summaryrefslogtreecommitdiff
path: root/src/include/mb
Commit message (Collapse)AuthorAgeFilesLines
* Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.Tom Lane2015-05-141-11/+30
| | | | | | | | | | | | | | | | | | | | | Until now, these functions have only supported encoding conversions using lookup tables, which is fine as long as there's not too many code points to convert. However, GB18030 expects all 1.1 million Unicode code points to be convertible, which would require a ridiculously-sized lookup table. Fortunately, a large fraction of those conversions can be expressed through arithmetic, ie the conversions are one-to-one in certain defined ranges. To support that, provide a callback function that is used after consulting the lookup tables. (This patch doesn't actually change anything about the GB18030 conversion behavior, just provide infrastructure for fixing it.) Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway, take the opportunity to rearrange their argument lists into what seems to me a saner order. And beautify the call sites by using lengthof() instead of error-prone sizeof() arithmetic. In passing, also mark all the lookup tables used by these calls "const". This moves an impressive amount of stuff into the text segment, at least on my machine, and is safer anyhow.
* Fix various typos and grammar errors in comments.Andres Freund2015-04-261-1/+1
| | | | | Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru
* Tweak __attribute__-wrapping macros for better pgindent results.Tom Lane2015-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.
* Add macros wrapping all usage of gcc's __attribute__.Andres Freund2015-03-111-2/+2
| | | | | | | | | | | | | | | | | | | | Until now __attribute__() was defined to be empty for all compilers but gcc. That's problematic because it prevents using it in other compilers; which is necessary e.g. for atomics portability. It's also just generally dubious to do so in a header as widely included as c.h. Instead add pg_attribute_format_arg, pg_attribute_printf, pg_attribute_noreturn macros which are implemented in the compilers that understand them. Also add pg_attribute_noreturn and pg_attribute_packed, but don't provide fallbacks, since they can affect functionality. This means that external code that, possibly unwittingly, relied on __attribute__ defined to be empty on !gcc compilers may now run into warnings or errors on those compilers. But there shouldn't be many occurances of that and it's hard to work around... Discussion: 54B58BA3.8040302@ohmu.fi Author: Oskari Saarenmaa, with some minor changes by me.
* Update copyright for 2015Bruce Momjian2015-01-061-1/+1
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-061-5/+5
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Fix typoPeter Eisentraut2014-02-131-1/+1
| | | | Stefan Kaltenbrunner
* Make various variables const (read-only).Tom Lane2014-01-181-19/+7
| | | | | | | | | | | | | | | | | | These changes should generally improve correctness/maintainability. A nice side benefit is that several kilobytes move from initialized data to text segment, allowing them to be shared across processes and probably reducing copy-on-write overhead while forking a new backend. Unfortunately this doesn't seem to help libpq in the same way (at least not when it's compiled with -fpic on x86_64), but we can hope the linker at least collects all nominally-const data together even if it's not actually part of the text segment. Also, make pg_encname_tbl[] static in encnames.c, since there seems no very good reason for any other code to use it; per a suggestion from Wim Lewis, who independently submitted a patch that was mostly a subset of this one. Oskari Saarenmaa, with some editorialization by me
* Update copyright for 2014Bruce Momjian2014-01-071-1/+1
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Renovate display of non-ASCII messages on Windows.Noah Misch2013-06-261-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch
* pgindent run for release 9.3Bruce Momjian2013-05-291-49/+50
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Support indexing of regular-expression searches in contrib/pg_trgm.Tom Lane2013-04-091-0/+5
| | | | | | | | | | | | | | | | This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
* Add noreturn attributes to some error reporting functionsPeter Eisentraut2013-02-121-2/+2
|
* Update copyrights for 2013Bruce Momjian2013-01-011-1/+1
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Fix bogus macro definition.Tom Lane2012-07-101-1/+1
| | | | Per buildfarm complaints.
* Add comments about additional mule-internal charsets from emacs'sTatsuo Ishii2012-07-111-3/+11
| | | | | | | source code(lisp/international/mule-conf.el). These charsets have not been supported up to now anyway, so this is just for adding commentary. Also add mention that we follow emacs's implementation, not xemacs's.
* Add wchar -> mb conversion routines.Robert Haas2012-07-041-3/+21
| | | | | | | | This is infrastructure for Alexander Korotkov's work on indexing regular expression searches. Alexander Korotkov, with a bit of further hackery on the MULE conversion by me
* Improve documentation about MULE encoding.Tom Lane2012-07-041-49/+96
| | | | | | | This commit improves the comments in pg_wchar.h and creates #define symbols for some formerly hard-coded values. No substantive code changes. Tatsuo Ishii and Tom Lane
* Update copyright notices for year 2012.Bruce Momjian2012-01-011-1/+1
|
* Improve make_greater_string() with encoding-specific incrementers.Robert Haas2011-10-291-0/+3
| | | | | | | | | This infrastructure doesn't in any way guarantee that the character we produce will sort before the one we incremented; but it does at least make it much more likely that we'll end up with something that is a valid character, which improves our chances. Kyotaro Horiguchi, with various adjustments by me.
* Fix char2wchar/wchar2char to support collations properly.Tom Lane2011-04-231-7/+0
| | | | | | | | | | | | | | | | | These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.
* Revise the API for GUC variable assign hooks.Tom Lane2011-04-071-1/+2
| | | | | | | | | | | | | | | | | The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".
* Add ENCODING option to COPY TO/FROM and file_fdw.Itagaki Takahiro2011-02-211-0/+2
| | | | | | | | | | | File encodings can be specified separately from client encoding. If not specified, client encoding is used for backward compatibility. Cases when the encoding doesn't match client encoding are slower than matched cases because we don't have conversion procs for other encodings. Performance improvement would be be a future work. Original patch by Hitoshi Harada, and modified by me.
* Per-column collation supportPeter Eisentraut2011-02-081-2/+2
| | | | | | | | This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
* Stamp copyrights for year 2011.Bruce Momjian2011-01-011-1/+1
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-201-1/+1
|
* Rename utf2ucs() to utf8_to_unicode(), and export it so it can be usedTom Lane2010-08-181-1/+2
| | | | | | | | | | | | elsewhere. Similarly rename the version in mbprint.c, not because this affects anything but just to keep the two copies in exact sync. There was some discussion of having only one copy in src/port/ instead, but this function is so small and unlikely to change that that seems like overkill. Slightly editorialized version of a patch by Joseph Adams. (The bug-fix aspect of his patch was applied separately, and back-patched.)
* pgindent run for 9.0Bruce Momjian2010-02-261-2/+2
|
* Update copyright for the year 2010.Bruce Momjian2010-01-021-2/+2
|
* Write to the Windows eventlog in UTF16, converting the message encodingMagnus Hagander2009-10-171-1/+9
| | | | | | as necessary. Itagaki Takahiro with some changes from me
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-111-7/+7
| | | | provided by Andrew.
* Move gettext encoding names into encnames.c, so we only have one place to ↵Magnus Hagander2009-04-241-1/+12
| | | | | | update. Per discussion.
* Tell gettext which codeset to use by calling bind_textdomain_codeset(). WeHeikki Linnakangas2009-04-081-2/+2
| | | | | | | | | already did that on Windows, but it's needed on other platforms too when LC_CTYPE=C. With other locales, we enforce (or trust) that the codeset of the locale matches the server encoding so we don't need to bind it explicitly. It should do no harm in that case either, but I don't have full faith in the PG encoding -> OS codeset mapping table yet. Per recent discussion on pgsql-hackers.
* Fix SetClientEncoding() to maintain a cache of previously selected encodingTom Lane2009-04-021-2/+1
| | | | | | | | | | | | conversion functions. This allows transaction rollback to revert to a previous client_encoding setting without doing fresh catalog lookups. I believe that this explains and fixes the recent report of "failed to commit client_encoding" failures. This bug is present in 8.3.x, but it doesn't seem prudent to back-patch the fix, at least not till it's had some time for field testing in HEAD. In passing, remove SetDefaultClientEncoding(), which was used nowhere.
* Revert pg_bind_textdomain_codeset to a existant-but-empty function whenAlvaro Herrera2009-03-091-3/+1
| | | | | | | ENABLE_NLS is not defined, for better compatibility of the backend with modules compiled the other way. Per note from Tom after my previous commit.
* pg_bind_textdomain_codeset must exist only on ENABLE_NLS.Alvaro Herrera2009-03-081-1/+3
|
* On Windows, call bind_textdomain_codeset on domains other than the default one,Alvaro Herrera2009-03-081-1/+2
| | | | too, so that the codeset is properly mapped on the newly added PL domains.
* Support for KOI8U encodingPeter Eisentraut2009-02-101-2/+3
|
* Replace argument-checking Asserts with regular test-and-elog checks in allTom Lane2009-01-291-1/+20
| | | | | | | | | | | | encoding conversion functions. These are not can't-happen cases because it's possible to create a conversion with the wrong conversion function for the specified encoding pair. That would lead to an Assert crash in an Assert-enabled build, or incorrect conversion otherwise, neither of which is desirable. This would be a DOS issue if production databases were customarily built with asserts enabled, but fortunately that's not so. Per an observation by Heikki. Back-patch to all supported branches.
* Add a pg_encoding_mbcliplen() function that is just like pg_mbcliplen()Tom Lane2009-01-041-1/+3
| | | | | | | | except the caller can specify the encoding to work in; this will be needed for pg_stat_statements. In passing, do some marginal efficiency hacking and clean up some comments. Also, prevent the single-byte-encoding code path from fetching one byte past the stated length of the string (this last is a bug that might need to be back-patched at some point).
* Update copyright for 2009.Bruce Momjian2009-01-011-2/+2
|
* Unicode escapes in strings and identifiersPeter Eisentraut2008-10-291-1/+2
|
* Move wchar2char() and char2wchar() from tsearch into /mb to be easier toBruce Momjian2008-06-181-1/+6
| | | | | | use for other modules; also move pnstrdup(). Clean up code slightly.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-011-2/+2
|
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-151-5/+5
| | | | avoid this problem in the future.)
* pgindent run for 8.3.Bruce Momjian2007-11-151-22/+22
|
* Fix pg_wchar_table[] to match revised ordering of the encoding ID enum.Tom Lane2007-10-151-3/+4
| | | | | Add some comments so hopefully the next poor sod doesn't fall into the same trap. (Wrong comments are worse than none at all...)
* Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: theTom Lane2007-10-131-19/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2 initdb and psql if they are run with an 8.3beta1 libpq.so. For the moment we can rearrange the order of enum pg_enc to keep the same number for everything except PG_JOHAB, which isn't a problem since there are no direct references to it in the 8.2 programs anyway. (This does force initdb unfortunately.) Going forward, we want to fix things so that encoding IDs can be changed without an ABI break, and this commit includes the changes needed to allow libpq's encoding IDs to be treated as fully independent of the backend's. The main issue is that libpq clients should not include pg_wchar.h or otherwise assume they know the specific values of libpq's encoding IDs, since they might encounter version skew between pg_wchar.h and the libpq.so they are using. To fix, have libpq officially export functions needed for encoding name<=>ID conversion and validity checking; it was doing this anyway unofficially. It's still the case that we can't renumber backend encoding IDs until the next bump in libpq's major version number, since doing so will break the 8.2-era client programs. However the code is now prepared to avoid this type of problem in future. Note that initdb is no longer a libpq client: we just pull in the two source files we need directly. The patch also fixes a few places that were being sloppy about checking for an unrecognized encoding name.
* Close previously open holes for invalidly encoded data to enter theAndrew Dunstan2007-09-181-1/+3
| | | | | | | | | | | | | | | | | | | | | database via builtin functions, as recently discussed on -hackers. chr() now returns a character in the database encoding. For UTF8 encoded databases the argument is treated as a Unicode code point. For other multi-byte encodings the argument must designate a strict ascii character, or an error is raised, as is also the case if the argument is 0. ascii() is adjusted so that it remains the inverse of chr(). The two argument form of convert() is gone, and the three argument form now takes a bytea first argument and returns a bytea. To cover this loss three new functions are introduced: . convert_from(bytea, name) returns text - converts the first argument from the named encoding to the database encoding . convert_to(text, name) returns bytea - converts the first argument from the database encoding to the named encoding . length(bytea, name) returns int - gives the length of the first argument in characters in the named encoding
* Make JOHAB client only encoding per discussions in pgsql-hackersTatsuo Ishii2007-04-151-2/+2
| | | | | "Server-side support of all encodings" around 2007/3/26. initdb required.