summaryrefslogtreecommitdiff
path: root/ext/mbstring/php_unicode.c
Commit message (Collapse)AuthorAgeFilesLines
* Don't guard mbstring code with #ifdef HAVE_MBSTRINGAlex Dowad2020-08-311-5/+0
| | | | | | | This is just a very silly feature of mbstring -- you can compile the source files with HAVE_MBSTRING undefined, and it will all just compile to (almost) nothing. What is the use of this? Why compile the source files and link against them if you don't want the mbstring extension? It doesn't make any kind of sense.
* Remove redundant includes from mbstring (and make sure correct config.h is used)Alex Dowad2020-08-311-5/+0
| | | | | | | | | | | Very interesting... it turns out that when Valgrind support was enabled, `#include "config.h"` from within mbstring was actually including the file "config.h" from Valgrind, and not the one from mbstring!! This is because -I/usr/include/valgrind was added to the compiler invocation _before_ -Iext/mbstring/libmbfl. Make sure we actually include the file which was intended.
* Optimize php_unicode_convert_case (cuts mbstring case conversion time ~15%)Alex Dowad2020-08-311-44/+47
| | | | | | | | | | | This function uses various subfunctions to convert case of Unicode wchars. Previously, these subfunctions would store the case-converted characters in a buffer, and the parent function would then pass them (byte by byte) to the next filter in the filter chain. Rather than passing around that buffer, it's better for the subfunctions to directly pass the case-converted bytes to the next filter in the filter chain. This speeds things up nicely.
* Fix [-Wundef] warning in MBString extensionGeorge Peter Banyard2020-05-161-1/+1
|
* Fix #79371: mb_strtolower (UTF-32LE): stack-buffer-overflowChristoph M. Becker2020-03-161-1/+1
| | | | We make sure that negative values are properly compared.
* Remove mention of PHP major version in Copyright headersGabriel Caruso2019-09-251-2/+0
| | | | Closes GH-4732.
* Use EMPTY_SWITCH_DEFAULT_CASE in php_unicode.cNikita Popov2019-04-121-3/+1
| | | | Avoids a potentially uninitialized variable warning.
* Remove local variablesPeter Kokot2019-02-031-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the so called local variables defined per file basis for certain editors to properly show tab width, and similar settings. These are mainly used by Vim and Emacs editors yet with recent changes the once working definitions don't work anymore in Vim without custom plugins or additional configuration. Neither are these settings synced across the PHP code base. A simpler and better approach is EditorConfig and fixing code using some code style fixing tools in the future instead. This patch also removes the so called modelines for Vim. Modelines allow Vim editor specifically to set some editor configuration such as syntax highlighting, indentation style and tab width to be set in the first line or the last 5 lines per file basis. Since the php test files have syntax highlighting already set in most editors properly and EditorConfig takes care of the indentation settings, this patch removes these as well for the Vim 6.0 and newer versions. With the removal of local variables for certain editors such as Emacs and Vim, the footer is also probably not needed anymore when creating extensions using ext_skel.php script. Additionally, Vim modelines for setting php syntax and some editor settings has been removed from some *.phpt files. All these are mostly not relevant for phpt files neither work properly in the middle of the file.
* Remove yearly range from copyright noticeZeev Suraski2019-01-301-1/+1
|
* Fixed bug #76319Nikita Popov2018-05-251-1/+14
| | | | | | While at it, also make sure that mbstring case conversion takes into account the specified substitution character and substitution mode.
* year++Xinchen Hui2018-01-021-1/+1
|
* fix c89 compatAnatol Belski2017-07-281-2/+2
|
* Fixed bug #65544 and #71298Nikita Popov2017-07-281-20/+12
|
* Implement full case mappingNikita Popov2017-07-281-18/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement full case mapping according to SpecialCasing.txt and also full case folding according to CaseFolding.txt (F). There are a number of caveats: * Only language-agnostic and unconditional full case mapping is implemented. The only language-agnostic conditional case mapping rule relates to Greek sigma in final position (Final_Sigma). Correctly handling this requires both arbitrary lookahead and lookbehind, which would require some larger changes to how the case mapping is implemented. This is a possible future extension. * The only language-specific handling that is implemented is for Turkish dotted/undotted Is, if the ISO-8859-9 encoding is used. This matches the previous behavior and makes sure that no codepoints not supported by the encoding are produced. A future extension would be to also handle the Turkish mappings specified by SpecialCasing.txt based on the mbfl internal language. * Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string. * mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are: * MB_CASE_LOWER (used by mb_strtolower) * MB_CASE_UPPER (used by mb_strtolower) * MB_CASE_TITLE * MB_CASE_FOLD * MB_CASE_LOWER_SIMPLE * MB_CASE_UPPER_SIMPLE * MB_CASE_TITLE_SIMPLE * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
* Use case-folding for case insensitive comparisonsNikita Popov2017-07-281-0/+24
| | | | Instead of using lowercasing.
* Use MPH for case mapsNikita Popov2017-07-281-32/+30
| | | | | | | Instead of performing a binary search, use a hashtable to store the case maps. In particular a minimal perfect hash construction is used, which does not require collision resolution (but does use an auxiliary table for the hash perturbation).
* Change layout of case mapping tableNikita Popov2017-07-231-87/+27
| | | | | | | | | | | | | | | | | | | Previously the case mapping table was segregated by the type of the character (upper, lower, title) and always stored the other two variants (key, other1, other2). Now the table is segregated by the target type (key, other). As only very few characters have more than one target this only slightly increases the size of the table. The advantage of this layout is that we only need to perform a single table lookup in the case table. Previously, depending on the case that was hit, either one lookup in the property table, or two lookups in the property table and one lookup in the case table were required. This changes the layout from libunicode in the OpenLDAP project -- however, the last commit there was over 10 years ago, so I don't see value in keeping this in sync.
* Merge branch 'PHP-7.2'Nikita Popov2017-07-231-14/+15
|\
| * Another fix for bug #69267Nikita Popov2017-07-231-1/+1
| | | | | | | | | | | | | | mb_strtoupper() was converting lowercase characters into titlecase characters, instead of uppercase characters. Luckily there are only very few characters with a distinct titlecase representation, so this mostly worked out okay...
| * Partial fix for bug #69267Nikita Popov2017-07-231-13/+14
| | | | | | | | | | This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the OpenLDAP project.
* | Directly use encodings instead of no_encoding in libmbflNikita Popov2017-07-201-5/+5
| | | | | | | | | | | | | | | | | | In particular strings now store encoding rather than the no_encoding. I've also pruned out libmbfl APIs that existed in two forms, one using no_encoding and the other using encoding. We were not actually using any of the former.
* | Reduce number of encoding conversions in case conversionNikita Popov2017-07-201-48/+85
| | | | | | | | | | | | | | | | | | | | Don't indirect through UCS4BE, instead directly work on wchars using a custom filter. This replaces the pipeline utf8 -> wchar -> ucs4be -> wchar -case-> wchar -> ucs4be -> wchar -> utf8 with utf8 -> wchar -case-> -> wchar -> utf8
* | Optimize php_unicode_tolower/upper for ASCIINikita Popov2017-07-201-39/+22
| |
* | Directly accept encoding in php_unicode_convert_case()Nikita Popov2017-07-191-11/+4
| | | | | | | | | | | | As a side-effect mb_strtolower() and mb_strtoupper() now correctly handle a NULL encoding parameter by using the internal encoding. This is what caused the two test changes.
* | Optimize php_unicode_is_prop()Nikita Popov2017-07-191-14/+20
| | | | | | | | | | | | | | | | Do not try to extract the properties from a bitmask. Instead make the function variadic and pass all properties individually. Also add a php_unicode_is_prop1() function to check only a single property.
* | Avoid unnecessary encoding lookups in mbstringNikita Popov2017-07-191-10/+15
|/ | | | | Extract part of php_mb_convert_encoding that does the actual work and use it whenever we already know the encoding.
* Update copyright headers to 2017Sammy Kaye Powers2017-01-021-1/+1
|
* Merge branch 'PHP-5.6' into PHP-7.0Lior Kaplan2016-01-011-1/+1
|\ | | | | | | | | * PHP-5.6: Happy new year (Update copyright to 2016)
| * Happy new year (Update copyright to 2016)Lior Kaplan2016-01-011-1/+1
| |
| * bump yearXinchen Hui2015-01-151-1/+1
| |
* | bump yearXinchen Hui2015-01-151-1/+1
| |
* | trailing whitespace removalStanislav Malyshev2015-01-101-9/+9
| |
* | first shot remove TSRMLS_* thingsAnatol Belski2014-12-131-11/+11
| |
* | s/PHP 5/PHP 7/Johannes Schlüter2014-09-191-1/+1
|/
* Bump yearXinchen Hui2014-01-031-1/+1
|
* Happy New YearXinchen Hui2013-01-011-1/+1
|
* - Year++Felipe Pena2012-01-011-1/+1
|
* - Year++Felipe Pena2011-01-011-1/+1
|
* sed -i "s#1997-2009#1997-2010#g" **/*.c **/*.h **/*.phpSebastian Bergmann2010-01-031-1/+1
|
* MFH: Bump copyright year, 3 of 3.Sebastian Bergmann2008-12-311-1/+1
|
* Fixed bug #46626 (mb_convert_case does not handle apostrophe correctly)Ilia Alshanetsky2008-11-241-1/+1
|
* - MFH: Fixed warnings.Moriyoshi Koizumi2008-07-241-2/+2
|
* fixed #43998 Two error messages returned for incorrect encoding for ↵Rui Hirokawa2008-02-161-0/+5
| | | | mb_strto[upper|lower]
* MFH: Bump copyright year, 2 of 2.Sebastian Bergmann2007-12-311-1/+1
|
* MFH: fixed bug #29955 invalid case conversion in iso-8859-9.Rui Hirokawa2007-09-041-4/+2
|
* MFH: Bump year.Sebastian Bergmann2007-01-011-1/+1
|
* bump year and license versionfoobar2006-01-011-3/+3
|
* MFH: fixed #29955 mb_strtoupper() / lower() broken with Turkish encoding..Rui Hirokawa2005-12-231-8/+39
|
* - Bumber up yearfoobar2005-08-031-1/+1
|
* - A belated happy holidays and PHP 5Andi Gutmans2004-01-081-2/+2
|