| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This is just a very silly feature of mbstring -- you can compile the source files with
HAVE_MBSTRING undefined, and it will all just compile to (almost) nothing. What is the
use of this? Why compile the source files and link against them if you don't want the
mbstring extension? It doesn't make any kind of sense.
|
|
|
|
|
|
|
|
|
|
|
| |
Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from within mbstring was actually including the file "config.h"
from Valgrind, and not the one from mbstring!!
This is because -I/usr/include/valgrind was added to the compiler invocation _before_
-Iext/mbstring/libmbfl.
Make sure we actually include the file which was intended.
|
|
|
|
|
|
|
|
|
|
|
| |
This function uses various subfunctions to convert case of Unicode wchars.
Previously, these subfunctions would store the case-converted characters in
a buffer, and the parent function would then pass them (byte by byte) to
the next filter in the filter chain.
Rather than passing around that buffer, it's better for the subfunctions to
directly pass the case-converted bytes to the next filter in the filter chain.
This speeds things up nicely.
|
| |
|
|
|
|
| |
We make sure that negative values are properly compared.
|
|
|
|
| |
Closes GH-4732.
|
|
|
|
| |
Avoids a potentially uninitialized variable warning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.
A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.
This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.
With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.
Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files. All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
|
| |
|
|
|
|
|
|
| |
While at it, also make sure that mbstring case conversion takes
into account the specified substitution character and substitution
mode.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:
* Only language-agnostic and unconditional full case mapping
is implemented. The only language-agnostic conditional case
mapping rule relates to Greek sigma in final position
(Final_Sigma). Correctly handling this requires both arbitrary
lookahead and lookbehind, which would require some larger
changes to how the case mapping is implemented. This is a
possible future extension.
* The only language-specific handling that is implemented is
for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
is used. This matches the previous behavior and makes sure
that no codepoints not supported by the encoding are
produced. A future extension would be to also handle the
Turkish mappings specified by SpecialCasing.txt based on
the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
operations continue to use simple case folding. The reason is
that full case folding of the haystack string may change the
position at which a match occurred. This would have to be
mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
mapping / folding, where full is the default. The constants
are:
* MB_CASE_LOWER (used by mb_strtolower)
* MB_CASE_UPPER (used by mb_strtolower)
* MB_CASE_TITLE
* MB_CASE_FOLD
* MB_CASE_LOWER_SIMPLE
* MB_CASE_UPPER_SIMPLE
* MB_CASE_TITLE_SIMPLE
* MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
|
|
|
|
| |
Instead of using lowercasing.
|
|
|
|
|
|
|
| |
Instead of performing a binary search, use a hashtable to store
the case maps. In particular a minimal perfect hash construction
is used, which does not require collision resolution (but does
use an auxiliary table for the hash perturbation).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the case mapping table was segregated by the type of
the character (upper, lower, title) and always stored the other
two variants (key, other1, other2). Now the table is segregated
by the target type (key, other). As only very few characters have
more than one target this only slightly increases the size of the
table.
The advantage of this layout is that we only need to perform a
single table lookup in the case table. Previously, depending on
the case that was hit, either one lookup in the property table,
or two lookups in the property table and one lookup in the case
table were required.
This changes the layout from libunicode in the OpenLDAP project
-- however, the last commit there was over 10 years ago, so I
don't see value in keeping this in sync.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
mb_strtoupper() was converting lowercase characters into
titlecase characters, instead of uppercase characters. Luckily
there are only very few characters with a distinct titlecase
representation, so this mostly worked out okay...
|
| |
| |
| |
| |
| | |
This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the
OpenLDAP project.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In particular strings now store encoding rather than the
no_encoding.
I've also pruned out libmbfl APIs that existed in two forms, one
using no_encoding and the other using encoding. We were not actually
using any of the former.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Don't indirect through UCS4BE, instead directly work on wchars
using a custom filter.
This replaces the pipeline
utf8 -> wchar -> ucs4be -> wchar -case-> wchar -> ucs4be -> wchar -> utf8
with
utf8 -> wchar -case-> -> wchar -> utf8
|
| | |
|
| |
| |
| |
| |
| |
| | |
As a side-effect mb_strtolower() and mb_strtoupper() now correctly
handle a NULL encoding parameter by using the internal encoding.
This is what caused the two test changes.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Do not try to extract the properties from a bitmask. Instead make
the function variadic and pass all properties individually.
Also add a php_unicode_is_prop1() function to check only a single
property.
|
|/
|
|
|
| |
Extract part of php_mb_convert_encoding that does the actual work
and use it whenever we already know the encoding.
|
| |
|
|\
| |
| |
| |
| | |
* PHP-5.6:
Happy new year (Update copyright to 2016)
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
mb_strto[upper|lower]
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|