<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/php-git.git/ext/mbstring/tests/data, branch master</title>
<subtitle>git.php.net: repository/php-src.git
</subtitle>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/'/>
<entry>
<title>JIS7/JIS8 encoding: handle invalid 2nd byte for Kanji correctly</title>
<updated>2021-01-14T20:31:31+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-09-14T19:07:03+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=b67e358e75fc55ec802215fac0ca5a8e21b1f307'/>
<id>b67e358e75fc55ec802215fac0ca5a8e21b1f307</id>
<content type='text'>
Previously, in ISO-2022-JP/JIS7/JIS8, if an escape sequence (starting with 0x1B)
appeared where the 2nd byte of a multibyte character should have been, mbstring
would forget all about the truncated multibyte character and happily accept the
escape sequence. However, such sequences are not legal and should be flagged as
errors.

Also, any other illegal bytes appearing where the 2nd byte of a multibyte
character was expected were just passed through quietly to the output. Fix that.

Also add a test suite for both ISO-2022-JP and JIS7/JIS8. (These are extremely
similar encodings; JIS7 and JIS8 are variants of ISO-2022-JP. mbstring's 'JIS'
is actually a combination of JIS7 _and_ JIS8, since the extensions which each
one adds to ISO-2022-JP are disjoint.)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, in ISO-2022-JP/JIS7/JIS8, if an escape sequence (starting with 0x1B)
appeared where the 2nd byte of a multibyte character should have been, mbstring
would forget all about the truncated multibyte character and happily accept the
escape sequence. However, such sequences are not legal and should be flagged as
errors.

Also, any other illegal bytes appearing where the 2nd byte of a multibyte
character was expected were just passed through quietly to the output. Fix that.

Also add a test suite for both ISO-2022-JP and JIS7/JIS8. (These are extremely
similar encodings; JIS7 and JIS8 are variants of ISO-2022-JP. mbstring's 'JIS'
is actually a combination of JIS7 _and_ JIS8, since the extensions which each
one adds to ISO-2022-JP are disjoint.)
</pre>
</div>
</content>
</entry>
<entry>
<title>ISO-2022-JP-2004 conversion: handle invalid characters correctly</title>
<updated>2021-01-14T20:26:24+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-09-13T17:25:46+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=4b95fdf2cac7269a38d035141e7321a295e19b29'/>
<id>4b95fdf2cac7269a38d035141e7321a295e19b29</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Enhance handling of CP51932 encoding</title>
<updated>2020-11-25T18:51:44+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-10-18T12:27:21+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=5c805655db117192ba0a32e355bb2d75aa9fc234'/>
<id>5c805655db117192ba0a32e355bb2d75aa9fc234</id>
<content type='text'>
- Don't pass 'control' characters through in the middle of a multi-byte char
- Treat truncated multi-byte characters as an error
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Don't pass 'control' characters through in the middle of a multi-byte char
- Treat truncated multi-byte characters as an error
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix mbstring support for SJIS-Mobile (DoCoMo, KDDI, and Softbank variants of Shift-JIS)</title>
<updated>2020-11-25T18:51:44+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-10-20T05:47:20+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=beef597124fa418a89d57d527c0eddf7552c5812'/>
<id>beef597124fa418a89d57d527c0eddf7552c5812</id>
<content type='text'>
Lots of problems here.

- Don't pass 'control' characters through silently in the middle of a
  multi-byte character.
- Treat it as an error if a multi-byte character is truncated.
- For ESC sequences used to encode emoji on earlier Softbank phones, if an
  invalid ESC sequence is found, don't pass it through. Rather, handle it as
  an error and respect `mb_substitute_character`.
- In ranges used by mobile vendors for emoji, if a certain byte sequence
  doesn't map to any emoji, don't emit a mangled value (actually a raw
  (ku*94)+ten value, which may not even be a valid Unicode codepoint at all).
- When converting Unicode to SJIS-Mobile, don't mangle codepoints which fall
  in the 2nd range of MicroSoft vendor extensions.

Some vendor-specific emoji have been mapped to standard Unicode codepoints
now, rather than 'private use area' codepoints. When the legacy code was
written, these codepoints may not have existed yet in the Unicode standard
which was current at that time.

Also do a major code cleanup -- remove dead code, rearrange what is left,
use some new macros and helper functions to make the code clearer...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lots of problems here.

- Don't pass 'control' characters through silently in the middle of a
  multi-byte character.
- Treat it as an error if a multi-byte character is truncated.
- For ESC sequences used to encode emoji on earlier Softbank phones, if an
  invalid ESC sequence is found, don't pass it through. Rather, handle it as
  an error and respect `mb_substitute_character`.
- In ranges used by mobile vendors for emoji, if a certain byte sequence
  doesn't map to any emoji, don't emit a mangled value (actually a raw
  (ku*94)+ten value, which may not even be a valid Unicode codepoint at all).
- When converting Unicode to SJIS-Mobile, don't mangle codepoints which fall
  in the 2nd range of MicroSoft vendor extensions.

Some vendor-specific emoji have been mapped to standard Unicode codepoints
now, rather than 'private use area' codepoints. When the legacy code was
written, these codepoints may not have existed yet in the Unicode standard
which was current at that time.

Also do a major code cleanup -- remove dead code, rearrange what is left,
use some new macros and helper functions to make the code clearer...
</pre>
</div>
</content>
</entry>
<entry>
<title>Enhance handling of CP932 text encoding</title>
<updated>2020-11-25T17:52:19+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-10-04T20:29:34+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=2759874a4250e45f25a1807cfaed7ab5bc07ae14'/>
<id>2759874a4250e45f25a1807cfaed7ab5bc07ae14</id>
<content type='text'>
- Don't allow control characters to appear in the middle of a multi-byte
  character. (This was a strange feature of mbstring; it doesn't make much
  sense, and iconv doesn't allow it.)
- Treat truncated multi-byte characters as an error.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Don't allow control characters to appear in the middle of a multi-byte
  character. (This was a strange feature of mbstring; it doesn't make much
  sense, and iconv doesn't allow it.)
- Treat truncated multi-byte characters as an error.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add test suite for SJIS-mac encoding</title>
<updated>2020-11-11T09:18:58+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-09-09T18:02:46+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=1cf12c02f0683eda46634317fa6dcd61d3e3cd5e'/>
<id>1cf12c02f0683eda46634317fa6dcd61d3e3cd5e</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Add test suite for SJIS-2004 encoding</title>
<updated>2020-11-11T09:18:58+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-09-08T20:14:36+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=d40f9cf735d946d77e4d5bbbcb851d067500fccc'/>
<id>d40f9cf735d946d77e4d5bbbcb851d067500fccc</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix mbstring support for EUC-JP text encoding</title>
<updated>2020-11-09T11:45:17+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-10-01T17:56:42+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=8f6889b20d43ed99564c380b879264dca6c541fb'/>
<id>8f6889b20d43ed99564c380b879264dca6c541fb</id>
<content type='text'>
- Don't allow control characters to appear in the middle of a multi-byte
  character. (A strange feature, or perhaps misfeature, of mbstring which is
  not present in other libraries such as iconv.)
- When checking whether string is valid, reject kuten codes which do not
  map to any character, whether converting from EUC-JP to another encoding,
  or converting another encoding which uses JIS X 0208/0212 charsets to
  EUC-JP.
- Truncated multi-byte characters are treated as an error.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Don't allow control characters to appear in the middle of a multi-byte
  character. (A strange feature, or perhaps misfeature, of mbstring which is
  not present in other libraries such as iconv.)
- When checking whether string is valid, reject kuten codes which do not
  map to any character, whether converting from EUC-JP to another encoding,
  or converting another encoding which uses JIS X 0208/0212 charsets to
  EUC-JP.
- Truncated multi-byte characters are treated as an error.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix mbstring support for Shift-JIS</title>
<updated>2020-11-09T11:45:16+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-10-19T18:57:58+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=ad7e0f16ccee09a28b4cb78055937f513aef9e61'/>
<id>ad7e0f16ccee09a28b4cb78055937f513aef9e61</id>
<content type='text'>
- Reject otherwise valid kuten codes which don't map to anything in JIS X 0208.
- Handle truncated multi-byte characters as an error.
- Convert Shift-JIS 0x7E to Unicode 0x203E (overline) as recommended by the
  Unicode Consortium, and as iconv does.
- Convert Shift-JIS 0x5C to Unicode 0xA5 (yen sign) as recommended by the
  Unicode Consortium, and as iconv does.
  (NOTE: This will affect PHP scripts which use an internal encoding of
  Shift-JIS! PHP assigns a special meaning to 0x5C, the backslash. For example,
  it is used for escapes in double-quoted strings. Mapping the Shift-JIS yen
  sign to the Unicode yen sign means the yen sign will not be usable for
  C escapes in double-quoted strings. Japanese PHP programmers who want to
  write their source code in Shift-JIS for some strange reason will have to
  use the JIS X 0208 backlash or 'REVERSE SOLIDUS' character for their C
  escapes.)
- Convert Unicode 0x5C (backslash) to Shift-JIS 0x815F (reverse solidus).
- Immediately handle error if first Shift-JIS byte is over 0xEF, rather than
  waiting to see the next byte. (Previously, the value used was 0xFC, which is
  the limit for the 2nd byte and not the 1st byte of a multi-byte character.)
- Don't allow 'control characters' to appear in the middle of a multi-byte
  character.

The test case for bug 47399 is now obsolete. That test assumed that a number
of Shift-JIS byte sequences which don't map to any character were 'valid'
(because the byte values were within the legal ranges).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Reject otherwise valid kuten codes which don't map to anything in JIS X 0208.
- Handle truncated multi-byte characters as an error.
- Convert Shift-JIS 0x7E to Unicode 0x203E (overline) as recommended by the
  Unicode Consortium, and as iconv does.
- Convert Shift-JIS 0x5C to Unicode 0xA5 (yen sign) as recommended by the
  Unicode Consortium, and as iconv does.
  (NOTE: This will affect PHP scripts which use an internal encoding of
  Shift-JIS! PHP assigns a special meaning to 0x5C, the backslash. For example,
  it is used for escapes in double-quoted strings. Mapping the Shift-JIS yen
  sign to the Unicode yen sign means the yen sign will not be usable for
  C escapes in double-quoted strings. Japanese PHP programmers who want to
  write their source code in Shift-JIS for some strange reason will have to
  use the JIS X 0208 backlash or 'REVERSE SOLIDUS' character for their C
  escapes.)
- Convert Unicode 0x5C (backslash) to Shift-JIS 0x815F (reverse solidus).
- Immediately handle error if first Shift-JIS byte is over 0xEF, rather than
  waiting to see the next byte. (Previously, the value used was 0xFC, which is
  the limit for the 2nd byte and not the 1st byte of a multi-byte character.)
- Don't allow 'control characters' to appear in the middle of a multi-byte
  character.

The test case for bug 47399 is now obsolete. That test assumed that a number
of Shift-JIS byte sequences which don't map to any character were 'valid'
(because the byte values were within the legal ranges).
</pre>
</div>
</content>
</entry>
<entry>
<title>Add test suite for ARMSCII-8 encoding</title>
<updated>2020-11-02T19:31:06+00:00</updated>
<author>
<name>Alex Dowad</name>
<email>alexinbeijing@gmail.com</email>
</author>
<published>2020-10-18T15:51:59+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/php-git.git/commit/?id=ff953f254c3f55c67e2de8a003887876d3b05420'/>
<id>ff953f254c3f55c67e2de8a003887876d3b05420</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
