diff options
Diffstat (limited to 'ext/mbstring/README_PHP3-i18n-ja')
-rw-r--r-- | ext/mbstring/README_PHP3-i18n-ja | 774 |
1 files changed, 774 insertions, 0 deletions
diff --git a/ext/mbstring/README_PHP3-i18n-ja b/ext/mbstring/README_PHP3-i18n-ja new file mode 100644 index 0000000..cac00b8 --- /dev/null +++ b/ext/mbstring/README_PHP3-i18n-ja @@ -0,0 +1,774 @@ +========================================== + README for I18N Package +========================================== + +o Name and location of package + +Name: php-3.0.18-i18n-ja-2 +Location: http://www.happysize.co.jp/techie/php-ja-jp/ + ftp://ftp.happysize.co.jp/php-ja-jp/ + http://php.vdomains.org/ + ftp://ftp.vdomains.org/pub/php-ja-jp/ + http://php.jpnnet.com/ + +Currently, this I18N version of PHP only adds Japanese support to base +PHP. It allows you to use Japanese in scripts, as well as conversion +between various Japanese encodings. It will work perfectly fine with +ASCII with i18n option enabled. (note: executable is bit larger due +to UNICODE table). The basic design aproach is to allow for other +languages to be added in the future. Developers are encourage to join +us! + +For more information on Japanese encodings, please refer to the +section "Additional Notes." + + +o What is this package? + +This package allows you to handle multiple Japanese encodings (SJIS, EUC, +UTF-8, JIS) in PHP. If you find any bugs in this package, please report +them to the appropriate mailing list. For now, the PHP-jp mailing list +is the best place for this. + +PHP-jp ML mailto:PHP-jp@sidecar.ics.es.osaka-u.ac.jp + http://sidecar.ics.es.osaka-u.ac.jp/php-jp/ + (discussions are in Japanese) + + +o Who should use this + +Due to lack of documentation, it's not intended for beginners. If +something goes wrong, be prepared to fix it on your own. + + +o Warranty and Copyright + +There is no warranty with this package. Use it at your own risk. + +Please refer to the source code for the copyrights. In general, each +program's copyright is owned by the programmer. Unless you obey the +copyright holders restrictions, you are not allowed to use it in any +form. + + +o Redistribution + +As described in the source code, this package and the components are +allowed to be redistributed with certain restrictions. + +Due to this package being still in beta, please try to redistribute +it as an entire package. Please try not to distribute it as a form +of patch. Because we would prefer to have this package distributed +as one single package (not patch of patch of patch), avoid releasing +any patch to this package. + + +o Who made this + +A team of volunteers, PHP3 Internationalization, has been contributing +their free time producing it. Although we are not related to the core +PHP programmers, we are hoping to have our modifications merged into the +core distribution in the near future. Thus, we did not call this a +"Japanese Patch" (or distribution). Our final goal is to have true +i18nized PHP! + +For anyone interested in this project, please drop us a line. + +Contact Address: + phpj-dev@kage.net + (Discussions are in Japanese, but feel free to write us in English) + +Webpage (English and Japanese): + http://php.jpnnet.com/ + +Project Outline (Japanese): + http://www.happysize.co.jp/techie/php-ja-jp/spec.htm + +Developers: + Hironori Sato <satoh@jpnnet.com> + Shigeru Kanemoto <sgk@happysize.co.jp> + Tsukada Takuya <tsukada@fminn.nagano.nagano.jp> + U. Kenkichi <kenkichi@axes.co.jp> + Tateyama <tateyan@amy.hi-ho.ne.jp> + Other gracious contributors + + +o Future plans + +- fulfilling what's written in outline +- support for other languages other than Japanese +- make the character conversion as a library (?) +- more testing + + +o Special Thanks to + +PHP Japanese webpage maintainer, Hirokawa-san + http://www.cityfujisawa.ne.jp/%7Elouis/apps/phpfi/ +PHP-JP ML's Yamamoto-san + http://sidecar.ics.es.osaka-u.ac.jp/php-jp/ +Previous jp-patch developers + + + +========================================== + Advantages of using I18N package +========================================== + +- allows you to use various character encodings for script files and + http output +- distinguish character encoding in POST/GET/COOKIE +- proper mail output using JIS as body and MIME/Base64/JIS subject +- if http output's Content-Type is text/html, it will set proper charset +- stable character encoding conversion +- multibyte regex + + + +========================================== + Installation +========================================== + +o Summary + +Add --enable-i18n option when running configure. For your own setup, +add any other appropriate options as well. + +Don't forget to copy php3.ini-dist to desired location. +(ex. /usr/local/lib/php3.ini) + +If you have already installed PHP3, copy all the entries in php3.ini-dist +which start with "i18n.xxxx" to php3.ini. + + +o configure option + --enable-i18n + include i18n features + + --enable-mbregex + include multibyte regex library + (without i18n enabled, mbregex functions will not function) + + +o creating cgi version + + % tar xvzf php-3.0.18-i18n-ja-2.tar.gz + % cd php-3.0.18-i18n-ja-2 + % ./configure --enable-i18n --enable-mbregex + % make + + +o creating Apache version (regular module) + + % tar xvzf php-3.0.18-i18n-ja-2.tar.gz + % tar xvzf apache_1.3.x.tar.gz + % cd apache_1.3.x + % ./configure + % cd ../php-3.0.18-i18n-ja-2 + % ./configure --with-apache=../apache_1.3.x --enable-i18n --enable-mbregex + % make + % make install + % cd ../apache_1.3.x + % ./configure --activate-module=src/modules/php3/libphp3.a + % make + % make install + + +o creating Apache DSO version + + create DSO capable Apache first + % tar xvzf apache_1.3.x.tar.gz + % cd apache-1.3.x + % ./configure --enable-shared=max + % make + % make install + + now create php3 + % cd php-3.0.18-i18n-ja-2 + % ./configure --with-apxs=/usr/local/apache/bin/apxs --enable-i18n \ + --enable-mbregex + % make + % make install + + +========================================== + Additional Notes +========================================== + +o Multibyte regex library + +From beta4, we have included the multibyte (mb) regex library which comes with +Ruby. With this addition, you can now use regex in EUC, SJIS and UTF-8 +encoding. To avoid any conflicts with HSREGEX included with Apache, +each function name has been changed. Therefore, mb regex functions are +named differently from the original ereg functions in PHP. The character +encoding used in mb regex is configured in i18n.internal_encoding. + + +o Binary Output + +If http output encoding is set to other than 'pass', conversion of encoding +from internal encoding to http output is done automatically. Thus, +if you prefer to spit out anything in raw binary format, your data +may be corrupted. In such event, set http_output to 'pass'. + +ex. + <? + i18n_http_output("pass"); + ... + echo $the_binary_data_string; + ?> + + +o Content-Type + +Depending on the setting of http_output, PHP will output the proper charset. +ex. Content-Type: text/html; charset="..." + +Be aware of following: + +- If you set Content-Type header using header() function, that will + override the automatic addition of charset. +- Be cautious when you set i18n_http_output, since if any output is + made prior to this, proper header may have been sent out to the + client already. + + +o In the event of trouble + +If you find any bugs or trouble, please contact us at the above address. +It may help us to track the problem if you send us the script as well. + +If you encounter any memory related error such as segmentation violation, +add --enable-debug when you run configure. This will give you more +detail information on where error has occurred. The error is stored +in the server log or regular http output in CGI mode. + + +o About Japanese encodings + +Due to historical reason, there are multiple character encodings used +for Japanese. The most common encodings are: SJIS, EUC, JIS, and UTF-8. +Here are (very) brief description of them: + +EUC + commonly used in UNIX environment + 8bit-8bit combo + always >=0x80 + +SJIS + commonly used in Mac or PCs + similar to EUC + mostly 8bit-8bit (some 8bit-7bit) + mostly >=0x80 + there are some halfwidth (size of ASCII) multibytes + +JIS + commonly used in 7bit environment (nntp and smtp) + starts with escaping char, \033 and a few more characters + +UTF-8 + 16bit+ encoding + defines many languages existing in this world + see http://www.unicode.org/ for more detail + +Because of having all these character encodings, PHP needs to translate +between these encodings on the fly. Also, the addition of the mb regex +library allows you to handle mb strings without fear of getting mb char +chopped in half. + +Since Japanese is not the only language with multiple encodings, we +encourage other developers to modify our code to suit your needs. We +definitely need people to work with Korean, Chinese (both traditional +and simplified), and Russian. Let us know if you are interested in +this project! + + + +========================================== + php3.ini setting +========================================== + +The following init options will allow you to change the default settings. +Define these settings in the global section of php3.ini. + +All keywords are case-insensitive. + +o Encoding naming + + For each encoding, there are three names: standarized, alias, MIME + + - UTF-8 + standard: UTF-8 + alias: N/A + mime: UTF-8 + + - ASCII + standard: ASCII + alias: N/A + mime: US-ASCII + + - Japanese EUC + standard: EUC-JP + alias: EUC, EUC_JP, eucJP, x-euc-jp + mime: EUC-JP + + - Shift JIS + standard: SJIS + alias: x-sjis, MS_Kanji + mime: Shift_JIS + + - JIS + standard: JIS + alias: N/A + mime: ISO-2022-JP + + - Quoted-Printable + standard: Quoted-Printable + alias: qprint + mime: N/A + + - BASE64 + standard: BASE64 + alias: N/A + mime: N/A + + - no conversion + standard: pass + alias: none + mime: N/A + + - auto encoding detection + standard: auto + alias: unknown + mime: N/A + + * N/A - Not Applicapable + +o i18n.http_output - default http output encoding + + i18n.http_output = EUC-JP|SJIS|JIS|UTF-8|pass + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + pass: no conversion + + The default is pass (internal encoding is used) + It can be re-configured on the fly using i18n_http_output(). + + +o i18n.internal_encoding - internal encoding + + i18n.internal_encoding = EUC-JP|SJIS|UTF-8 + EUC-JP : EUC + SJIS: SJIS + UTF-8: UTF-8 + + The default is EUC-JP. + + PHP parser is designed based on using ISO-8859-1. For other + encodings, following conditions have to be satisfied in order + to use them: + - per byte encoding + - single byte charactor in range of 00h-7fh which is compatible + with ASCII + - multibyte without 00h-7fh + In case of Japanese, EUC-JP and UTF-8 are the only encoding that + meets this criteria. + + If i18n.internal_encoding and i18n.http_output differs, conversion + takes place at the time of output. If you convert any data within + PHP scripts to URL encoding, BASE64 or Quoted-Printable, encoding + stays as defined in i18n.internal_encoding. Thus, if you would + prefer to encode in compliance with i18n.http_output, you need + to manually convert encoding. + + ex. $str = urlencode( i18n_convert($str, i18n_http_output()) ); + + Encoding such as ISO-2022-** and HZ encoding which uses escape + sequences can not be used as internal encoding. If used, they + result in following errors: + - parser pukes funky error + - magic_quotes_*** breaks encoding (SJIS may have similar problem) + - string manipulation and regex will malfunction + + +o i18n.script_encoding - script encoding + + i18n.script_encoding = auto|EUC-JP|SJIS|JIS|UTF-8 + auto: automatic + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + The default is auto. + The script's encoding is converted to i18n.internal_encoding before + entering the script parser. + + Be aware that auto detection may fail under some conditions. + For best auto detection, add multibyte charactor at begining of + script. + + +o i18n.http_input - handling of http input (GET/POST/COOKIE) + + i18n.http_input = pass|auto + auto: auto conversion + pass: no conversion + + The default is auto. + If set to pass, no conversion will take place. + If set to auto, it will automatically detect the encoding. If + detection is successful, it will convert to the proper internal + encoding. If not, it will assume the input as defined in + i18n.http_input_default. + +o i18n.http_input_default - default http input encoding + + i18n.http_input_default = pass|EUC-JP|SJIS|JIS|UTF-8 + pass: no conversion + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + The default is pass. + This option is only effective as long as i18n.http_input is set to + auto. If the auto detection fails, this encoding is used as an + assumption to convert the http input to the internal encoding. + If set to pass, no conversion will take place. + +o sample settings + + 1) For most flexibility, we recommend using following example. + i18n.http_output = SJIS + i18n.internal_encoding = EUC-JP + i18n.script_encoding = auto + i18n.http_input = auto + i18n.http_input_default = SJIS + + 2) To avoid unexpected encoding problems, try these: + + i18n.http_output = pass + i18n.internal_encoding = EUC-JP + i18n.script_encoding = pass + i18n.http_input = pass + i18n.http_input_default = pass + + + +========================================== + PHP functions +========================================== + +The following describes the additional PHP functions. + +All keywords are case-insensitive. + +o i18n_http_output(encoding) +o encoding = i18n_http_output() + + This will set the http output encoding. Any output following this + function will be controlled by this function. If no argument is given, + the current http output encode setting is returned. + + encodings + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + pass: no conversion + + NONE is not allowed + + +o encoding = i18n_internal_encoding() + + Returns the current internal encoding as a string. + + internal encoding + EUC-JP : EUC + SJIS: SJIS + UTF-8: UTF-8 + + +o encoding = i18n_http_input() + + Returns http input encoding. + + encodings + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + pass: no conversion (only if i18n.http_input is set to pass) + + +o string = i18n_convert(string, encoding) + string = i18n_convert(string, encoding, pre-conversion-encoding) + + Returns converted string in desired encoding. If + pre-conversion-encoding is not defined, the given + string is assumed to be in internal encoding. + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + pass: no conversion + + pre-conversion-encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + pass: no conversion + auto: auto detection + + +o encoding = i18n_discover_encoding(string) + + Encoding of the given string is returned (as a string). + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + ASCII: ASCII (only 09h, 0Ah, 0Dh, 20h-7Eh) + pass: unable to determine (text is too short to determine) + unknown: unknown or possible error + + +o int = mbstrlen(string) +o int = mbstrlen(string, encoding) + + Returns character length of a given string. If no encoding is defined, + the encoding of string is assumed to be the internal encoding. + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + auto: automatic + + +o int = mbstrpos(string1, string2) +o int = mbstrpos(string1, string2, start) +o int = mbstrpos(string1, string2, start, encoding) + + Same as strpos. If no encoding is defined, the encoding of string + is assumed to be the internal encoding. + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + +o int = mbstrrpos(string1, string2) +o int = mbstrrpos(string1, string2, encoding) + + Same as strrpos. If no encoding is defined, the encoding of string + is assumed to be the internal encoding. + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + +o string = mbsubstr(string, position) +o string = mbsubstr(string, position, length) +o string = mbsubstr(string, position, length, encoding) + + Same as substr. If no encoding is defined, the encoding of string + is assumed to be the internal encoding. + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + +o string = mbstrcut(string, position) +o string = mbstrcut(string, position, length) +o string = mbstrcut(string, position, length, encoding) + + Same as subcut. If position is the 2nd byte of a mb character, it will cut + from the first byte of that character. It will cut the string without + chopping a single byte from a mb character. In another words, if you + set length to 5, you will only get two mb characters. If no encoding + is defined, the encoding of string is assumed to be the internal encoding. + + encoding + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + +o string = i18n_mime_header_encode(string) + MIME encode the string in the format of =?ISO-2022-JP?B?[string]?=. + + +o string = i18n_mime_header_decode(string) + MIME decodes the string. + + +o string = i18n_ja_jp_hantozen(string) +o string = i18n_ja_jp_hantozen(string, option) +o string = i18n_ja_jp_hantozen(string, option, encoding) + + Conversion between full width character and halfwidth character. + + option + The following options are allowed. The default is "KV". + Acronym: FW = fullwidth, HW = halfwidth + + "r" : FW alphabet -> HW alphabet + + "R" : HW alphabet -> FW alphabet + + "n" : FW number -> HW number + + "N" : HW number -> FW number + + "a" : FW alpha numeric (21h-7Eh) -> HW alpha numeric + + "A" : HW alpha numeric (21h-7Eh) -> FW alpha numeric + + "k" : FW katakana -> HW katakana + + "K" : HW katakana -> FW katakana + + "h" : FW hiragana -> HW hiragana + + "H" : HW hiragana -> FW katakana + + "c" : FW katakana -> FW hiragana + + "C" : FW hiragana -> FW katakana + + "V" : merge dakuon character. only works with "K" and "H" option + + encoding + If no encoding is defined, the encoding of string is assumed to be + the internal encoding. + EUC-JP : EUC + SJIS: SJIS + JIS : JIS + UTF-8: UTF-8 + + +int = mbereg(regex_pattern, string, string) +int = mberegi(regex_pattern, string, string) + mb version of ereg() and eregi() + + +string = mbereg_replace(regex_pattern, string, string) +string = mberegi_replace(regex_pattern, string, string) + mb version of ereg_replace() and eregi_replace() + + +string_array = mbsplit(regex, string, limit) + mb version of split() + + + +========================================== + FAQ +========================================== + +Here, we have gathered some commonly asked questions on PHP-jp mailing +list. + +o To use Japanese in GET method + +If you need to assign Japanese text in GET method with argument, such as; +xxxx.php?data=<Japanese text>, use urlencode function in PHP. If not, +text may not be passed onto action php properly. + +ex: <a href="hoge.php?data=<? echo urlencode($data) ?>">Link</a> + + +o When passing data via GET/POST/COOKIE, \ character sneaks in + +When using SJIS as internal encoding, or passed-on data includes '"\, +PHP automatically inserts escaping character, \. Set magic_quotes_gpc +in php3.ini from On to Off. An alternative work around to this problem +is to use StripSlashes(). + +If $quote_str is in SJIS and you would like to extract Japanese text, +use ereg_replace as follows: + +ereg_replace(sprintf("([%c-%c%c-%c]\\\\)\\\\",0x81,0x9f,0xe0,0xfc), + "\\1",$quote_str); + +This will effectively extract Japanese text out of $quote_str. + + +o Sometimes, encoding detection fails + +If i18n_http_input() returns 'pass', it's likely that PHP failed to +detect whether it's SJIS or EUC. In such case, use <input type=hidden +value="some Japanese text"> to properly detect the incoming text's +encoding. + + + +========================================== + Japanese Manual +========================================== +Translated manual done by "PHP Japanese Manual Project" : + +http://www.php.net/manual/ja/manual.php + +Starting 3.0.18-i18n-ja, we have removed doc-jp from tarball package. + + +========================================== + Change Logs +========================================== + +o 2000-10-28, Rui Hirokawa <hirokawa@php.net> + +This patch is derived from php-3.0.15-i18n-ja as well as php-3.0.16 by +Kuwamura applied to original php-3.0.18. It also includes following fixes: + +1) allows you to set charset in mail(). +2) fixed mbregex definitions to avoid conflicts with system regex +3) php3.ini-dist now uses PASS for http_output instead of SJIS + +o 2000-11-24, Hironori Sato <satoh@yyplanet.com> + +Applied above patched and added detection for gdImageStringTTF in configure. +Following setups are known to work: + +gd-1.3-6, gd-devel-1.3-6, freetype-1.3.1-5, freetype-devel-1.3.1-5 + ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf", + i18n_convert("日本語", "UTF-8")); + ImageGif($im); + +gd-1.7.3-1k1, gd-devel-1.7.3-1k1, freetype-1.3.1-5, freetype-devel-1.3.1-5 + ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf","日本語"); + ImagePng($im); + * i18n_internal_encoding = EUC 又は SJIS + +For any gd libraries before 1.6.2, you need to use i18n_convert. For +gd-1.5.2/3, upgrade to anything above 1.7 to use ImageTTFText without +using i18n_convert. As long as you have internal_encoding set to EUC or +SJIS, ImageTTFText should work without mojibake. Again, make sure you +have i18n_http_output("pass") before calling ImageGif, ImagePng, ImageJpeg! + +o 2000-12-09, Rui Hirokawa <hirokawa@php.net> + +Fixed mail() which was causing segmentation fault when header was null. + |