summaryrefslogtreecommitdiff
path: root/ext/mbstring/README_PHP3-i18n-ja
diff options
context:
space:
mode:
Diffstat (limited to 'ext/mbstring/README_PHP3-i18n-ja')
-rw-r--r--ext/mbstring/README_PHP3-i18n-ja774
1 files changed, 774 insertions, 0 deletions
diff --git a/ext/mbstring/README_PHP3-i18n-ja b/ext/mbstring/README_PHP3-i18n-ja
new file mode 100644
index 0000000..cac00b8
--- /dev/null
+++ b/ext/mbstring/README_PHP3-i18n-ja
@@ -0,0 +1,774 @@
+==========================================
+ README for I18N Package
+==========================================
+
+o Name and location of package
+
+Name: php-3.0.18-i18n-ja-2
+Location: http://www.happysize.co.jp/techie/php-ja-jp/
+ ftp://ftp.happysize.co.jp/php-ja-jp/
+ http://php.vdomains.org/
+ ftp://ftp.vdomains.org/pub/php-ja-jp/
+ http://php.jpnnet.com/
+
+Currently, this I18N version of PHP only adds Japanese support to base
+PHP. It allows you to use Japanese in scripts, as well as conversion
+between various Japanese encodings. It will work perfectly fine with
+ASCII with i18n option enabled. (note: executable is bit larger due
+to UNICODE table). The basic design aproach is to allow for other
+languages to be added in the future. Developers are encourage to join
+us!
+
+For more information on Japanese encodings, please refer to the
+section "Additional Notes."
+
+
+o What is this package?
+
+This package allows you to handle multiple Japanese encodings (SJIS, EUC,
+UTF-8, JIS) in PHP. If you find any bugs in this package, please report
+them to the appropriate mailing list. For now, the PHP-jp mailing list
+is the best place for this.
+
+PHP-jp ML mailto:PHP-jp@sidecar.ics.es.osaka-u.ac.jp
+ http://sidecar.ics.es.osaka-u.ac.jp/php-jp/
+ (discussions are in Japanese)
+
+
+o Who should use this
+
+Due to lack of documentation, it's not intended for beginners. If
+something goes wrong, be prepared to fix it on your own.
+
+
+o Warranty and Copyright
+
+There is no warranty with this package. Use it at your own risk.
+
+Please refer to the source code for the copyrights. In general, each
+program's copyright is owned by the programmer. Unless you obey the
+copyright holders restrictions, you are not allowed to use it in any
+form.
+
+
+o Redistribution
+
+As described in the source code, this package and the components are
+allowed to be redistributed with certain restrictions.
+
+Due to this package being still in beta, please try to redistribute
+it as an entire package. Please try not to distribute it as a form
+of patch. Because we would prefer to have this package distributed
+as one single package (not patch of patch of patch), avoid releasing
+any patch to this package.
+
+
+o Who made this
+
+A team of volunteers, PHP3 Internationalization, has been contributing
+their free time producing it. Although we are not related to the core
+PHP programmers, we are hoping to have our modifications merged into the
+core distribution in the near future. Thus, we did not call this a
+"Japanese Patch" (or distribution). Our final goal is to have true
+i18nized PHP!
+
+For anyone interested in this project, please drop us a line.
+
+Contact Address:
+ phpj-dev@kage.net
+ (Discussions are in Japanese, but feel free to write us in English)
+
+Webpage (English and Japanese):
+ http://php.jpnnet.com/
+
+Project Outline (Japanese):
+ http://www.happysize.co.jp/techie/php-ja-jp/spec.htm
+
+Developers:
+ Hironori Sato <satoh@jpnnet.com>
+ Shigeru Kanemoto <sgk@happysize.co.jp>
+ Tsukada Takuya <tsukada@fminn.nagano.nagano.jp>
+ U. Kenkichi <kenkichi@axes.co.jp>
+ Tateyama <tateyan@amy.hi-ho.ne.jp>
+ Other gracious contributors
+
+
+o Future plans
+
+- fulfilling what's written in outline
+- support for other languages other than Japanese
+- make the character conversion as a library (?)
+- more testing
+
+
+o Special Thanks to
+
+PHP Japanese webpage maintainer, Hirokawa-san
+ http://www.cityfujisawa.ne.jp/%7Elouis/apps/phpfi/
+PHP-JP ML's Yamamoto-san
+ http://sidecar.ics.es.osaka-u.ac.jp/php-jp/
+Previous jp-patch developers
+
+
+
+==========================================
+ Advantages of using I18N package
+==========================================
+
+- allows you to use various character encodings for script files and
+ http output
+- distinguish character encoding in POST/GET/COOKIE
+- proper mail output using JIS as body and MIME/Base64/JIS subject
+- if http output's Content-Type is text/html, it will set proper charset
+- stable character encoding conversion
+- multibyte regex
+
+
+
+==========================================
+ Installation
+==========================================
+
+o Summary
+
+Add --enable-i18n option when running configure. For your own setup,
+add any other appropriate options as well.
+
+Don't forget to copy php3.ini-dist to desired location.
+(ex. /usr/local/lib/php3.ini)
+
+If you have already installed PHP3, copy all the entries in php3.ini-dist
+which start with "i18n.xxxx" to php3.ini.
+
+
+o configure option
+ --enable-i18n
+ include i18n features
+
+ --enable-mbregex
+ include multibyte regex library
+ (without i18n enabled, mbregex functions will not function)
+
+
+o creating cgi version
+
+ % tar xvzf php-3.0.18-i18n-ja-2.tar.gz
+ % cd php-3.0.18-i18n-ja-2
+ % ./configure --enable-i18n --enable-mbregex
+ % make
+
+
+o creating Apache version (regular module)
+
+ % tar xvzf php-3.0.18-i18n-ja-2.tar.gz
+ % tar xvzf apache_1.3.x.tar.gz
+ % cd apache_1.3.x
+ % ./configure
+ % cd ../php-3.0.18-i18n-ja-2
+ % ./configure --with-apache=../apache_1.3.x --enable-i18n --enable-mbregex
+ % make
+ % make install
+ % cd ../apache_1.3.x
+ % ./configure --activate-module=src/modules/php3/libphp3.a
+ % make
+ % make install
+
+
+o creating Apache DSO version
+
+ create DSO capable Apache first
+ % tar xvzf apache_1.3.x.tar.gz
+ % cd apache-1.3.x
+ % ./configure --enable-shared=max
+ % make
+ % make install
+
+ now create php3
+ % cd php-3.0.18-i18n-ja-2
+ % ./configure --with-apxs=/usr/local/apache/bin/apxs --enable-i18n \
+ --enable-mbregex
+ % make
+ % make install
+
+
+==========================================
+ Additional Notes
+==========================================
+
+o Multibyte regex library
+
+From beta4, we have included the multibyte (mb) regex library which comes with
+Ruby. With this addition, you can now use regex in EUC, SJIS and UTF-8
+encoding. To avoid any conflicts with HSREGEX included with Apache,
+each function name has been changed. Therefore, mb regex functions are
+named differently from the original ereg functions in PHP. The character
+encoding used in mb regex is configured in i18n.internal_encoding.
+
+
+o Binary Output
+
+If http output encoding is set to other than 'pass', conversion of encoding
+from internal encoding to http output is done automatically. Thus,
+if you prefer to spit out anything in raw binary format, your data
+may be corrupted. In such event, set http_output to 'pass'.
+
+ex.
+ <?
+ i18n_http_output("pass");
+ ...
+ echo $the_binary_data_string;
+ ?>
+
+
+o Content-Type
+
+Depending on the setting of http_output, PHP will output the proper charset.
+ex. Content-Type: text/html; charset="..."
+
+Be aware of following:
+
+- If you set Content-Type header using header() function, that will
+ override the automatic addition of charset.
+- Be cautious when you set i18n_http_output, since if any output is
+ made prior to this, proper header may have been sent out to the
+ client already.
+
+
+o In the event of trouble
+
+If you find any bugs or trouble, please contact us at the above address.
+It may help us to track the problem if you send us the script as well.
+
+If you encounter any memory related error such as segmentation violation,
+add --enable-debug when you run configure. This will give you more
+detail information on where error has occurred. The error is stored
+in the server log or regular http output in CGI mode.
+
+
+o About Japanese encodings
+
+Due to historical reason, there are multiple character encodings used
+for Japanese. The most common encodings are: SJIS, EUC, JIS, and UTF-8.
+Here are (very) brief description of them:
+
+EUC
+ commonly used in UNIX environment
+ 8bit-8bit combo
+ always >=0x80
+
+SJIS
+ commonly used in Mac or PCs
+ similar to EUC
+ mostly 8bit-8bit (some 8bit-7bit)
+ mostly >=0x80
+ there are some halfwidth (size of ASCII) multibytes
+
+JIS
+ commonly used in 7bit environment (nntp and smtp)
+ starts with escaping char, \033 and a few more characters
+
+UTF-8
+ 16bit+ encoding
+ defines many languages existing in this world
+ see http://www.unicode.org/ for more detail
+
+Because of having all these character encodings, PHP needs to translate
+between these encodings on the fly. Also, the addition of the mb regex
+library allows you to handle mb strings without fear of getting mb char
+chopped in half.
+
+Since Japanese is not the only language with multiple encodings, we
+encourage other developers to modify our code to suit your needs. We
+definitely need people to work with Korean, Chinese (both traditional
+and simplified), and Russian. Let us know if you are interested in
+this project!
+
+
+
+==========================================
+ php3.ini setting
+==========================================
+
+The following init options will allow you to change the default settings.
+Define these settings in the global section of php3.ini.
+
+All keywords are case-insensitive.
+
+o Encoding naming
+
+ For each encoding, there are three names: standarized, alias, MIME
+
+ - UTF-8
+ standard: UTF-8
+ alias: N/A
+ mime: UTF-8
+
+ - ASCII
+ standard: ASCII
+ alias: N/A
+ mime: US-ASCII
+
+ - Japanese EUC
+ standard: EUC-JP
+ alias: EUC, EUC_JP, eucJP, x-euc-jp
+ mime: EUC-JP
+
+ - Shift JIS
+ standard: SJIS
+ alias: x-sjis, MS_Kanji
+ mime: Shift_JIS
+
+ - JIS
+ standard: JIS
+ alias: N/A
+ mime: ISO-2022-JP
+
+ - Quoted-Printable
+ standard: Quoted-Printable
+ alias: qprint
+ mime: N/A
+
+ - BASE64
+ standard: BASE64
+ alias: N/A
+ mime: N/A
+
+ - no conversion
+ standard: pass
+ alias: none
+ mime: N/A
+
+ - auto encoding detection
+ standard: auto
+ alias: unknown
+ mime: N/A
+
+ * N/A - Not Applicapable
+
+o i18n.http_output - default http output encoding
+
+ i18n.http_output = EUC-JP|SJIS|JIS|UTF-8|pass
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ pass: no conversion
+
+ The default is pass (internal encoding is used)
+ It can be re-configured on the fly using i18n_http_output().
+
+
+o i18n.internal_encoding - internal encoding
+
+ i18n.internal_encoding = EUC-JP|SJIS|UTF-8
+ EUC-JP : EUC
+ SJIS: SJIS
+ UTF-8: UTF-8
+
+ The default is EUC-JP.
+
+ PHP parser is designed based on using ISO-8859-1. For other
+ encodings, following conditions have to be satisfied in order
+ to use them:
+ - per byte encoding
+ - single byte charactor in range of 00h-7fh which is compatible
+ with ASCII
+ - multibyte without 00h-7fh
+ In case of Japanese, EUC-JP and UTF-8 are the only encoding that
+ meets this criteria.
+
+ If i18n.internal_encoding and i18n.http_output differs, conversion
+ takes place at the time of output. If you convert any data within
+ PHP scripts to URL encoding, BASE64 or Quoted-Printable, encoding
+ stays as defined in i18n.internal_encoding. Thus, if you would
+ prefer to encode in compliance with i18n.http_output, you need
+ to manually convert encoding.
+
+ ex. $str = urlencode( i18n_convert($str, i18n_http_output()) );
+
+ Encoding such as ISO-2022-** and HZ encoding which uses escape
+ sequences can not be used as internal encoding. If used, they
+ result in following errors:
+ - parser pukes funky error
+ - magic_quotes_*** breaks encoding (SJIS may have similar problem)
+ - string manipulation and regex will malfunction
+
+
+o i18n.script_encoding - script encoding
+
+ i18n.script_encoding = auto|EUC-JP|SJIS|JIS|UTF-8
+ auto: automatic
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+ The default is auto.
+ The script's encoding is converted to i18n.internal_encoding before
+ entering the script parser.
+
+ Be aware that auto detection may fail under some conditions.
+ For best auto detection, add multibyte charactor at begining of
+ script.
+
+
+o i18n.http_input - handling of http input (GET/POST/COOKIE)
+
+ i18n.http_input = pass|auto
+ auto: auto conversion
+ pass: no conversion
+
+ The default is auto.
+ If set to pass, no conversion will take place.
+ If set to auto, it will automatically detect the encoding. If
+ detection is successful, it will convert to the proper internal
+ encoding. If not, it will assume the input as defined in
+ i18n.http_input_default.
+
+o i18n.http_input_default - default http input encoding
+
+ i18n.http_input_default = pass|EUC-JP|SJIS|JIS|UTF-8
+ pass: no conversion
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+ The default is pass.
+ This option is only effective as long as i18n.http_input is set to
+ auto. If the auto detection fails, this encoding is used as an
+ assumption to convert the http input to the internal encoding.
+ If set to pass, no conversion will take place.
+
+o sample settings
+
+ 1) For most flexibility, we recommend using following example.
+ i18n.http_output = SJIS
+ i18n.internal_encoding = EUC-JP
+ i18n.script_encoding = auto
+ i18n.http_input = auto
+ i18n.http_input_default = SJIS
+
+ 2) To avoid unexpected encoding problems, try these:
+
+ i18n.http_output = pass
+ i18n.internal_encoding = EUC-JP
+ i18n.script_encoding = pass
+ i18n.http_input = pass
+ i18n.http_input_default = pass
+
+
+
+==========================================
+ PHP functions
+==========================================
+
+The following describes the additional PHP functions.
+
+All keywords are case-insensitive.
+
+o i18n_http_output(encoding)
+o encoding = i18n_http_output()
+
+ This will set the http output encoding. Any output following this
+ function will be controlled by this function. If no argument is given,
+ the current http output encode setting is returned.
+
+ encodings
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ pass: no conversion
+
+ NONE is not allowed
+
+
+o encoding = i18n_internal_encoding()
+
+ Returns the current internal encoding as a string.
+
+ internal encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ UTF-8: UTF-8
+
+
+o encoding = i18n_http_input()
+
+ Returns http input encoding.
+
+ encodings
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ pass: no conversion (only if i18n.http_input is set to pass)
+
+
+o string = i18n_convert(string, encoding)
+ string = i18n_convert(string, encoding, pre-conversion-encoding)
+
+ Returns converted string in desired encoding. If
+ pre-conversion-encoding is not defined, the given
+ string is assumed to be in internal encoding.
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ pass: no conversion
+
+ pre-conversion-encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ pass: no conversion
+ auto: auto detection
+
+
+o encoding = i18n_discover_encoding(string)
+
+ Encoding of the given string is returned (as a string).
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ ASCII: ASCII (only 09h, 0Ah, 0Dh, 20h-7Eh)
+ pass: unable to determine (text is too short to determine)
+ unknown: unknown or possible error
+
+
+o int = mbstrlen(string)
+o int = mbstrlen(string, encoding)
+
+ Returns character length of a given string. If no encoding is defined,
+ the encoding of string is assumed to be the internal encoding.
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+ auto: automatic
+
+
+o int = mbstrpos(string1, string2)
+o int = mbstrpos(string1, string2, start)
+o int = mbstrpos(string1, string2, start, encoding)
+
+ Same as strpos. If no encoding is defined, the encoding of string
+ is assumed to be the internal encoding.
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+
+o int = mbstrrpos(string1, string2)
+o int = mbstrrpos(string1, string2, encoding)
+
+ Same as strrpos. If no encoding is defined, the encoding of string
+ is assumed to be the internal encoding.
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+
+o string = mbsubstr(string, position)
+o string = mbsubstr(string, position, length)
+o string = mbsubstr(string, position, length, encoding)
+
+ Same as substr. If no encoding is defined, the encoding of string
+ is assumed to be the internal encoding.
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+
+o string = mbstrcut(string, position)
+o string = mbstrcut(string, position, length)
+o string = mbstrcut(string, position, length, encoding)
+
+ Same as subcut. If position is the 2nd byte of a mb character, it will cut
+ from the first byte of that character. It will cut the string without
+ chopping a single byte from a mb character. In another words, if you
+ set length to 5, you will only get two mb characters. If no encoding
+ is defined, the encoding of string is assumed to be the internal encoding.
+
+ encoding
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+
+o string = i18n_mime_header_encode(string)
+ MIME encode the string in the format of =?ISO-2022-JP?B?[string]?=.
+
+
+o string = i18n_mime_header_decode(string)
+ MIME decodes the string.
+
+
+o string = i18n_ja_jp_hantozen(string)
+o string = i18n_ja_jp_hantozen(string, option)
+o string = i18n_ja_jp_hantozen(string, option, encoding)
+
+ Conversion between full width character and halfwidth character.
+
+ option
+ The following options are allowed. The default is "KV".
+ Acronym: FW = fullwidth, HW = halfwidth
+
+ "r" : FW alphabet -> HW alphabet
+
+ "R" : HW alphabet -> FW alphabet
+
+ "n" : FW number -> HW number
+
+ "N" : HW number -> FW number
+
+ "a" : FW alpha numeric (21h-7Eh) -> HW alpha numeric
+
+ "A" : HW alpha numeric (21h-7Eh) -> FW alpha numeric
+
+ "k" : FW katakana -> HW katakana
+
+ "K" : HW katakana -> FW katakana
+
+ "h" : FW hiragana -> HW hiragana
+
+ "H" : HW hiragana -> FW katakana
+
+ "c" : FW katakana -> FW hiragana
+
+ "C" : FW hiragana -> FW katakana
+
+ "V" : merge dakuon character. only works with "K" and "H" option
+
+ encoding
+ If no encoding is defined, the encoding of string is assumed to be
+ the internal encoding.
+ EUC-JP : EUC
+ SJIS: SJIS
+ JIS : JIS
+ UTF-8: UTF-8
+
+
+int = mbereg(regex_pattern, string, string)
+int = mberegi(regex_pattern, string, string)
+ mb version of ereg() and eregi()
+
+
+string = mbereg_replace(regex_pattern, string, string)
+string = mberegi_replace(regex_pattern, string, string)
+ mb version of ereg_replace() and eregi_replace()
+
+
+string_array = mbsplit(regex, string, limit)
+ mb version of split()
+
+
+
+==========================================
+ FAQ
+==========================================
+
+Here, we have gathered some commonly asked questions on PHP-jp mailing
+list.
+
+o To use Japanese in GET method
+
+If you need to assign Japanese text in GET method with argument, such as;
+xxxx.php?data=<Japanese text>, use urlencode function in PHP. If not,
+text may not be passed onto action php properly.
+
+ex: <a href="hoge.php?data=<? echo urlencode($data) ?>">Link</a>
+
+
+o When passing data via GET/POST/COOKIE, \ character sneaks in
+
+When using SJIS as internal encoding, or passed-on data includes '"\,
+PHP automatically inserts escaping character, \. Set magic_quotes_gpc
+in php3.ini from On to Off. An alternative work around to this problem
+is to use StripSlashes().
+
+If $quote_str is in SJIS and you would like to extract Japanese text,
+use ereg_replace as follows:
+
+ereg_replace(sprintf("([%c-%c%c-%c]\\\\)\\\\",0x81,0x9f,0xe0,0xfc),
+ "\\1",$quote_str);
+
+This will effectively extract Japanese text out of $quote_str.
+
+
+o Sometimes, encoding detection fails
+
+If i18n_http_input() returns 'pass', it's likely that PHP failed to
+detect whether it's SJIS or EUC. In such case, use <input type=hidden
+value="some Japanese text"> to properly detect the incoming text's
+encoding.
+
+
+
+==========================================
+ Japanese Manual
+==========================================
+Translated manual done by "PHP Japanese Manual Project" :
+
+http://www.php.net/manual/ja/manual.php
+
+Starting 3.0.18-i18n-ja, we have removed doc-jp from tarball package.
+
+
+==========================================
+ Change Logs
+==========================================
+
+o 2000-10-28, Rui Hirokawa <hirokawa@php.net>
+
+This patch is derived from php-3.0.15-i18n-ja as well as php-3.0.16 by
+Kuwamura applied to original php-3.0.18. It also includes following fixes:
+
+1) allows you to set charset in mail().
+2) fixed mbregex definitions to avoid conflicts with system regex
+3) php3.ini-dist now uses PASS for http_output instead of SJIS
+
+o 2000-11-24, Hironori Sato <satoh@yyplanet.com>
+
+Applied above patched and added detection for gdImageStringTTF in configure.
+Following setups are known to work:
+
+gd-1.3-6, gd-devel-1.3-6, freetype-1.3.1-5, freetype-devel-1.3.1-5
+ ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf",
+ i18n_convert("日本語", "UTF-8"));
+ ImageGif($im);
+
+gd-1.7.3-1k1, gd-devel-1.7.3-1k1, freetype-1.3.1-5, freetype-devel-1.3.1-5
+ ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf","日本語");
+ ImagePng($im);
+ * i18n_internal_encoding = EUC 又は SJIS
+
+For any gd libraries before 1.6.2, you need to use i18n_convert. For
+gd-1.5.2/3, upgrade to anything above 1.7 to use ImageTTFText without
+using i18n_convert. As long as you have internal_encoding set to EUC or
+SJIS, ImageTTFText should work without mojibake. Again, make sure you
+have i18n_http_output("pass") before calling ImageGif, ImagePng, ImageJpeg!
+
+o 2000-12-09, Rui Hirokawa <hirokawa@php.net>
+
+Fixed mail() which was causing segmentation fault when header was null.
+