summaryrefslogtreecommitdiff
path: root/unidecode
Commit message (Collapse)AuthorAgeFilesLines
* Improve Yiddish conversionAlon Bar-Lev2021-08-311-22/+22
| | | | | | Cleanup invalid characters and typos. Fixup special characters.
* Improve Hebrew conversionAlon Bar-Lev2021-08-311-14/+14
| | | | | | | | Cleanup special rearly used characters. Regular characters closer to formal document[1]. [1] https://hebrew-academy.org.il/wp-content/uploads/taatik-ivrit-latinit-1-1.pdf
* Remove __init__.pyiTomaz Solc2021-02-052-10/+4
|
* More Python 2 compatibility code clean up.Tomaz Solc2021-02-052-12/+2
|
* Move Py3.5-compatible type annotations inline.Tomaz Solc2021-02-052-8/+4
|
* Drop support for Python 2 and 3.4.Tomaz Solc2021-02-051-17/+7
|
* Add Typing stubs for the main API.Pascal Corpet2021-02-032-0/+11
| | | | See PEP 484 (for typing) and PEP 561 (for distributing types).
* Avoid exception chaining on Python 3.Tomaz Solc2021-01-081-4/+7
| | | | | | This avoids exceptions raised by errors='strict' from displaying as "During handling of the above exception ..." in the backtrace which can be confusing.
* Rename argument replace_char -> replace_strTomaz Solc2020-12-201-9/+9
|
* More mass replace '' -> NoneTomaz Solc2020-12-204-757/+757
| | | | See 35295352.
* Mass replace '[?] ' -> NoneTomaz Solc2020-12-2079-795/+795
| | | | | | | To make use of the new 'errors' argument. It seems that '[?] ' (with space) was used for code points that were assigned, but the replacement was not known.
* Mass replace '' -> None.Tomaz Solc2020-12-204-395/+395
| | | | | | | | | To make use of the new 'errors' argument. '' was used in the original Perl tables both to mean an unknown replacement and an intentional replacement with and empty string. Here I only replace it in ranges I've added later where I'm reasonably sure that '' means unknown replacement.
* Mass replace '[?]' -> NoneTomaz Solc2020-12-2045-3319/+3319
| | | | | | To make use of the new 'errors' argument. It seems '[?]' was used in the original Perl tables for unassigned codepoints.
* Add missing ligatures and quotes in U+1F6xx rangeTomaz Solc2020-12-201-0/+258
|
* Add missing quotation marks in the U+27xx range.Tomaz Solc2020-12-201-5/+5
|
* Add errors parameter to unidecode()Tomaz Solc2020-12-201-34/+76
| | | | This implements the idea in https://github.com/avian2/unidecode/pull/53
* Fix U+204A "TIRONIAN SIGN ET"Tomaz Solc2020-12-061-1/+5
| | | | See https://github.com/avian2/unidecode/issues/57
* Add some missing replacements in U+23xx page.Tomaz Solc2020-05-281-43/+43
| | | | | | Content of this commit by Marcoffee (Marco Ribeiro) on GitHub. https://github.com/marcoffee/unidecode/commit/705d91ad4c9c7755529d4be025170b11922f1dee
* Add more latin variants in U+1F1xx page.Tomaz Solc2019-01-191-81/+81
| | | | | | | | | | This adds: - SQUARED LATIN CAPITAL LETTERs, - NEGATIVE CIRCLED LATIN CAPITAL LETTERs, - NEGATIVE SQUARED LATIN CAPITAL LETTERs, - TORTOISE SHELL BRACKETED LATIN CAPITAL LETTERs and - CIRCLED ITALIC LATIN CAPITAL LETTERs
* Merge remote-tracking branch 'jdufresne/main'Tomaz Solc2019-01-191-0/+3
|\
| * Add __main__.py file so the CLI can be executed as a moduleJon Dufresne2018-12-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Allows running the following command to execute the CLI $ python -m unidecode ... https://docs.python.org/3/library/__main__.html > For a package, the same effect can be achieved by including a > __main__.py module, the contents of which will be executed when the > module is run with -m.
* | Merge remote-tracking branch 'jdufresne/b-literal'Tomaz Solc2019-01-191-1/+1
|\ \
| * | Replace string literal + encode with bytes literalJon Dufresne2018-12-311-1/+1
| |/ | | | | | | | | Simpler and more forward compatible. The b prefix syntax is available on all supported Pythons.
* | Merge remote-tracking branch 'jdufresne/argparse'Tomaz Solc2019-01-191-12/+13
|\ \
| * | Replace use of deprecated optparse with argparseJon Dufresne2018-12-311-12/+13
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Python project considers the optparse module as deprecated. See: https://docs.python.org/3/library/optparse.html > Deprecated since version 3.2: The optparse module is deprecated and > will not be developed further; development will continue with the > argparse module. Replace the project's use with the newer argparse. The CLI is fully equivalent and should not result in any backwards comparability concerns. https://docs.python.org/3/library/argparse.html
* | Remove unused import from unidecode/util.pyJon Dufresne2018-12-311-1/+0
|/
* Fix "SQUARE V OVER M" and "SQUARE A OVER M".Tomaz Solc2018-06-191-2/+2
|
* Use uA instead of microampere, etc.Tomaz Solc2018-06-191-8/+8
| | | | | | | These codepoints are defined as "Greek small letter mu" and a Latin capital letter, not with spelled-out unit names. "u" is a common way of representing "micro" SI prefix in ASCII.
* Adds decoding for phonetic bloc 1D00—1D7Folau2018-04-031-86/+86
| | | | https://unicode-table.com/en/blocks/phonetic-extensions/
* Improve Hebrew conversionAlon Bar-Lev2018-03-101-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert double letter translation to capital letter as very hard to understand what the translation is because of duplicate, for example: kh - is it k and h or kh? tskh - is it t,s,kh or ts,k,h or ts,kh, etc... 0xa2 Hebrew bible puncheation mark, should be ignored. 0xc6 Opposite Nun, same as 'n'. 0xba Hulam Haser, vawel as 'o'. 0xbf Makaf Raphe, same as Makaf (0xbe). 0xc5 Hebrew bible puncheation mark, should be ignored. 0xc7 Makaf katan, vowel as 'o'. 0xd0 Aleph, sounds as AHA must exist to make string readbale. Distinguish from '`' use capital A to distinguish from 'a' vowel. 0xf5 Splitted Vave, same as 'v'. 0xf6 Opposite Nun, same as 'n'. 0xf7 Small Kuf, same as 'q'. Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
* Fix syntax error in an exampleJakub Wilk2018-02-191-1/+1
|
* Surround fractions with spacesJeffrey Gerard2017-10-101-3/+3
| | | | Goal is to avoid incorrect combination with adjacent numbers.
* Add currency translations for U+20B0 through U+20BFMike Swanson2017-09-221-16/+16
|
* U+05be is a hyphenMicha Moskovic2017-06-231-1/+1
| | | U+05be is the Hebrew Maqaf character, which is equivalent to a hyphen, as explained in https://en.wikipedia.org/wiki/Hebrew_punctuation#Hyphen_and_maqaf.
* U+2116 is the numero signAlan Davidson2017-01-161-1/+1
|
* Add missing square unit symbols.Tomaz Solc2016-11-042-10/+11
|
* Added latin variants in U+20xx and U+21xx pages.Tomaz Solc2016-11-042-28/+28
|
* Fix U+02B1 MODIFIER LETTER SMALL H WITH HOOKTomaz Solc2016-11-041-1/+1
|
* Fix U+205F MEDIUM MATHEMATICAL SPACETomaz Solc2016-11-041-1/+1
|
* Add U+1F1xx page.Tomaz Solc2016-10-121-0/+258
| | | | Includes "DIGIT ... COMMA" and "PARANTHESIZED LATIN CAPITAL LETTER" characters.
* Add missing vulgar fractions.Tomaz Solc2016-10-121-4/+4
|
* Add a/c, a/s, c/o, c/uTomaz Solc2016-10-121-4/+4
|
* Fix transliteration of enclosed alphanumericsKrzysztof Jurewicz2016-05-291-46/+47
|
* Fix typosJakub Wilk2015-12-101-1/+1
|
* Fix docstringsTomaz Solc2015-11-171-9/+15
|
* Rename unidecode_fast to unidecode_expect_asciiTomaz Solc2015-11-171-4/+8
| | | | | Also, add unidecode_expect_nonascii. "unidecode" is now an alias for "unidecode_expect_ascii"
* Add unidecode_fast function to speedup mostly-ASCII transliterations.dukebody2015-11-141-6/+30
|
* Add a newline if the string comes from commandlineTomaz Solc2015-05-141-0/+4
|
* Don't append an extra new-line.Tomaz Solc2015-05-131-1/+1
|
* Add -c command line option.Tomaz Solc2015-05-131-6/+18
|