| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Cleanup invalid characters and typos.
Fixup special characters.
|
| |
|
|
|
|
|
|
| |
Cleanup special rearly used characters.
Regular characters closer to formal document[1].
[1] https://hebrew-academy.org.il/wp-content/uploads/taatik-ivrit-latinit-1-1.pdf
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
See PEP 484 (for typing) and PEP 561 (for distributing types).
|
| |
|
|
|
|
| |
This avoids exceptions raised by errors='strict' from displaying as
"During handling of the above exception ..." in the backtrace which
can be confusing.
|
| | |
|
| |
|
|
| |
See 35295352.
|
| |
|
|
|
|
|
| |
To make use of the new 'errors' argument.
It seems that '[?] ' (with space) was used for code points that were assigned,
but the replacement was not known.
|
| |
|
|
|
|
|
|
|
| |
To make use of the new 'errors' argument.
'' was used in the original Perl tables both to mean an unknown replacement and
an intentional replacement with and empty string. Here I only replace it in
ranges I've added later where I'm reasonably sure that '' means unknown
replacement.
|
| |
|
|
|
|
| |
To make use of the new 'errors' argument.
It seems '[?]' was used in the original Perl tables for unassigned codepoints.
|
| | |
|
| | |
|
| |
|
|
| |
This implements the idea in https://github.com/avian2/unidecode/pull/53
|
| |
|
|
| |
See https://github.com/avian2/unidecode/issues/57
|
| |
|
|
|
|
| |
Content of this commit by Marcoffee (Marco Ribeiro) on GitHub.
https://github.com/marcoffee/unidecode/commit/705d91ad4c9c7755529d4be025170b11922f1dee
|
| |
|
|
|
|
|
|
|
|
| |
This adds:
- SQUARED LATIN CAPITAL LETTERs,
- NEGATIVE CIRCLED LATIN CAPITAL LETTERs,
- NEGATIVE SQUARED LATIN CAPITAL LETTERs,
- TORTOISE SHELL BRACKETED LATIN CAPITAL LETTERs and
- CIRCLED ITALIC LATIN CAPITAL LETTERs
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Allows running the following command to execute the CLI
$ python -m unidecode ...
https://docs.python.org/3/library/__main__.html
> For a package, the same effect can be achieved by including a
> __main__.py module, the contents of which will be executed when the
> module is run with -m.
|
| |\ \ |
|
| | |/
| |
| |
| |
| | |
Simpler and more forward compatible. The b prefix syntax is available on
all supported Pythons.
|
| |\ \ |
|
| | |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The Python project considers the optparse module as deprecated. See:
https://docs.python.org/3/library/optparse.html
> Deprecated since version 3.2: The optparse module is deprecated and
> will not be developed further; development will continue with the
> argparse module.
Replace the project's use with the newer argparse. The CLI is fully
equivalent and should not result in any backwards comparability
concerns.
https://docs.python.org/3/library/argparse.html
|
| |/ |
|
| | |
|
| |
|
|
|
|
|
| |
These codepoints are defined as "Greek small letter mu" and a Latin capital
letter, not with spelled-out unit names.
"u" is a common way of representing "micro" SI prefix in ASCII.
|
| |
|
|
| |
https://unicode-table.com/en/blocks/phonetic-extensions/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert double letter translation to capital letter as very hard to understand
what the translation is because of duplicate, for example:
kh - is it k and h or kh?
tskh - is it t,s,kh or ts,k,h or ts,kh, etc...
0xa2
Hebrew bible puncheation mark, should be ignored.
0xc6
Opposite Nun, same as 'n'.
0xba
Hulam Haser, vawel as 'o'.
0xbf
Makaf Raphe, same as Makaf (0xbe).
0xc5
Hebrew bible puncheation mark, should be ignored.
0xc7
Makaf katan, vowel as 'o'.
0xd0
Aleph, sounds as AHA must exist to make string readbale.
Distinguish from '`' use capital A to distinguish from 'a' vowel.
0xf5
Splitted Vave, same as 'v'.
0xf6
Opposite Nun, same as 'n'.
0xf7
Small Kuf, same as 'q'.
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
|
| | |
|
| |
|
|
| |
Goal is to avoid incorrect combination with adjacent numbers.
|
| | |
|
| |
|
| |
U+05be is the Hebrew Maqaf character, which is equivalent to a hyphen, as explained in https://en.wikipedia.org/wiki/Hebrew_punctuation#Hyphen_and_maqaf.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
Includes "DIGIT ... COMMA" and "PARANTHESIZED LATIN CAPITAL LETTER" characters.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
Also, add unidecode_expect_nonascii. "unidecode" is now an alias for
"unidecode_expect_ascii"
|
| | |
|
| | |
|
| | |
|
| | |
|