| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update the JSON-LD keyword list to match JSON-LD 1.1
Changes in this patch:
* Update the JSON-LD URL to HTTPS
* Update the list of JSON-LD keywords
* Make the JSON-LD parser less dependent on the JSON lexer implementation
* Add unit tests for the JSON-LD lexer
* Add unit tests for the JSON parser
This includes:
* Testing valid literals
* Testing valid string escapes
* Testing that object keys are tokenized differently from string values
* Rewrite the JSON lexer
Related to #1425
Included in this change:
* The JSON parser is rewritten
* The JSON bare object parser no longer requires additional code
* `get_tokens_unprocessed()` returns as much as it can to reduce yields
(for example, side-by-side punctuation is not returned separately)
* The unit tests were updated
* Add unit tests based on Hypothesis test results
* Reduce HTML formatter memory consumption by ~33% and speed it up
Related to #1425
Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before
this patch and drops to only ~2GB with this patch. These were the command
lines used:
python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt
python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt
* Add an LRU cache to the HTML formatter's HTML-escaping and line-splitting
For a 118MB JSON input file, this reduces memory consumption by ~500MB
and reduces formatting time by ~15 seconds.
* JSON: Add a catastrophic backtracking test back to the test suite
* JSON: Update the comment that documents the internal queue
* JSON: Document in comments that ints/floats/constants are not validated
|