summaryrefslogtreecommitdiff
path: root/markdown/inlinepatterns.py
Commit message (Collapse)AuthorAgeFilesLines
* Use pyspelling to check spelling.Waylan Limberg2023-04-061-32/+32
| | | In addition to checking the spelling in our documentation, we are now also checking the spelling of the README.md and similar files as well as comments in our Python code.
* Improve standalone * and _ parsing.Waylan Limberg2022-11-151-1/+1
| | | | | | | | | | | The `NOT_STRONG_RE` regex matchs 1, 2, or 3 * or _ which are surrounded by white space to prevent them from being parsed as tokens. However, the surrounding white space should not be consumed by the regex, which is why lookhead and lookbehind assertions are used. As `^` cannot be matched in a lookbehind assertion, it is left outside the assertion, but as it is zero length, that should not matter. Tests added and/or updated to cover various edge cases. Fixes #1300.
* Remove previously deprecated objectsWaylan Limberg2022-05-271-6/+0
| | | This completely removes all objects which were deprecated in version 3.0 (this change will be included in version 3.4). Given the time that has passed, and the fact that older unmaintained extensions are not likely to support the new minimum Python version, this is little concern about breaking older extensions.
* [style]: fix various typos in docstrings and commentsFlorian Best2022-03-181-1/+1
|
* Improve email address validation for Automatic LinksCarlos2021-08-111-2/+2
|
* Fix minor grammatical errorTani N-K2021-02-151-1/+1
| | | | Corrected "shorte" to "short"
* Support short reference image links.Waylan Limberg2020-07-011-0/+11
| | | | Fixes #894.
* Fix issues with complex emphasisfacelessuser2020-06-221-2/+2
| | | | | Resolves issue that can occur with complex emphasis combinations. Fixes #979
* Simplify xml.etree.ElementTree loading (#902)Dmitry Shachnev2020-02-031-20/+21
| | | | | | | | cElementTree is a deprecated alias for ElementTree since Python 3.3. Also drop the recommendation to import etree from markdown.util, and deprecate markdown.util.etree.
* Drop support for Python 2.7 (#865)Hugo van Kemenade2019-10-241-10/+7
| | | | | | | * Python syntax upgraded using `pyupgrade --py3-plus` * Travis no longer uses `sudo`. See https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration See #760 for Python Version Support Timeline and related dicussion.
* Refactor em strong to consolidate code and fix issue #792Isaac Muse2019-09-031-10/+171
|
* Optimize HTML_RE from quadratic time to linear (#804)Anders Kaseorg2019-08-141-1/+1
| | | | | | Remove misleading escaped_chars_in_js test Signed-off-by: Anders Kaseorg <andersk@mit.edu>
* Optimize several regexes from quadratic time to linear timeAnders Kaseorg2019-03-061-5/+5
| | | | | | Part of the discussion in #798. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
* Emphasis pattern treats newlines as whitespace (#785)Waylan Limberg2019-02-071-1/+1
| | | | | | All whitespace characters should be treated the same by inline patterns. Previoulsy, emphasis patterns were only accounting for spaces, but not other whitepsace characters such as newlines. Fixes #783.
* Collapse all whitespace in reference ids (#743)Isaac Muse2018-10-301-1/+1
| | | Previously only newlines preceded by whitespace were collapsed. Fixes #742.
* Make ENTITY_RE support hexadecimal entitiesissue712Dmitry Shachnev2018-09-251-2/+2
| | | | Fixes #712.
* smart_emphasis keyword > legacy_em extension.Waylan Limberg2018-07-311-11/+9
| | | | | | | | | The smart_strong extension has been removed and its behavior is now the default (smart em and smart strong are the default). The legacy_em extension restores legacy behavior (no smart em or smart strong). This completes the removal of keywords. All parser behavior is now modified by extensions, not by keywords on the Markdown class.
* Fix double escaping of amp in attributes (#670)Isaac Muse2018-07-291-2/+2
| | | | | | | | | | Serializer should only escape & in attributes if not part of &amp; Better regex avoid Unicode and `_` in amp detection. In general, we don't want to escape already escaped content, but with code content, we want literal representations of escaped content, so have code content explicitly escape its content before placing in AtomicStrings. Closes #669.
* Consistent copyright headers.Waylan Limberg2018-07-271-0/+20
| | | | Fixes #435.
* All Markdown instances are now 'md'. (#691)Waylan Limberg2018-07-271-25/+29
| | | | | | | | | | | | Previously, instances of the Markdown class were represented as any one of 'md', 'md_instance', or 'markdown'. This inconsistency made it difficult when developing extensions, or just maintaining the existing code. Now, all instances are consistently represented as 'md'. The old attributes on class instances still exist, but raise a DeprecationWarning when accessed. Also on classes where the instance was optional, the attribute always exists now and is simply None if no instance was provided (previously the attribute wouldn't exist).
* Replace homegrown OrderedDict with purpose-built Registry. (#688)Waylan Limberg2018-07-271-23/+22
| | | | | | | | | | | | | | | | | | | All processors and patterns now get "registered" to a Registry. Each item is given a name (string) and a priority. The name is for later reference and the priority can be either an integer or float and is used to sort. Priority is sorted from highest to lowest. A Registry instance is a list-like iterable with the items auto-sorted by priority. If two items have the same priority, then they are listed in the order there were "registered". Registering a new item with the same name as an already registered item replaces the old item with the new item (however, the new item is sorted by its newly assigned priority). To remove an item, "deregister" it by name or index. A backwards compatible shim is included so that existing simple extensions should continue to work. DeprecationWarnings will be raised for any code which calls the old API. Fixes #418.
* Moved enable_attributes keyword to extension: legacy_attrs.Waylan Limberg2018-07-241-21/+1
| | | | | | | If you have existing documents that use the legacy attributes format, then you should enable the legacy_attrs extension for those documents. Everyone is encouraged to use the attr_list extension going forward. Closes #643. Work adapted from 0005d7a of the md3 branch.
* Flexible inline (#629)Isaac Muse2018-01-171-116/+367
| | | | Add new InlineProcessor class that handles inline processing much better and allows for more flexibility. This adds new InlineProcessors that no longer utilize unnecessary pretext and posttext captures. New class can accept the buffer that is being worked on and manually process the text without regex and return new replacement bounds. This helps us to handle links in a better way and handle nested brackets and logic that is too much for regular expression. The refactor also allows image links to have links/paths with spaces like links. Ref #551, #613, #590, #161.
* Removed some Py2.4-2.6 specific code.Waylan Limberg2018-01-111-14/+1
|
* Removed deprecated safe_mode.Waylan Limberg2018-01-111-58/+5
|
* Make sure regex patterns are raw strings (#614)Isaac Muse2018-01-021-1/+1
| | | Python 3.6 is starting to reject invalid escapes. Regular expression patterns should be raw strings to avoid having regex escapes being mistaken for invalid string escapes. Fixes #611.
* Feature ancestry (#598)Isaac Muse2017-11-231-0/+2
| | | | | Ancestry exclusion for inline patterns. Adds the ability for an inline pattern to define a list of ancestor tag names that should be avoided. If a pattern would create a descendant of one of the listed tag names, the pattern will not match. Fixes #596.
* Fix new flake8 722 errorfacelessuser2017-10-261-1/+1
|
* fix DeprecationWarning: invalid escape sequenced9pouces2017-07-251-3/+3
|
* Fix typo s/Goggle/Google/Tim Chase2017-06-031-1/+1
|
* Better inline code escaping (#533)Isaac Muse2017-01-201-5/+9
| | | | | This aims to escape code in a more expected fashion. This handles when backticks are escaped and when the escapes before backticks are escaped.
* Add blank lines after toplevel function definitions.Dmitry Shachnev2016-11-181-0/+1
| | | | This fixes warnings with pycodestyle ≥ 2.1, see PyCQA/pycodestyle#400.
* Fix image titles not following specfacelessuser2016-07-261-1/+1
| | | | | Don’t allow spaces in image links. This was also causing an issue where any text following a space was treated as a title. Ref #484.
* Ensure InlinePatterns don't drop newlines.Waylan Limberg2015-11-061-1/+1
| | | | | | Drppoed the non-greedy quantifier from the end of the inlinePatterns as it served no useful purpose and was actually (in very rare edge cases) causing newlines to be dropped. FIxes #439. Thanks to @munificent for the report.
* No binary operators at begining of line.Waylan Limberg2015-02-181-6/+6
| | | | | | | Apparently this is a new requirement of flake8. That's the thing about using tox. Every test run reinstalls all dependencies so an updated dependency might instroduce new errors. I could specify a specific version, but I like staying current.
* Flake8 cleanup (mostly whitespace).Waylan Limberg2014-11-201-49/+95
| | | | | | Got all but a couple files in the tests (ran out of time today). Apparently I have been using some bad form for years (although a few things seemed to look better before the update). Anyway, conformant now.
* Issue #365 Bold/Italic nesting fixfacelessuser2014-11-171-2/+2
| | | | | | | | | | | The logic for the current regex for strong/em and em/strong was sound, but the way it was implemented caused some unintended side effects. Whether it is a quirk with regex in general or just with Python’s re engine, I am not sure. Put basically `(\*|_){3}` causes issues with nested bold/italic. So, allowing the group to be defined, and then using the group number to specify the remaining sequential chars is a better way that works more reliably `(\*|_)\2{2}. Test from issue #365 was also added to check for this case in the future.
* Better nested STRONG EM support.Waylan Limberg2014-09-261-2/+6
| | | | | | | | | Fixes #253. Thanks to @facelessuser for the tests. Although I removed a bunch of weird ones (even some that passed) from his PR (#342). For the most part, there is no definitive way for those to be parsed. So there is no point of testing for them. In most of those situations, authors should be mixing underscores and astericks so it is clear what is intended.
* Fix the lost tail issue in inlineprocessors.facelessuser2014-09-261-8/+8
| | | | | | See #253. Prior to this patch, if any inline processors returned an element with a tail, the tail would end up empty. This resolves that issue and will allow for #253 to be fixed. Thanks to @facelessuser for the work on this.
* Removed some old codeWaylan Limberg2014-08-251-4/+1
| | | | | | These couple lines were from an old - no longer used - method of stashing inlines. There is no need for it today. The if statement would never evaluate True.
* Mark a few more lines with 'no cover' - missed them the first time through. ↵Waylan Limberg2014-07-111-4/+4
| | | | The rest should have test cases added.
* Marked a bunch of lines as 'no cover'. Coverage at 91%Waylan Limberg2014-07-111-6/+6
|
* No longer percent encode spaces in urls.Waylan Limberg2014-01-091-1/+0
| | | | | | | | | | | The current implementation was wrong as it also percent encoded query strings (which should be plus encoded) and calling urllib.quote on the path (and urllib.quote_plus on the query string) assumes the url is not already encoded. What if the document author pasted a url that was already encoded? She probably did not intend for `%20` to become `%2520`. Or did she? It is now clear to me why many implementation do nothing to urls. Just pass them though as-is. To bad if they are not valid HTML. HTML authors have to encodee their own urls, so I guess markdown authors have to as well.
* Only escape ESCAPED_CHARS.Waylan Limberg2014-01-091-1/+1
| | | | | | Leave all other chars prefaced by a backslash alone. Fixes #242. Not sure why I thought that I needed to add another backslash. Thanks for the report and the test case @mhubig.
* Fixed parsing of brackets within inline image titles.Darell Tan2014-01-051-1/+1
|
* Future imports go after the docstringsAdam Dinwoodie2013-03-181-1/+1
| | | | | | | | | A `from __future__ import ...` statement must go after any docstrings; since putting them before the docstring means the docstring loses its magic and just becomes a string literal. That then causes a syntax error if there are further future statements after the false docstring. This fixes issue #203, using the patch provided by @Arfrever.
* Now using universal code for Python 2 & 3.Waylan Limberg2013-02-271-14/+16
| | | | | | | | | | The most notable changes are the use of unicode_literals and absolute_imports. Actually, absolute_imports was the biggest deal as it gives us relative imports. For the first time extensions import markdown relative to themselves. This allows other packages to embed the markdown lib in a subdir of their project and still be able to use our extensions.
* Whitelisted known safe url schemes in safe_mode. A better fix for #185.Waylan Limberg2013-02-061-6/+7
|
* Forbid javascript:// URLs in safe modePhilipp Hagemeister2013-02-051-0/+3
|
* Enable attributes inside image referencesAdam Backstrom2013-01-271-0/+4
|