summaryrefslogtreecommitdiff
path: root/src/lxml/html
Commit message (Collapse)AuthorAgeFilesLines
* Avoid using the deprecated "imp" module.HEADmasterStefan Behnel2023-05-111-1/+2
| | | | Closes https://bugs.launchpad.net/lxml/+bug/2018137
* Avoid using the deprecated "imp" module.Stefan Behnel2023-05-111-2/+4
| | | | Closes https://bugs.launchpad.net/lxml/+bug/2018137
* Fix inheritance order of mixin classes in lxml.html (GH-340)xmo-odoo2022-05-172-10/+48
| | | | | | | | | | | As the old FIXME comment from https://github.com/lxml/lxml/commit/8132c755adad4a75ba855d985dd257493bccc7fd notes, the mixin should come first for the inheritance to be correct (the left-most class is the first in the MRO, at least if no diamond inheritance is involved). Also fix the odd `super` call in `HtmlMixin`, likely stemming from the incorrect MRO. Fixes the inheritance order of all `HTML*` base classes though it probably doesn't matter for other than `HtmlElement`.
* Fix typos (GH-334)Kian Meng, Ang2022-01-021-1/+1
|
* Fix a test in Py2.lxml-4.6.5lxml-4.6Stefan Behnel2021-12-121-1/+6
|
* Cleaner: cover some more cases where scripts could sneak through in ↵Stefan Behnel2021-12-112-12/+73
| | | | specially crafted style content.
* Cleaner: Remove SVG image data URLs since they can embed script content.Stefan Behnel2021-11-112-8/+60
| | | | Reported as GHSL-2021-1038
* Cleaner: Prevent "@import" from re-occurring in the CSS after replacements, ↵Stefan Behnel2021-11-112-0/+22
| | | | | | e.g. "@@importimport". Reported as GHSL-2021-1037
* Add HTML-5 "formaction" attribute to "defs.link_attrs" (GH-316)Kevin Chung2021-03-212-0/+17
| | | | Resolves https://bugs.launchpad.net/lxml/+bug/1888153 See https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28957
* Work around Py2's lack of "re.ASCII".lxml-4.6.2Stefan Behnel2020-11-261-2/+4
|
* Prevent combinations of <math/svg> and <style> to sneak JavaScript through ↵Stefan Behnel2020-11-263-11/+39
| | | | the HTML cleaner.
* Prevent combinations of <noscript> and <style> to sneak JavaScript through ↵Stefan Behnel2020-10-182-0/+13
| | | | the HTML cleaner.
* html: Add InputGetter.items() method and make .keys() return the field names ↵Stefan Behnel2020-08-122-8/+47
| | | | in document order.
* html: Avoid XPath in InputGetter where fast and simple iteration is enough.Stefan Behnel2020-08-121-20/+19
|
* html: Simplify and speed up InputGetter.__iter__() and __len__().Stefan Behnel2020-08-121-6/+3
|
* Implement __len__() on InputGetter which is expected by ↵AidanWoolley2020-08-121-0/+3
| | | | FormElement/FieldsDict (GH-310)
* Use sphinx-apidoc to create API reference (GH-309)Chris Mayo2020-08-043-95/+97
| | | | | | | | | | | | | * Add some missing files to .gitignore * Remove duplicate open_in_browser from lxml.html.__all__ * Make ETreeXMLSchemaTestCase docstring Sphinx autodoc friendly * Fix outdated codespeak.net links in docstrings * Convert html/defs.py comment to be the module docstring * Use sphinx-apidoc to create the API reference instead of epydoc Epydoc is Python 2 only and unmaintained. sphinx-apidoc is run before the build step, to avoid duplicate entries being created. * Include the elements from html.builder in the API reference * Use Python 3.8 for coverage Travis job * Build html documentation in Travis
* Fix an import in Py3.Stefan Behnel2020-08-041-1/+1
|
* Remove dead code.Stefan Behnel2020-08-031-1/+0
|
* Cleaner: Catch bad arg combo in constructor (GH-301)Mike Lissner2020-06-202-0/+21
| | | Fixes https://bugs.launchpad.net/lxml/+bug/1882606
* Improve compilation of clean.py (e.g. dict iteration) by switching to ↵Stefan Behnel2020-06-191-1/+1
| | | | language_level=3str.
* Avoid calling hasattr when we need the attribute anyway, and validate the ↵Stefan Behnel2020-06-191-4/+10
| | | | argument names passed into Cleaner() along the way.
* LP#1882606: ``Cleaner.clean_html()`` discarded comments and PIs regardless ↵Stefan Behnel2020-06-133-4/+49
| | | | of the corresponding configuration option, if "remove_unknown_tags=True" was set.
* Use a bound method instead of looking it up on each element.Stefan Behnel2020-06-131-2/+2
|
* Add docstrings to Cleaner.allow_element() and Cleaner.allow_embedded_url().Stefan Behnel2019-08-241-2/+15
|
* Modernise some code.Stefan Behnel2019-07-271-6/+4
|
* Fix typos (GH-282)Min ho Kim2019-06-241-1/+1
|
* LP#1758553: add "source" and "track" to list of empty HTML tags.Stefan Behnel2019-03-081-1/+1
|
* Actually use "language_level=2" everywhere for better Py2 compatibility.Stefan Behnel2018-12-021-1/+1
|
* Set explicit Cython language levels for compiled modules (Cython suggests to ↵Stefan Behnel2018-12-022-0/+4
| | | | make them explicit).
* LP#1799755: Fix ABC imports from collections package to resolve a ↵Stefan Behnel2018-10-242-2/+5
| | | | DeprecationWarning in Py3.7.
* Merge lxml-4.2 branch into master.Stefan Behnel2018-09-292-0/+4
|\
| * Fix import warnings in Py3.6+ by switching to absolute imports.Stefan Behnel2018-09-292-0/+4
| |
* | Merge branch lxml-4.2 into master.Stefan Behnel2018-09-092-5/+6
|\ \ | |/
| * Fix typo in test file.Stefan Behnel2018-08-261-1/+1
| |
| * Fix: make the cleaner also remove javascript URLs that use escaping.Stefan Behnel2018-09-092-5/+6
| |
* | Merge pull request #270 from hugovk/rm-2.6scoder2018-08-2614-70/+34
|\ \ | | | | | | Remove redundant Python <= 2.6 code
| * | Remove ununsed importsHugo2018-08-2611-13/+8
| | |
| * | Use tempfile.NamedTemporaryFile directlyHugo2018-08-261-3/+1
| | |
| * | Min version of LIBXML_VERSION is now 2.7Hugo2018-08-261-2/+1
| | |
| * | 'assert False' more readable than 'assert 0'Hugo2018-08-261-1/+1
| | |
| * | Revert "Replace mutable default argument"Hugo2018-08-261-18/+6
| | | | | | | | | | | | This reverts commit 92faebc0efa332c39a94d90d4ab7eb1a82233c4b.
| * | Remove redundant parenthesesHugo2018-08-251-1/+1
| | |
| * | Replace function call with set literalHugo2018-08-252-2/+2
| | |
| * | Replace mutable default argumentHugo2018-08-251-6/+18
| | |
| * | Remove redundant code for Python <= 2.6Hugo2018-08-2510-53/+25
| |/
* | Fix typo in test file.Stefan Behnel2018-08-261-1/+1
|/
* Clean up test code for better readability.Stefan Behnel2017-11-121-6/+16
|
* Add better fallbacks to SelectElement.valueChristopher Schramm2017-10-053-8/+51
| | | | If a browser encounters a select element without any selected option element, it automatically pre-selects the first one. If multiple options are selected, all but the last one get deselected.
* LP#1710429: Fix uninitialised variable usage.Stefan Behnel2017-08-131-0/+1
|