Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Avoid using the deprecated "imp" module.HEADmaster | Stefan Behnel | 2023-05-11 | 1 | -1/+2 |
| | | | | Closes https://bugs.launchpad.net/lxml/+bug/2018137 | ||||
* | Avoid using the deprecated "imp" module. | Stefan Behnel | 2023-05-11 | 1 | -2/+4 |
| | | | | Closes https://bugs.launchpad.net/lxml/+bug/2018137 | ||||
* | Fix inheritance order of mixin classes in lxml.html (GH-340) | xmo-odoo | 2022-05-17 | 1 | -2/+42 |
| | | | | | | | | | | | As the old FIXME comment from https://github.com/lxml/lxml/commit/8132c755adad4a75ba855d985dd257493bccc7fd notes, the mixin should come first for the inheritance to be correct (the left-most class is the first in the MRO, at least if no diamond inheritance is involved). Also fix the odd `super` call in `HtmlMixin`, likely stemming from the incorrect MRO. Fixes the inheritance order of all `HTML*` base classes though it probably doesn't matter for other than `HtmlElement`. | ||||
* | Fix a test in Py2.lxml-4.6.5lxml-4.6 | Stefan Behnel | 2021-12-12 | 1 | -1/+6 |
| | |||||
* | Cleaner: cover some more cases where scripts could sneak through in ↵ | Stefan Behnel | 2021-12-11 | 1 | -1/+64 |
| | | | | specially crafted style content. | ||||
* | Cleaner: Remove SVG image data URLs since they can embed script content. | Stefan Behnel | 2021-11-11 | 1 | -0/+45 |
| | | | | Reported as GHSL-2021-1038 | ||||
* | Cleaner: Prevent "@import" from re-occurring in the CSS after replacements, ↵ | Stefan Behnel | 2021-11-11 | 1 | -0/+20 |
| | | | | | | e.g. "@@importimport". Reported as GHSL-2021-1037 | ||||
* | Add HTML-5 "formaction" attribute to "defs.link_attrs" (GH-316) | Kevin Chung | 2021-03-21 | 1 | -0/+15 |
| | | | | Resolves https://bugs.launchpad.net/lxml/+bug/1888153 See https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28957 | ||||
* | Prevent combinations of <math/svg> and <style> to sneak JavaScript through ↵ | Stefan Behnel | 2020-11-26 | 2 | -3/+25 |
| | | | | the HTML cleaner. | ||||
* | Prevent combinations of <noscript> and <style> to sneak JavaScript through ↵ | Stefan Behnel | 2020-10-18 | 1 | -0/+10 |
| | | | | the HTML cleaner. | ||||
* | html: Add InputGetter.items() method and make .keys() return the field names ↵ | Stefan Behnel | 2020-08-12 | 1 | -0/+16 |
| | | | | in document order. | ||||
* | Cleaner: Catch bad arg combo in constructor (GH-301) | Mike Lissner | 2020-06-20 | 1 | -0/+15 |
| | | | Fixes https://bugs.launchpad.net/lxml/+bug/1882606 | ||||
* | LP#1882606: ``Cleaner.clean_html()`` discarded comments and PIs regardless ↵ | Stefan Behnel | 2020-06-13 | 2 | -0/+42 |
| | | | | of the corresponding configuration option, if "remove_unknown_tags=True" was set. | ||||
* | Merge branch lxml-4.2 into master. | Stefan Behnel | 2018-09-09 | 1 | -3/+3 |
|\ | |||||
| * | Fix typo in test file. | Stefan Behnel | 2018-08-26 | 1 | -1/+1 |
| | | |||||
| * | Fix: make the cleaner also remove javascript URLs that use escaping. | Stefan Behnel | 2018-09-09 | 1 | -3/+3 |
| | | |||||
* | | Merge pull request #270 from hugovk/rm-2.6 | scoder | 2018-08-26 | 12 | -62/+32 |
|\ \ | | | | | | | Remove redundant Python <= 2.6 code | ||||
| * | | Remove ununsed imports | Hugo | 2018-08-26 | 10 | -12/+8 |
| | | | |||||
| * | | Use tempfile.NamedTemporaryFile directly | Hugo | 2018-08-26 | 1 | -3/+1 |
| | | | |||||
| * | | Min version of LIBXML_VERSION is now 2.7 | Hugo | 2018-08-26 | 1 | -2/+1 |
| | | | |||||
| * | | Replace function call with set literal | Hugo | 2018-08-25 | 1 | -1/+1 |
| | | | |||||
| * | | Remove redundant code for Python <= 2.6 | Hugo | 2018-08-25 | 9 | -48/+25 |
| |/ | |||||
* | | Fix typo in test file. | Stefan Behnel | 2018-08-26 | 1 | -1/+1 |
|/ | |||||
* | Clean up test code for better readability. | Stefan Behnel | 2017-11-12 | 1 | -6/+16 |
| | |||||
* | Add better fallbacks to SelectElement.value | Christopher Schramm | 2017-10-05 | 2 | -1/+38 |
| | | | | If a browser encounters a select element without any selected option element, it automatically pre-selects the first one. If multiple options are selected, all but the last one get deselected. | ||||
* | LP#1567526: Make soupparser sort-of handle empty and plain text input ↵ | Stefan Behnel | 2017-08-13 | 1 | -0/+10 |
| | | | | instead of raising a TypeError. | ||||
* | Fix tests after making "useChardet" handling smarter. | Stefan Behnel | 2017-08-12 | 1 | -5/+16 |
| | |||||
* | soupparse: add test case for double-hyphen | ha shao | 2017-07-29 | 1 | -0/+11 |
| | |||||
* | Fix a typo: referrs -> refers | Felix Yan | 2017-06-12 | 1 | -1/+1 |
| | |||||
* | Perform full-document detection on decoded bytes. | Koert van der Veer | 2017-03-16 | 1 | -0/+6 |
| | | | | Closes #1673355 | ||||
* | add tests for bug #1665241 | Ashish Kulkarni | 2017-02-16 | 1 | -1/+25 |
| | |||||
* | ignore disabled form inputs | Kristian Klemon | 2016-07-26 | 1 | -1/+3 |
| | |||||
* | Merge pull request #180 from chripede/patch-2 | scoder | 2016-07-24 | 1 | -2/+21 |
|\ | | | | | Add inline_style option | ||||
| * | Fix tests for inline_style | Christian Pedersen | 2015-11-20 | 1 | -2/+21 |
| | | |||||
* | | Exclude `file` field `value` from `FormElement.form_values`. | Tomas Divis | 2016-07-20 | 1 | -0/+2 |
|/ | | | | Similar to `submit`, `image` and `reset`, browsers don't send `file` field values in the POST when form is submitted. `FormElement.form_values` method already correctly excluded `submit`, `image` and `reset` fields, now it also excludes the `file` fields. | ||||
* | simplify import check in test and keep original import exception on failures | Stefan Behnel | 2015-06-05 | 1 | -13/+6 |
| | |||||
* | unittest check beautifulsoup/bs4 import properly | mozbugbox | 2015-06-06 | 1 | -5/+14 |
| | |||||
* | BeautifulSoup 4: handle Doctype and Declaration | mozbugbox | 2015-06-05 | 1 | -9/+12 |
| | | | | | bs4 can use lxml or html5lib to parse html content. Force bs4 builtin html parser when parse html with soupparser. | ||||
* | fix doctest in Py3 | Stefan Behnel | 2015-02-18 | 1 | -2/+2 |
| | |||||
* | implement a set-like interface for the HTML 'class' attribute | Stefan Behnel | 2015-02-18 | 1 | -0/+57 |
| | |||||
* | refactor new code in soupparser, extend tests | Stefan Behnel | 2015-02-16 | 1 | -8/+22 |
| | |||||
* | Make soupparser properly handle everything outside the root tag (doctype | Olli Pottonen | 2015-02-16 | 1 | -0/+55 |
| | | | | | | declaration, comments, processing instructions.) See https://bugs.launchpad.net/lxml/+bug/1341964. | ||||
* | LP#1419354: fix meta-redirect URL parsing when preceded by whitespace | Stefan Behnel | 2015-02-08 | 1 | -0/+12 |
| | |||||
* | lxml.html.document_fromstring ensure_head_body | jab | 2014-09-04 | 1 | -0/+17 |
| | | | | | | | | | | | | | | | | | | When using lxml.html.document_fromstring to process html outside your control, you can't be sure it will have a head element or body element. Allowing document_fromstring to accept an ensure_head_body option saves you from having to write code like: doc = document_fromstring(html) try: doc.head except IndexError: doc.insert(0, Element('head')) # now we can safely reference doc.head You can instead just write: doc = document_fromstring(html, ensure_head_body=True) | ||||
* | include links in meta refresh tags in iterlinks | jab | 2014-08-22 | 1 | -0/+7 |
| | |||||
* | strip control characters before looking for evil text content in Cleaner | Stefan Behnel | 2014-04-17 | 1 | -1/+8 |
| | |||||
* | clean up test module (mostly formatting) | Stefan Behnel | 2014-02-21 | 1 | -2/+16 |
| | |||||
* | fix typo in comment | Stefan Behnel | 2014-02-21 | 1 | -1/+1 |
| | |||||
* | add test | Stefan Behnel | 2014-02-20 | 1 | -1/+11 |
| | |||||
* | more faking of NamedTemporaryFile(delete=False) in Py2.[45] | Stefan Behnel | 2014-02-19 | 1 | -1/+12 |
| |