summaryrefslogtreecommitdiff
path: root/src/lxml/html/tests
Commit message (Collapse)AuthorAgeFilesLines
* Avoid using the deprecated "imp" module.HEADmasterStefan Behnel2023-05-111-1/+2
| | | | Closes https://bugs.launchpad.net/lxml/+bug/2018137
* Avoid using the deprecated "imp" module.Stefan Behnel2023-05-111-2/+4
| | | | Closes https://bugs.launchpad.net/lxml/+bug/2018137
* Fix inheritance order of mixin classes in lxml.html (GH-340)xmo-odoo2022-05-171-2/+42
| | | | | | | | | | | As the old FIXME comment from https://github.com/lxml/lxml/commit/8132c755adad4a75ba855d985dd257493bccc7fd notes, the mixin should come first for the inheritance to be correct (the left-most class is the first in the MRO, at least if no diamond inheritance is involved). Also fix the odd `super` call in `HtmlMixin`, likely stemming from the incorrect MRO. Fixes the inheritance order of all `HTML*` base classes though it probably doesn't matter for other than `HtmlElement`.
* Fix a test in Py2.lxml-4.6.5lxml-4.6Stefan Behnel2021-12-121-1/+6
|
* Cleaner: cover some more cases where scripts could sneak through in ↵Stefan Behnel2021-12-111-1/+64
| | | | specially crafted style content.
* Cleaner: Remove SVG image data URLs since they can embed script content.Stefan Behnel2021-11-111-0/+45
| | | | Reported as GHSL-2021-1038
* Cleaner: Prevent "@import" from re-occurring in the CSS after replacements, ↵Stefan Behnel2021-11-111-0/+20
| | | | | | e.g. "@@importimport". Reported as GHSL-2021-1037
* Add HTML-5 "formaction" attribute to "defs.link_attrs" (GH-316)Kevin Chung2021-03-211-0/+15
| | | | Resolves https://bugs.launchpad.net/lxml/+bug/1888153 See https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28957
* Prevent combinations of <math/svg> and <style> to sneak JavaScript through ↵Stefan Behnel2020-11-262-3/+25
| | | | the HTML cleaner.
* Prevent combinations of <noscript> and <style> to sneak JavaScript through ↵Stefan Behnel2020-10-181-0/+10
| | | | the HTML cleaner.
* html: Add InputGetter.items() method and make .keys() return the field names ↵Stefan Behnel2020-08-121-0/+16
| | | | in document order.
* Cleaner: Catch bad arg combo in constructor (GH-301)Mike Lissner2020-06-201-0/+15
| | | Fixes https://bugs.launchpad.net/lxml/+bug/1882606
* LP#1882606: ``Cleaner.clean_html()`` discarded comments and PIs regardless ↵Stefan Behnel2020-06-132-0/+42
| | | | of the corresponding configuration option, if "remove_unknown_tags=True" was set.
* Merge branch lxml-4.2 into master.Stefan Behnel2018-09-091-3/+3
|\
| * Fix typo in test file.Stefan Behnel2018-08-261-1/+1
| |
| * Fix: make the cleaner also remove javascript URLs that use escaping.Stefan Behnel2018-09-091-3/+3
| |
* | Merge pull request #270 from hugovk/rm-2.6scoder2018-08-2612-62/+32
|\ \ | | | | | | Remove redundant Python <= 2.6 code
| * | Remove ununsed importsHugo2018-08-2610-12/+8
| | |
| * | Use tempfile.NamedTemporaryFile directlyHugo2018-08-261-3/+1
| | |
| * | Min version of LIBXML_VERSION is now 2.7Hugo2018-08-261-2/+1
| | |
| * | Replace function call with set literalHugo2018-08-251-1/+1
| | |
| * | Remove redundant code for Python <= 2.6Hugo2018-08-259-48/+25
| |/
* | Fix typo in test file.Stefan Behnel2018-08-261-1/+1
|/
* Clean up test code for better readability.Stefan Behnel2017-11-121-6/+16
|
* Add better fallbacks to SelectElement.valueChristopher Schramm2017-10-052-1/+38
| | | | If a browser encounters a select element without any selected option element, it automatically pre-selects the first one. If multiple options are selected, all but the last one get deselected.
* LP#1567526: Make soupparser sort-of handle empty and plain text input ↵Stefan Behnel2017-08-131-0/+10
| | | | instead of raising a TypeError.
* Fix tests after making "useChardet" handling smarter.Stefan Behnel2017-08-121-5/+16
|
* soupparse: add test case for double-hyphenha shao2017-07-291-0/+11
|
* Fix a typo: referrs -> refersFelix Yan2017-06-121-1/+1
|
* Perform full-document detection on decoded bytes.Koert van der Veer2017-03-161-0/+6
| | | | Closes #1673355
* add tests for bug #1665241Ashish Kulkarni2017-02-161-1/+25
|
* ignore disabled form inputsKristian Klemon2016-07-261-1/+3
|
* Merge pull request #180 from chripede/patch-2scoder2016-07-241-2/+21
|\ | | | | Add inline_style option
| * Fix tests for inline_styleChristian Pedersen2015-11-201-2/+21
| |
* | Exclude `file` field `value` from `FormElement.form_values`.Tomas Divis2016-07-201-0/+2
|/ | | | Similar to `submit`, `image` and `reset`, browsers don't send `file` field values in the POST when form is submitted. `FormElement.form_values` method already correctly excluded `submit`, `image` and `reset` fields, now it also excludes the `file` fields.
* simplify import check in test and keep original import exception on failuresStefan Behnel2015-06-051-13/+6
|
* unittest check beautifulsoup/bs4 import properlymozbugbox2015-06-061-5/+14
|
* BeautifulSoup 4: handle Doctype and Declarationmozbugbox2015-06-051-9/+12
| | | | | bs4 can use lxml or html5lib to parse html content. Force bs4 builtin html parser when parse html with soupparser.
* fix doctest in Py3Stefan Behnel2015-02-181-2/+2
|
* implement a set-like interface for the HTML 'class' attributeStefan Behnel2015-02-181-0/+57
|
* refactor new code in soupparser, extend testsStefan Behnel2015-02-161-8/+22
|
* Make soupparser properly handle everything outside the root tag (doctypeOlli Pottonen2015-02-161-0/+55
| | | | | | declaration, comments, processing instructions.) See https://bugs.launchpad.net/lxml/+bug/1341964.
* LP#1419354: fix meta-redirect URL parsing when preceded by whitespaceStefan Behnel2015-02-081-0/+12
|
* lxml.html.document_fromstring ensure_head_bodyjab2014-09-041-0/+17
| | | | | | | | | | | | | | | | | | When using lxml.html.document_fromstring to process html outside your control, you can't be sure it will have a head element or body element. Allowing document_fromstring to accept an ensure_head_body option saves you from having to write code like: doc = document_fromstring(html) try: doc.head except IndexError: doc.insert(0, Element('head')) # now we can safely reference doc.head You can instead just write: doc = document_fromstring(html, ensure_head_body=True)
* include links in meta refresh tags in iterlinksjab2014-08-221-0/+7
|
* strip control characters before looking for evil text content in CleanerStefan Behnel2014-04-171-1/+8
|
* clean up test module (mostly formatting)Stefan Behnel2014-02-211-2/+16
|
* fix typo in commentStefan Behnel2014-02-211-1/+1
|
* add testStefan Behnel2014-02-201-1/+11
|
* more faking of NamedTemporaryFile(delete=False) in Py2.[45]Stefan Behnel2014-02-191-1/+12
|