lxml.html adds a find_class method to elements:: >>> from lxml.etree import Comment >>> from lxml.html import document_fromstring, fragment_fromstring, tostring >>> from lxml.html.clean import clean, clean_html >>> from lxml.html import usedoctest >>> try: unicode = __builtins__["unicode"] ... except (KeyError, NameError): unicode = str >>> h = document_fromstring(''' ...
... ... P1 ... P2 ... ''') >>> print(tostring(h, encoding=unicode)) P1 P2 >>> print([e.text for e in h.find_class('fn')]) ['P1'] >>> print([e.text for e in h.find_class('vcard')]) ['P1', 'P2'] Also added is a get_rel_links, which you can use to search for links like ````:: >>> h = document_fromstring(''' ... test 1 ... item 2 ... item 3 ... item 4''') >>> print([e.attrib['href'] for e in h.find_rel_links('tag')]) ['2', '4'] >>> print([e.attrib['href'] for e in h.find_rel_links('nofollow')]) [] Another method is ``get_element_by_id`` that does what it says:: >>> print(tostring(fragment_fromstring(''' ...