LXMLを使用してXMLのタグを再帰的に見つける方法は？

Question

<?xml version="1.0" ?> <data> <test > <f1 /> </test > <test2 > <test3> <f1 /> </test3> </test2> <f1 /> </data>

Lxmlを使用すると、タグ「f1」を再帰的に検索できますか？ findallメソッドを試しましたが、それは直近の子供にのみ有効です。

私はこれのためにBeautifulSoupに行くべきだと思います!!!

Max Shawabkeh · Accepted Answer

XPathを使用して再帰的に検索できます。

>>> from lxml import etree >>> q = etree.fromstring('<xml><hello>a</hello><x><hello>b</hello></x></xml>') >>> q.findall('hello') # Tag name, first level only. [<Element hello at 414a7c8>] >>> q.findall('.//hello') # XPath, recursive. [<Element hello at 414a7c8>, <Element hello at 414a818>]

codersofthedark · Answer

iterfind()は、パス式に一致するすべての要素を反復処理します

findall()は一致する要素のリストを返します

find()は最初の一致のみを効率的に返します

findtext()は、最初に一致した.textコンテンツを返します

実例：

>>> root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>") #Find a child of an Element: >>> print(root.find("b")) None >>> print(root.find("a").tag) a #Find an Element anywhere in the tree: >>> print(root.find(".//b").tag) b >>> [ b.tag for b in root.iterfind(".//b") ] ['b', 'b'] #Find Elements with a certain attribute: >>> print(root.findall(".//a[@x]")[0].tag) a >>> print(root.findall(".//a[@y]")) []

リファレンス： http://lxml.de/tutorial.html#elementpath

（この回答は、このリンクのコンテンツからの関連する選択的な選択です）