Beautifulsoup：HTMLの解析– hrefの一部を取得

Question

解析しようとしています

<td height="16" class="listtable_1"><a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank">76561198134729239</a></td>

76561198134729239の場合はどうすればよいかわかりません。私が試したもの：

import requests from lxml import html from bs4 import BeautifulSoup r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154") content = r.content soup = BeautifulSoup(content, "html.parser") element = soup.find("td", { "class":"listtable_1", "target":"_blank" }) print(element.text)

Martin Evans · Accepted Answer

そのHTMLには多くのそのようなエントリがあります。それらをすべて取得するには、次を使用できます。

import requests from lxml import html from bs4 import BeautifulSoup r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154") soup = BeautifulSoup(r.content, "html.parser") for td in soup.findAll("td", class_="listtable_1"): for a in td.findAll("a", href=True, target="_blank"): print(a.text)

これにより、次が返されます。

76561198143466239 76561198094114508 76561198053422590 76561198066478249 76561198107353289 76561198043513442 76561198128253254 76561198134729239 76561198003749039 76561198091968935 76561198071376804 76561198068375438 76561198039625269 76561198135115106 76561198096243060 76561198067255227 76561198036439360 76561198026089333 76561198126749681 76561198008927797 76561198091421170 76561198122328638 76561198104586244 76561198056032796 76561198059683068 76561197995961306 76561198102013044

MYGz · Answer

"target":"_blank"は、aタグ内のアンカータグtdのクラスです。 tdタグのクラスではありません。

次のように取得できます。

from bs4 import BeautifulSoup html=""" <td height="16" class="listtable_1"> <a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank"> 76561198134729239 </a> </td>""" soup = BeautifulSoup(html, 'html.parser') print(soup.find('td', {'class': "listtable_1"}).find('a', {"target":"_blank"}).text)

出力：

76561198134729239

alecxe · Answer

他の人が述べたように、あなたは単一のfind()の異なる要素の属性をチェックしようとしています。代わりに、MYGzが推奨するようにfind()呼び出しをチェーンするか、単一の CSSセレクターを使用できます。

_soup.select_one("td.listtable_1 a[target=_blank]").get_text() _

この方法で複数の要素を見つける必要がある場合は、select()を使用します。

_for Elm in soup.select("td.listtable_1 a[target=_blank]"): print(Elm.get_text()) _

宏杰李 · Answer

"class":"listtable_1"はtdタグに属し、target="_blank"はaタグに属します。これらを一緒に使用しないでください。

Steam Communityをアンカーとして使用して、その後の数字を見つける必要があります。

または、URLを使用します。URLには必要な情報が含まれており、簡単に見つけることができます。URLを見つけて/で分割できます。

for a in soup.find_all('a', href=re.compile(r'steamcommunity')): num = a['href'].split('/')[-1] print(num)

コード：

import requests from lxml import html from bs4 import BeautifulSoup r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154") content = r.content soup = BeautifulSoup(content, "html.parser") for td in soup.find_all('td', string="Steam Community"): num = td.find_next_sibling('td').text print(num)

でる：

76561198143466239 76561198094114508 76561198053422590 76561198066478249 76561198107353289 76561198043513442 76561198128253254 76561198134729239 76561198003749039 76561198091968935 76561198071376804 76561198068375438 76561198039625269 76561198135115106 76561198096243060 76561198067255227 76561198036439360 76561198026089333 76561198126749681 76561198008927797 76561198091421170 76561198122328638 76561198104586244 76561198056032796 76561198059683068 76561197995961306 76561198102013044