PythonでWebページのコンテンツを取得しますか？

Question

私はPython 3.1を使用しています。

とにかく、私は this webpageのコンテンツを取得しようとしています。私は少しグーグルでいろいろなことを試しましたが、うまくいきませんでした。これは簡単な作業であると推測していますが、...それを取得できません。：/。

Urllib、urllib2の結果：

>>> import urllib2 Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> import urllib2 ImportError: No module named urllib2 >>> import urllib >>> urllib.urlopen("http://www.python.org") Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> urllib.urlopen("http://www.python.org") AttributeError: 'module' object has no attribute 'urlopen' >>>

Python 3ソリューション

ありがとう、ジェイソン。：D。

import urllib.request page = urllib.request.urlopen('http://services.runescape.com/m=hiscore/ranking?table=0&category_type=0&time_filter=0&date=1519066080774&user=zezima') print(page.read())

Jason R. Coombs · Accepted Answer

Python 3.1を使用しているため、新しい Python 3.1 API を使用する必要があります。

試してください：

urllib.request.urlopen('http://www.python.org/')

または、Python 2つの例。Python 2で記述し、2to3ツールを使用して変換します。Windowsでは、 2to3.pyは\ python31 ools\scriptsにありますが、他のプラットフォーム上の2to3.pyの場所を他の誰かが指摘できますか？

編集

最近では、6を使用してPython 2および3互換コードを記述します。

from six.moves import urllib urllib.request.urlopen('http://www.python.org')

6つがインストールされていると仮定すると、Python 2とPython 3。

Jonathan Hartley · Answer

今日これを行う最良の方法は、「リクエスト」ライブラリを使用することです。

import requests response = requests.get('http://hiscore.runescape.com/index_lite.ws?player=zezima') print (response.status_code) print (response.content)

Olu Smith · Answer

もしあなたが私に尋ねるなら。これを試して

import urllib2 resp = urllib2.urlopen('http://hiscore.runescape.com/index_lite.ws?player=zezima')

そして通常の方法を読む

page = resp.read()

でも頑張って

Joe Koberg · Answer

Mechanize は、Cookieの状態などを処理する場合に、「ブラウザのように動作する」ための優れたパッケージです。

http://wwwsearch.sourceforge.net/mechanize/

JasDev · Answer

Urlib2を使用して、HTMLを自分で解析できます。

または、Beautiful Soupを試して、解析を行います。

Martin Thoma · Answer

Python 2.XおよびPython 3.X：

try: # For Python 3.0 and later from urllib.request import urlopen except ImportError: # Fall back to Python 2's urllib2 from urllib2 import urlopen url = 'http://hiscore.runescape.com/index_lite.ws?player=zezima' response = urlopen(url) data = str(response.read())

Swathi Bhuvaneshwar Babu · Answer

Webページのコンテンツを取得するとします。次のコードで実行します。

# -*- coding: utf-8 -*- # python # example of getting a web page from urllib import urlopen print urlopen("http://xahlee.info/python/python_index.html").read()