Python：urllib / urllib2 / httplibの混乱

Question

ログインシーケンスをPythonでスクリプト化してWebアプリの機能をテストしようとしていますが、いくつか問題があります。

これが私がしなければならないことです：

POSTをいくつかのパラメーターとヘッダーで実行します。
リダイレクトに従う
HTML本文を取得します。

さて、私は比較的pythonに慣れていませんが、これまでにテストした2つのことはうまくいきませんでした。最初に、httplibをputrequest（）（URL内のパラメーターを渡す）とputheader（）とともに使用しました。これはリダイレクトに従っていないようです。

次に、ヘッダーとパラメーターの両方を辞書として渡して、urllibとurllib2を試しました。これは、ログインしようとしているページではなく、ログインページを返すようです。これは、Cookieの不足などが原因だと思います。

簡単なものがないですか？

ありがとう。

S.Lott · Accepted Answer

焦点を合わせる urllib2このため、かなりうまくいきます。 httplibをいじらないでください。これはトップレベルのAPIではありません。

あなたが注目しているのはurllib2はリダイレクトに従いません。

リダイレクトをキャッチしてたどるHTTPRedirectHandlerのインスタンスをフォールドする必要があります。

さらに、デフォルトのHTTPRedirectHandlerをサブクラス化して、ユニットテストの一部として確認する情報を取得することもできます。

cookie_handler= urllib2.HTTPCookieProcessor( self.cookies ) redirect_handler= HTTPRedirectHandler() opener = urllib2.build_opener(redirect_handler,cookie_handler)

次に、このopenerオブジェクトを使用してPOSTおよびGETし、リダイレクトとCookieを適切に処理します。

HTTPHandlerの独自のサブクラスを追加して、さまざまなエラーコードをキャプチャしてログに記録することもできます。

Jason Pepas · Answer

これがこの問題に対する私の見解です。

#!/usr/bin/env python import urllib import urllib2 class HttpBot: """an HttpBot represents one browser session, with cookies.""" def __init__(self): cookie_handler= urllib2.HTTPCookieProcessor() redirect_handler= urllib2.HTTPRedirectHandler() self._opener = urllib2.build_opener(redirect_handler, cookie_handler) def GET(self, url): return self._opener.open(url).read() def POST(self, url, parameters): return self._opener.open(url, urllib.urlencode(parameters)).read() if __== "__main__": bot = HttpBot() ignored_html = bot.POST('https://example.com/authenticator', {'passwd':'foo'}) print bot.GET('https://example.com/interesting/content') ignored_html = bot.POST('https://example.com/deauthenticator',{})

Ace · Answer

@ S.Lott、ありがとうございます。あなたの提案は、いくらかの修正を加えて、私のために働きました。ここに私がそれをした方法があります。

data = urllib.urlencode(params) url = Host+page request = urllib2.Request(url, data, headers) response = urllib2.urlopen(request) cookies = CookieJar() cookies.extract_cookies(response,request) cookie_handler= urllib2.HTTPCookieProcessor( cookies ) redirect_handler= HTTPRedirectHandler() opener = urllib2.build_opener(redirect_handler,cookie_handler) response = opener.open(request)

Eli Courtwright · Answer

私は最近、この正確なことを自分でしなければなりませんでした。標準ライブラリのクラスだけが必要でした。これが私のコードからの抜粋です：

from urllib import urlencode from urllib2 import urlopen, Request # encode my POST parameters for the login page login_qs = urlencode( [("username",USERNAME), ("password",PASSWORD)] ) # extract my session id by loading a page from the site set_cookie = urlopen(URL_BASE).headers.getheader("Set-Cookie") sess_id = set_cookie[set_cookie.index("=")+1:set_cookie.index(";")] # construct headers dictionary using the session id headers = {"Cookie": "session_id="+sess_id} # perform login and make sure it worked if "Announcements:" not in urlopen(Request(URL_BASE+"login",headers=headers), login_qs).read(): print "Didn't log in properly" exit(1) # here's the function I used after this for loading pages def download(page=""): return urlopen(Request(URL_BASE+page, headers=headers)).read() # for example: print download(URL_BASE + "config")

Matthew Christensen · Answer

Mechanize（ http://wwwsearch.sourceforge.net/mechanize/ ）を試してみます。それはあなたのクッキー/ヘッダーを透過的に処理するかもしれません。

gimel · Answer

試してみてください twill -ユーザーがコマンドラインインターフェイスからWebを閲覧できるようにする単純な言語。ツイルを使用すると、フォーム、Cookie、およびほとんどの標準的なWeb機能を使用するWebサイトをナビゲートできます。さらに言えば、twillはPythonで記述され、 python API を持っています。たとえば、

from twill import get_browser b = get_browser() b.go("http://www.python.org/") b.showforms()

chnrxn · Answer

Cookieが欠落している可能性があるという事実に加えて、WebサーバーにPOSTしていない形式のフィールドが存在する可能性があります。最善の方法は、実際のPOSTをWebブラウザからキャプチャすることです。 LiveHTTPHeaders または WireShark を使用して、トラフィックをスヌープして模倣することができますスクリプトで同じ動作。

Ned Batchelder · Answer

Funkload は、優れたWebアプリテストツールでもあります。これは、webunitをラップしてブラウザーエミュレーションを処理し、機能と負荷の両方のテスト機能を提供します。