リクエストでスクレイピング中にGoogle Recaptchaをバイパスする方法

Question

URLをリクエストするPythonコード：

agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} #using agent to solve the blocking issue response = requests.get('https://www.naukri.com/jobs-in-andhra-pradesh', headers=agent) #making the request to the link

Htmlを印刷するときの出力：

<!DOCTYPE html> <html> <head> <title>Naukri reCAPTCHA</title> #the title in the actual title of the URL that I am requested for <meta name="robots" content="noindex, nofollow"> <link rel="stylesheet" href="https://static.naukimg.com/s/4/101/c/common_v62.min.css" /> <script src="https://www.google.com/recaptcha/api.js" async defer></script> </head> </html>

Joshua Varghese · Accepted Answer

Google Cacheをrefererとともに使用すると、これらのキャプチャが防止されます（1秒間に2を超えるリクエストを送信しないでください。ブロックされる可能性があります：

header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,'referer':'https://www.google.com/'} r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.naukri.com/jobs-in-andhra-pradesh",headers=header)

これは与える：

>>> r.content [Squeezed 2554 lines]