python の request モジュールいろいろ

クローラーの構築で requests モジュールについて調べる機会があったので覚え書き. 参考は 公式ドキュメント

GET リクエスト

response = requests.get($URL)

# url
print(response.url) # http://example.com

# status code (int)
print(response.status_code) # 200, 404, etc...

# contents (str)
print(response.text) # html source code
        

ヘッダーの指定

headers = {"User-Agent": "curl"}

response = requests.get($URL, headers=headers)
        

URL パラメータの指定

payload = {"user": "labu"}
response = requests.get("https://example.com", params=payload) # https://example.com?user=labu
        

Authentication 関連

Basic 認証

from requests.auth import HTTPBasicAuth

basic = HTTPBasicAuth($USERNAME, $PASSWORD)
response = requests.get($URL, auth=basic)
        

Digest 認証

from requests.auth import HTTPDigestAuth

digest = HTTPDigestAuth($USERNAME, $PASSWORD)
response = requests.get($URL, auth=digest)
        

POST リクエスト

データの送信 (テキスト, 数字)

payload = {"user": "labu", "password": "12345"}

response = requests.post("https://example.com", data=payload)
        

ファイルの送信

files = {"test_pdf": open("flag.txt", "rb")}

response = requests.post("https://example.com", files=files)
        

Cookie の取り扱い

token = "EXAMPLE=="

jar = requests.cookies.RequestsCookieJar()
jar.set("token", token, domain="https://example.com", path"/cookies")

response = requests.get("https://example.com/cookies", cookies=jar)
        

リダイレクトの取り扱い

response = requests.get("http://pan-pudding.com")

print(resp.url) # https://pan-pudding.com/ (converted from http to https)
print(resp.history) # [<Response [301]>]
        
response = requests.get("http://pan-pudding.com", allow_redirects=False)

print(response.status_code) # 301