python の request モジュールいろいろ

クローラーの構築で requests モジュールについて調べる機会があったので覚え書き. 参考は公式ドキュメント

GET リクエスト

ただ GET のみをしたい場合は以下の通り

response = requests.get($URL)

# url
print(response.url) # http://example.com

# status code (int)
print(response.status_code) # 200, 404, etc...

# contents (str)
print(response.text) # html source code

text 属性で本体の html が読み込める
ステータスコードは int 型で渡される

ヘッダーの指定

header を指定したい場合は以下の通り

headers = {"User-Agent": "curl"}

response = requests.get($URL, headers=headers)

辞書形式で headers 引数に与えてあげることで, カスタムヘッダが使用可能
CTF などで Referer の書き換えを行う場合やバグバウンティでプログラム参加者であることを示すヘッダを加える際に使える

URL パラメータの指定

params キーワードを用いて指定する

payload = {"user": "labu"}
response = requests.get("https://example.com", params=payload) # https://example.com?user=labu

Authentication 関連

auth キーワードに用意された認証情報を渡すことで認証も可能

Basic 認証

Basic 認証は HTTPBasicAuth を用いることで実現可能

from requests.auth import HTTPBasicAuth

basic = HTTPBasicAuth($USERNAME, $PASSWORD)
response = requests.get($URL, auth=basic)

Digest 認証

HTTPDigestAuth を利用

from requests.auth import HTTPDigestAuth

digest = HTTPDigestAuth($USERNAME, $PASSWORD)
response = requests.get($URL, auth=digest)

POST リクエスト

基本は GET と同様に使うことが可能

データの送信 (テキスト, 数字)

POST のパラメータとしてデータを送信する

payload = {"user": "labu", "password": "12345"}

response = requests.post("https://example.com", data=payload)

ファイルの送信

open 関数を用いてファイルを開き, files パラメータに指定することで送信する

files = {"test_pdf": open("flag.txt", "rb")}

response = requests.post("https://example.com", files=files)

Cookie の取り扱い

レスポンスの Cookie は cookies 属性に辞書形式に近い形で保存される (正確には RequestsCookieJar 形式)
送信も可能
RequestsCookieJar 形式では各種属性の指定が可能

token = "EXAMPLE=="

jar = requests.cookies.RequestsCookieJar()
jar.set("token", token, domain="https://example.com", path"/cookies")

response = requests.get("https://example.com/cookies", cookies=jar)

リダイレクトの取り扱い

特に指定をしない場合はリダイレクトは有効になっており, 自動的に遷移後のレスポンスが取得される
リダイレクトの遷移履歴については history 属性からリスト形式で取得可能

response = requests.get("http://pan-pudding.com")

print(resp.url) # https://pan-pudding.com/ (converted from http to https)
print(resp.history) # [<Response [301]>]

明示的にリダイレクトをしないことも可能

response = requests.get("http://pan-pudding.com", allow_redirects=False)

print(response.status_code) # 301