如何自动获取Cookie以实现动态请求,避免手动复制硬编码Cookie?
嘿,这个问题我之前折腾Yahoo Finance接口的时候也碰到过!手动复制Cookie不仅麻烦,过段时间Cookie过期还得重新抓,太折腾了。其实用requests.Session()就能轻松解决这个问题——它会自动帮你维持会话、管理Cookie,另外你代码里硬编码的crumb参数也能一起自动获取,不用再手动抄了。我给你理清楚思路,再改好代码:
核心思路
- 用
requests.Session()创建一个会话对象,它会自动保存首次请求得到的Cookie,后续所有请求都会自动带上这些Cookie,完全不用手动处理。 - 先访问Yahoo Finance的基金筛选主页,从页面中提取Yahoo要求的
crumb参数(这个是防跨站请求的验证参数,每个会话对应一个)。 - 用同一个会话对象发送POST请求,把自动获取的
crumb填入参数,剩下的Cookie交给Session自动处理。
修改后的完整代码
import json import requests from bs4 import BeautifulSoup # 创建会话对象,自动管理Cookie session = requests.Session() session.headers.update({ 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' }) # 第一步:先访问主页,获取Cookie和crumb参数 main_page_url = 'https://finance.yahoo.com/research-hub/screener/mutualfunds/' try: main_response = session.get(main_page_url) main_response.raise_for_status() # 检查请求是否成功 except requests.exceptions.RequestException as e: print(f"访问主页失败: {e}") exit() # 从页面中解析出crumb参数 soup = BeautifulSoup(main_response.text, 'html.parser') crumb = None # 找页面里的crumb脚本标签,Yahoo通常把crumb存在类似的JS变量里 for script in soup.find_all('script'): if script.string and 'CrumbStore' in script.string: # 用字符串处理提取crumb值 crumb_start = script.string.find('crumb') + 8 crumb_end = script.string.find('"', crumb_start) crumb = script.string[crumb_start:crumb_end] break if not crumb: print("无法获取crumb参数,请检查页面结构是否变化") exit() # 第二步:用会话对象发送POST请求,Cookie自动带上 link = 'https://query1.finance.yahoo.com/v1/finance/screener' params = { 'formatted': 'true', 'useRecordsResponse': 'true', 'lang': 'en-US', 'region': 'US', 'crumb': crumb, # 用自动获取的crumb } payload = { "size":25, "offset":0, "sortType":"DESC", "sortField":"fundnetassets", "includeFields":[ "ticker","companyshortname","intradaypricechange","percentchange", "intradayprice","trailing_ytd_return","trailing_3m_return", "annualreturnnavy1","annualreturnnavy3","annualreturnnavy5", "annualreportnetexpenseratio","annualreportgrossexpenseratio", "fundnetassets","performanceratingoverall","fiftydaymovingavg", "twohundreddaymovingavg","day_open_price","fiftytwowklow","fiftytwowkhigh" ], "topOperator":"AND", "query":{"operator":"and","operands":[{"operator":"or","operands":[{"operator":"eq","operands":["exchange","NAS"]}]}]}, "quoteType":"MUTUALFUND" } try: res = session.post(link, params=params, json=payload) res.raise_for_status() print(f"请求状态码: {res.status_code}") # 解析结果并打印公司名称 data = res.json() for item in data['finance']['result']: for elem in item['records']: print(elem['companyName']) except requests.exceptions.RequestException as e: print(f"请求失败: {e}") if res.content: print(f"响应内容: {res.text}")
关键细节说明
- 会话管理:
requests.Session()会自动处理Cookie的存储和发送,比如首次访问主页得到的Cookie,在后续POST请求时会自动加到请求头里,完全不用手动复制。 - crumb参数:Yahoo的接口要求这个参数来验证请求合法性,它和当前会话的Cookie是绑定的,所以从首次访问的主页里提取最准确,避免硬编码导致的不匹配问题。
- 异常处理:加了请求异常捕获,方便你排查请求失败的原因(比如页面结构变化导致找不到crumb,或者网络问题)。
备注:内容来源于stack exchange,提问作者MITHU




