如何让Python3网络爬虫处理本地Cookie?求相关示例代码
Hey there! Let's tackle your two questions one by one to get your Python download script up and running smoothly.
1. How to Manage Cookies (Including Multiple Cookies)
The key here is using Python's requests library's Session object—it’s built specifically to handle persistent sessions, which means it automatically saves and reuses all cookies between requests, no matter how many there are. You have two common scenarios to handle:
- Scenario 1: Start from login and auto-grab cookies: Simulate the login process, and the session will store all authentication cookies returned by the site. Subsequent requests will automatically send these cookies to prove your identity.
- Scenario 2: Reuse existing browser cookies: If you’re already logged in via your browser, you can extract those cookies and import them into your session to skip the login step.
2. Example Code
Example 1: Auto-Manage Cookies with Session (Full Login Flow)
This example walks through a complete login-then-download workflow, where the session handles all cookie management for you:
import requests from bs4 import BeautifulSoup # Needed to extract CSRF tokens (common on login pages) # Initialize a session to persist cookies across requests session = requests.Session() # Step 1: Fetch the login page to get a CSRF token (most sites require this for security) login_page_url = "https://your-target-site.com/login" login_page_response = session.get(login_page_url) soup = BeautifulSoup(login_page_response.text, "html.parser") # Extract the CSRF token—adjust the attribute names to match your target site csrf_token = soup.find("input", attrs={"name": "csrf_token"}).get("value") # Step 2: Submit the login form with your credentials and CSRF token login_payload = { "username": "your-registered-username", "password": "your-password", "csrf_token": csrf_token } login_response = session.post(login_page_url, data=login_payload) # Verify login success (customize this check based on the site's response) if "Welcome back" in login_response.text or login_response.status_code == 200: print("Login successful!") else: print("Login failed—double-check your credentials or CSRF token logic.") exit() # Step 3: Download the file—session automatically sends your auth cookies download_url = "https://your-target-site.com/download.php?file=your-desired-file" download_response = session.get(download_url) # Save the downloaded file to your local machine with open("downloaded-file.zip", "wb") as output_file: output_file.write(download_response.content) print("File downloaded successfully!")
Example 2: Import Existing Browser Cookies
If you don’t want to simulate login (e.g., you’re already logged in via Chrome/Firefox), you can reuse your browser’s cookies:
- Extract browser cookies: Use a browser extension like "EditThisCookie" (for Chrome) to export your cookies as a dictionary or JSON.
- Import cookies into your session:
import requests session = requests.Session() # Replace this with the cookies you extracted from your browser browser_cookies = { "session_id": "abc123xyz789", "user_auth_token": "9876543210abc", # Add any other cookies from the site here } # Add each cookie to the session—make sure to use the correct domain for cookie_name, cookie_value in browser_cookies.items(): session.cookies.set( name=cookie_name, value=cookie_value, domain="your-target-site.com" # Must match the site's domain exactly ) # Download the file with your imported cookies download_url = "https://your-target-site.com/download.php?file=your-desired-file" download_response = session.get(download_url) # Save the file with open("downloaded-file.zip", "wb") as output_file: output_file.write(download_response.content) print("File downloaded successfully!")
Key Takeaways
- Session handles multiple cookies automatically: You don’t need to manually track or add each cookie— the session stores all cookies returned by the server and sends them in future requests.
- CSRF tokens matter: Most modern login forms require a CSRF token, so always fetch the login page first to grab this token before submitting credentials.
- Cookie expiry: If you’re reusing browser cookies, keep in mind they have an expiration date—you’ll need to re-extract them once they expire.
内容的提问来源于stack exchange,提问作者yorkevin




