使用Curl爬取Twitter搜索返回302,浏览器/Postman返回200求助
First off, I spot a critical mistake in your code that's definitely contributing to the 302 issue: you're setting the HTTP header after calling curl_exec(). That line does nothing because the request has already been sent by the time you set CURLOPT_HTTPHEADER. Let's fix that first, then address Twitter's anti-scraping checks that are likely blocking your request.
Step 1: Fix the Header Order & Add Required Configs
Move the CURLOPT_HTTPHEADER line before curl_exec($ch), and add additional settings to mimic a real browser session. Here's the corrected code:
$param = "?f=tweets&q=+LAPOR1708&src=typd&max_position=".$scrollCursor; $url = "https://twitter.com/i/search/timeline".$param; $ch = curl_init(); curl_setopt($ch, CURLOPT_VERBOSE, true); // Use a modern user-agent instead of the outdated Firefox 2.0 one curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'); curl_setopt($ch, CURLOPT_URL, $url); // Set headers BEFORE executing the request curl_setopt($ch, CURLOPT_HTTPHEADER, [ "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language: en-US,en;q=0.5", "Referer: https://twitter.com/search?q=LAPOR1708", "DNT: 1" ]); // Follow 302 redirects (Twitter often uses these to set up sessions) curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Persist cookies to maintain a valid session, like a browser does curl_setopt($ch, CURLOPT_COOKIEJAR, 'twitter_cookies.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'twitter_cookies.txt'); $result = curl_exec($ch); dd(curl_getinfo($ch)); curl_close($ch);
Step 2: Why This Fixes the 302 Issue
Twitter's anti-scraping systems now check far more than just a user-agent. Here's what each change does:
- Proper Header Order: Headers are applied before the request is sent, so Twitter sees the same accept rules as a real browser.
- Modern User-Agent: Your old Firefox 2.0 agent is immediately flagged as non-human. A modern agent blends in with regular traffic.
- Referer Header: Tells Twitter your request originated from the search page, mimicking a user navigating normally.
- Cookie Persistence: Twitter requires valid session cookies to serve timeline data. By storing and reusing cookies, you maintain a session just like a browser.
- Follow Redirects: The 302 is likely a temporary redirect to set up a valid session; enabling this lets Curl follow that redirect instead of stopping at the 302 response.
Step 3: Extra Troubleshooting If You Still Get 302s
If the above doesn't work, try these steps:
- Manually Add Fresh Cookies: Log into Twitter in your browser, copy the cookies from dev tools (Application > Cookies > twitter.com), and add them directly to the headers:
Note: Cookies expire, so you'll need to refresh them periodically.curl_setopt($ch, CURLOPT_HTTPHEADER, [ // ... other headers "Cookie: auth_token=YOUR_AUTH_TOKEN; ct0=YOUR_CT0_COOKIE; ..." ]); - Check for JavaScript Rendering: Twitter might be serving content via JavaScript now. If the timeline data isn't in the raw HTML response, you may need to use a headless browser like Puppeteer instead of Curl.
Keep in mind that Twitter actively blocks scraping, so these fixes might stop working over time as they update their anti-scraping measures. Always ensure you're complying with Twitter's Terms of Service when scraping their data.
内容的提问来源于stack exchange,提问作者Rangga Rizky




