Python+Selenium/Requests爬取NSE期权链遭访问拒绝求助
Hey there, let's break down why you're getting blocked by NSE India's platform and walk through actionable fixes to extract that option chain data successfully.
Why You're Getting Blocked
NSE uses robust anti-scraping measures that go beyond basic request headers:
- Selenium Fingerprinting: Default Selenium configurations leave obvious traces (like
window.navigator.webdriverbeing set totrue) that the site detects. - Outdated User-Agent: Your Chrome 44 UA is way too old—modern sites flag outdated browser versions as suspicious.
- Incomplete Request Flow: When using Requests, you need to strictly mimic how a real browser interacts with the site (including setting cookies first before fetching CSRF tokens).
Fix #1: Fix Selenium's Automation Traces
Here's an updated Selenium setup that hides most automation flags and mimics real user behavior:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By import time opts = Options() # Use a modern, real User-Agent opts.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36") opts.add_argument("Accept-Language=en-US,en;q=0.5") opts.add_argument("Accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8") # Critical: Hide Selenium's automation signatures opts.add_argument("--disable-blink-features=AutomationControlled") opts.add_experimental_option("excludeSwitches", ["enable-automation"]) opts.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=opts) # Override the webdriver property to make it undetectable driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") # Step 1: Visit the homepage first to let the site set necessary cookies driver.get('https://www.nseindia.com/') # Wait for a core element to load (adjust the selector if needed) WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "searchBtn"))) time.sleep(2) # Add a natural delay # Step 2: Navigate to the option chain page driver.get('https://www.nseindia.com/option-chain') # Wait for the option table to load WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CLASS_NAME, "opttbldata"))) # Now you can scrape the table or interact with elements as needed
Fix #2: Correct Request Flow for Requests Library
If you prefer using Requests over Selenium, follow this strict flow to avoid being blocked:
import requests from bs4 import BeautifulSoup # Use modern headers headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36", "Accept-Language": "en-US,en;q=0.5", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" } session = requests.Session() session.headers.update(headers) # 1. First visit the homepage to get cookies and CSRF token home_response = session.get("https://www.nseindia.com/") soup = BeautifulSoup(home_response.text, 'html.parser') csrf_token = soup.find('meta', {'name': 'csrf-token'})['content'] # 2. Request the option chain API with the CSRF token and cookies api_url = "https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY" api_headers = { "X-CSRF-Token": csrf_token, "Referer": "https://www.nseindia.com/option-chain" } api_response = session.get(api_url, headers=api_headers) if api_response.status_code == 200: option_data = api_response.json() print("Success! Option chain data retrieved.") # Process your data here else: print(f"Failed with status code: {api_response.status_code}") print(api_response.text)
Extra Tips to Stay Unblocked
- Add Natural Delays: Real users don't click or navigate instantly—use
time.sleep()or WebDriverWait instead of firing requests back-to-back. - Avoid Headless Mode Initially: Headless browsers are easier to detect. Test in regular browser mode first, then try
--headless=newif needed. - Respect Rate Limits: Don't spam requests. NSE will block you if you hit their servers too frequently.
- Check for Cloudflare: If you see a CAPTCHA, you may need to solve it manually once before automating, or use tools that handle CAPTCHAs (but always follow NSE's terms of service).
内容的提问来源于stack exchange,提问作者Wacao




