You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python+Selenium/Requests爬取NSE期权链遭访问拒绝求助

Hey there, let's break down why you're getting blocked by NSE India's platform and walk through actionable fixes to extract that option chain data successfully.

Why You're Getting Blocked

NSE uses robust anti-scraping measures that go beyond basic request headers:

  • Selenium Fingerprinting: Default Selenium configurations leave obvious traces (like window.navigator.webdriver being set to true) that the site detects.
  • Outdated User-Agent: Your Chrome 44 UA is way too old—modern sites flag outdated browser versions as suspicious.
  • Incomplete Request Flow: When using Requests, you need to strictly mimic how a real browser interacts with the site (including setting cookies first before fetching CSRF tokens).

Fix #1: Fix Selenium's Automation Traces

Here's an updated Selenium setup that hides most automation flags and mimics real user behavior:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time

opts = Options()
# Use a modern, real User-Agent
opts.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36")
opts.add_argument("Accept-Language=en-US,en;q=0.5")
opts.add_argument("Accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")

# Critical: Hide Selenium's automation signatures
opts.add_argument("--disable-blink-features=AutomationControlled")
opts.add_experimental_option("excludeSwitches", ["enable-automation"])
opts.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome(options=opts)
# Override the webdriver property to make it undetectable
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")

# Step 1: Visit the homepage first to let the site set necessary cookies
driver.get('https://www.nseindia.com/')
# Wait for a core element to load (adjust the selector if needed)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "searchBtn")))
time.sleep(2)  # Add a natural delay

# Step 2: Navigate to the option chain page
driver.get('https://www.nseindia.com/option-chain')
# Wait for the option table to load
WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CLASS_NAME, "opttbldata")))

# Now you can scrape the table or interact with elements as needed

Fix #2: Correct Request Flow for Requests Library

If you prefer using Requests over Selenium, follow this strict flow to avoid being blocked:

import requests
from bs4 import BeautifulSoup

# Use modern headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}

session = requests.Session()
session.headers.update(headers)

# 1. First visit the homepage to get cookies and CSRF token
home_response = session.get("https://www.nseindia.com/")
soup = BeautifulSoup(home_response.text, 'html.parser')
csrf_token = soup.find('meta', {'name': 'csrf-token'})['content']

# 2. Request the option chain API with the CSRF token and cookies
api_url = "https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY"
api_headers = {
    "X-CSRF-Token": csrf_token,
    "Referer": "https://www.nseindia.com/option-chain"
}

api_response = session.get(api_url, headers=api_headers)

if api_response.status_code == 200:
    option_data = api_response.json()
    print("Success! Option chain data retrieved.")
    # Process your data here
else:
    print(f"Failed with status code: {api_response.status_code}")
    print(api_response.text)

Extra Tips to Stay Unblocked

  • Add Natural Delays: Real users don't click or navigate instantly—use time.sleep() or WebDriverWait instead of firing requests back-to-back.
  • Avoid Headless Mode Initially: Headless browsers are easier to detect. Test in regular browser mode first, then try --headless=new if needed.
  • Respect Rate Limits: Don't spam requests. NSE will block you if you hit their servers too frequently.
  • Check for Cloudflare: If you see a CAPTCHA, you may need to solve it manually once before automating, or use tools that handle CAPTCHAs (but always follow NSE's terms of service).

内容的提问来源于stack exchange,提问作者Wacao

火山引擎 最新活动