网页爬取NBA数据导出CSV报错：IndexError列表索引越界

阿华AIGC实验室

2026-5-14

解决你的NBA数据爬取CSV导出报错问题

这个IndexError: list index out of range报错出现在table = soup.find_all('table')[1]这一行，本质原因是：你的代码尝试获取页面中第2个table（索引从0开始），但实际请求回来的页面里要么没有这么多table，要么请求被网站反爬机制拦截，返回了空页面/验证页面，导致找不到目标表格。

下面一步步帮你解决问题：

1. 先排查请求是否被拦截

首先在scrape_data函数里加几行调试代码，确认请求的有效性：

def scrape_data(url):
    response = requests.get(url, timeout=10)
    # 打印请求状态码（200代表成功，403/404就是有问题）
    print(f"请求状态码: {response.status_code}")
    soup = BeautifulSoup(response.content, 'html.parser')
    # 打印页面里的table总数
    print(f"页面table数量: {len(soup.find_all('table'))}")
    # 剩下的原有代码...

如果状态码不是200，或者table数量为0，那就是请求被网站反爬拦截了，需要模拟浏览器请求。

2. 添加请求头绕过反爬

CBS Sports会检测请求是否来自真实浏览器，我们给requests.get加上headers参数，模拟浏览器身份：

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
response = requests.get(url, headers=headers, timeout=10)

3. 不要依赖索引定位表格

原代码用find_all('table')[1]靠索引选表格非常不稳定——网站结构一旦调整，索引就会失效。建议通过表格的class属性精准定位，比如观察页面源码，目标统计表格的class通常是TableBase-table，用以下代码替换原有的table定位逻辑：

# 精准定位统计表格，若class不对，可自己看页面源码调整
table = soup.find('table', class_='TableBase-table')
if not table:
    print("找不到目标表格，请检查页面结构或class名称")
    return

4. 修正表头和数据行的提取逻辑

原代码里的rows = table.select('tbody > tr')是错误的（>是HTML实体，应该直接写>），而且表头应该从thead标签里提取，而不是从tbody的行里取。

修改后的完整可运行代码

import csv
import requests
from bs4 import BeautifulSoup

def scrape_data(url):
    # 添加请求头模拟浏览器
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
    }
    response = requests.get(url, headers=headers, timeout=10)
    # 若请求失败直接抛出异常，方便排查
    response.raise_for_status()
    
    soup = BeautifulSoup(response.content, 'html.parser')
    # 精准定位目标统计表格
    table = soup.find('table', class_='TableBase-table')
    if not table:
        print("找不到目标表格，请检查页面结构或class名称")
        return
    
    # 从thead提取表头
    header = [th.text.strip() for th in table.find('thead').find_all('th')]
    # 从tbody提取所有数据行
    rows = table.find('tbody').find_all('tr')
    
    # 写入CSV，避免空行和乱码
    with open('statsoutput.csv', 'w', newline='', encoding='utf-8') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(header)
        for row in rows:
            data = [td.text.strip() for td in row.find_all('td')]
            # 确保数据长度和表头一致，过滤无效行
            if len(data) == len(header):
                writer.writerow(data)

if __name__=="__main__":
    url = "https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame"
    scrape_data(url)