You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python网页注释数据抓取报错:IndexError列表索引越界

Fixing the IndexError in Your Web Scraping Script

Hey there! Let's break down why you're hitting that IndexError: list index out of range and get your script back on track.

What's Causing the Error?

Your code loops through every comment in the page, but not all of those comments contain the #tco_detail_data element you're targeting. When sauce.select("#tco_detail_data") runs on a comment without that element, it returns an empty list. Trying to grab the first item with [0] on an empty list triggers the index error immediately.

Here's the Fixed Script

We'll adjust the code to safely check for the target element before trying to process it:

import requests
from bs4 import BeautifulSoup, Comment

res = requests.get("replace_with_the_above_link")
soup = BeautifulSoup(res.text, 'lxml')

for comment in soup.find_all(string=lambda text: isinstance(text, Comment)):
    sauce = BeautifulSoup(comment, 'lxml')
    # Use find() instead of select()[0] to avoid immediate errors
    items = sauce.find(id="tco_detail_data")
    if items:
        # Only process data if we actually found the target element
        data = ' '.join([' '.join(item.text.split()) for item in items.select("li")])
        print(data)
    # Optional: Uncomment below to debug which comments don't have the element
    # else:
    #     print(f"Skipping comment (no #tco_detail_data): {comment[:100]}...")

Key Changes Explained

  1. Swap select()[0] for find():
    sauce.find(id="tco_detail_data") returns None if the element isn't found, instead of throwing an error right away. This gives us control over how to handle missing elements.
  2. Add an existence check:
    The if items: condition ensures we only run the data-processing logic when we've successfully located the element we need.

Bonus Debug Tip

If you're still not seeing your expected output, uncomment the else block to print a snippet of comments that don't contain your target element. This can help you confirm whether the element is actually in a comment, or if the page structure has changed since you wrote the script.

内容的提问来源于stack exchange,提问作者SIM

火山引擎 最新活动