Python网页注释数据抓取报错:IndexError列表索引越界
Hey there! Let's break down why you're hitting that IndexError: list index out of range and get your script back on track.
What's Causing the Error?
Your code loops through every comment in the page, but not all of those comments contain the #tco_detail_data element you're targeting. When sauce.select("#tco_detail_data") runs on a comment without that element, it returns an empty list. Trying to grab the first item with [0] on an empty list triggers the index error immediately.
Here's the Fixed Script
We'll adjust the code to safely check for the target element before trying to process it:
import requests from bs4 import BeautifulSoup, Comment res = requests.get("replace_with_the_above_link") soup = BeautifulSoup(res.text, 'lxml') for comment in soup.find_all(string=lambda text: isinstance(text, Comment)): sauce = BeautifulSoup(comment, 'lxml') # Use find() instead of select()[0] to avoid immediate errors items = sauce.find(id="tco_detail_data") if items: # Only process data if we actually found the target element data = ' '.join([' '.join(item.text.split()) for item in items.select("li")]) print(data) # Optional: Uncomment below to debug which comments don't have the element # else: # print(f"Skipping comment (no #tco_detail_data): {comment[:100]}...")
Key Changes Explained
- Swap
select()[0]forfind():sauce.find(id="tco_detail_data")returnsNoneif the element isn't found, instead of throwing an error right away. This gives us control over how to handle missing elements. - Add an existence check:
Theif items:condition ensures we only run the data-processing logic when we've successfully located the element we need.
Bonus Debug Tip
If you're still not seeing your expected output, uncomment the else block to print a snippet of comments that don't contain your target element. This can help you confirm whether the element is actually in a comment, or if the page structure has changed since you wrote the script.
内容的提问来源于stack exchange,提问作者SIM




