You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在Selenium(Python)中用XPath实现for循环遍历搜索结果

如何将Selenium维基百科爬虫的while循环改为遍历所有结果的for循环

Hey there! Let's fix that loop issue so you can scrape all Wikipedia search results without worrying about hardcoding a number or hitting errors. Here's how to refactor your code properly:

核心问题分析

Your original while loop relies on a fixed count (x !=10), which causes two frustrating problems:

  • If there are fewer than 10 results, you'll get a NoSuchElementException when trying to access an index that doesn't exist
  • If there are more than 10 results, you'll miss scraping all the extra entries

The smarter approach is to grab all matching elements first, then iterate over that collection—this way you automatically handle any number of results, no matter how many or how few.

改进后的代码

We'll add a wait to ensure the page finishes loading (critical for Selenium to find elements reliably) and use the plural find_elements_by_xpath method to get all result titles:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

path = r"C:\webdrivers\chromedriver.exe"  # 用原始字符串避免反斜杠转义问题
driver = webdriver.Chrome(path)

try:
    driver.get("https://en.wikipedia.org/w/index.php?cirrusUserTesting=glent_m0&search=1st+indochinese+war&title=Special%3ASearch&go=Go&ns0=1")
    
    # 最多等待10秒,直到搜索结果列表加载完成
    WebDriverWait(driver, 10).until(
        EC.presence_of_all_elements_located((By.XPATH, '//*[@id="mw-content-text"]/div[3]/ul/li/div[1]/a'))
    )
    
    # 获取所有结果标题元素(复数方法返回一个列表)
    result_titles = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[3]/ul/li/div[1]/a')
    
    # 遍历每个标题元素并打印文本
    for title in result_titles:
        print(title.text)
        
finally:
    driver.quit()  # 确保即使出错,浏览器也能正常关闭

关键改进点

  • 原始字符串路径: 在路径前加r,避免Python将反斜杠解析为转义字符导致的路径错误
  • 显式等待: 使用WebDriverWait等待结果加载完成,避免页面未加载完就查找元素的竞态问题
  • 复数元素查找: find_elements_by_xpath返回所有匹配的元素列表,不用再猜测结果数量
  • try/finally: 确保即使爬取过程中出现错误,浏览器也会被正确关闭

内容的提问来源于stack exchange,提问作者Morgan

火山引擎 最新活动