如何在Selenium（Python）中用XPath实现for循环遍历搜索结果

阿华AIGC实验室

2026-5-7

如何将Selenium维基百科爬虫的while循环改为遍历所有结果的for循环

Hey there! Let's fix that loop issue so you can scrape all Wikipedia search results without worrying about hardcoding a number or hitting errors. Here's how to refactor your code properly:

核心问题分析

Your original while loop relies on a fixed count (x !=10), which causes two frustrating problems:

If there are fewer than 10 results, you'll get a NoSuchElementException when trying to access an index that doesn't exist
If there are more than 10 results, you'll miss scraping all the extra entries

The smarter approach is to grab all matching elements first, then iterate over that collection—this way you automatically handle any number of results, no matter how many or how few.

改进后的代码

We'll add a wait to ensure the page finishes loading (critical for Selenium to find elements reliably) and use the plural find_elements_by_xpath method to get all result titles:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

path = r"C:\webdrivers\chromedriver.exe"  # 用原始字符串避免反斜杠转义问题
driver = webdriver.Chrome(path)

try:
    driver.get("https://en.wikipedia.org/w/index.php?cirrusUserTesting=glent_m0&search=1st+indochinese+war&title=Special%3ASearch&go=Go&ns0=1")
    
    # 最多等待10秒，直到搜索结果列表加载完成
    WebDriverWait(driver, 10).until(
        EC.presence_of_all_elements_located((By.XPATH, '//*[@id="mw-content-text"]/div[3]/ul/li/div[1]/a'))
    )
    
    # 获取所有结果标题元素（复数方法返回一个列表）
    result_titles = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[3]/ul/li/div[1]/a')
    
    # 遍历每个标题元素并打印文本
    for title in result_titles:
        print(title.text)
        
finally:
    driver.quit()  # 确保即使出错，浏览器也能正常关闭