High Charts数据爬取问题:无法获取图表中全部数据点
问题:如何爬取PriceCharting中High Charts的多标签数据(Loose/CIB/New等)
我正在尝试爬取PriceCharting上《The Legend of Zelda》(PAL NES版本)的High Charts图表数据,链接是:https://www.pricecharting.com/game/pal-nes/legend-of-zelda。目前的代码只能获取默认显示的「Loose」标签对应的数据,但我需要提取「CIB」「New」「Graded」「Boxed」「Manual」这些其他选项的图表数据,不知道该怎么处理。以下是我目前仅对「Loose」有效的代码:
import time import pandas as pd from selenium.webdriver.chrome.service import Service from selenium import webdriver service = Service(executable_path="/driver_selenium/geckodriver.exe") driver = webdriver.Firefox(service=service) website = "https://www.pricecharting.com/game/pal-nes/legend-of-zelda#completed-auctions-graded" driver.get(website) time.sleep(5) temp = driver.execute_script('return window.Highcharts.charts[0].series[0].options.data') data = [item[1] for item in temp] print(data)
如果能提供无需使用Selenium的更优数据提取方案(类似《Scrape highchart into python》里的实现方式),会额外感谢!
解决方案
一、改进Selenium代码,支持多标签数据提取
问题的核心是切换标签后,页面会重新渲染High Charts数据,我们需要模拟点击每个标签,等待图表更新后再提取对应数据。这里推荐用WebDriverWait替代time.sleep,让等待更高效可靠。
代码示例:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By import pandas as pd from selenium.webdriver.chrome.service import Service from selenium import webdriver # 初始化浏览器 service = Service(executable_path="/driver_selenium/geckodriver.exe") driver = webdriver.Firefox(service=service) website = "https://www.pricecharting.com/game/pal-nes/legend-of-zelda#completed-auctions-graded" driver.get(website) # 要获取的所有标签 condition_labels = ["Loose", "CIB", "New", "Graded", "Boxed", "Manual"] all_data = {} wait = WebDriverWait(driver, 10) for label in condition_labels: try: # 点击对应的标签按钮 tab_button = wait.until(EC.element_to_be_clickable((By.XPATH, f"//button[text()='{label}']"))) tab_button.click() # 等待图表数据加载完成(通过判断Highcharts是否存在且数据非空) wait.until(lambda d: d.execute_script('return window.Highcharts && window.Highcharts.charts[0] && window.Highcharts.charts[0].series[0].options.data.length > 0')) # 提取当前标签的数据 temp = driver.execute_script('return window.Highcharts.charts[0].series[0].options.data') price_data = [item[1] for item in temp] all_data[label] = price_data print(f"成功获取{label}标签的数据") except Exception as e: print(f"获取{label}标签数据失败: {str(e)}") # 可以将数据转为DataFrame方便处理 df = pd.DataFrame(all_data) print(df.head()) driver.quit()
关键说明:
- 使用
WebDriverWait等待元素可点击和图表加载,避免固定等待时间的低效问题 - 通过XPATH定位每个标签按钮,确保准确点击
- 循环遍历所有需要的标签,将数据存入字典统一管理
二、无Selenium的更优方案:直接请求API接口
PriceCharting在切换标签时会发送AJAX请求获取数据,我们可以直接调用这些接口,无需渲染整个页面,效率更高。
步骤说明:
- 打开浏览器开发者工具(F12),切换到「Network」标签
- 点击页面上的不同价格标签,观察XHR请求,会发现类似
/ajax/price-chart-data的请求 - 分析请求参数:其中
product_id是当前游戏的ID(可以从页面URL或源码中提取,这里PAL NES Zelda的ID是1026),condition对应不同标签(loose/cib/new/graded/boxed/manual)
代码示例:
import requests import pandas as pd # 请求基础配置 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", "Referer": "https://www.pricecharting.com/game/pal-nes/legend-of-zelda" } base_url = "https://www.pricecharting.com/ajax/price-chart-data" product_id = 1026 # 从页面源码或网络请求中获取 condition_mapping = { "Loose": "loose", "CIB": "cib", "New": "new", "Graded": "graded", "Boxed": "boxed", "Manual": "manual" } all_data = {} for label, condition in condition_mapping.items(): params = { "product_id": product_id, "condition": condition, "type": "completed-auctions" # 对应页面的「Completed Auctions」图表 } try: response = requests.get(base_url, headers=headers, params=params) response.raise_for_status() # 检查请求是否成功 data = response.json() # 提取价格数据(数据格式是[[日期戳, 价格], ...]) price_data = [item[1] for item in data["series"][0]["data"]] all_data[label] = price_data print(f"成功获取{label}标签的数据") except Exception as e: print(f"获取{label}标签数据失败: {str(e)}") # 转为DataFrame df = pd.DataFrame(all_data) print(df.head())
关键说明:
- 构造请求头时加入
User-Agent和Referer,避免被网站反爬拦截 product_id可以通过查看页面源码(搜索product_id)或网络请求参数获取- 接口返回的JSON数据结构清晰,直接提取对应series的data即可,无需处理页面渲染
内容的提问来源于stack exchange,提问作者Lee Roy




