High Charts数据爬取问题：无法获取图表中全部数据点

阿华AIGC实验室

2026-4-27

问题：如何爬取PriceCharting中High Charts的多标签数据（Loose/CIB/New等）

我正在尝试爬取PriceCharting上《The Legend of Zelda》（PAL NES版本）的High Charts图表数据，链接是：https://www.pricecharting.com/game/pal-nes/legend-of-zelda。目前的代码只能获取默认显示的「Loose」标签对应的数据，但我需要提取「CIB」「New」「Graded」「Boxed」「Manual」这些其他选项的图表数据，不知道该怎么处理。以下是我目前仅对「Loose」有效的代码：

import time
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
service = Service(executable_path="/driver_selenium/geckodriver.exe")
driver = webdriver.Firefox(service=service)
website = "https://www.pricecharting.com/game/pal-nes/legend-of-zelda#completed-auctions-graded"
driver.get(website)
time.sleep(5)
temp = driver.execute_script('return window.Highcharts.charts[0].series[0].options.data')
data = [item[1] for item in temp]
print(data)

如果能提供无需使用Selenium的更优数据提取方案（类似《Scrape highchart into python》里的实现方式），会额外感谢！

解决方案

一、改进Selenium代码，支持多标签数据提取

问题的核心是切换标签后，页面会重新渲染High Charts数据，我们需要模拟点击每个标签，等待图表更新后再提取对应数据。这里推荐用WebDriverWait替代time.sleep，让等待更高效可靠。

代码示例：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium import webdriver

# 初始化浏览器
service = Service(executable_path="/driver_selenium/geckodriver.exe")
driver = webdriver.Firefox(service=service)
website = "https://www.pricecharting.com/game/pal-nes/legend-of-zelda#completed-auctions-graded"
driver.get(website)

# 要获取的所有标签
condition_labels = ["Loose", "CIB", "New", "Graded", "Boxed", "Manual"]
all_data = {}

wait = WebDriverWait(driver, 10)

for label in condition_labels:
    try:
        # 点击对应的标签按钮
        tab_button = wait.until(EC.element_to_be_clickable((By.XPATH, f"//button[text()='{label}']")))
        tab_button.click()
        
        # 等待图表数据加载完成（通过判断Highcharts是否存在且数据非空）
        wait.until(lambda d: d.execute_script('return window.Highcharts && window.Highcharts.charts[0] && window.Highcharts.charts[0].series[0].options.data.length > 0'))
        
        # 提取当前标签的数据
        temp = driver.execute_script('return window.Highcharts.charts[0].series[0].options.data')
        price_data = [item[1] for item in temp]
        all_data[label] = price_data
        print(f"成功获取{label}标签的数据")
    except Exception as e:
        print(f"获取{label}标签数据失败: {str(e)}")

# 可以将数据转为DataFrame方便处理
df = pd.DataFrame(all_data)
print(df.head())

driver.quit()

关键说明：

使用WebDriverWait等待元素可点击和图表加载，避免固定等待时间的低效问题
通过XPATH定位每个标签按钮，确保准确点击
循环遍历所有需要的标签，将数据存入字典统一管理

二、无Selenium的更优方案：直接请求API接口

PriceCharting在切换标签时会发送AJAX请求获取数据，我们可以直接调用这些接口，无需渲染整个页面，效率更高。

步骤说明：

打开浏览器开发者工具（F12），切换到「Network」标签
点击页面上的不同价格标签，观察XHR请求，会发现类似/ajax/price-chart-data的请求
分析请求参数：其中product_id是当前游戏的ID（可以从页面URL或源码中提取，这里PAL NES Zelda的ID是1026），condition对应不同标签（loose/cib/new/graded/boxed/manual）

代码示例：

import requests
import pandas as pd

# 请求基础配置
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
    "Referer": "https://www.pricecharting.com/game/pal-nes/legend-of-zelda"
}
base_url = "https://www.pricecharting.com/ajax/price-chart-data"
product_id = 1026  # 从页面源码或网络请求中获取
condition_mapping = {
    "Loose": "loose",
    "CIB": "cib",
    "New": "new",
    "Graded": "graded",
    "Boxed": "boxed",
    "Manual": "manual"
}

all_data = {}

for label, condition in condition_mapping.items():
    params = {
        "product_id": product_id,
        "condition": condition,
        "type": "completed-auctions"  # 对应页面的「Completed Auctions」图表
    }
    
    try:
        response = requests.get(base_url, headers=headers, params=params)
        response.raise_for_status()  # 检查请求是否成功
        data = response.json()
        
        # 提取价格数据（数据格式是[[日期戳, 价格], ...]）
        price_data = [item[1] for item in data["series"][0]["data"]]
        all_data[label] = price_data
        print(f"成功获取{label}标签的数据")
    except Exception as e:
        print(f"获取{label}标签数据失败: {str(e)}")

# 转为DataFrame
df = pd.DataFrame(all_data)
print(df.head())