为何pytrends返回值与Google Trends UI不一致？（已归一化）

阿华AIGC实验室

2026-6-12

问题：pytrends 采集数据与 Google Trends 网页端不一致

现象描述

使用pytrends采集《乱世佳人》《卡萨布兰卡》等经典电影在美、法、英三国的5年（today 5-y）趋势数据，已自定义headers规避429错误
即便时间范围、地区、关键词完全匹配，生成的DataFrame数据与网页端存在多处差异：
- 峰值出现的周数不同
- 数值缩放比例不一致
- 关键词间的相对数据差异

核心原因

Google Trends的非官方API（pytrends依赖）与网页端存在以下差异：

数据采样机制：网页端采用更精细的采样策略，API返回的是聚合后的数据，部分细节丢失
批次缩放逻辑：pytrends批量查询时会以批次内最高热度为基准缩放，与网页端固定基准的缩放方式不同
时区对齐问题：原代码统一设置tz=360，与不同国家网页端的时区处理不一致，导致周数据日期错位
缓存与实时性：网页端展示最新实时数据，API返回的是缓存聚合数据

解决方案

针对上述问题，调整代码如下：

调整后的代码

import pandas as pd
from pytrends.request import TrendReq as UTrendReq
import time
import random
from functools import reduce

REFERENCE_MOVIE = "Gone with the Wind"
movies = [
    "Gone with the Wind",
    "Casablanca",
    "The Godfather",
    "Citizen Kane",
    "The Sound of Music",
    "12 Angry Men",
    "Psycho",
    "Singin' in the Rain"
]

countries = {
    "US": "United States",
    "FR": "France",
    "GB": "United Kingdom"
}

category = 0
timeframe = "today 5-y"
gprop = ""

class TrendReq(UTrendReq):
    def _get_data(self, url, method='get', trim_chars=0, **kwargs):
        headers = {
            'accept': 'application/json, text/plain, */*',
            'accept-language': 'en-US,en;q=0.9',
            'content-type': 'application/json;charset=UTF-8',
            'origin': 'https://trends.google.com',
            'referer': 'https://trends.google.com/trends/',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
        }
        return super()._get_data(url, method=method, trim_chars=trim_chars, headers=headers, **kwargs)

# 匹配各国网页端时区
tz_map = {
    "US": -300,  # UTC-5（美国东部标准时间）
    "FR": 60,    # UTC+1（法国时区）
    "GB": 0      # UTC+0（英国标准时间）
}

all_dfs = []

for country_code, country_name in countries.items():
    # 针对当前国家设置对应时区
    pytrends = TrendReq(hl='en-US', tz=tz_map[country_code])
    
    # 单关键词查询，避免批次缩放干扰
    for movie in movies:
        for attempt in range(3):
            try:
                pytrends.build_payload([movie], cat=category, timeframe=timeframe, geo=country_code, gprop=gprop)
                df = pytrends.interest_over_time().drop(columns=['isPartial'], errors='ignore')
                df = df.rename(columns={movie: f"{movie}: ({country_name})"})
                df = df.reset_index()
                all_dfs.append(df)
                break
            except Exception as e:
                if attempt < 2:
                    time.sleep(random.uniform(5, 10))
        time.sleep(random.uniform(5, 10))

if all_dfs:
    result = reduce(lambda left, right: pd.merge(left, right, on="date", how="outer"), all_dfs)
    result = result.rename(columns={"date": "Week"})

    movie_row = [""] + [col.split(": (")[0] for col in result.columns if col != "Week"]
    country_row = ["Week"] + [col.split(": (")[1][:-1] for col in result.columns if col != "Week"]

    final_df = pd.DataFrame([movie_row, country_row], columns=result.columns)
    final_df = pd.concat([final_df, result], ignore_index=True)

    final_df.to_csv("movie_trends_interest_over_time.csv", header=False, index=False)
else:
    print("No data collected.")