如何获取论文过去12个月的引用量？API实践遇问题

阿华AIGC实验室

2026-6-1

解决方案：获取指定时间范围的论文引用量

方案1：Semantic Scholar API 分段查询绕开10k限制

Semantic Scholar的offset + limit不能超过10000，但可以把目标时间窗口拆成多个小时间段，每个时间段单独请求，最后累加总数。毕竟《Attention is All you Need》单年7k引用，拆成季度的话，每个季度也就1-2k，完全不会碰上限。

具体操作：
1. 把2025.4.25-2026.4.26拆成4个季度区间，比如2025.4.25-2025.7.25、2025.7.25-2025.10.25、2025.10.25-2026.1.25、2026.1.25-2026.4.26。
2. 每个区间单独调用Semantic Scholar的引用接口，用filter=citationDate:[startDate TO endDate]过滤时间，再用offset+limit分页拿全该区间的引用。
3. 把每个区间的引用数加起来就是总数量。
附Python代码片段：

import requests

def get_citations_in_range(paper_id, start_date, end_date):
    api_url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}/citations"
    headers = {"Accept": "application/json"}
    total = 0
    offset = 0
    limit = 1000  # 单次请求最大取1000条，避免触发限制
    while True:
        params = {
            "offset": offset,
            "limit": limit,
            "filter": f"citationDate:[{start_date} TO {end_date}]",
            "fields": "citationDate"
        }
        response = requests.get(api_url, headers=headers, params=params)
        if response.status_code != 200:
            break
        data = response.json()
        total += len(data.get("data", []))
        if not data.get("nextPage"):
            break
        offset += limit
    return total

# 《Attention is All you Need》的Semantic Scholar ID
paper_id = "17f3b26886a4e7a11e0a49d34e9c12d13c9e37f8"
time_ranges = [
    ("2025-04-25", "2025-07-25"),
    ("2025-07-25", "2025-10-25"),
    ("2025-10-25", "2026-01-25"),
    ("2026-01-25", "2026-04-26")
]

total_citations = sum(get_citations_in_range(paper_id, s, e) for s, e in time_ranges)
print(f"过去12个月总引用量：{total_citations}")

方案2：用Semantic Scholar的统计端点直接拿年度/月度数据

Semantic Scholar有个接口可以直接返回论文的年度、月度引用统计，不用遍历所有引用条目，效率更高。

调用https://api.semanticscholar.org/graph/v1/paper/{paper_id}?fields=citationStats，返回的citationStats里有yearly和monthly数据。
比如要算2025.4.25-2026.4.26的引用，就把2025年4-12月的引用量，加上2026年1-4月的引用量就行。如果需要精确到天，再结合方案1的小分段微调。

方案3：修复OpenAlex的查询问题

OpenAlex返回结果少，大概率是查询参数错了或者没分页拿全数据：

用正确的论文ID：《Attention is All you Need》的OpenAlex ID是W2741950973。
设置正确的时间过滤：filter=cites:{paper_id},publication_date:{start_date},{end_date}，注意日期格式要对。
用OpenAlex的cursor分页，不要用offset，这样能拿到全量数据，不会遗漏。
附Python代码片段：

import requests

def get_openalex_citations(paper_id, start_date, end_date):
    api_url = f"https://api.openalex.org/works?filter=cites:{paper_id},publication_date:{start_date},{end_date}"
    headers = {"Accept": "application/json"}
    total = 0
    cursor = "*"
    while cursor:
        params = {"cursor": cursor, "per-page": 200}
        response = requests.get(api_url, headers=headers, params=params)
        if response.status_code != 200:
            break
        data = response.json()
        total += data["meta"]["count"]
        cursor = data["meta"].get("next_cursor")
    return total

paper_id = "W2741950973"
total = get_openalex_citations(paper_id, "2025-04-25", "2026-04-26")
print(f"OpenAlex统计的过去12个月引用量：{total}")