如何用Python结合维基百科API获取已存在页面的分类？

阿华AIGC实验室

2026-5-15

当然可以用维基百科API实现这个需求！

比起直接用urlopen请求页面判断状态码，用维基百科的MediaWiki API来完成页面存在检查+分类获取会更规范，毕竟这是官方提供的接口，稳定性和合法性都更有保障。下面给你具体的实现思路和代码示例：

核心实现思路

维基百科API的query模块可以同时完成这两个任务：我们只需要指定要查询的词条名，加上prop=categories参数来获取分类，再用format=json让返回结果易于解析。

Python代码示例（基于你使用的`urllib`库）

import urllib.request
import urllib.parse
import json

term = "forensics"
api_base_url = "https://en.wikipedia.org/w/api.php"

# 构造API请求参数
request_params = {
    "action": "query",
    "titles": term,
    "prop": "categories",
    "cllimit": "max",  # 获取所有可用分类（可根据需求调整数量）
    "clshow": "!hidden",  # 排除维基内部维护的隐藏分类（比如审核类分类）
    "format": "json"
}

# 编码参数并拼接完整请求URL
encoded_params = urllib.parse.urlencode(request_params)
full_request_url = f"{api_base_url}?{encoded_params}"

try:
    # 发送请求并解析返回数据
    response = urllib.request.urlopen(full_request_url)
    response_data = json.loads(response.read().decode())
    
    # 提取页面信息，判断是否存在
    pages = response_data["query"]["pages"]
    page_id = next(iter(pages.keys()))
    
    if page_id == "-1":
        print(f"词条 '{term}' 不存在于维基百科")
    else:
        print(f"词条 '{term}' 存在，对应的分类如下：")
        # 提取分类名称并去掉"Category:"前缀
        category_list = [cat["title"].replace("Category:", "") for cat in pages[page_id]["categories"]]
        for category in category_list:
            print(f"- {category}")
except Exception as e:
    print(f"请求过程中出现错误：{e}")

关键细节说明

页面存在判断：API返回的pages对象中，如果词条不存在，对应的pageid会是-1；若存在则会返回正常的数字ID。
分类过滤：clshow=!hidden参数可以帮你过滤掉维基内部的管理类隐藏分类，只保留和词条主题相关的公开分类。
请求规范：维基百科API有请求频率限制，短时间内不要发送大量请求，避免被临时封禁。如果是批量查询，建议使用continue参数分页获取结果。

更简洁的替代方案（使用`requests`库）

如果你能安装requests库，代码会更易读和简洁：

import requests

term = "forensics"
api_base_url = "https://en.wikipedia.org/w/api.php"

request_params = {
    "action": "query",
    "titles": term,
    "prop": "categories",
    "cllimit": "max",
    "clshow": "!hidden",
    "format": "json"
}

response = requests.get(api_base_url, params=request_params)
response_data = response.json()

pages = response_data["query"]["pages"]
page_id = next(iter(pages.keys()))

if page_id == "-1":
    print(f"词条 '{term}' 不存在于维基百科")
else:
    print(f"词条 '{term}' 存在，对应的分类如下：")
    category_list = [cat["title"].replace("Category:", "") for cat in pages[page_id]["categories"]]
    for category in category_list:
        print(f"- {category}")

这样就能完美实现你的需求啦，既准确判断页面是否存在，又能拿到对应的主题分类~

内容的提问来源于stack exchange，提问作者J Cena