基于Python、SPARQL、DBpedia及Wikidata的人物信息获取需求

阿华AIGC实验室

2026-5-19

处理双语人名CSV：DBpedia与Wikidata定向查询及结果整合方案

针对你的需求，我整理了一套从CSV读取人名、定向查询两个知识库，再整合结果的完整流程，分步骤说明如下：

1. 准备工作：加载并预处理CSV数据

首先你需要把CSV里的人名提取出来，区分英文和希伯来语版本。这里用Python的pandas来处理最方便：

import pandas as pd

# 加载CSV，假设列名是"english_name"和"hebrew_name"
df = pd.read_csv("your_names.csv")

# 确保人名没有多余空格，避免查询失误
df["english_name"] = df["english_name"].str.strip()
df["hebrew_name"] = df["hebrew_name"].str.strip()

2. 定向查询DBpedia（按特定人名）

DBpedia支持SPARQL查询，你可以针对每个英文/希伯来语人名构造查询语句，匹配rdfs:label来定位实体：

from SPARQLWrapper import SPARQLWrapper, JSON

def query_dbpedia(name, lang="en"):
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    # 构造查询：匹配对应语言的标签，提取所需字段
    query = f"""
    SELECT ?person ?birthDate ?birthPlace ?deathDate ?deathPlace WHERE {{
        ?person rdfs:label "{name}"@{lang}.
        OPTIONAL {{ ?person dbo:birthDate ?birthDate. }}
        OPTIONAL {{ ?person dbo:birthPlace ?birthPlace. }}
        OPTIONAL {{ ?person dbo:deathDate ?deathDate. }}
        OPTIONAL {{ ?person dbo:deathPlace ?deathPlace. }}
    }}
    """
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    # 整理结果
    if results["results"]["bindings"]:
        res = results["results"]["bindings"][0]
        return {
            "english_dbpedia_link": res["person"]["value"],
            "birth_date": res.get("birthDate", {}).get("value"),
            "birth_place": res.get("birthPlace", {}).get("value"),
            "death_date": res.get("deathDate", {}).get("value"),
            "death_place": res.get("deathPlace", {}).get("value"),
            "source": "DBpedia"
        }
    return None

提示：如果希伯来语人名在DBpedia有对应标签，把lang参数改成"he"即可尝试匹配。

3. 定向查询Wikidata（按特定人名）

Wikidata同样用SPARQL，它支持多语言标签匹配，你可以同时传入英文和希伯来语人名来提高匹配准确率：

def query_wikidata(english_name, hebrew_name):
    sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
    query = f"""
    SELECT ?person ?birthDate ?birthPlace ?deathDate ?deathPlace WHERE {{
        ?person rdfs:label "{english_name}"@en.
        OPTIONAL {{ ?person rdfs:label "{hebrew_name}"@he. }}
        OPTIONAL {{ ?person wdt:P569 ?birthDate. }}
        OPTIONAL {{ ?person wdt:P19 ?birthPlace. }}
        OPTIONAL {{ ?person wdt:P570 ?deathDate. }}
        OPTIONAL {{ ?person wdt:P20 ?deathPlace. }}
    }}
    """
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    if results["results"]["bindings"]:
        res = results["results"]["bindings"][0]
        return {
            "birth_date": res.get("birthDate", {}).get("value"),
            "birth_place": res.get("birthPlace", {}).get("value"),
            "death_date": res.get("deathDate", {}).get("value"),
            "death_place": res.get("deathPlace", {}).get("value"),
            "source": "Wikidata"
        }
    return None

注意：Wikidata的属性ID是固定的（比如P569是出生日期），如果需要更详细的地点名称，可以在查询里加上?birthPlace rdfs:label ?birthPlaceName来获取对应语言的名称。

4. 整合查询结果并输出

遍历CSV里的每一行，分别查询两个知识库，然后合并结果（优先DBpedia的链接，同时补充Wikidata的缺失字段）：

final_results = []

for idx, row in df.iterrows():
    eng_name = row["english_name"]
    heb_name = row["hebrew_name"]
    
    # 查询DBpedia
    dbpedia_res = query_dbpedia(eng_name)
    # 如果英文没查到，试希伯来语
    if not dbpedia_res:
        dbpedia_res = query_dbpedia(heb_name, lang="he")
    
    # 查询Wikidata
    wikidata_res = query_wikidata(eng_name, heb_name)
    
    # 整合结果
    combined = {
        "name": eng_name,
        "hebrew_name": heb_name,
        "english_dbpedia_link": dbpedia_res.get("english_dbpedia_link") if dbpedia_res else None,
        "birth_date": dbpedia_res.get("birth_date") if (dbpedia_res and dbpedia_res["birth_date"]) else wikidata_res.get("birth_date"),
        "birth_place": dbpedia_res.get("birth_place") if (dbpedia_res and dbpedia_res["birth_place"]) else wikidata_res.get("birth_place"),
        "death_date": dbpedia_res.get("death_date") if (dbpedia_res and dbpedia_res["death_date"]) else wikidata_res.get("death_date"),
        "death_place": dbpedia_res.get("death_place") if (dbpedia_res and dbpedia_res["death_place"]) else wikidata_res.get("death_place"),
        "source": "DBpedia" if dbpedia_res else ("Wikidata" if wikidata_res else "Not found")
    }
    final_results.append(combined)

# 保存为新的CSV
final_df = pd.DataFrame(final_results)
final_df.to_csv("person_info.csv", index=False, encoding="utf-8")