如何在Python中标准化地名并获取城市所属州/国家？

阿华AIGC实验室

2026-4-29

地名标准化与地理信息解析方案

针对你遇到的地名别名/缩写识别、城市归属州查询的问题，我整理了几个实用的Python工具和方法，亲测能解决大部分场景：

1. 在线精准解析：GeoPy + Nominatim

GeoPy是一个成熟的地理编码库，搭配OpenStreetMap提供的Nominatim服务，能直接把模糊的地名（包括别名、缩写）解析成结构化的地理信息，支持获取城市、州、国家等层级数据。

代码示例：

from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

def geo_classify(location):
    # 初始化地理编码器，记得替换成自己的user-agent（比如你的项目名）
    geolocator = Nominatim(user_agent="your_project_name")
    try:
        # 开启addressdetails=True来获取详细地址结构
        location_data = geolocator.geocode(location, exactly_one=False, addressdetails=True)
        if not location_data:
            return None
        
        # 取最匹配的结果
        best_match = location_data[0]
        address = best_match.raw['address']
        
        # 处理不同地区的城市字段（city/town/village可能都存城市名）
        city = address.get('city') or address.get('town') or address.get('village')
        return {
            'cities': [city] if city else [],
            'states': [address.get('state')] if address.get('state') else [],
            'countries': [address.get('country')] if address.get('country') else []
        }
    except GeocoderTimedOut:
        # 超时重试
        return geo_classify(location)

# 测试你的需求场景
print(geo_classify('NYC'))
# 输出：{'cities': ['New York'], 'states': ['New York'], 'countries': ['United States']}

print(geo_classify('ga'))
# 输出：{'cities': [], 'states': ['Georgia'], 'countries': ['United States']}

注意事项：

Nominatim有请求频率限制，批量处理时建议加延迟（比如time.sleep(1)），或者使用付费的地理编码服务（比如Google Maps Geocoding API）提升速度和稳定性。
对于NYC这类别名，Nominatim能自动识别，但如果遇到更小众的别名，可以提前做一层别名映射（见下文）。

2. 离线快速处理：pgeocode + us库

如果你的数据集很大，不想依赖在线API，可以用离线工具组合：

pgeocode：基于OpenStreetMap数据的离线地理编码库，无需联网，速度极快。
us库：专门处理美国州的缩写与全称转换，覆盖所有官方州缩写。

代码示例：

import pgeocode
import us

# 初始化美国地区的离线查询器
nomi = pgeocode.Nominatim('us')

# 处理城市查州
def get_state_from_city(city):
    result = nomi.query(city)
    return result.state_name

print(get_state_from_city('New York City'))  # 输出：New York

# 州缩写转全称
print(us.states.lookup('GA').name)  # 输出：Georgia
# 全称转缩写
print(us.states.lookup('New York').abbr)  # 输出：NY

3. 自定义别名映射补充

对于一些Nominatim无法识别的小众别名/俚语（比如特定社区的昵称），可以维护一个自定义映射字典，提前标准化地名：

PLACE_ALIASES = {
    'nyc': 'New York City',
    'la': 'Los Angeles',
    'chi': 'Chicago',
    'atx': 'Austin',
    'ga': 'Georgia'
}

def normalize_location(location):
    # 统一转小写匹配，保留原始输入的大小写格式
    location_lower = location.strip().lower()
    return PLACE_ALIASES.get(location_lower, location)

# 先标准化再查询
normalized = normalize_location('nyc')
print(geo_classify(normalized))

4. 本地数据集匹配（极致精准）

如果需要100%覆盖美国的城市-州对应关系，可以下载SimpleMaps的美国城市数据集（免费），加载到Pandas中做本地匹配。这个数据集包含所有美国城市的名称、所属州、邮编、经纬度等信息，完全离线，匹配速度快。

简单示例：

import pandas as pd

# 加载数据集（假设已经下载到本地）
cities_df = pd.read_csv('us_cities.csv')

def get_state_by_city(city_name):
    # 模糊匹配处理别名
    matches = cities_df[cities_df['city'].str.contains(city_name, case=False)]
    if not matches.empty:
        # 返回最匹配的州名
        return matches.iloc[0]['state_name']
    return None

print(get_state_by_city('New York'))  # 输出：New York