You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

GeoPandas to_file报utf8解码错误求助(Python2.7)

Fixing UTF-8 Decode Error When Using GeoPandas to_file() in Python 2.7

Hey there, let's work through this encoding error you're facing—no need to hunt for that ogrext.pyx file (it's a compiled Cython source file, so you won't find it as plain text in your installed packages anyway). Here's what's going on and how to fix it:

What's Causing the Error?

The 'utf8' codec can't decode byte 0xb9... error means your GeoDataFrame contains string values that aren't encoded in UTF-8. Likely, they're using a Windows/ArcGIS common encoding like GBK or CP1252. Fiona (the library GeoPandas relies on for file I/O) defaults to UTF-8, so it chokes when trying to parse those non-UTF-8 bytes.

Step-by-Step Fixes


1. Specify the Correct Encoding Directly in to_file()

The simplest fix is to tell Fiona which encoding your data uses when saving. Pass an encoding parameter to to_file()—use the encoding that matches your data (common options for Chinese/Windows environments are gbk or cp1252):

df.to_file('psuedo.shp', encoding='gbk')

This skips the UTF-8 decoding step and uses your specified encoding to handle string values properly.

2. Convert Your Data's Encoding to UTF-8 Explicitly

If specifying the encoding doesn't resolve the issue, you can convert all string columns in your GeoDataFrame to UTF-8 directly. In Python 2.7, strings are byte-based, so we'll decode from the original encoding and re-encode to UTF-8:

# Iterate over all columns with string/object data types
for col in df.columns:
    if df[col].dtype == object:
        # Replace 'gbk' with your actual data encoding if needed
        df[col] = df[col].apply(lambda x: x.decode('gbk').encode('utf-8') if isinstance(x, str) else x)

# Save the cleaned data
df.to_file('psuedo.shp')

If you're unsure of the original encoding, try gb2312 or cp1252 as alternatives to gbk.

3. Convert Strings to Unicode (Python 2.7-Specific)

Python 2.7 treats Unicode and byte strings separately. Converting all string values to Unicode can eliminate encoding mismatches:

for col in df.columns:
    if df[col].dtype == object:
        # Replace 'gbk' with your data's actual encoding
        df[col] = df[col].apply(lambda x: unicode(x, 'gbk') if isinstance(x, str) else x)

# Save the data—Fiona will handle Unicode to UTF-8 conversion automatically
df.to_file('psuedo.shp')

Why You Don't Need ogrext.pyx

That file is part of Fiona's internal compiled source code, so you can't modify it directly. The fixes above target your data instead, which is the correct approach for encoding issues like this.

内容的提问来源于stack exchange,提问作者Samantha Leo

火山引擎 最新活动