Python2.7处理CSV文件遇UnicodeDecodeError问题求助
Hey there! Let's break down why you're hitting this UnicodeDecodeError and how to fix it.
The Root Cause
In Python 2.7, the csv.DictReader returns byte strings (the str type) when reading your CSV file. When you call .encode('utf-8') on a byte string, Python automatically tries to first decode it to Unicode using the default ASCII encoding—and if your CSV has any non-ASCII characters (like accents, emojis, or non-English text), this default decoding fails hard, throwing the error you're seeing.
The Fixes
The core rule here is: Decode first to get Unicode, then encode if you need byte strings later. Here are a few actionable solutions:
1. Decode with the CSV's Actual Encoding
First, figure out what encoding your Book2.csv uses (common ones are utf-8, cp1252 for Windows, or gbk for Chinese text). Then explicitly decode the byte string to Unicode:
TEST_SENTENCES = [] with open('Book2.csv', 'rb') as csvfile: reader = csv.DictReader(csvfile) for row in reader: # Replace 'utf-8' with your CSV's actual encoding unicode_tweet = row["Tweet"].decode('utf-8') TEST_SENTENCES.append(unicode_tweet)
2. If You Need UTF-8 Byte Strings
If your downstream program expects UTF-8 encoded byte strings, decode first then encode properly:
TEST_SENTENCES = [] with open('Book2.csv', 'rb') as csvfile: reader = csv.DictReader(csvfile) for row in reader: # Step 1: Decode to Unicode unicode_tweet = row["Tweet"].decode('utf-8') # Step 2: Encode to UTF-8 byte string encoded_tweet = unicode_tweet.encode('utf-8') TEST_SENTENCES.append(encoded_tweet)
3. Fallback with Latin-1 (If You Don't Know the Encoding)
If you're unsure of the CSV's encoding, latin-1 is a safe fallback because it can decode every possible byte without throwing errors (though it might not display characters correctly, it avoids crashes):
unicode_tweet = row["Tweet"].decode('latin-1')
Quick Note
Python 2's CSV handling is notoriously finicky with Unicode compared to Python 3. Sticking to the "decode early, encode late" workflow will save you a lot of headaches with text encoding issues.
内容的提问来源于stack exchange,提问作者Patrick Reid




