如何在Python中解码JavaScript Unicode字符串？解决双心符号编码报错

阿华AIGC实验室

2026-5-22

Fixing UnicodeEncodeError for the TWO HEARTS Emoji in Python

Got it, let's break down why you're hitting that error and how to get the proper 💕 emoji working smoothly in Python.

What's Causing the Surrogate Error?

The root issue here is that your string a isn't the valid UTF-8 representation of the TWO HEARTS emoji—it’s probably made up of Unicode surrogate pairs.

Surrogate pairs are a UTF-16 workaround for characters that need 4 bytes (like most emojis). JavaScript often splits these 4-byte characters into two 2-byte "surrogate" characters (for example, \ud83d + \udc95 for the two hearts emoji). When Python tries to encode these unpaired/standalone surrogates as UTF-8, it throws an error because UTF-8 doesn’t recognize individual surrogate characters as valid.

How to Fix It

Here are two straightforward ways to convert those surrogate pairs into the correct emoji:

1. Use `codecs` to Resolve Surrogate Pairs

Python’s codecs module has built-in support for handling surrogate pairs by treating them as UTF-16. Here’s the code:

import codecs

# If your input string is the surrogate pair version (e.g., from JavaScript)
surrogate_str = '\ud83d\udc95'
fixed_emoji = codecs.decode(surrogate_str, 'utf-16', 'surrogatepass')

print(fixed_emoji)  # Output: 💕
# Now encoding/decoding works without errors
print(fixed_emoji.encode('utf-8').decode('utf-8'))  # Output: 💕

The surrogatepass flag tells the UTF-16 decoder to combine the two surrogate characters into the single 4-byte Unicode emoji they’re meant to represent.

2. Fix the Source (Preventative Measure)

If you’re pulling this string from a JavaScript source, make sure the data is sent as valid UTF-8 instead of surrogate pairs. In JavaScript:

Use encodeURIComponent() when sending the string over HTTP—this encodes it as proper UTF-8 that Python can decode directly.
Or convert the string to a UTF-8 byte array before sending, so Python doesn’t have to deal with surrogates at all.

Why Your Original Code Failed

When you ran a.encode('utf-8').decode('utf-8'), if a contained unpaired surrogate characters, UTF-8 couldn’t encode them—surrogates are only valid in UTF-16 as pairs. Once you convert the surrogate pair to the actual emoji character, the encode/decode cycle works flawlessly because it’s a valid UTF-8 character.

内容的提问来源于stack exchange，提问作者Mike