You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在Python中解码JavaScript Unicode字符串?解决双心符号编码报错

Fixing UnicodeEncodeError for the TWO HEARTS Emoji in Python

Got it, let's break down why you're hitting that error and how to get the proper 💕 emoji working smoothly in Python.

What's Causing the Surrogate Error?

The root issue here is that your string a isn't the valid UTF-8 representation of the TWO HEARTS emoji—it’s probably made up of Unicode surrogate pairs.

Surrogate pairs are a UTF-16 workaround for characters that need 4 bytes (like most emojis). JavaScript often splits these 4-byte characters into two 2-byte "surrogate" characters (for example, \ud83d + \udc95 for the two hearts emoji). When Python tries to encode these unpaired/standalone surrogates as UTF-8, it throws an error because UTF-8 doesn’t recognize individual surrogate characters as valid.

How to Fix It

Here are two straightforward ways to convert those surrogate pairs into the correct emoji:

1. Use codecs to Resolve Surrogate Pairs

Python’s codecs module has built-in support for handling surrogate pairs by treating them as UTF-16. Here’s the code:

import codecs

# If your input string is the surrogate pair version (e.g., from JavaScript)
surrogate_str = '\ud83d\udc95'
fixed_emoji = codecs.decode(surrogate_str, 'utf-16', 'surrogatepass')

print(fixed_emoji)  # Output: 💕
# Now encoding/decoding works without errors
print(fixed_emoji.encode('utf-8').decode('utf-8'))  # Output: 💕

The surrogatepass flag tells the UTF-16 decoder to combine the two surrogate characters into the single 4-byte Unicode emoji they’re meant to represent.

2. Fix the Source (Preventative Measure)

If you’re pulling this string from a JavaScript source, make sure the data is sent as valid UTF-8 instead of surrogate pairs. In JavaScript:

  • Use encodeURIComponent() when sending the string over HTTP—this encodes it as proper UTF-8 that Python can decode directly.
  • Or convert the string to a UTF-8 byte array before sending, so Python doesn’t have to deal with surrogates at all.

Why Your Original Code Failed

When you ran a.encode('utf-8').decode('utf-8'), if a contained unpaired surrogate characters, UTF-8 couldn’t encode them—surrogates are only valid in UTF-16 as pairs. Once you convert the surrogate pair to the actual emoji character, the encode/decode cycle works flawlessly because it’s a valid UTF-8 character.

内容的提问来源于stack exchange,提问作者Mike

火山引擎 最新活动