如何在Python中正确拆分含干扰数字的产品名称与评分字符串？

阿华AIGC实验室

2026-5-8

解决产品名称与末尾评分的拆分问题

嘿，这个问题太常见了——当产品名称里夹杂数字时，简单的字符串拆分肯定会踩坑，它根本没法区分产品名里的数字和末尾固定格式的评分。咱们换用正则表达式就能精准搞定，因为它能锁定字符串末尾那套固定的评分格式。

核心思路

评分的格式是固定的：X out of 5（X可以是整数或小数，比如4.8、5、3.5），而且它一定在字符串的最后。我们可以用正则表达式精准匹配这个末尾的评分模块，再拆分出前面的产品名称。

推荐的正则表达式：

(\d+\.?\d*) out of 5$

拆解一下这个规则：

\d+：匹配评分的整数部分（至少1个数字）
\.?：可选的小数点（兼容整数/小数评分）
\d*：小数点后的数字（如果是整数评分，这部分就为空）
out of 5：匹配固定的评分后缀
$：强制匹配字符串的末尾，彻底避免误匹配产品名里类似的片段

代码实现示例

方法1：用`re.search`定位评分位置

import re

# 测试用的产品列表
catalogue = [
    "Moooni Modern Rectangular Raindrop Crystal Chandelier Ceiling Lighting Fixture Rectangle Pendant Flush Mount LED Light for Dining Room L 40x W 12x H 31.54.8 out of 5",
    "Vintage Wooden Table Lamp 5 out of 5",
    "Smart Bluetooth Speaker 2.1 3.7 out of 5"
]

for item in catalogue:
    # 查找末尾的评分
    match_result = re.search(r'(\d+\.?\d*) out of 5$', item)
    if match_result:
        # 提取评分并转成浮点型
        rating = float(match_result.group(1))
        # 截取评分之前的内容作为产品名称，去掉多余空格
        product_name = item[:match_result.start()].strip()
        
        print(f"产品名称：{product_name}")
        print(f"评分：{rating}\n")
    else:
        print(f"未识别到评分格式：{item}\n")

方法2：用`re.split`直接拆分字符串

import re

for item in catalogue:
    # 仅拆分1次（避免产品名内的干扰）
    parts = re.split(r' (\d+\.?\d*) out of 5$', item, maxsplit=1)
    if len(parts) == 2:
        product_name = parts[0].strip()
        rating = float(parts[1])
        
        print(f"产品名称：{product_name}")
        print(f"评分：{rating}\n")
    else:
        print(f"未识别到评分格式：{item}\n")

这两种方法都能完美绕过产品名里的数字/小数点干扰，只要评分是末尾的X out of 5格式，就能100%精准拆分。

内容的提问来源于stack exchange，提问作者Amrit G