Python列表推导式:移除顺序不同的重复元组
解决汉堡组合重复计数的问题
你碰到的核心问题是对称重复计数:你的列表推导式把上下面包颠倒的同一种汉堡当成了两个不同的组合,比如('Weissbrot', 'Wildschwein', 'Kaese', 'Kopfsalat', 'Vollkorn')和('Vollkorn', 'Wildschwein', 'Kaese', 'Kopfsalat', 'Weissbrot'),本质是同一个汉堡,但被算了两次,所以总数从正确的138翻倍成了276。
最直接的解决方案:从生成阶段避免重复
我们只需要在列表推导式中添加一个条件,强制上下面包的顺序符合某种固定规则(比如字符串的字典序),这样每一对不同的面包组合只会被生成一次。具体来说,把原来的if bottom != top替换成if bottom < top(利用字符串的字典序比较),这样就不会生成颠倒的重复项了。
修改后的代码如下:
breads = ["Weissbrot", "Vollkorn", "Dinkel", "Speckbrot"] patties = ["Wildschwein", "Rind", "Halloumi", "Aubergine"] souces = ["Kaese", "Knoblauch", "Curry"] toppings = ["Kopfsalat", "Bacon", "Tomate"] burger = [(bottom, patty, souce, topping, top) for bottom in breads for patty in patties for souce in souces for topping in toppings for top in breads if bottom < top # 关键:强制面包顺序,避免对称重复 if (bottom, patty) != ("Speckbrot", "Aubergine") and (top, patty) != ("Speckbrot", "Aubergine") and (patty, souce) != ("Aubergine", "Kaese") and (patty, topping) != ("Aubergine", "Bacon") and (patty, bottom) != ("Halloumi", "Speckbrot") and (patty, top) != ("Halloumi", "Speckbrot") and (patty, topping) != ("Halloumi", "Bacon")] print(len(burger)) # 现在输出138,符合预期
为什么这个方法有效?
- 原来的推导式会为每一对不同的
(bottom, top)生成两次组合(正序和逆序),而bottom < top的条件会只保留其中一个顺序,直接从根源上消除了重复。 - 这个方法比生成后再去重(比如用集合)效率更高,因为它不会生成多余的重复项,减少了计算量。
另一种思路:生成后标准化去重
如果你不想修改生成逻辑,也可以先保留所有组合,再将每个汉堡的上下面包标准化(比如排序后组成新的元组),然后用集合去重:
# 先生成所有符合条件的组合(包括重复项) raw_burgers = [(bottom, patty, souce, topping, top) for bottom in breads for patty in patties for souce in souces for topping in toppings for top in breads if bottom != top if (bottom, patty) != ("Speckbrot", "Aubergine") and (top, patty) != ("Speckbrot", "Aubergine") and (patty, souce) != ("Aubergine", "Kaese") and (patty, topping) != ("Aubergine", "Bacon") and (patty, bottom) != ("Halloumi", "Speckbrot") and (patty, top) != ("Halloumi", "Speckbrot") and (patty, topping) != ("Halloumi", "Bacon")] # 标准化每个汉堡:把上下面包排序,生成唯一标识 unique_burgers = list({tuple(sorted((b, t)) + list(mid)) for b, *mid, t in raw_burgers}) print(len(unique_burgers)) # 同样输出138
不过这种方法需要额外的内存存储所有重复项,效率不如第一种方法,更适合小数据集。
内容的提问来源于stack exchange,提问作者badatprog




