Jupyter中运行Spark词频统计代码时出现TypeError: <lambda>()缺少必需位置参数'y'的问题求助

阿华AIGC实验室

2026-4-28

Fixing the TypeError When Reversing Key-Value Pairs in Spark

Let’s break down what’s causing your error and get your word count output in the exact format you want.

The Root Cause of the Error

The problem hits this line of code:

sortedwords = wordcount.map(lambda x,y: (y,x)).sortByKey()

When you use map() on wordcount, each element passed to the lambda is a single tuple (like ('self', 111)), not two separate arguments. But your lambda is defined to expect two parameters (x and y), which is why you get the TypeError: <lambda>() missing 1 required positional argument: 'y' — Spark is sending one value (the tuple) but your lambda is asking for two.

The syntax error with lambda x,y : y,x happens because you forgot to wrap the return value in parentheses. Python interprets that as two separate expressions instead of a single tuple.

The Correct Lambda Syntax

You have two reliable ways to fix this:

Index into the tuple (universally compatible across Python versions):
Access the first and second elements of the tuple directly using square brackets. This is straightforward and easy to read:
```
sortedwords = wordcount.map(lambda item: (item[1], item[0])).sortByKey()
```
Destructure the tuple in the lambda (concise, note compatibility):
If you prefer a cleaner style, you can unpack the tuple in the lambda parameter. This works in Python 2, and in Python 3 you can adjust to unpack inside the lambda body:
```
# Python 2 compatible
sortedwords = wordcount.map(lambda (word, count): (count, word)).sortByKey()
```
That said, the index method is safer for cross-version compatibility.

Full Corrected Code

Here’s your complete script with the fix applied:

import re
def normalizewords(text):
    return re.compile(r'\W+', re.UNICODE).split(text.lower())

inputs = sc.textFile('Book.txt')
words = inputs.flatMap(normalizewords)
wordcount = words.map(lambda x: (x, 1)).reduceByKey(lambda x,y : x + y)
# Fixed line below
sortedwords = wordcount.map(lambda item: (item[1], item[0])).sortByKey()
sortedwords.collect()

Expected Output

After applying the fix, sortedwords.collect() will return exactly the format you requested:

[(111, 'self'), (75, 'employment'), (33, 'building'), (178, 'an'), (26, 'internet'), (383, 'business'), (970, 'of'), (100, 'one')]

内容的提问来源于stack exchange，提问作者BetaTester

火山引擎最新活动

方舟 Coding Plan

HOT

模型自由，工具不限，最新支持 DeepSeek-V4 系列与 GLM-5.1，受邀下单叠加9.5折

查看详情

ArkClaw

7×24在线专属智能伙伴

查看详情

Seedance 2.0 全面开放 API

创作无限可能，一键生成电影级 AI 视频

查看详情

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠

查看详情

方舟 Agent Plan