You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Jupyter中运行Spark词频统计代码时出现TypeError: <lambda>()缺少必需位置参数'y'的问题求助

Fixing the TypeError When Reversing Key-Value Pairs in Spark

Let’s break down what’s causing your error and get your word count output in the exact format you want.

The Root Cause of the Error

The problem hits this line of code:

sortedwords = wordcount.map(lambda x,y: (y,x)).sortByKey()

When you use map() on wordcount, each element passed to the lambda is a single tuple (like ('self', 111)), not two separate arguments. But your lambda is defined to expect two parameters (x and y), which is why you get the TypeError: <lambda>() missing 1 required positional argument: 'y' — Spark is sending one value (the tuple) but your lambda is asking for two.

The syntax error with lambda x,y : y,x happens because you forgot to wrap the return value in parentheses. Python interprets that as two separate expressions instead of a single tuple.

The Correct Lambda Syntax

You have two reliable ways to fix this:

  1. Index into the tuple (universally compatible across Python versions):
    Access the first and second elements of the tuple directly using square brackets. This is straightforward and easy to read:

    sortedwords = wordcount.map(lambda item: (item[1], item[0])).sortByKey()
    
  2. Destructure the tuple in the lambda (concise, note compatibility):
    If you prefer a cleaner style, you can unpack the tuple in the lambda parameter. This works in Python 2, and in Python 3 you can adjust to unpack inside the lambda body:

    # Python 2 compatible
    sortedwords = wordcount.map(lambda (word, count): (count, word)).sortByKey()
    

    That said, the index method is safer for cross-version compatibility.

Full Corrected Code

Here’s your complete script with the fix applied:

import re
def normalizewords(text):
    return re.compile(r'\W+', re.UNICODE).split(text.lower())

inputs = sc.textFile('Book.txt')
words = inputs.flatMap(normalizewords)
wordcount = words.map(lambda x: (x, 1)).reduceByKey(lambda x,y : x + y)
# Fixed line below
sortedwords = wordcount.map(lambda item: (item[1], item[0])).sortByKey()
sortedwords.collect()

Expected Output

After applying the fix, sortedwords.collect() will return exactly the format you requested:

[(111, 'self'), (75, 'employment'), (33, 'building'), (178, 'an'), (26, 'internet'), (383, 'business'), (970, 'of'), (100, 'one')]

内容的提问来源于stack exchange,提问作者BetaTester

火山引擎 最新活动