Python实现文本Top10高频单词功能求助，本人熟悉C#/Java/C++

阿华AIGC实验室

2026-5-20

解决Python统计文本Top10高频单词的排序问题

Hey there! 作为从C#/Java/C++转Python的新手，你这个思路完全在线——用字典统计词频是最直接的入门方案，卡在排序上太正常了，毕竟Python里字典的排序逻辑和你熟悉的静态语言确实不太一样😉

先帮你把整个流程补全，重点解决排序的问题：

第一步：完成词频统计（你的思路已经很对，这里优化下写法）

首先我们得先把文本预处理一下（比如统一小写、去掉标点，避免"Hello"和"hello"被当成不同单词），然后统计每个单词的出现次数：

# 示例文本，替换成你自己的内容
text = "Python is great, python is easy! I love python, it's better than C++ sometimes."

# 预处理：转小写、分割单词、去除首尾标点
words = [word.strip('.,!?\'\"') for word in text.lower().split()]

# 统计词频（两种写法，选你顺手的）
# 写法1：你原来的思路，更直观
word_counts = {}
for word in words:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

# 写法2：更Pythonic的简化版，用dict.get()避免判断
word_counts = {}
for word in words:
    word_counts[word] = word_counts.get(word, 0) + 1

第二步：解决排序问题，提取Top10高频词

Python的普通字典（3.7+版本是插入有序）默认不会按值排序，所以我们需要把字典转换成可排序的列表，再用sorted()函数指定排序规则：

# 按词频降序排序，返回(单词, 次数)的列表
sorted_word_counts = sorted(word_counts.items(), key=lambda item: item[1], reverse=True)

# 取前10个
top_10_words = sorted_word_counts[:10]

# 打印结果
print("Top 10最频繁单词：")
for word, count in top_10_words:
    print(f"- {word}: {count}次")

这里解释下核心部分：

word_counts.items()把字典转换成键值对的列表，比如[('python',3), ('is',2), ...]
key=lambda item: item[1]告诉sorted()：根据每个元素的第二个值（也就是词频）来排序
reverse=True表示降序排列，这样词频最高的排在最前面

进阶偷懒：用Python标准库直接搞定

如果你想少写代码，Python的collections.Counter是专门用来做频次统计的工具，一步到位：

from collections import Counter

# 同样先预处理得到words列表
top_10_words = Counter(words).most_common(10)

# 直接打印结果
for word, count in top_10_words:
    print(f"- {word}: {count}次")

Counter.most_common(n)方法会直接返回前n个频次最高的元素，新手用这个既高效又不容易出错。