如何通过Pandas DataFrame分组获取满足条件的课程统计结果

如何通过Pandas DataFrame分组获取满足条件的课程统计结果

阿华AIGC实验室

2026-5-26

Pandas分组统计问题解决方案

嘿，我来帮你搞定这两个分组统计问题！咱们一步步拆解，不仅能拿到最大值，还能获取对应的分组信息：

问题1：找出注册学生人数最多的course

思路是先按课程分组统计学生数，再定位到人数最多的那个课程：

计算每个course的注册学生数（如果存在同一学生重复注册同一课程的情况，把count()换成nunique()做去重统计）：

# 假设你的DataFrame名为df
course_student_counts = df.groupby('course')['student_id'].count()

获取人数最多的course及其对应人数：

# 拿到人数最多的课程名称
top_course = course_student_counts.idxmax()
# 拿到对应的人数
top_course_student_num = course_student_counts.max()

print(f"注册学生最多的课程是：{top_course}，总人数：{top_course_student_num}")

如果想查看所有课程的学生数排序结果，可以加上：

# 按学生数降序排列所有课程
sorted_course_counts = course_student_counts.sort_values(ascending=False)
print(sorted_course_counts)

问题2：在学生人数最多的前两个section中，统计哪个course的注册学生人数最多

分三步来实现：

先找出学生人数最多的前2个section：

# 统计每个section的学生数
section_student_counts = df.groupby('section')['student_id'].count()
# 取学生数最多的前2个section的名称
top_two_sections = section_student_counts.nlargest(2).index.tolist()

筛选出这两个section的所有数据：

top_sections_data = df[df['section'].isin(top_two_sections)]

在筛选后的数据集里，统计每个course的学生数并找出最多的那个：

course_in_top_sections_counts = top_sections_data.groupby('course')['student_id'].count()
top_course_in_sections = course_in_top_sections_counts.idxmax()
top_course_in_sections_num = course_in_top_sections_counts.max()

print(f"学生最多的前两个section中，注册人数最多的课程是：{top_course_in_sections}，总人数：{top_course_in_sections_num}")

小提示：如果存在多个section学生数并列第一的情况，nlargest(2)会包含所有并列最大的section（比如有3个section都是10人，会返回这3个）。如果要严格取前2个（不管并列），可以改用section_student_counts.sort_values(ascending=False).head(2).index.tolist()。

内容的提问来源于stack exchange，提问作者Chittu

火山引擎最新活动

方舟 Coding Plan

模型自由，工具不限，最新支持 DeepSeek-V4 系列与 GLM-5.1，受邀下单叠加9.5折

ArkClaw

7×24在线专属智能伙伴

Seedance 2.0 全面开放 API

创作无限可能，一键生成电影级 AI 视频

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠