同一设备Windows与WSL Ubuntu的Python CPU计算性能差异原因探究
问题背景
我编写了一段测试代码,用来对比Python 3中单线程、多线程和多进程的性能表现,在搭载AMD Ryzen 3 3200G(3.6GHz,4核4逻辑处理器)的同一设备上,分别在Windows 10 Powershell和WSL Ubuntu 18.04终端运行后,发现Ubuntu环境下三种计算方式的耗时均比Windows少1秒以上,想知道为什么会出现这种性能差异。
测试代码
# import libraries from multiprocessing import Pool import time import threading def calculate_sum_upto(n): sum = 0 for i in range(n): sum += i # print("Sum : " + str(sum)) def test_all(limit): print("\nFor sum of series upto : " + str(limit)) # Define input case, that is an array of numbers array_of_numbers = [limit for i in range(8)] # Adding time for performace calculation start_time_1 = time.time() # First, let's try using raw approach # print("\nStarting Raw approach...\n") for num in array_of_numbers: calculate_sum_upto(num) # print("result obtained using raw approach : " + str(super_sum_raw)) # print("\nRaw approach finished.") end_time_1 = time.time() start_time_2 = time.time() # Now trying using parallel processing # print("\n\nStarting multiprocessing approach...\n") pool = Pool() super_sum_optimized_values = pool.map(calculate_sum_upto, array_of_numbers) pool.close() pool.join() # print("result obtained using parallel processing approach : " + str(super_sum_optimized)) # print("\nParallel Processing approach finished.") end_time_2 = time.time() start_time_3 = time.time() # Trying using general threading approach # print("\n\nStarting Threading approach...\n") thread_array = [threading.Thread(target=calculate_sum_upto, args=(num,)) for num in array_of_numbers] for thread in thread_array: thread.start() for thread in thread_array: thread.join() # print("\nThreading approach finished.\n\n") end_time_3 = time.time() # Printing results print("\nRaw approach : {:10.5f}".format(end_time_1 - start_time_1)) print("Multithreading approach : {:10.5f}".format(end_time_3 - start_time_3)) print("Multiprocessing approach : {:10.5f}".format(end_time_2 - start_time_2)) if __name__ == "__main__": # print("This test bench records time for calculating sum of series upto n terms for 4 numbers using 3 approaches : \n1 : Linear calculation for each number one after the other.\n2 : Calculating sum of series for 4 numbers on 4 different threads.\n3 : Calculating sum of series for 4 numbers on 4 different processes.") # print("For simplicity, all 4 numbers have the same value, i.e. sum of series upto n terms for m, 4 times.") n = 10000 # for i in range(5): # test_all(n) # n *= 10 test_all(10000000) print("\n\nEnd of test.")
测试结果
Windows环境:
For sum of series upto : 10000000
Raw approach : 5.08537
Multithreading approach : 5.52041
Multiprocessing approach : 1.40911
WSL Ubuntu环境:
For sum of series upto : 10000000
Raw approach : 3.60763
Multithreading approach : 3.70080
Multiprocessing approach : 0.93371
性能差异的核心原因
这背后主要是操作系统底层机制、Python解释器实现以及WSL的优化方向共同作用的结果,具体可以拆解为以下几点:
Python解释器的平台适配差异
Windows上默认的CPython解释器是针对NT内核开发的版本,而WSL中使用的是Linux原生CPython。Linux版本在字节码执行、系统调用交互上更贴合Linux内核的特性,减少了跨平台适配的额外开销——像你这种纯CPU密集的循环计算,这种底层效率的差异会直接体现在耗时上。系统调度与资源分配的区别
Linux的CFS(完全公平调度器)在处理CPU密集型任务时,调度效率和资源分配策略比Windows的调度器更高效。尤其是多进程场景下,Linux的进程创建、上下文切换开销更低,能更充分地利用多核CPU;而Windows本身后台运行的系统服务、进程更多,会分流一部分CPU资源,同时进程调度的额外开销也更高。如果你的WSL是版本2,它基于轻量级虚拟机直接使用Linux内核,资源利用效率更接近原生Linux。内存管理机制的优化倾向
Linux的内存管理器在连续内存分配、缓存策略上更偏向计算密集型任务的需求,解释器运行时的内存访问效率更高。而Windows的内存管理需要兼顾桌面应用、图形界面等多种场景,在纯计算任务上的针对性优化不如Linux。WSL的针对性性能优化
微软对WSL 2做了大量性能优化,尤其是CPU和内存层面,让Linux环境能接近原生Linux的运行效率。而Windows上的Python环境,在系统调用、文件交互等层面需要经过更多的转换层,带来了额外的性能损耗。
额外补充
值得注意的是,你的测试是典型的CPU密集型任务,刚好是Linux的强项。如果换成IO密集型任务(比如文件读写、网络请求),两者的性能差异会明显缩小,甚至Windows在某些IO场景下表现更好。另外,两个系统中多进程的性能提升都远高于多线程,这也符合Python GIL(全局解释器锁)的特性——多线程在CPU密集任务中无法突破GIL限制利用多核,而多进程可以绕过GIL,充分发挥多核CPU的优势。
内容的提问来源于stack exchange,提问作者Shivang Gangadia




