Windows11下PySpark运行时Python Worker意外退出(连接重置)求助
大家好,我在Windows11上跑PySpark测试代码时卡壳了,Python Worker直接意外退出还报了连接重置的错,折腾好一阵没搞定,来求各位大佬支支招!
我的环境信息
- 操作系统:Windows 11
- Apache Spark:4.0.1
- Java:17.0.12 2024-07-16 LTS
- 我指定的Python版本:3.11.9
- Hadoop:3.3.6
- 运行环境:虚拟环境(.venv)
- 注意:报错里居然出现了Python3.13的系统路径,这点我也很疑惑...
复现代码
from pyspark.sql import SparkSession import os os.environ["PYARROW_IGNORE_TIMEZONE"] = "1" os.environ["SPARK_LOCAL_IP"] = "127.0.0.1" os.environ["SPARK_LOCAL_HOSTNAME"] = "localhost" spark = SparkSession.builder.appName("PySpark Test").getOrCreate() data = [("Alice", 1), ("Bob", 2)] df = spark.createDataFrame(data, ["Name", "ID"]) df.show()
完整报错信息
(.venv) PS D:\coding\source\test_tasks\test_tasks\products_spark> python -m src.test
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception ignored in: <_io.BufferedRWPair object at 0x00000186DF5A5880>
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.2032.0_x64__qbz5n2kfra8p0\Lib\socket.py", line 737, in write
OSError: [WinError 10038] 尝试对非套接字执行操作
25/09/28 18:29:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed). Consider setting 'spark.sql.execution.pyspark.udf.faulthandler.enabled' or 'spark.python.worker.faulthandler.enabled' configuration to 'true' for the better Python traceback.
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:621)
...
Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
... 26 more
我已经试过的操作
- 手动指定了
SPARK_LOCAL_IP和SPARK_LOCAL_HOSTNAME为本地地址,避免网络连接问题 - 开启了
PYARROW_IGNORE_TIMEZONE,排除时区相关的PyArrow兼容问题 - 确认虚拟环境激活正常,依赖包也都装全了
疑问&求助点
- 明明我用的是Python3.11.9的虚拟环境,为什么报错里会出现系统自带的Python3.13路径?会不会是Spark默认调用了系统Python导致版本冲突?
- WinError 10038这个“非套接字操作”的错误,在PySpark场景下一般是什么原因导致的?
- 有没有什么针对性的配置或者排查方法能定位到Python Worker崩溃的具体原因?
麻烦各位大佬帮忙看看,实在是卡在这里动不了了😭




