如何在Python(含Django程序)中查找死锁线程?
嘿,我来帮你梳理下Python(Django)里排查死锁线程的办法~你提到Java里有ThreadMXBean能直接定位死锁,确实Python没有这么直接的内置工具,但我们有几个实用的方案,下面给你详细说:
1. 用faulthandler快速打印所有线程栈(最简单高效)
faulthandler是Python 3.3+自带的标准库,能一键输出所有线程的调用栈,从栈信息里就能一眼看出哪些线程卡在获取锁的步骤上,是排查死锁的首选工具。
用法:
在Django项目的启动入口(比如wsgi.py或者manage.py)里添加这段代码:
import faulthandler import signal # 设置触发信号:按下Ctrl+\(SIGQUIT)时自动打印所有线程栈 faulthandler.register(signal.SIGQUIT)
当你怀疑出现死锁时,在运行项目的终端按下Ctrl+\,就能看到所有线程的详细栈信息。比如你的测试死锁场景里,会清晰看到两个线程分别卡在a_lock.acquire()和b_lock.acquire()的位置,这就是死锁的典型特征。
2. 手动分析线程栈与锁的持有关系(适合自动化检测)
如果需要主动检测死锁而不是手动触发,可以利用sys._current_frames()获取每个线程的栈帧,结合threading模块的锁状态来判断。下面是一个实用的检测函数:
import sys import threading import traceback from threading import Lock def detect_deadlocks(): # 先收集所有已被线程持有的锁 lock_owners = {} for name, obj in vars(threading).items(): if isinstance(obj, Lock) and hasattr(obj, '_owner'): owner_thread = obj._owner if owner_thread: lock_owners[obj] = owner_thread.ident deadlocked_thread_ids = set() # 遍历每个线程的栈帧,检查是否存在互相等待锁的情况 for thread_id, frame in sys._current_frames().items(): current_thread = threading._active.get(thread_id) if not current_thread: continue # 回溯栈帧,查找当前线程正在等待的锁 while frame: if frame.f_code.co_name == 'acquire' and 'self' in frame.f_locals: waiting_lock = frame.f_locals['self'] if isinstance(waiting_lock, Lock) and waiting_lock._owner != thread_id: # 检查持有这个锁的线程是否也在等待当前线程持有的锁 owner_id = lock_owners.get(waiting_lock) if owner_id and owner_id in sys._current_frames(): owner_frame = sys._current_frames()[owner_id] while owner_frame: if owner_frame.f_code.co_name == 'acquire' and 'self' in owner_frame.f_locals: owner_waiting_lock = owner_frame.f_locals['self'] if isinstance(owner_waiting_lock, Lock) and owner_waiting_lock._owner == thread_id: deadlocked_thread_ids.add(thread_id) deadlocked_thread_ids.add(owner_id) break owner_frame = owner_frame.f_back frame = frame.f_back if deadlocked_thread_ids: print("检测到死锁线程ID:", deadlocked_thread_ids) # 打印死锁线程的栈信息 for tid in deadlocked_thread_ids: print(f"\n线程ID {tid} 的调用栈:") traceback.print_stack(sys._current_frames()[tid]) else: print("未检测到死锁")
你可以把这个函数放到Django的管理命令里,或者在view中按需调用,实现自动化死锁检测。注意_owner是Lock的私有属性,不同Python版本可能有细微差异,使用时注意兼容性。
3. 针对你的测试代码的排查示例
我把你没写完的测试代码补全了,方便理解排查场景:
import threading import time from django.http import HttpResponse a_lock = threading.Lock() b_lock = threading.Lock() def thread1(): with a_lock: time.sleep(1) # 让线程2先获取b_lock with b_lock: print("线程1拿到两个锁") def thread2(): with b_lock: time.sleep(1) # 让线程1先获取a_lock with a_lock: print("线程2拿到两个锁") def deadlock(request): t1 = threading.Thread(target=thread1) t2 = threading.Thread(target=thread2) t1.start() t2.start() return HttpResponse("测试线程已启动")
当访问这个view触发死锁后,用faulthandler打印栈,会看到两个线程分别卡在等待对方持有的锁的位置,这就是死锁的直接证据。
总结
Python没有Java那样一键获取死锁线程的API,但通过faulthandler快速打印栈,或者手动分析线程栈与锁的持有关系,都能有效定位死锁。在Django项目中,优先推荐faulthandler,简单高效;如果需要自动化检测,可以用第二种方法编写工具函数。
内容的提问来源于stack exchange,提问作者Kohei TAMURA




