如何捕获scipy.sparse.linalg.spsolve()抛出的Windows致命异常(访问违规)
问题描述
我在循环里用scipy.sparse.linalg.spsolve()求解迭代更新的线性方程组,之前遇到奇异矩阵问题时,用try-except块加Tikhonov正则化就能解决,但现在碰到了一个完全抓不住的崩溃,报错信息如下:
Windows fatal exception: access violation
Main thread:
Current thread 0x00000270 (most recent call first):
File "...\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py", line 203 in spsolve
File "...\curvefit.py", line 401 in solve_linear_system_with_regularization
File "...\curvefit.py", line 320 in smoothing_spline
File "...\curvefit.py", line 580 in run_curvefit
File "...\fitSplinesShapeConstraints.py", line 53 in fitSplinesShapeConstraints
File "...\main.py", line 107 in
...(后续调用栈省略)
Restarting kernel...
我现在的代码只能捕获MatrixRankWarning,但这个访问违规异常完全绕开了我的异常处理,想问问该怎么解决这个问题。
我的代码
import numpy as np import scipy.sparse as sp import scipy.sparse.linalg as la from scipy.optimize import minimize_scalar from scipy.sparse import identity import scipy.sparse from scipy.sparse.linalg.dsolve.linsolve import MatrixRankWarning # Import the custom warning class import warnings def solve_linear_system_with_regularization(A, bb): regularization_range = np.logspace(-20, 15, 36) # Regularization parameter range best_residual_norm = np.inf best_solution = None best_reg_param = None consecutive_error_increases_regularized_solution = 0 with warnings.catch_warnings(): # Convert warnings to exceptions warnings.simplefilter("error", MatrixRankWarning) try: parameters = la.spsolve(A, bb) return parameters except MatrixRankWarning as e: # If the matrix is singular, apply Tikhonov regularization if 'singular' in str(e): print("Matrix is singular. Applying Tikhonov regularization in an attempt to solve anyways...") for reg_param in regularization_range: try: print('trying with regularization parameter: ' + str(reg_param)) # Try solving the regularized system # Create a sparse identity matrix with the same shape as A identity_matrix = identity(A.shape[0], format='csr') A_reg = A + reg_param * identity_matrix x_reg = la.spsolve(A_reg, bb) # Regularized solution # Calculate the dot product result dot_product_result = np.dot(A.toarray(), x_reg) # Calculate the residual norm (error) of the solution residual_norm = np.linalg.norm(dot_product_result - bb) # Check if the residual norm is increasing if residual_norm > best_residual_norm: consecutive_error_increases_regularized_solution += 1 else: consecutive_error_increases_regularized_solution = 0 # Update best solution if the residual norm is smaller if residual_norm < best_residual_norm: best_solution = x_reg best_reg_param = reg_param best_residual_norm = residual_norm except MatrixRankWarning: # Skip if regularization causes singularity continue # Break out of the loop if residual_norm is increasing three times if consecutive_error_increases_regularized_solution >= 3: print('The error from adding more regularization seems to be increasing for the last 3 increases of the regularization parameter') break if best_solution is None: raise Warning("Unable to find a suitable regularization parameter.") print("The system of equations has been solved with a regularization parameter of:", best_reg_param) return best_solution else: # If it's another type of error, raise it # I never reach this point... raise e
解决方案思路
这种Windows访问违规属于底层C/C++层面的异常,它跳过了Python的异常机制直接触发系统级错误,常规try-except根本抓不到。可以从这几个方向入手解决:
1. 提前排查矩阵的异常情况
访问违规通常是矩阵本身有问题导致底层求解器崩溃,调用spsolve前先做这些检查:
- 验证维度匹配:确保
A.shape[0] == len(bb),避免维度不匹配引发的底层错误 - 强制转换稀疏格式:
spsolve对CSR/CSC格式支持最好,调用前加A = A.tocsr() - 检查数值异常:用
np.isinf(A.data).any()和np.isnan(A.data).any()排查是否有无限值或NaN,有的话先做清洗或缩放
2. 换用更健壮的迭代求解器
spsolve是直接调用LU分解的直接求解器,对病态矩阵容错性差,可以换成迭代求解器比如gmres或bicgstab,它们不仅鲁棒性更强,还会抛出Python层面的异常,更容易捕获:
# 示例:用gmres替代spsolve处理正则化后的矩阵 try: x_reg, info = la.gmres(A_reg, bb, tol=1e-6) if info == 0: # 求解成功的标志 # 后续计算残差等逻辑 pass else: # 求解未收敛,跳过当前正则化参数 continue except Exception as e: print(f"gmres求解失败: {str(e)}") continue
3. 捕获系统级异常(不推荐,仅作最后手段)
可以用ctypes模块捕获Windows的系统异常,但这属于hack手段,跨平台兼容性极差,只建议在Windows环境下临时用:
import ctypes # 禁用系统错误弹窗,避免程序卡死 ctypes.windll.kernel32.SetErrorMode(0x0001) # 定义Windows访问违规的错误码 SEH_ACCESS_VIOLATION = 0x80000003 try: x_reg = la.spsolve(A_reg, bb) except Exception as e: # 先处理常规Python异常 pass except ctypes.WinError as e: if e.winerror == SEH_ACCESS_VIOLATION: print("捕获到访问违规,跳过当前正则化参数") continue else: # 其他系统异常直接抛出 raise e
4. 保存崩溃时的矩阵用于调试
当崩溃发生时,把当前的矩阵和向量存到文件,后续单独加载分析问题根源:
try: x_reg = la.spsolve(A_reg, bb) except: # 保存崩溃现场的矩阵和向量 np.save("crash_A_reg.npy", A_reg.toarray()) np.save("crash_bb.npy", bb) # 重新抛出异常方便定位 raise
加载后可以检查矩阵是否有全零行、极端值等异常,针对性修复。
备注:内容来源于stack exchange,提问作者Brian




