为何设置n_jobs=2的sklearn.linear_model.LogisticRegression性能极差？

阿华AIGC实验室

2026-5-26

Why n_jobs=2 Hurts Your Logistic Regression Performance on MNIST

Great question! Let's break down what's happening here and how to fix it—this ties into some easy-to-miss details about scikit-learn's LogisticRegression behavior.

1. `n_jobs` doesn't help (and can harm) your current solver/strategy

When you switched to solver='lbfgs', scikit-learn automatically uses the multinomial multi-class strategy (since multi_class='auto' defaults to this for non-liblinear solvers). The multinomial approach optimizes a single loss function across all 10 MNIST classes—it can't be split into independent sub-tasks for parallelization.

Setting n_jobs=2 here doesn't speed up training at all. Worse, it triggers scikit-learn's joblib parallel framework unnecessarily, adding overhead like process scheduling and data copying that can disrupt the lbfgs solver's convergence. This can lead the model to settle on a worse local minimum, hence the drop in performance.

2. You might have accidentally switched to `ovr` mode

If you modified multi_class='ovr' (one-vs-rest) when setting n_jobs=2, that's another likely culprit. In ovr mode, n_jobs does parallelize training by spinning up separate models for each "class vs all others" task, but MNIST's class balance makes this strategy far less effective than multinomial. Each binary model trains on highly imbalanced data (e.g., 10% "digit 0" vs 90% "not 0"), leading to weaker individual models and worse overall accuracy.

Fixes to Get Your Model Back on Track

Let's resolve this quickly:

Explicitly lock in the multinomial strategy: This ensures you're using the same setup that worked well initially. Since parallelism doesn't help here, set n_jobs=1 (or omit it entirely). Your code would look like:
```
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver='lbfgs', multi_class='multinomial', max_iter=1000)
```
Adjust resources if you insist on parallel training: If you want to use n_jobs, switch to multi_class='ovr' and use a GCE instance with enough physical cores (e.g., n1-standard-2 with 2 vCPUs). Just note this strategy may never match the multinomial mode's performance on MNIST.
Increase iteration limits: If convergence was disrupted by parallel overhead, bumping max_iter to 500 or 1000 gives the solver more time to find a strong minimum, even with minor overhead.