为何设置n_jobs=2的sklearn.linear_model.LogisticRegression性能极差?
n_jobs=2 Hurts Your Logistic Regression Performance on MNIST Great question! Let's break down what's happening here and how to fix it—this ties into some easy-to-miss details about scikit-learn's LogisticRegression behavior.
1. n_jobs doesn't help (and can harm) your current solver/strategy
When you switched to solver='lbfgs', scikit-learn automatically uses the multinomial multi-class strategy (since multi_class='auto' defaults to this for non-liblinear solvers). The multinomial approach optimizes a single loss function across all 10 MNIST classes—it can't be split into independent sub-tasks for parallelization.
Setting n_jobs=2 here doesn't speed up training at all. Worse, it triggers scikit-learn's joblib parallel framework unnecessarily, adding overhead like process scheduling and data copying that can disrupt the lbfgs solver's convergence. This can lead the model to settle on a worse local minimum, hence the drop in performance.
2. You might have accidentally switched to ovr mode
If you modified multi_class='ovr' (one-vs-rest) when setting n_jobs=2, that's another likely culprit. In ovr mode, n_jobs does parallelize training by spinning up separate models for each "class vs all others" task, but MNIST's class balance makes this strategy far less effective than multinomial. Each binary model trains on highly imbalanced data (e.g., 10% "digit 0" vs 90% "not 0"), leading to weaker individual models and worse overall accuracy.
Fixes to Get Your Model Back on Track
Let's resolve this quickly:
- Explicitly lock in the multinomial strategy: This ensures you're using the same setup that worked well initially. Since parallelism doesn't help here, set
n_jobs=1(or omit it entirely). Your code would look like:from sklearn.linear_model import LogisticRegression model = LogisticRegression(solver='lbfgs', multi_class='multinomial', max_iter=1000) - Adjust resources if you insist on parallel training: If you want to use
n_jobs, switch tomulti_class='ovr'and use a GCE instance with enough physical cores (e.g., n1-standard-2 with 2 vCPUs). Just note this strategy may never match the multinomial mode's performance on MNIST. - Increase iteration limits: If convergence was disrupted by parallel overhead, bumping
max_iterto 500 or 1000 gives the solver more time to find a strong minimum, even with minor overhead.
内容的提问来源于stack exchange,提问作者Fallen Apart




