多元线性回归向后剔除:为何移除P值>0.05的特征及相关疑问
Great question—let’s unpack this clearly, since p-values and backward elimination can trip up even experienced analysts!
First: What the p-Value Means Here
In linear regression, each feature’s p-value tests the null hypothesis: "This feature’s regression coefficient is 0"—in plain terms, the feature has no meaningful relationship with the target variable.
The 0.05 threshold is a standard "significance level": we’re willing to accept a 5% chance of incorrectly rejecting the null hypothesis (a Type I error). If a feature’s p-value is >0.05, we can’t reject the null hypothesis—there’s not enough statistical evidence to prove the feature adds value to the model. Backward elimination starts with all features, then iteratively removes the least significant one (the one with the largest p-value >0.05) until every remaining feature has p ≤ 0.05.
Why We Don’t Drop Smaller p-Value Features
Smaller p-values (e.g., p=0.01, p=0.0001) mean stronger evidence that the feature’s coefficient is not 0. These features passed the significance test—we have good reason to believe they help predict the target. Backward elimination is about pruning unnecessary features, not discarding the ones that actually contribute to the model. It’s like cleaning out your closet: you get rid of clothes you never wear (p>0.05), not the ones you use all the time (p<0.05).
Conditions for Valid p-Values in Regression
p-values only make sense if your regression model meets these key assumptions—ignore them, and your p-values will be unreliable:
- Linearity: The target variable has a linear relationship with each predictor.
- Independence: Residuals (the difference between predicted and actual values) are independent of each other (no autocorrelation, which is common in time series data).
- Homoscedasticity: Residuals have constant variance across all predictor values (no "funnel" or "horn" shape in residual plots).
- Normality: Residuals follow a normal distribution.
- No multicollinearity: Predictors aren’t highly correlated with each other—severe multicollinearity makes coefficient estimates unstable and p-values meaningless.
内容的提问来源于stack exchange,提问作者Sai Krishna




