Statsmodels OLS回归报错'exog contains inf or nans'但数据无空值/无穷值的原因排查求助
Statsmodels OLS回归报错'exog contains inf or nans'但数据无空值/无穷值的原因排查求助
各位大佬好,我最近在用Statsmodels做OLS回归分析时碰到了一个百思不得其解的问题:我有4个不同的因变量,打算逐个代入模型测试,前3个变量都能正常跑通回归,但轮到最后一个的时候,直接抛出了MissingDataError: exog contains inf or nans的错误。
我已经把数据翻来覆去检查了好几遍,确认没有NaN或者无穷值:
- 用
x.isna().sum()统计空值数量,结果全是0 - 用这段代码专门检测无穷值和空值:
import numpy as np if (x.isin([np.inf, -np.inf, np.nan]).any()): print("Series contains infinite values") else: print("Series does not contain infinite values")
运行后输出的是Series does not contain infinite values,说明数据里确实不存在这些异常值。
下面是完整的报错回溯信息:
----> 5 result = sm.OLS(y,sm.add_constant(x_adj)).fit() 6 return result ~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in __init__(self, endog, exog, missing, hasconst, **kwargs) 870 def __init__(self, endog, exog=None, missing='none', hasconst=None, 871 **kwargs): --> 872 super(OLS, self).__init__(endog, exog, missing=missing, 873 hasconst=hasconst, **kwargs) 874 if "weights" in self._init_keys: ~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in __init__(self, endog, exog, weights, missing, hasconst, **kwargs) 701 else: 702 weights = weights.squeeze() --> 703 super(WLS, self).__init__(endog, exog, missing=missing, 704 weights=weights, hasconst=hasconst, **kwargs) 705 nobs = self.exog.shape[0] ~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in __init__(self, endog, exog, **kwargs) 188 """ 189 def __init__(self, endog, exog, **kwargs): --> 190 super(RegressionModel, self).__init__(endog, exog, **kwargs) 191 self._data_attr.extend(['pinv_wexog', 'weights']) 192 ~\anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs) 235 236 def __init__(self, endog, exog=None, **kwargs): --> 237 super(LikelihoodModel, self).__init__(endog, exog, **kwargs) 238 self.initialize() 239 ~\anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs) 75 missing = kwargs.pop('missing', 'none') 76 hasconst = kwargs.pop('hasconst', None) ---> 77 self.data = self._handle_data(endog, exog, missing, hasconst, 78 **kwargs) 79 self.k_constant = self.data.k_constant ~\anaconda3\lib\site-packages\statsmodels\base\model.py in _handle_data(self, endog, exog, missing, hasconst, **kwargs) 99 100 def _handle_data(self, endog, exog, missing, hasconst, **kwargs): --> 101 data = handle_data(endog, exog, missing, hasconst, **kwargs) 102 # kwargs arrays could have changed, easier to just attach here 103 for key in kwargs: ~\anaconda3\lib\site-packages\statsmodels\base\data.py in handle_data(endog, exog, missing, hasconst, **kwargs) 670 671 klass = handle_data_class_factory(endog, exog) --> 672 return klass(endog, exog=exog, missing=missing, hasconst=hasconst, 673 **kwargs) ~\anaconda3\lib\site-packages\statsmodels\base\data.py in __init__(self, endog, exog, missing, hasconst, **kwargs) 85 self.const_idx = None 86 self.k_constant = 0 ---> 87 self._handle_constant(hasconst) 88 self._check_integrity() 89 self._cache = {} ~\anaconda3\lib\site-packages\statsmodels\base\data.py in _handle_constant(self, hasconst) 131 exog_max = np.max(self.exog, axis=0) 132 if not np.isfinite(exog_max).all(): --> 133 raise MissingDataError('exog contains inf or nans') 134 exog_min = np.min(self.exog, axis=0) 135 const_idx = np.where(exog_max == exog_min)[0].squeeze() MissingDataError: exog contains inf or nans
有没有大佬知道除了数据里的NaN和无穷值之外,还有哪些可能的原因会触发这个错误呀?麻烦指点一下,谢谢!
备注:内容来源于stack exchange,提问作者Kyle_Stockton




