如何在sts.adfuller()函数中设置时间序列固定滞后阶数?以及如何使多变量ADF检验使用相同滞后阶数以获得一致P值?
Hey there! Let's break down your two questions about setting lag orders in the Augmented Dickey-Fuller (ADF) test using statsmodels:
Question 1: How to set a fixed lag order in sts.adfuller()?
To force the ADF test to use a specific, fixed number of lags instead of letting it auto-select based on criteria like AIC/BIC, you just need to adjust two parameters in the adfuller() function:
- Set
autolag=Noneto turn off automatic lag selection. - Pass your desired lag count to the
lagsparameter as an integer.
Here's a quick code example:
import statsmodels.tsa.stattools as sts # Assume `your_time_series` is your single-variable time series data fixed_lag_result = sts.adfuller(your_time_series, lags=2, autolag=None)
This will run the ADF test with exactly 2 lags every time, no automatic adjustments.
Question 2: How to get consistent P-values across multiple variables with a uniform lag order?
The default behavior of adfuller() (using autolag='AIC' by default) picks lag orders individually for each variable based on that variable's own information criterion. That's why your two test results have different lags (2 vs 1) and P-values (0.477 vs 0.0)—each variable's optimal lag (per AIC) is different.
To make all your tests use the same lag order (and thus produce comparable P-values), follow these steps:
- Choose a common lag order: You can either pick a pre-defined number (based on domain knowledge or your data's frequency) or calculate the maximum lag auto-selected for any of your variables (to use the most conservative lag structure).
- Run the ADF test for every variable with this fixed lag: Use
autolag=Noneand setlagsto your chosen common value.
Example code for two variables
import statsmodels.tsa.stattools as sts import pandas as pd # Assume `df` contains your two variables in columns 'var1' and 'var2' # Option 1: Pick a pre-defined common lag common_lag = 2 # Option 2: Calculate max auto-selected lag across variables # lag_var1 = sts.adfuller(df['var1'])[2] # lag_var2 = sts.adfuller(df['var2'])[2] # common_lag = max(lag_var1, lag_var2) # Test both variables with the same fixed lag result_var1 = sts.adfuller(df['var1'], lags=common_lag, autolag=None) result_var2 = sts.adfuller(df['var2'], lags=common_lag, autolag=None) # Print results for comparison print("Var1 ADF Result (fixed lag {}):".format(common_lag), result_var1) print("Var2 ADF Result (fixed lag {}):".format(common_lag), result_var2)
Now both tests use identical lag structures, so their P-values are directly comparable. For your provided results, re-running with a fixed lag of 2 would mean the second variable uses 2 lags instead of 1—its P-value might shift slightly, but both results will be consistent for cross-variable comparison.
内容的提问来源于stack exchange,提问作者Merahamad




