如何在sts.adfuller()函数中设置时间序列固定滞后阶数？以及如何使多变量ADF检验使用相同滞后阶数以获得一致P值？

阿华AIGC实验室

2026-4-28

Hey there! Let's break down your two questions about setting lag orders in the Augmented Dickey-Fuller (ADF) test using statsmodels:

Answers to ADF Test Lag Order Questions

Question 1: How to set a fixed lag order in `sts.adfuller()`?

To force the ADF test to use a specific, fixed number of lags instead of letting it auto-select based on criteria like AIC/BIC, you just need to adjust two parameters in the adfuller() function:

Set autolag=None to turn off automatic lag selection.
Pass your desired lag count to the lags parameter as an integer.

Here's a quick code example:

import statsmodels.tsa.stattools as sts

# Assume `your_time_series` is your single-variable time series data
fixed_lag_result = sts.adfuller(your_time_series, lags=2, autolag=None)

This will run the ADF test with exactly 2 lags every time, no automatic adjustments.

Question 2: How to get consistent P-values across multiple variables with a uniform lag order?

The default behavior of adfuller() (using autolag='AIC' by default) picks lag orders individually for each variable based on that variable's own information criterion. That's why your two test results have different lags (2 vs 1) and P-values (0.477 vs 0.0)—each variable's optimal lag (per AIC) is different.

To make all your tests use the same lag order (and thus produce comparable P-values), follow these steps:

Choose a common lag order: You can either pick a pre-defined number (based on domain knowledge or your data's frequency) or calculate the maximum lag auto-selected for any of your variables (to use the most conservative lag structure).
Run the ADF test for every variable with this fixed lag: Use autolag=None and set lags to your chosen common value.

Example code for two variables

import statsmodels.tsa.stattools as sts
import pandas as pd

# Assume `df` contains your two variables in columns 'var1' and 'var2'
# Option 1: Pick a pre-defined common lag
common_lag = 2

# Option 2: Calculate max auto-selected lag across variables
# lag_var1 = sts.adfuller(df['var1'])[2]
# lag_var2 = sts.adfuller(df['var2'])[2]
# common_lag = max(lag_var1, lag_var2)

# Test both variables with the same fixed lag
result_var1 = sts.adfuller(df['var1'], lags=common_lag, autolag=None)
result_var2 = sts.adfuller(df['var2'], lags=common_lag, autolag=None)

# Print results for comparison
print("Var1 ADF Result (fixed lag {}):".format(common_lag), result_var1)
print("Var2 ADF Result (fixed lag {}):".format(common_lag), result_var2)

Now both tests use identical lag structures, so their P-values are directly comparable. For your provided results, re-running with a fixed lag of 2 would mean the second variable uses 2 lags instead of 1—its P-value might shift slightly, but both results will be consistent for cross-variable comparison.

内容的提问来源于stack exchange，提问作者Merahamad