PyTorch LSTM模型未来多步预测时收敛至恒定值的原因排查求助
PyTorch LSTM模型未来多步预测时收敛至恒定值的原因排查求助
大家好,我最近在用PyTorch搭LSTM做股票价格的多步预测,用yfinance拉取AAPL的数据,整体流程看起来都很常规,但在做未来时间步的预测时,结果总是收敛到一个固定值,完全没有波动,想请大家帮忙看看问题出在哪。
先给大家梳理下我的整个流程:
1. 数据获取与预处理
首先用yfinance下载数据,然后用滚动窗口平滑:
import yfinance as yf import pandas as pd import numpy as np import torch from torch.utils.data import DataLoader, TensorDataset device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 下载数据 df = yf.download(ticker='AAPL') # 滚动窗口平滑(这里用的是收盘价列,实际只处理单特征) df_smoothed = df['Close'].rolling(30, min_periods=1).mean()
然后把数据转换成LSTM需要的序列格式,输入窗口是window_range,目标是下一个窗口的序列:
def create_ds_for_forecasting(df, window_range): df_values = df.copy().values.reshape(-1, 1) # 确保是单特征,形状(n_samples, 1) X, y = [], [] for i in np.arange(0, len(df_values)-window_range-1): X.append(df_values[i:i+window_range]) y.append(df_values[i+1:i+window_range+1]) return torch.Tensor(np.array(X)).to(device), torch.Tensor(np.array(y)).to(device)
2. LSTM模型定义
我定义的LSTM模型,最后接了全连接层和Tanh激活:
from torch import nn class ModeloLSTM(nn.Module): def __init__(self, num_layers, hidden_size, input_size, batch_size): super().__init__() self.input_size = input_size self.hidden_size = hidden_size self.num_layers = num_layers self.batch_size = batch_size self.lstm = nn.LSTM( input_size=self.input_size, num_layers=self.num_layers, hidden_size=self.hidden_size, batch_first=True ).to(device) self.fc = nn.Linear(hidden_size, 1).to(device) self.tanh = nn.Tanh() def forward(self, x): # 动态初始化每个batch的隐藏状态 if self.batch_size != 0: h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device) c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device) elif self.batch_size == 0: h0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device) c0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device) out, _ = self.lstm(x, (h0, c0)) out = self.fc(out) out = self.tanh(out) return out
3. 训练与测试循环
训练用的是SmoothL1Loss,优化器是Adam,训练循环如下:
# 假设已经划分好X_train, y_train, X_test, y_test(这里省略了数据集划分和归一化步骤) modelo = ModeloLSTM(num_layers=1, hidden_size=50, input_size=1, batch_size=64) criterion = nn.SmoothL1Loss() optimizer = torch.optim.Adam(modelo.parameters(), lr=1e-3) # 训练循环 loader = DataLoader(TensorDataset(X_train, y_train), shuffle=True, batch_size=64, drop_last=True) num_epochs = 5 for epoch in range(num_epochs): modelo.train() epoch_loss = 0.0 for inputs, label in loader: outputs = modelo(inputs) loss = criterion(outputs, label) optimizer.zero_grad() loss.backward() optimizer.step() epoch_loss += loss.item() print(f"Epoch {epoch+1}/{num_epochs}, Average Loss: {epoch_loss/len(loader):.4f}") # 测试循环 modelo.eval() y_pred = [] batch_size = 64 loader = DataLoader(X_test, batch_size=batch_size) with torch.no_grad(): for x_batch in loader: y_pred_i = modelo(x_batch)[:, -1, :] y_pred.append(y_pred_i) y_pred = torch.cat(y_pred, axis=0)
测试的时候结果看起来是正常的,但当我保存模型权重,然后初始化一个batch_size=0的模型(因为预测时是单样本/无批量),做未来多步预测时,问题就出现了:
4. 未来多步预测代码
# 加载模型 window_range = 30 # 假设窗口是30天 days_to_simulate = 90 # 预测3个月 df_test = df_smoothed[-window_range:] # 取最后一个窗口的真实数据作为初始输入 # 初始化无批量的模型 model = ModeloLSTM(num_layers=1, hidden_size=50, input_size=1, batch_size=0) model.load_state_dict(modelo.state_dict()) model.eval() # 开始多步预测 input_data = torch.Tensor(df_test.values.reshape(-1, 1)).to(device) seq_prediction = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(device) with torch.no_grad(): for i in range(0, days_to_simulate): if i < window_range: input_data = torch.cat((input_data[-window_range+i:,:], seq_prediction), dim=0) elif i >= window_range: input_data = seq_prediction[-window_range:] next_pred = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(device) seq_prediction = torch.cat((seq_prediction, next_pred), dim=0) # 可视化 starting_dates = pd.date_range(start=df.index[-window_range], periods=window_range) predicted_dates = pd.date_range(start=df.index[-1], periods=days_to_simulate+1) starting_series = pd.Series(df[-window_range:]['Close'].values.flatten(), index=starting_dates) # 假设之前做了归一化,这里反归一化 predicted_series = pd.Series(scaler.inverse_transform(seq_prediction.detach().cpu().numpy()).flatten(), index=predicted_dates) import matplotlib.pyplot as plt plt.figure(figsize=(12, 6)) plt.plot(starting_series.index, starting_series.values.flatten(), linestyle='-', label='真实数据') plt.plot(predicted_series.index, predicted_series.values.flatten(), linestyle='-', label='预测结果') plt.title('股票价格预测') plt.xlabel('日期') plt.ylabel('价格') plt.legend() plt.show()
遇到的问题
预测的结果会很快收敛到一个固定值,完全没有波动,就像一条平线。我原本以为是批量大小的问题,后来发现forward里其实是用x.size(0)来动态处理批量的,理论上不管批量是多少都应该正常工作才对。
我自己也排查了几个方向,但都没找到根源:
- Tanh激活的问题:Tanh的输出范围是[-1,1],如果我的归一化后的数据在这个区间,但预测时反复用自己的输出当输入,会不会逐渐收敛到均值?
- 隐藏状态初始化的问题:在单样本预测时,h0和c0的形状是
(num_layers, hidden_size),而训练时是(num_layers, batch_size, hidden_size),会不会这里的形状不匹配导致模型输出异常? - 多步预测的输入构建逻辑:每次拼接的时候窗口长度是不是不对?比如当i<window_range时,input_data的长度是不是没有维持在window_range?
- 训练与预测任务不匹配:训练时的y是下一个窗口的完整序列,而预测时只取最后一个时间步,会不会模型没有学到正确的序列延续能力?
现在实在找不到问题所在,想请各位大佬帮忙看看,到底哪里出了问题导致预测结果收敛到固定值?谢谢大家!




