如何向PyTorch LSTM层输入数据并实现模型训练与预测
从Keras迁移到PyTorch:完成你的BiLSTM训练与预测流程
嘿,刚从Keras转到PyTorch确实需要适应一阵——毕竟PyTorch是命令式风格,不像Keras那样把训练流程封装得“开箱即用”。我来帮你把你Keras里的那套模型和训练逻辑对应到PyTorch里,一步步搞定训练和预测:
1. 先对齐你的模型结构
你现在的PyTorch模型少了Keras里的Embedding层,而且Keras里的Bidirectional(LSTM(units=len(X_train)))应该是笔误吧?LSTM的units是隐藏层神经元数量,不该是训练集样本数。先把模型补全,和Keras版本对齐:
import torch import torch.nn as nn import torch.optim as optim # 假设你已经定义了这些参数(和Keras对应) vocab_size = 你的词汇表大小 embedding_size = 你的嵌入维度 input_length = 55 # 对应Keras的input_length hidden_size = 64 # 替换成你Keras里LSTM的units值,别用len(X_train) num_layers = 1 # Keras默认LSTM的num_layers是1,可按需调整 num_classes = 你的类别数 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') pretrained_weights = torch.tensor(你的预训练嵌入权重) # 转成PyTorch张量 # 补全带Embedding的双向LSTM模型 class BiRNN(nn.Module): def __init__(self, vocab_size, embedding_size, hidden_size, num_layers, num_classes, pretrained_weights): super(BiRNN, self).__init__() # 对应Keras的Embedding层,加载预训练权重 self.embedding = nn.Embedding(vocab_size, embedding_size) self.embedding.weight.data.copy_(pretrained_weights) self.embedding.weight.requires_grad = False # 不想更新预训练权重就设为False self.hidden_size = hidden_size self.num_layers = num_layers self.lstm = nn.LSTM(embedding_size, hidden_size, num_layers, batch_first=True, bidirectional=True) self.fc = nn.Linear(hidden_size * 2, num_classes) # 双向输出要乘2 def forward(self, x): # x的形状: (batch_size, input_length) # 过Embedding层,输出形状: (batch_size, input_length, embedding_size) x = self.embedding(x) # 初始化LSTM的隐藏状态和细胞状态 h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(device) c0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(device) # 前向传播LSTM out, _ = self.lstm(x, (h0, c0)) # out形状: (batch_size, input_length, hidden_size*2) # 取最后一个时间步的输出,对应many-to-one任务 out = self.fc(out[:, -1, :]) # 输出形状: (batch_size, num_classes) return out # 初始化模型 model = BiRNN(vocab_size, embedding_size, hidden_size, num_layers, num_classes, pretrained_weights).to(device)
2. 数据准备:用DataLoader批量加载数据
Keras的fit会自动处理批量,但PyTorch需要我们手动把数据包装成Dataset和DataLoader,灵活性更高:
from torch.utils.data import TensorDataset, DataLoader # 转成PyTorch张量,注意数据类型:X是long型(Embedding层需要索引),y是long型(CrossEntropyLoss需要类别索引) X_train_tensor = torch.tensor(X_train, dtype=torch.long).to(device) y_train_tensor = torch.tensor(y_train, dtype=torch.long).to(device) # 注意:PyTorch不需要独热编码,直接传类别索引! X_val_tensor = torch.tensor(X_val, dtype=torch.long).to(device) y_val_tensor = torch.tensor(y_val, dtype=torch.long).to(device) # 打包成Dataset train_dataset = TensorDataset(X_train_tensor, y_train_tensor) val_dataset = TensorDataset(X_val_tensor, y_val_tensor) # 创建DataLoader,指定batch_size和是否打乱 batch_size = 32 # 可自行调整 train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
划重点:PyTorch的
CrossEntropyLoss不需要像Keras那样把y转成独热编码,直接传入类别索引(就是你Keras里y_train的原始索引)就行,这点和Keras的sparse_categorical_crossentropy一致。如果你之前Keras用的是categorical_crossentropy,记得把y从独热编码转成索引哦。
3. 实现训练循环
这是PyTorch和Keras最不一样的地方,需要手动写前向传播、损失计算、反向传播和参数更新:
# 定义损失函数和优化器,对应Keras的compile步骤 criterion = nn.CrossEntropyLoss() optimizer = optim.RMSprop(model.parameters(), lr=0.0005) # 和Keras用的RMSprop一致 epochs = 100 # 开始训练 for epoch in range(epochs): model.train() # 切换到训练模式 train_loss = 0.0 train_acc = 0.0 for batch_X, batch_y in train_loader: # 前向传播 outputs = model(batch_X) loss = criterion(outputs, batch_y) # 反向传播+优化 optimizer.zero_grad() # 清空梯度,避免累加 loss.backward() # 计算梯度 optimizer.step() # 更新参数 # 统计训练损失和准确率 train_loss += loss.item() * batch_X.size(0) _, preds = torch.max(outputs, 1) train_acc += torch.sum(preds == batch_y.data) # 计算每个epoch的平均指标 train_loss = train_loss / len(train_loader.dataset) train_acc = train_acc.double() / len(train_loader.dataset) # 验证集评估 model.eval() # 切换到评估模式 val_loss = 0.0 val_acc = 0.0 with torch.no_grad(): # 验证时不需要计算梯度,节省资源 for batch_X, batch_y in val_loader: outputs = model(batch_X) loss = criterion(outputs, batch_y) val_loss += loss.item() * batch_X.size(0) _, preds = torch.max(outputs, 1) val_acc += torch.sum(preds == batch_y.data) val_loss = val_loss / len(val_loader.dataset) val_acc = val_acc.double() / len(val_loader.dataset) # 打印训练信息 print(f'Epoch {epoch+1}/{epochs}') print(f'Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f}') print(f'Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}\n')
4. 用训练好的模型做预测
训练完成后,做预测的步骤很简单:
model.eval() # 切换到评估模式 with torch.no_grad(): # 假设你有测试数据X_test,先转成张量 X_test_tensor = torch.tensor(X_test, dtype=torch.long).to(device) outputs = model(X_test_tensor) # 得到每个样本的预测类别索引 _, predictions = torch.max(outputs, 1) # 如果需要概率,用softmax处理 probabilities = torch.softmax(outputs, dim=1) # 把预测结果转成numpy数组(如果需要的话) predictions_np = predictions.cpu().numpy() probabilities_np = probabilities.cpu().numpy()
这样就完成了从Keras到PyTorch的迁移,整个流程和你原来的Keras逻辑完全对应,只是PyTorch需要手动实现训练循环,灵活性更高。
内容的提问来源于stack exchange,提问作者user9355680




