如何将LSTM中的Tanh激活函数替换为ReLU并支持CUDA加速?
替换LSTM的tanh为ReLU并支持CUDA加速的解决方案
嘿,我之前也纠结过这个问题,给你几个亲测有效的方案,都能完美支持CUDA加速,放心用:
方案一:PyTorch自定义LSTMCell(完全支持GPU)
你之前看到的“自定义LSTMCell不支持GPU”应该是比较旧的资料了,现在PyTorch 1.0+版本完全支持自定义Cell的CUDA加速,只要把模型和数据都移到CUDA设备上就行。
首先自定义一个基于ReLU的LSTMCell,替换掉默认的tanh激活:
import torch import torch.nn as nn class ReLULSTMCell(nn.LSTMCell): def __init__(self, input_size, hidden_size, bias=True): super().__init__(input_size, hidden_size, bias) # 替换激活函数为ReLU self.activation = nn.ReLU() def forward(self, input, hx=None): if hx is None: zeros = torch.zeros(input.size(0), self.hidden_size, dtype=input.dtype, device=input.device) hx = (zeros, zeros) h_prev, c_prev = hx # 复用父类的门计算逻辑,只替换cell和输出的激活 gates = self.weight_ih @ input.t() + self.bias_ih.unsqueeze(1) + self.weight_hh @ h_prev.t() + self.bias_hh.unsqueeze(1) gates = gates.t() ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1) ingate = torch.sigmoid(ingate) forgetgate = torch.sigmoid(forgetgate) cellgate = self.activation(cellgate) # 这里用ReLU替换tanh c_next = forgetgate * c_prev + ingate * cellgate h_next = torch.sigmoid(outgate) * self.activation(c_next) # 输出也用ReLU,可按需调整 return h_next, c_next
然后把这个Cell包装成完整的LSTM层(支持多层、batch_first等常用参数):
class ReLULSTM(nn.Module): def __init__(self, input_size, hidden_size, num_layers=1, bias=True, batch_first=False): super().__init__() self.input_size = input_size self.hidden_size = hidden_size self.num_layers = num_layers self.batch_first = batch_first # 堆叠多层LSTMCell self.cells = nn.ModuleList([ ReLULSTMCell(input_size if i == 0 else hidden_size, hidden_size, bias) for i in range(num_layers) ]) def forward(self, input, hx=None): if self.batch_first: input = input.transpose(0, 1) # 转换为PyTorch默认的(seq_len, batch, input_size)格式 seq_len, batch_size, _ = input.size() # 初始化隐藏状态(自动匹配设备) if hx is None: hx = [ (torch.zeros(batch_size, self.hidden_size, device=input.device), torch.zeros(batch_size, self.hidden_size, device=input.device)) for _ in range(self.num_layers) ] outputs = [] for t in range(seq_len): x_t = input[t] new_hx = [] # 逐层计算 for layer_idx, cell in enumerate(self.cells): h_prev, c_prev = hx[layer_idx] h_next, c_next = cell(x_t, (h_prev, c_prev)) new_hx.append((h_next, c_next)) x_t = h_next outputs.append(x_t) outputs = torch.stack(outputs, dim=0) if self.batch_first: outputs = outputs.transpose(0, 1) # 转回(batch, seq_len, hidden_size)格式 return outputs, new_hx
使用时直接将模型移到CUDA设备即可:
# 初始化模型并移到GPU model = ReLULSTM(input_size=10, hidden_size=20, num_layers=2, batch_first=True).to('cuda') # 示例输入(也移到GPU) input = torch.randn(32, 15, 10).to('cuda') # batch_size=32, seq_len=15, input_size=10 output, (h_n, c_n) = model(input)
方案二:TensorFlow/Keras直接设置激活函数(更简便)
如果你用TensorFlow/Keras的话,根本不用自定义Cell,原生的LSTM层就支持直接指定activation参数为ReLU,而且原生支持GPU加速:
from tensorflow.keras.layers import LSTM # 直接创建带ReLU激活的LSTM层 lstm_layer = LSTM(units=64, activation='relu', recurrent_activation='sigmoid')
这里recurrent_activation是门控的激活函数,一般还是保留sigmoid,你也可以根据需求修改。
补充说明
之前流传的“自定义LSTMCell不支持GPU”是早期框架版本的限制,现在主流深度学习框架(PyTorch 1.0+、TensorFlow 2.x)都已经修复了这个问题,只要你的模型和输入数据都正确移到CUDA设备上,自定义Cell就能正常享受GPU加速。
内容的提问来源于stack exchange,提问作者Venkat




