MNIST手写数字识别神经网络训练后精度停滞在9.8%且单轮epoch后停止学习的问题排查求助

阿华AIGC实验室

2026-3-31

各位好，我最近在写一个MNIST手写数字识别的神经网络，本来以为代码已经写完了，但训练的时候遇到了大问题——精度在第一轮训练后就卡在了9.8%，完全不再提升。初始预训练的精度是8.35%，之后每一轮epoch的精度都是9.8%，完全没有变化。

我目前用的是MSE作为损失函数，tanh作为激活函数，学习率设置为0.1。前几个epoch的精度数据如下：

8.35（预训练）、9.8、9.8、9.8、9.8、9.8

我先说明一下目前实现的基础函数：

tanh(z)：输入向量，输出经过tanh激活后的向量
tanhDerivative(z)：输入向量，计算对应的梯度向量
feedforward(input, stop)：计算网络输出，stop参数可以让计算停在任意层，返回对应层的激活值
feedforward2(input, stop)：和上面的前向传播类似，但返回的是没有经过激活函数的层输出（也就是公式里的z值）
MSE(input, desiredOutput)：计算均方误差
transformer(label)：把单个标签（比如0）转换成目标输出向量，格式是[[1],[-1],[-1],...,[-1]]（对应标签的位置为1，其余为-1）

我现在强烈怀疑问题出在train和backpropagation这两个函数里，下面是这两个函数的代码：

# 训练函数，反向传播逻辑在另一个方法里
def train(self, dataLocation, learnRate, batchSize=100):
    self.bias_updates = self.bias_templates # 存储偏置的更新量
    self.weight_updates = self.weight_templates # 存储权重的更新量
    file = np.loadtxt(dataLocation, delimiter=",", dtype="float128")
    count = 0
    print("Starting training")
    for row in file:
        count += 1
        data = []
        for item in row:
            data.append([item/255]) # 将输入归一化到0-1区间，转成二维数组格式
        desired = self.Transformer(data.pop(0)) # 转换标签为目标输出向量
        self.backpropagation(data, desired)
        if count % 100 == 0:
            for n in range(0,len(self.bias_updates)):
                self.bias_updates[n] *= learnRate
                self.weight_updates[n] *= learnRate
                self.biases[n] -= self.bias_updates[n]
                self.weights[n] -= self.weight_updates[n]
            self.bias_updates = self.bias_templates
            self.weight_updates = self.weight_templates
            print(count//100)

# 反向传播函数，训练的核心逻辑
# 注：学习率只在train()里应用
def backpropagation(self, input, desiredOutput):
    SigmoidLastLayerActivations = np.array(self.feedforward(input, self.size-1))
    LastLayerActivation = np.array(self.feedforward2(input, self.size-1))
    # 计算最后一层的delta：∂C/∂z(L)
    δ = 2 * (SigmoidLastLayerActivations-desiredOutput) * self.tanhDerivative(LastLayerActivation) 
    self.bias_updates[-1] += δ # 最后一层偏置的更新量
    # 最后一层权重的更新量
    self.weight_updates[-1] += np.matmul(δ, np.transpose(self.feedforward(input, self.size-2))) 

    # 反向遍历前面的层
    for i in range(len(self.weight_templates)-2 ,-1 , -1):
       requiredWeights = np.transpose(np.array(self.weights[i+1])) # 公式中需要的转置权重矩阵
       LayerActivations = np.array(self.feedforward2(input, i+1)) # 对应公式中的z值
       SigmoidLayerActivations = np.transpose(np.array(self.feedforward(input,i))) # 对应公式中的激活值
       
       δ = np.matmul(requiredWeights, δ) * self.tanhDerivative(LayerActivations)
       self.bias_updates[i] += δ
       # 计算当前层权重的更新量
       otherVariable = np.matmul(δ, SigmoidLayerActivations) 
       self.weight_updates[i] += otherVariable

注：代码里有些变量名带“Sigmoid”，是因为我最开始用的是sigmoid激活函数，后来换成tanh了但没改变量名，这点请大家忽略，实际逻辑是用tanh的。

我现在完全搞不懂为什么精度会一直卡在9.8%，感觉反向传播或者权重更新的逻辑肯定哪里错了，但自己找了好久都没发现。如果需要我提供其他函数的代码，或者更详细的信息，随时告诉我！麻烦各位帮忙看看问题出在哪，谢谢大家了！