如何在Caffe中训练带scale参数的VGG-16模型？

阿华AIGC实验室

2026-5-15

解决Caffe中添加输入scale参数后模型训练无进展的问题

你遇到的核心问题是输入数据缩放后，模型的学习动力学被彻底打破——原来的学习率、权重初始化等设置都是针对[0,255]范围的输入设计的，缩放到[0,1]后输入量级缩小了255倍，导致梯度更新幅度过小，模型根本无法学到有效特征。下面是具体的修复步骤：

1. 紧急调整学习率

你原本的base_lr=0.0648是基于原始输入尺度计算的，现在输入缩小了255倍，对应的学习率需要按比例放大：

最直接的方式是把base_lr乘以255，也就是设置为0.0648 * 255 ≈ 16.524
如果担心大学习率导致震荡，可以先从base_lr=1.0开始尝试，然后根据loss的变化逐步调整（比如loss下降慢就加大学率，震荡就减小）

2. 修正均值文件的处理逻辑

Caffe的transform_param预处理顺序是先减均值，再乘以scale，但你的imagenet_mean.binaryproto是基于[0,255]原始图像计算的，这会导致最终输入变成(image - mean) * 0.00390625，和你期望的image*0.00390625 - mean*0.00390625完全不符。解决方法二选一：

重新计算基于[0,1]尺度的均值文件（把原始均值除以255）
手动修改预处理逻辑：先对原始图像做scale，再减去缩放后的均值（可以通过自定义Data层或者预处理脚本实现）

3. 优化权重初始化方式

你当前用的xavier初始化是基于输入输出维度计算的，输入量级变小后，初始化的权重尺度可能不足以驱动有效学习：

把卷积层和全连接层的权重初始化换成msra（更适配ReLU激活的深层模型）
或者给xavier初始化增加std参数，手动放大权重的初始化范围，比如：
```
weight_filler {
  type: "xavier"
  std: 0.1
}
```

4. 先做小规模调试验证

不要直接跑几十万轮，先做快速验证：

把batch_size调到1，max_iter设为100，观察loss是否有明显下降
打印第一层卷积的输出张量，确认输入缩放后的数据分布是否合理（比如是否大部分值在0-1区间，减去均值后是否有正负分布）

5. 确保训练/测试预处理完全一致

你当前的配置已经做到了这一点，但要再次确认：训练和测试阶段的transform_param中，scale、mean_file、crop_size等参数完全相同，避免测试阶段出现数据分布不匹配的问题。

你的原始配置文件参考

train_val.prototxt

name: "ES VGG"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
    mirror: true
    crop_size: 224
    mean_file: "/local/datasets/imagenet/ilsvrc12/imagenet_mean.binaryproto"
  }
  data_param {
    source: "/local/datasets/imagenet/ilsvrc12_train_lmdb"
    batch_size: 6
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
    mirror: false
    crop_size: 224
    mean_file: "/local/datasets/imagenet/ilsvrc12/imagenet_mean.binaryproto"
  }
  data_param {
    source: "/local/datasets/imagenet/ilsvrc12_val_lmdb"
    batch_size: 6
    backend: LMDB
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  convolution_param {
    num_output: 64
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  convolution_param {
    num_output: 64
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1_2"
  type: "ReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_2"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  convolution_param {
    num_output: 128
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu2_1"
  type: "ReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  convolution_param {
    num_output: 128
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu2_2"
  type: "ReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2_2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3_1"
  convolution_param {
    num_output: 256
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3_1"
  type: "ReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  convolution_param {
    num_output: 256
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3_2"
  type: "ReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  convolution_param {
    num_output: 256
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3_3"
  type: "ReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3_3"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "pool3"
  top: "conv4_1"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu4_1"
  type: "ReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu4_2"
  type: "ReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu4_3"
  type: "ReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "pool4"
  type: "Pooling"
  bottom: "conv4_3"
  top: "pool4"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv5_1"
  type: "Convolution"
  bottom: "pool4"
  top: "conv5_1"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu5_1"
  type: "ReLU"
  bottom: "conv5_1"
  top: "conv5_1"
}
layer {
  name: "conv5_2"
  type: "Convolution"
  bottom: "conv5_1"
  top: "conv5_2"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu5_2"
  type: "ReLU"
  bottom: "conv5_2"
  top: "conv5_2"
}
layer {
  name: "conv5_3"
  type: "Convolution"
  bottom: "conv5_2"
  top: "conv5_3"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5_3"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.01
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.01
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.01
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}
layer {
  name: "accuracytop1"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracytop1"
  accuracy_param {
    top_k: 1
  }
  include {
    phase: TEST
  }
}
layer {
  name: "accuracytop5"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracytop5"
  accuracy_param {
    top_k: 5
  }
  include {
    phase: TEST
  }
}

solver.prototxt

net: "models/es_vgg/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.0648
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 18900000
momentum: 0.9
weight_decay: 0.0005