如何在Caffe中训练带scale参数的VGG-16模型?
解决Caffe中添加输入scale参数后模型训练无进展的问题
你遇到的核心问题是输入数据缩放后,模型的学习动力学被彻底打破——原来的学习率、权重初始化等设置都是针对[0,255]范围的输入设计的,缩放到[0,1]后输入量级缩小了255倍,导致梯度更新幅度过小,模型根本无法学到有效特征。下面是具体的修复步骤:
1. 紧急调整学习率
你原本的base_lr=0.0648是基于原始输入尺度计算的,现在输入缩小了255倍,对应的学习率需要按比例放大:
- 最直接的方式是把
base_lr乘以255,也就是设置为0.0648 * 255 ≈ 16.524 - 如果担心大学习率导致震荡,可以先从
base_lr=1.0开始尝试,然后根据loss的变化逐步调整(比如loss下降慢就加大学率,震荡就减小)
2. 修正均值文件的处理逻辑
Caffe的transform_param预处理顺序是先减均值,再乘以scale,但你的imagenet_mean.binaryproto是基于[0,255]原始图像计算的,这会导致最终输入变成(image - mean) * 0.00390625,和你期望的image*0.00390625 - mean*0.00390625完全不符。解决方法二选一:
- 重新计算基于[0,1]尺度的均值文件(把原始均值除以255)
- 手动修改预处理逻辑:先对原始图像做scale,再减去缩放后的均值(可以通过自定义Data层或者预处理脚本实现)
3. 优化权重初始化方式
你当前用的xavier初始化是基于输入输出维度计算的,输入量级变小后,初始化的权重尺度可能不足以驱动有效学习:
- 把卷积层和全连接层的权重初始化换成
msra(更适配ReLU激活的深层模型) - 或者给
xavier初始化增加std参数,手动放大权重的初始化范围,比如:weight_filler { type: "xavier" std: 0.1 }
4. 先做小规模调试验证
不要直接跑几十万轮,先做快速验证:
- 把
batch_size调到1,max_iter设为100,观察loss是否有明显下降 - 打印第一层卷积的输出张量,确认输入缩放后的数据分布是否合理(比如是否大部分值在0-1区间,减去均值后是否有正负分布)
5. 确保训练/测试预处理完全一致
你当前的配置已经做到了这一点,但要再次确认:训练和测试阶段的transform_param中,scale、mean_file、crop_size等参数完全相同,避免测试阶段出现数据分布不匹配的问题。
你的原始配置文件参考
train_val.prototxt
name: "ES VGG" layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 mirror: true crop_size: 224 mean_file: "/local/datasets/imagenet/ilsvrc12/imagenet_mean.binaryproto" } data_param { source: "/local/datasets/imagenet/ilsvrc12_train_lmdb" batch_size: 6 backend: LMDB } } layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 mirror: false crop_size: 224 mean_file: "/local/datasets/imagenet/ilsvrc12/imagenet_mean.binaryproto" } data_param { source: "/local/datasets/imagenet/ilsvrc12_val_lmdb" batch_size: 6 backend: LMDB } } layer { name: "conv1_1" type: "Convolution" bottom: "data" top: "conv1_1" convolution_param { num_output: 64 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1_1" type: "ReLU" bottom: "conv1_1" top: "conv1_1" } layer { name: "conv1_2" type: "Convolution" bottom: "conv1_1" top: "conv1_2" convolution_param { num_output: 64 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1_2" type: "ReLU" bottom: "conv1_2" top: "conv1_2" } layer { name: "pool1" type: "Pooling" bottom: "conv1_2" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2_1" type: "Convolution" bottom: "pool1" top: "conv2_1" convolution_param { num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu2_1" type: "ReLU" bottom: "conv2_1" top: "conv2_1" } layer { name: "conv2_2" type: "Convolution" bottom: "conv2_1" top: "conv2_2" convolution_param { num_output: 128 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu2_2" type: "ReLU" bottom: "conv2_2" top: "conv2_2" } layer { name: "pool2" type: "Pooling" bottom: "conv2_2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3_1" type: "Convolution" bottom: "pool2" top: "conv3_1" convolution_param { num_output: 256 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3_1" type: "ReLU" bottom: "conv3_1" top: "conv3_1" } layer { name: "conv3_2" type: "Convolution" bottom: "conv3_1" top: "conv3_2" convolution_param { num_output: 256 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3_2" type: "ReLU" bottom: "conv3_2" top: "conv3_2" } layer { name: "conv3_3" type: "Convolution" bottom: "conv3_2" top: "conv3_3" convolution_param { num_output: 256 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3_3" type: "ReLU" bottom: "conv3_3" top: "conv3_3" } layer { name: "pool3" type: "Pooling" bottom: "conv3_3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4_1" type: "Convolution" bottom: "pool3" top: "conv4_1" convolution_param { num_output: 512 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu4_1" type: "ReLU" bottom: "conv4_1" top: "conv4_1" } layer { name: "conv4_2" type: "Convolution" bottom: "conv4_1" top: "conv4_2" convolution_param { num_output: 512 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu4_2" type: "ReLU" bottom: "conv4_2" top: "conv4_2" } layer { name: "conv4_3" type: "Convolution" bottom: "conv4_2" top: "conv4_3" convolution_param { num_output: 512 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu4_3" type: "ReLU" bottom: "conv4_3" top: "conv4_3" } layer { name: "pool4" type: "Pooling" bottom: "conv4_3" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv5_1" type: "Convolution" bottom: "pool4" top: "conv5_1" convolution_param { num_output: 512 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu5_1" type: "ReLU" bottom: "conv5_1" top: "conv5_1" } layer { name: "conv5_2" type: "Convolution" bottom: "conv5_1" top: "conv5_2" convolution_param { num_output: 512 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "relu5_2" type: "ReLU" bottom: "conv5_2" top: "conv5_2" } layer { name: "conv5_3" type: "Convolution" bottom: "conv5_2" top: "conv5_3" convolution_param { num_output: 512 kernel_size: 3 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "pool5" type: "Pooling" bottom: "conv5_3" top: "pool5" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" inner_product_param { num_output: 4096 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.01 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" inner_product_param { num_output: 4096 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.01 } } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc8" type: "InnerProduct" bottom: "fc7" top: "fc8" inner_product_param { num_output: 1000 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.01 } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc8" bottom: "label" top: "loss" } layer { name: "accuracytop1" type: "Accuracy" bottom: "fc8" bottom: "label" top: "accuracytop1" accuracy_param { top_k: 1 } include { phase: TEST } } layer { name: "accuracytop5" type: "Accuracy" bottom: "fc8" bottom: "label" top: "accuracytop5" accuracy_param { top_k: 5 } include { phase: TEST } }
solver.prototxt
net: "models/es_vgg/train_val.prototxt" test_iter: 1000 test_interval: 1000 base_lr: 0.0648 lr_policy: "step" gamma: 0.1 stepsize: 100000 display: 20 max_iter: 18900000 momentum: 0.9 weight_decay: 0.0005




