求助：如何用贝叶斯优化（skopt/Optuna）优化深度CNN超参数？

阿华AIGC实验室

2026-5-8

Fixing Your scikit-optimize Bayesian Optimization for CNN Hyperparameter Tuning

Let's break down the issues in your current code and walk through a fully working implementation, plus share some tips to make this efficient on Colab.

Key Issues in Your Current Code

First, let's list the critical problems preventing your code from running properly:

No validation set: You can't compute a validation error to return to the Bayesian optimizer—this is the core metric it uses to guide the search.
Redundant data loading: You load CIFAR-10 every time the objective function runs, which wastes time and memory.
TensorFlow graph/session leaks: Resetting the graph and creating a new session on every iteration can cause memory bloat on Colab.
Learning rate decay bug: You reassign model_learning_rate after defining it as a placeholder, which breaks the computation graph.
Missing return value: Your function doesn't return the validation error, so the optimizer has no feedback.

Working Implementation with scikit-optimize

Let's rewrite this step by step to fix these issues. We'll use a cleaner TensorFlow setup (or you could switch to Keras for even less boilerplate, but we'll stick with your TF approach first).

Step 1: Preprocess Data Once (Outside the Objective Function)

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from skopt import gp_minimize
from skopt.space import Integer, Real
from skopt.utils import use_named_args

# Set random seeds for reproducibility
randomState = 42
np.random.seed(randomState)
tf.set_random_seed(randomState)

# Load and preprocess data ONCE
(train_X, train_y), (test_X, test_y) = cifar10.load_data()
train_X = train_X.astype('float32') / 255.0
test_X = test_X.astype('float32') / 255.0

# Split training set into train + validation (80/20 split)
val_split = 0.2
val_size = int(len(train_X) * val_split)
train_X, val_X = train_X[:-val_size], train_X[-val_size:]
train_y, val_y = train_y[:-val_size], train_y[-val_size:]

# One-hot encode labels (critical for classification tasks like CIFAR10)
train_y = tf.keras.utils.to_categorical(train_y, 10)
val_y = tf.keras.utils.to_categorical(val_y, 10)
input_size = 10  # Number of output classes

Step 2: Define Your CNN Model

Replace this with your actual model architecture, but here's a working example:

def cnn(inputs, dropout_rate):
    x = tf.layers.conv2d(inputs, 32, (3,3), activation='relu', padding='same')
    x = tf.layers.max_pooling2d(x, (2,2))
    x = tf.layers.conv2d(x, 64, (3,3), activation='relu', padding='same')
    x = tf.layers.max_pooling2d(x, (2,2))
    x = tf.layers.conv2d(x, 128, (3,3), activation='relu', padding='same')
    x = tf.layers.max_pooling2d(x, (2,2))
    x = tf.layers.flatten(x)
    x = tf.layers.dense(x, 256, activation='relu')
    x = tf.layers.dropout(x, rate=dropout_rate)
    x = tf.layers.dense(x, input_size, activation='softmax')
    return x

Step 3: Fix the Objective Function

# Define the search space for hyperparameters
dimensions = [
    Integer(100, 500, name='cnn_num_steps'),  # Training steps per epoch
    Integer(5, 20, name='cnn_init_epoch'),    # Epochs before first LR decay
    Integer(20, 50, name='cnn_max_epoch'),    # Total training epochs
    Real(0.8, 0.99, name='cnn_learning_rate_decay'),  # LR decay rate
    Integer(32, 128, name='cnn_batch_size'),  # Batch size (powers of 2 work best on GPUs)
    Real(0.2, 0.5, name='cnn_dropout_rate'),  # Dropout rate
    Real(1e-4, 1e-2, prior='log-uniform', name='cnn_init_learning_rate')  # Initial LR (log scale fits better)
]

# Track iteration count for logging
iteration = 0

@use_named_args(dimensions=dimensions)
def bayes_opt(cnn_num_steps, cnn_init_epoch, cnn_max_epoch, cnn_learning_rate_decay, cnn_batch_size, cnn_dropout_rate, cnn_init_learning_rate):
    global iteration
    iteration += 1
    print(f"=== Starting Iteration {iteration} ===")

    # Reset TF graph and session properly to avoid memory leaks
    tf.reset_default_graph()
    sess = tf.Session()

    # Define placeholders
    inputs = tf.placeholder(tf.float32, [None, 32, 32, 3], name="inputs")
    targets = tf.placeholder(tf.float32, [None, input_size], name="targets")
    model_dropout_rate = tf.placeholder(tf.float32, name="dropout_rate")
    init_lr = tf.placeholder(tf.float32, name="init_lr")

    # Fixed learning rate decay (corrected the variable assignment bug)
    global_step = tf.Variable(0, trainable=False)
    learning_rate = tf.train.exponential_decay(
        learning_rate=init_lr,
        global_step=global_step,
        decay_steps=cnn_init_epoch * cnn_num_steps,  # Decay after N total steps, not epochs
        decay_rate=cnn_learning_rate_decay,
        staircase=False
    )

    # Build model, loss, and accuracy metrics
    prediction = cnn(inputs, model_dropout_rate)
    with tf.name_scope('loss'):
        # Use cross-entropy for classification (MSE was likely a mistake for CIFAR10)
        model_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=prediction, labels=targets))
    with tf.name_scope('accuracy'):
        correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(targets, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Optimizer
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(model_loss, global_step=global_step)

    # Initialize variables
    sess.run(tf.global_variables_initializer())

    # Training loop with validation and early stopping
    best_val_loss = float('inf')
    patience = 3
    no_improve_count = 0

    for epoch in range(cnn_max_epoch):
        # Training steps per epoch
        for step in range(cnn_num_steps):
            # Generate random batch (replace with your generate_batches if needed)
            idx = np.random.choice(len(train_X), cnn_batch_size)
            batch_X = train_X[idx]
            batch_y = train_y[idx]

            # Run training step
            sess.run(train_step, feed_dict={
                inputs: batch_X,
                targets: batch_y,
                model_dropout_rate: cnn_dropout_rate,
                init_lr: cnn_init_learning_rate
            })

        # Validate after each epoch (disable dropout during validation!)
        val_loss, val_acc = sess.run([model_loss, accuracy], feed_dict={
            inputs: val_X,
            targets: val_y,
            model_dropout_rate: 0.0
        })
        print(f"Iteration {iteration}, Epoch {epoch+1}: Val Loss = {val_loss:.4f}, Val Acc = {val_acc:.4f}")

        # Early stopping to save time
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            no_improve_count = 0
        else:
            no_improve_count += 1
            if no_improve_count >= patience:
                print("Early stopping: no validation improvement for 3 epochs")
                break

    # Clean up session to free memory
    sess.close()

    # Return validation loss (we want to MINIMIZE this)
    return best_val_loss

Step 4: Run the Bayesian Optimization

# Start the optimization (n_calls = number of hyperparameter combinations to test)
result = gp_minimize(
    bayes_opt,
    dimensions=dimensions,
    n_calls=20,
    random_state=randomState,
    verbose=1
)

# Print the best results
print("\n=== Best Hyperparameters Found ===")
print(f"Steps per epoch: {result.x[0]}")
print(f"Initial decay epoch: {result.x[1]}")
print(f"Max epochs: {result.x[2]}")
print(f"LR decay rate: {result.x[3]:.4f}")
print(f"Batch size: {result.x[4]}")
print(f"Dropout rate: {result.x[5]:.4f}")
print(f"Initial LR: {result.x[6]:.6f}")
print(f"Best validation loss: {result.fun:.4f}")

Efficient Implementation Tips for Colab

Use Early Stopping: As shown above, this cuts down unnecessary training time when the model stops improving.

Switch to Keras: Keras simplifies TensorFlow code drastically and integrates seamlessly with skopt. Here's a quick snippet of a Keras-based objective function:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

def build_keras_model(dropout_rate, init_lr):
    model = Sequential([
        Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),
        MaxPooling2D((2,2)),
        Conv2D(64, (3,3), activation='relu'),
        MaxPooling2D((2,2)),
        Flatten(),
        Dense(128, activation='relu'),
        Dropout(dropout_rate),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=init_lr),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

@use_named_args(dimensions=dimensions)
def bayes_opt_keras(...):
    model = build_keras_model(cnn_dropout_rate, cnn_init_learning_rate)
    history = model.fit(
        train_X, train_y,
        batch_size=cnn_batch_size,
        epochs=cnn_max_epoch,
        validation_data=(val_X, val_y),
        callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)],
        verbose=1
    )
    return min(history.history['val_loss'])

Try Optuna as an Alternative: Optuna has built-in trial pruning (stopping bad runs early) and requires less boilerplate. A quick example:

import optuna

def objective(trial):
    # Suggest hyperparameters
    num_steps = trial.suggest_int('num_steps', 100, 500)
    init_epoch = trial.suggest_int('init_epoch',5,20)
    max_epoch = trial.suggest_int('max_epoch',20,50)
    lr_decay = trial.suggest_float('lr_decay',0.8,0.99)
    batch_size = trial.suggest_int('batch_size',32,128)
    dropout = trial.suggest_float('dropout',0.2,0.5)
    init_lr = trial.suggest_float('init_lr',1e-4,1e-2,log=True)

    # Build and train model
    model = build_keras_model(dropout, init_lr)
    history = model.fit(
        train_X, train_y,
        batch_size=batch_size,
        epochs=max_epoch,
        validation_data=(val_X, val_y),
        callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)],
        verbose=0
    )
    return min(history.history['val_loss'])

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=20)
print("Best params:", study.best_params)

Limit Search Space: Don't make hyperparameter ranges too wide—stick to reasonable values (e.g., batch sizes of 32, 64, 128) to reduce trial count.
Enable GPU Acceleration: Ensure Colab uses a GPU (Runtime > Change runtime type > GPU) to speed up training drastically.