求助:如何用贝叶斯优化(skopt/Optuna)优化深度CNN超参数?
Fixing Your scikit-optimize Bayesian Optimization for CNN Hyperparameter Tuning
Let's break down the issues in your current code and walk through a fully working implementation, plus share some tips to make this efficient on Colab.
Key Issues in Your Current Code
First, let's list the critical problems preventing your code from running properly:
- No validation set: You can't compute a validation error to return to the Bayesian optimizer—this is the core metric it uses to guide the search.
- Redundant data loading: You load CIFAR-10 every time the objective function runs, which wastes time and memory.
- TensorFlow graph/session leaks: Resetting the graph and creating a new session on every iteration can cause memory bloat on Colab.
- Learning rate decay bug: You reassign
model_learning_rateafter defining it as a placeholder, which breaks the computation graph. - Missing return value: Your function doesn't return the validation error, so the optimizer has no feedback.
Working Implementation with scikit-optimize
Let's rewrite this step by step to fix these issues. We'll use a cleaner TensorFlow setup (or you could switch to Keras for even less boilerplate, but we'll stick with your TF approach first).
Step 1: Preprocess Data Once (Outside the Objective Function)
import numpy as np import tensorflow as tf from tensorflow.keras.datasets import cifar10 from skopt import gp_minimize from skopt.space import Integer, Real from skopt.utils import use_named_args # Set random seeds for reproducibility randomState = 42 np.random.seed(randomState) tf.set_random_seed(randomState) # Load and preprocess data ONCE (train_X, train_y), (test_X, test_y) = cifar10.load_data() train_X = train_X.astype('float32') / 255.0 test_X = test_X.astype('float32') / 255.0 # Split training set into train + validation (80/20 split) val_split = 0.2 val_size = int(len(train_X) * val_split) train_X, val_X = train_X[:-val_size], train_X[-val_size:] train_y, val_y = train_y[:-val_size], train_y[-val_size:] # One-hot encode labels (critical for classification tasks like CIFAR10) train_y = tf.keras.utils.to_categorical(train_y, 10) val_y = tf.keras.utils.to_categorical(val_y, 10) input_size = 10 # Number of output classes
Step 2: Define Your CNN Model
Replace this with your actual model architecture, but here's a working example:
def cnn(inputs, dropout_rate): x = tf.layers.conv2d(inputs, 32, (3,3), activation='relu', padding='same') x = tf.layers.max_pooling2d(x, (2,2)) x = tf.layers.conv2d(x, 64, (3,3), activation='relu', padding='same') x = tf.layers.max_pooling2d(x, (2,2)) x = tf.layers.conv2d(x, 128, (3,3), activation='relu', padding='same') x = tf.layers.max_pooling2d(x, (2,2)) x = tf.layers.flatten(x) x = tf.layers.dense(x, 256, activation='relu') x = tf.layers.dropout(x, rate=dropout_rate) x = tf.layers.dense(x, input_size, activation='softmax') return x
Step 3: Fix the Objective Function
# Define the search space for hyperparameters dimensions = [ Integer(100, 500, name='cnn_num_steps'), # Training steps per epoch Integer(5, 20, name='cnn_init_epoch'), # Epochs before first LR decay Integer(20, 50, name='cnn_max_epoch'), # Total training epochs Real(0.8, 0.99, name='cnn_learning_rate_decay'), # LR decay rate Integer(32, 128, name='cnn_batch_size'), # Batch size (powers of 2 work best on GPUs) Real(0.2, 0.5, name='cnn_dropout_rate'), # Dropout rate Real(1e-4, 1e-2, prior='log-uniform', name='cnn_init_learning_rate') # Initial LR (log scale fits better) ] # Track iteration count for logging iteration = 0 @use_named_args(dimensions=dimensions) def bayes_opt(cnn_num_steps, cnn_init_epoch, cnn_max_epoch, cnn_learning_rate_decay, cnn_batch_size, cnn_dropout_rate, cnn_init_learning_rate): global iteration iteration += 1 print(f"=== Starting Iteration {iteration} ===") # Reset TF graph and session properly to avoid memory leaks tf.reset_default_graph() sess = tf.Session() # Define placeholders inputs = tf.placeholder(tf.float32, [None, 32, 32, 3], name="inputs") targets = tf.placeholder(tf.float32, [None, input_size], name="targets") model_dropout_rate = tf.placeholder(tf.float32, name="dropout_rate") init_lr = tf.placeholder(tf.float32, name="init_lr") # Fixed learning rate decay (corrected the variable assignment bug) global_step = tf.Variable(0, trainable=False) learning_rate = tf.train.exponential_decay( learning_rate=init_lr, global_step=global_step, decay_steps=cnn_init_epoch * cnn_num_steps, # Decay after N total steps, not epochs decay_rate=cnn_learning_rate_decay, staircase=False ) # Build model, loss, and accuracy metrics prediction = cnn(inputs, model_dropout_rate) with tf.name_scope('loss'): # Use cross-entropy for classification (MSE was likely a mistake for CIFAR10) model_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=prediction, labels=targets)) with tf.name_scope('accuracy'): correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(targets, 1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # Optimizer train_step = tf.train.AdamOptimizer(learning_rate).minimize(model_loss, global_step=global_step) # Initialize variables sess.run(tf.global_variables_initializer()) # Training loop with validation and early stopping best_val_loss = float('inf') patience = 3 no_improve_count = 0 for epoch in range(cnn_max_epoch): # Training steps per epoch for step in range(cnn_num_steps): # Generate random batch (replace with your generate_batches if needed) idx = np.random.choice(len(train_X), cnn_batch_size) batch_X = train_X[idx] batch_y = train_y[idx] # Run training step sess.run(train_step, feed_dict={ inputs: batch_X, targets: batch_y, model_dropout_rate: cnn_dropout_rate, init_lr: cnn_init_learning_rate }) # Validate after each epoch (disable dropout during validation!) val_loss, val_acc = sess.run([model_loss, accuracy], feed_dict={ inputs: val_X, targets: val_y, model_dropout_rate: 0.0 }) print(f"Iteration {iteration}, Epoch {epoch+1}: Val Loss = {val_loss:.4f}, Val Acc = {val_acc:.4f}") # Early stopping to save time if val_loss < best_val_loss: best_val_loss = val_loss no_improve_count = 0 else: no_improve_count += 1 if no_improve_count >= patience: print("Early stopping: no validation improvement for 3 epochs") break # Clean up session to free memory sess.close() # Return validation loss (we want to MINIMIZE this) return best_val_loss
Step 4: Run the Bayesian Optimization
# Start the optimization (n_calls = number of hyperparameter combinations to test) result = gp_minimize( bayes_opt, dimensions=dimensions, n_calls=20, random_state=randomState, verbose=1 ) # Print the best results print("\n=== Best Hyperparameters Found ===") print(f"Steps per epoch: {result.x[0]}") print(f"Initial decay epoch: {result.x[1]}") print(f"Max epochs: {result.x[2]}") print(f"LR decay rate: {result.x[3]:.4f}") print(f"Batch size: {result.x[4]}") print(f"Dropout rate: {result.x[5]:.4f}") print(f"Initial LR: {result.x[6]:.6f}") print(f"Best validation loss: {result.fun:.4f}")
Efficient Implementation Tips for Colab
- Use Early Stopping: As shown above, this cuts down unnecessary training time when the model stops improving.
- Switch to Keras: Keras simplifies TensorFlow code drastically and integrates seamlessly with skopt. Here's a quick snippet of a Keras-based objective function:
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout def build_keras_model(dropout_rate, init_lr): model = Sequential([ Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)), MaxPooling2D((2,2)), Conv2D(64, (3,3), activation='relu'), MaxPooling2D((2,2)), Flatten(), Dense(128, activation='relu'), Dropout(dropout_rate), Dense(10, activation='softmax') ]) model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=init_lr), loss='categorical_crossentropy', metrics=['accuracy']) return model @use_named_args(dimensions=dimensions) def bayes_opt_keras(...): model = build_keras_model(cnn_dropout_rate, cnn_init_learning_rate) history = model.fit( train_X, train_y, batch_size=cnn_batch_size, epochs=cnn_max_epoch, validation_data=(val_X, val_y), callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)], verbose=1 ) return min(history.history['val_loss']) - Try Optuna as an Alternative: Optuna has built-in trial pruning (stopping bad runs early) and requires less boilerplate. A quick example:
import optuna def objective(trial): # Suggest hyperparameters num_steps = trial.suggest_int('num_steps', 100, 500) init_epoch = trial.suggest_int('init_epoch',5,20) max_epoch = trial.suggest_int('max_epoch',20,50) lr_decay = trial.suggest_float('lr_decay',0.8,0.99) batch_size = trial.suggest_int('batch_size',32,128) dropout = trial.suggest_float('dropout',0.2,0.5) init_lr = trial.suggest_float('init_lr',1e-4,1e-2,log=True) # Build and train model model = build_keras_model(dropout, init_lr) history = model.fit( train_X, train_y, batch_size=batch_size, epochs=max_epoch, validation_data=(val_X, val_y), callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)], verbose=0 ) return min(history.history['val_loss']) study = optuna.create_study(direction='minimize') study.optimize(objective, n_trials=20) print("Best params:", study.best_params) - Limit Search Space: Don't make hyperparameter ranges too wide—stick to reasonable values (e.g., batch sizes of 32, 64, 128) to reduce trial count.
- Enable GPU Acceleration: Ensure Colab uses a GPU (Runtime > Change runtime type > GPU) to speed up training drastically.
内容的提问来源于stack exchange,提问作者Hamilton




