TensorFlow教程中PTB模型训练后如何使用?适配1.7版本问题
Got it, let's walk through how to use a trained RNN (like the one from TensorFlow's PTB tutorial) in TensorFlow 1.7—since older approaches don't play nicely with this stable version anymore. I'll break this down into actionable steps with code examples tailored to TF 1.7's API.
Key Principles First
Before diving into code, remember these critical points for TF 1.7:
- Disable training-only operations: When running inference, turn off dropout, batch norm training modes, etc.—these are only for training and will mess up your results.
- Match vocabulary & model structure: Your inference code must use the exact same vocabulary and model architecture as your training code (variable names, layer counts, hidden sizes, etc.).
- Preserve RNN state: For sequence generation, you need to pass the RNN's state between steps so it retains context.
Step 1: Save Your Trained Model
First, make sure you saved your model during training using tf.train.Saver(). Add this at the end of your training loop:
# Inside your training session saver = tf.train.Saver() with tf.Session() as sess: # ... (training logic here) # Save after training completes saver.save(sess, "./trained_ptb_model") # Saves variables to this path
This creates checkpoint files (.meta, .data, .index) that we'll load later.
Step 2: Inference Code for TF 1.7
Below is a complete example for loading the model and generating text (adapted for the PTB tutorial's RNN):
1. Reuse the Model Class (with Inference Mode)
First, define the same model class but set is_training=False to disable training-specific layers:
import tensorflow as tf import numpy as np from tensorflow.models.rnn.ptb import reader # Load the same vocabulary used during training vocab = reader.load_vocab("./path/to/your/vocab/file") reverse_vocab = {v: k for k, v in vocab.items()} # Map IDs back to words class PTBModel(object): def __init__(self, is_training, config): self.batch_size = config.batch_size self.num_steps = config.num_steps size = config.hidden_size vocab_size = config.vocab_size # Input placeholder (adjust batch/step size for inference) self.input_data = tf.placeholder(tf.int32, [config.batch_size, config.num_steps]) self.initial_state = tf.placeholder(tf.float32, [config.num_layers, 2, config.batch_size, size]) # Unpack state for multi-layer LSTM state_tuple = tuple( tf.contrib.rnn.LSTMStateTuple(self.initial_state[i][0], self.initial_state[i][1]) for i in range(config.num_layers) ) # Build embedding layer (CPU is faster for this) with tf.device("/cpu:0"): embedding = tf.get_variable("embedding", [vocab_size, size]) inputs = tf.nn.embedding_lookup(embedding, self.input_data) # Build LSTM cell (no dropout for inference) lstm_cell = tf.contrib.rnn.BasicLSTMCell(size, forget_bias=0.0, state_is_tuple=True) if is_training and config.keep_prob < 1: lstm_cell = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=config.keep_prob) cell = tf.contrib.rnn.MultiRNNCell([lstm_cell] * config.num_layers, state_is_tuple=True) # Run RNN steps outputs, self.final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=state_tuple) output = tf.reshape(outputs, [-1, size]) # Softmax layer to get word probabilities softmax_w = tf.get_variable("softmax_w", [size, vocab_size]) softmax_b = tf.get_variable("softmax_b", [vocab_size]) self.logits = tf.matmul(output, softmax_w) + softmax_b self.probs = tf.nn.softmax(self.logits)
2. Define Inference Config
Adjust the config to fit inference (smaller batch/step size, no dropout):
class InferenceConfig(object): init_scale = 0.1 num_layers = 2 num_steps = 1 # Process one word at a time hidden_size = 200 keep_prob = 1.0 # Disable dropout for inference batch_size = 1 # Single sample for generation vocab_size = 10000 # Match your training vocab size config = InferenceConfig() model = PTBModel(is_training=False, config=config)
3. Load Model & Generate Text
Finally, load the saved model and generate text by feeding one word at a time and passing the RNN state between steps:
saver = tf.train.Saver() with tf.Session() as sess: # Restore the trained variables saver.restore(sess, "./trained_ptb_model") # Initialize RNN state state = np.zeros((config.num_layers, 2, config.batch_size, config.hidden_size)) # Start with a seed word seed_word = "the" current_word_id = vocab[seed_word] generated_text = [seed_word] # Generate 100 words for _ in range(100): feed_dict = { model.input_data: [[current_word_id]], model.initial_state: state } # Get next word probabilities and updated state probs, state = sess.run([model.probs, model.final_state], feed_dict=feed_dict) # Option 1: Pick most likely word (argmax) next_word_id = np.argmax(probs[0]) # Option 2: Random sampling for more diverse text # next_word_id = np.random.choice(len(vocab), p=probs[0]) next_word = reverse_vocab[next_word_id] generated_text.append(next_word) current_word_id = next_word_id # Print the result print("Generated Text:\n", " ".join(generated_text))
Common Pitfalls to Avoid
- Variable name mismatches: Ensure your inference model uses the exact same variable names as your training model (don't rename layers or scopes).
- Forgetting state: If you don't pass the RNN state between steps, your model will lose context and generate nonsensical text.
- Training mode enabled: Always set
is_training=Falsefor inference—dropout in particular will degrade performance if left on.
内容的提问来源于stack exchange,提问作者Verych




