TensorFlow教程中PTB模型训练后如何使用？适配1.7版本问题

阿华AIGC实验室

2026-5-25

How to Use a Trained RNN Model in TensorFlow 1.7

Got it, let's walk through how to use a trained RNN (like the one from TensorFlow's PTB tutorial) in TensorFlow 1.7—since older approaches don't play nicely with this stable version anymore. I'll break this down into actionable steps with code examples tailored to TF 1.7's API.

Key Principles First

Before diving into code, remember these critical points for TF 1.7:

Disable training-only operations: When running inference, turn off dropout, batch norm training modes, etc.—these are only for training and will mess up your results.
Match vocabulary & model structure: Your inference code must use the exact same vocabulary and model architecture as your training code (variable names, layer counts, hidden sizes, etc.).
Preserve RNN state: For sequence generation, you need to pass the RNN's state between steps so it retains context.

Step 1: Save Your Trained Model

First, make sure you saved your model during training using tf.train.Saver(). Add this at the end of your training loop:

# Inside your training session
saver = tf.train.Saver()
with tf.Session() as sess:
    # ... (training logic here)
    # Save after training completes
    saver.save(sess, "./trained_ptb_model")  # Saves variables to this path

This creates checkpoint files (.meta, .data, .index) that we'll load later.

Step 2: Inference Code for TF 1.7

Below is a complete example for loading the model and generating text (adapted for the PTB tutorial's RNN):

1. Reuse the Model Class (with Inference Mode)

First, define the same model class but set is_training=False to disable training-specific layers:

import tensorflow as tf
import numpy as np
from tensorflow.models.rnn.ptb import reader

# Load the same vocabulary used during training
vocab = reader.load_vocab("./path/to/your/vocab/file")
reverse_vocab = {v: k for k, v in vocab.items()}  # Map IDs back to words

class PTBModel(object):
    def __init__(self, is_training, config):
        self.batch_size = config.batch_size
        self.num_steps = config.num_steps
        size = config.hidden_size
        vocab_size = config.vocab_size

        # Input placeholder (adjust batch/step size for inference)
        self.input_data = tf.placeholder(tf.int32, [config.batch_size, config.num_steps])
        self.initial_state = tf.placeholder(tf.float32, [config.num_layers, 2, config.batch_size, size])
        
        # Unpack state for multi-layer LSTM
        state_tuple = tuple(
            tf.contrib.rnn.LSTMStateTuple(self.initial_state[i][0], self.initial_state[i][1])
            for i in range(config.num_layers)
        )

        # Build embedding layer (CPU is faster for this)
        with tf.device("/cpu:0"):
            embedding = tf.get_variable("embedding", [vocab_size, size])
            inputs = tf.nn.embedding_lookup(embedding, self.input_data)

        # Build LSTM cell (no dropout for inference)
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(size, forget_bias=0.0, state_is_tuple=True)
        if is_training and config.keep_prob < 1:
            lstm_cell = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=config.keep_prob)
        cell = tf.contrib.rnn.MultiRNNCell([lstm_cell] * config.num_layers, state_is_tuple=True)

        # Run RNN steps
        outputs, self.final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=state_tuple)
        output = tf.reshape(outputs, [-1, size])

        # Softmax layer to get word probabilities
        softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
        softmax_b = tf.get_variable("softmax_b", [vocab_size])
        self.logits = tf.matmul(output, softmax_w) + softmax_b
        self.probs = tf.nn.softmax(self.logits)

2. Define Inference Config

Adjust the config to fit inference (smaller batch/step size, no dropout):

class InferenceConfig(object):
    init_scale = 0.1
    num_layers = 2
    num_steps = 1  # Process one word at a time
    hidden_size = 200
    keep_prob = 1.0  # Disable dropout for inference
    batch_size = 1  # Single sample for generation
    vocab_size = 10000  # Match your training vocab size

config = InferenceConfig()
model = PTBModel(is_training=False, config=config)

3. Load Model & Generate Text

Finally, load the saved model and generate text by feeding one word at a time and passing the RNN state between steps:

saver = tf.train.Saver()

with tf.Session() as sess:
    # Restore the trained variables
    saver.restore(sess, "./trained_ptb_model")

    # Initialize RNN state
    state = np.zeros((config.num_layers, 2, config.batch_size, config.hidden_size))

    # Start with a seed word
    seed_word = "the"
    current_word_id = vocab[seed_word]
    generated_text = [seed_word]

    # Generate 100 words
    for _ in range(100):
        feed_dict = {
            model.input_data: [[current_word_id]],
            model.initial_state: state
        }
        # Get next word probabilities and updated state
        probs, state = sess.run([model.probs, model.final_state], feed_dict=feed_dict)
        
        # Option 1: Pick most likely word (argmax)
        next_word_id = np.argmax(probs[0])
        # Option 2: Random sampling for more diverse text
        # next_word_id = np.random.choice(len(vocab), p=probs[0])
        
        next_word = reverse_vocab[next_word_id]
        generated_text.append(next_word)
        current_word_id = next_word_id

    # Print the result
    print("Generated Text:\n", " ".join(generated_text))

Common Pitfalls to Avoid

Variable name mismatches: Ensure your inference model uses the exact same variable names as your training model (don't rename layers or scopes).
Forgetting state: If you don't pass the RNN state between steps, your model will lose context and generate nonsensical text.
Training mode enabled: Always set is_training=False for inference—dropout in particular will degrade performance if left on.

内容的提问来源于stack exchange，提问作者Verych