TensorFlow加载训练模型报错：checkpoint未找到weights/Variable

阿华AIGC实验室

2026-5-15

Fixing TensorFlow's NotFoundError When Restoring a Custom Trained Model

Hey there, let's walk through why you're seeing this NotFoundError: Key weights/Variable not found in checkpoint error and how to fix it step by step.

Why This Happens

There are a few common culprits here:

Mismatched Graph Structure: The TensorFlow graph you're using to restore the model doesn't match the one you used to train it. Checkpoints rely on variable names to map weights—if the graph you're loading into doesn't have a variable named weights/Variable, or if variable names changed, this error pops up.
Incorrect Checkpoint Path: When restoring, you used Trained_Data/modelstep.ckpt, but your training code saved to an absolute path (/Users/prakash/Desktop/hackathon/Trained_Data/modelstep.ckpt). If your current working directory isn't /Users/prakash/Desktop/hackathon/, TensorFlow can't find the right checkpoint files. Also, note that your training saves checkpoints with step suffixes (like modelstep.ckpt-0, modelstep.ckpt-100)—using just modelstep.ckpt won't point to a valid file.
Premature Variable Initialization: If you run tf.global_variables_initializer() before restoring the checkpoint, you're overwriting the weights you're trying to load, which can cause mismatches or missing keys.

How to Fix It

1. Reuse the Exact Same Model Graph First

Before creating a Saver or restoring, you must rebuild the exact same model structure you used during training. That means redefining all layers, variables, placeholders (x, y_, keep_prob), and operations (accuracy, train_step) exactly as they were in your training code.

Once the graph is identical, create the Saver and restore:

# First, copy/paste your training model definition code here
# (Convolutional layers, fully connected layers, accuracy metric, etc.)

saver = tf.train.Saver()  # Create Saver AFTER defining the model graph
with tf.Session() as sess:
    # DO NOT run global_variables_initializer() here—it will erase checkpoint weights
    # sess.run(tf.global_variables_initializer())  # Comment this out!
    
    # Use the full path to your checkpoint, including the step suffix
    saver.restore(sess, "/Users/prakash/Desktop/hackathon/Trained_Data/modelstep.ckpt-19900")
    print("Model restored successfully!")

2. Use the Correct Checkpoint Path

Stick to absolute paths to avoid working directory confusion, or make sure your code runs from /Users/prakash/Desktop/hackathon/ if using relative paths.
Remember: TensorFlow saves checkpoints as a set of files (e.g., .data, .index, .meta). You only need to pass the prefix (e.g., modelstep.ckpt-19900)—TensorFlow will handle the rest.

3. Auto-Load the Latest Checkpoint (Optional)

To avoid manually typing the step number, use tf.train.latest_checkpoint to grab the most recent saved model:

checkpoint_dir = "/Users/prakash/Desktop/hackathon/Trained_Data/"
latest_checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)

saver.restore(sess, latest_checkpoint_path)

Quick Checks to Avoid Future Headaches

Never initialize variables before restoring a checkpoint—this wipes out the weights you want to load.
If you modify your model structure (add layers, rename variables), old checkpoints won't be compatible anymore. You'll need to retrain or implement weight transfer logic.
Double-check that the Saver object is created after defining your model graph, not before.

内容的提问来源于stack exchange，提问作者Tarun Prakash