TensorFlow加载训练模型报错:checkpoint未找到weights/Variable
Fixing TensorFlow's NotFoundError When Restoring a Custom Trained Model
Hey there, let's walk through why you're seeing this NotFoundError: Key weights/Variable not found in checkpoint error and how to fix it step by step.
Why This Happens
There are a few common culprits here:
- Mismatched Graph Structure: The TensorFlow graph you're using to restore the model doesn't match the one you used to train it. Checkpoints rely on variable names to map weights—if the graph you're loading into doesn't have a variable named
weights/Variable, or if variable names changed, this error pops up. - Incorrect Checkpoint Path: When restoring, you used
Trained_Data/modelstep.ckpt, but your training code saved to an absolute path (/Users/prakash/Desktop/hackathon/Trained_Data/modelstep.ckpt). If your current working directory isn't/Users/prakash/Desktop/hackathon/, TensorFlow can't find the right checkpoint files. Also, note that your training saves checkpoints with step suffixes (likemodelstep.ckpt-0,modelstep.ckpt-100)—using justmodelstep.ckptwon't point to a valid file. - Premature Variable Initialization: If you run
tf.global_variables_initializer()before restoring the checkpoint, you're overwriting the weights you're trying to load, which can cause mismatches or missing keys.
How to Fix It
1. Reuse the Exact Same Model Graph First
Before creating a Saver or restoring, you must rebuild the exact same model structure you used during training. That means redefining all layers, variables, placeholders (x, y_, keep_prob), and operations (accuracy, train_step) exactly as they were in your training code.
Once the graph is identical, create the Saver and restore:
# First, copy/paste your training model definition code here # (Convolutional layers, fully connected layers, accuracy metric, etc.) saver = tf.train.Saver() # Create Saver AFTER defining the model graph with tf.Session() as sess: # DO NOT run global_variables_initializer() here—it will erase checkpoint weights # sess.run(tf.global_variables_initializer()) # Comment this out! # Use the full path to your checkpoint, including the step suffix saver.restore(sess, "/Users/prakash/Desktop/hackathon/Trained_Data/modelstep.ckpt-19900") print("Model restored successfully!")
2. Use the Correct Checkpoint Path
- Stick to absolute paths to avoid working directory confusion, or make sure your code runs from
/Users/prakash/Desktop/hackathon/if using relative paths. - Remember: TensorFlow saves checkpoints as a set of files (e.g.,
.data,.index,.meta). You only need to pass the prefix (e.g.,modelstep.ckpt-19900)—TensorFlow will handle the rest.
3. Auto-Load the Latest Checkpoint (Optional)
To avoid manually typing the step number, use tf.train.latest_checkpoint to grab the most recent saved model:
checkpoint_dir = "/Users/prakash/Desktop/hackathon/Trained_Data/" latest_checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir) saver.restore(sess, latest_checkpoint_path)
Quick Checks to Avoid Future Headaches
- Never initialize variables before restoring a checkpoint—this wipes out the weights you want to load.
- If you modify your model structure (add layers, rename variables), old checkpoints won't be compatible anymore. You'll need to retrain or implement weight transfer logic.
- Double-check that the
Saverobject is created after defining your model graph, not before.
内容的提问来源于stack exchange,提问作者Tarun Prakash




