MobileNet模型在水果识别任务中验证准确率达0.999但测试图像预测错误的问题排查

阿华AIGC实验室

2026-4-29

Hey there, let's break down what's going on with your fruit classification model—super high validation accuracy but failing on simple test images is a classic scenario with a few key culprits to investigate:

1. Data Distribution Mismatch (Most Likely Cause)

Your model's stellar validation accuracy suggests it's great at learning patterns from your training/validation data, but it's failing on test images because those test shots don't match the distribution your model was trained on. Here's where to look:

Background Replacement Edge Cases: When you swapped the white backgrounds with Open Images data, did you ensure the resulting training images cover the same variety of backgrounds, lighting, and angles as your test images? For example, if your test banana is on a messy kitchen counter but all training bananas are on clean outdoor backgrounds, the model will get confused by the unfamiliar context.
Fruit Variation: Are your test images of fruits that don't match the training set's variety? For instance, if your training set only has yellow bananas but your test image is a green unripe one, the model won't recognize it. Check if your class_names correspond exactly to the fruit classes in the Fruits-360 dataset after background replacement.
Validation Set Bias: Did you split your dataset randomly, or is your validation set just a subset of the same "clean" background-replaced images? If your validation data is too similar to training data, it won't reflect real-world variation.

2. Potential Overfitting (Even With High Val Accuracy)

While your validation accuracy is 0.999, don't rule out overfitting entirely:

Data Leakage: Double-check if any validation images accidentally made their way into the training set, or if you're using the same data augmentation for both training and validation (validation data shouldn't be augmented!).
Insufficient Regularization: Your model only uses a Dropout(0.2) layer. For fine-tuning a pre-trained MobileNet, this might not be enough to prevent the model from memorizing training details instead of learning general fruit features. Try increasing the dropout rate to 0.3 or 0.4, or add L2 regularization to the dense layer.

3. Prediction Pipeline Inconsistencies

Small mismatches between training and preprocessing can break predictions:

Normalization Check: In your prediction code, you divide the image by 255. Does your training data generator (tr_data_gen) do the same? If you used ImageDataGenerator(rescale=1./255) during training, you're good—but if not, the input scaling difference will throw off the model.
Input Size: Confirm that IMG_SIZE during training is exactly 224x224 (matching your prediction code). A mismatch here can lead to incorrect feature extraction.
Model Loading: Ensure the saved model (better_modelv2.0) is the exact one you trained, including the fine-tuned layers. Sometimes loading models with TensorFlow Hub layers can have edge cases—try re-saving the model after training and reloading it to rule this out.

4. Pre-trained Model Epoch Requirements: No Fixed Minimum

There's no hard rule for the minimum number of epochs when using pre-trained models—it depends entirely on your task:

For feature extraction (no fine-tuning), 2-5 epochs might be enough to converge.
For fine-tuning (like unlocking the last 10 layers of MobileNet), you might need more epochs (10-20) to let the upper layers adapt to your data. However, your 5 epochs leading to 0.999 val accuracy suggests either your data is extremely simple, or your validation set isn't challenging enough.

Next Steps to Debug

Compare Test vs. Training Images: Grab the misclassified test image and visually compare it to images of the same class in your training set. Look for differences in lighting, angle, background, or fruit state (ripe vs. unripe).
Re-split Your Dataset: Create a completely separate test set (not just your hand-picked images) from the Fruits-360 data (with background replacement) and evaluate the model on that. If it performs well there, the issue is with your hand-picked test images being out of distribution.
Adjust Regularization: Increase dropout or add L2 regularization, then re-train to see if the model's generalization improves.
Check Training Curves: Plot your training/validation loss and accuracy over epochs. If validation loss starts to rise after a few epochs, that's clear overfitting. If both stay low, the problem is definitely data distribution mismatch.

内容的提问来源于stack exchange，提问作者Snake Jazz