You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

MNIST训练模型误分类自定义手写数字图像的预处理问题咨询

MNIST训练模型误分类自定义手写数字图像的预处理问题咨询

Hey there! Let's dig into why your custom handwritten digit image isn't being predicted correctly. The main issue is that your custom image preprocessing pipeline isn't fully aligned with how you prepared the MNIST training data, plus there are a few key steps missing to match MNIST's standard format perfectly.

Common Preprocessing Mismatches to Fix

  • Image Alignment & Centering: MNIST digits are consistently centered in the 28x28 frame. If your custom digit is off-center, too small, or shifted, the model won't recognize it. You need to crop and center the digit before resizing.
  • Resize Interpolation: OpenCV's default interpolation can introduce artifacts when shrinking to 28x28. Using INTER_AREA is better for preserving the digit's structure during downscaling.
  • Noise Reduction & Thresholding: Custom images often have subtle noise or gradient backgrounds, while MNIST digits have sharp, high contrast. Applying a threshold will make your image match MNIST's clean style.
  • Color Inversion Check: MNIST uses black digits on a white background (0 = black, 255 = white). If your custom image is the opposite, you need to invert it—but you have to make sure this step matches your input's actual style.

Fixed Preprocessing Code for Custom Images

Here's an updated version of your custom image handling code that aligns perfectly with MNIST's preprocessing steps:

# Load an image for prediction
image_path = 'digit.png'  # Replace with your image path
print(f"Loading and predicting for {image_path}...")

try:
    # Read the image in grayscale
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    if image is None:
        raise IOError(f"Error loading image at {image_path}")

    # Step 1: Reduce noise with Gaussian blur
    image = cv2.GaussianBlur(image, (3, 3), 0)

    # Step 2: Convert to high-contrast binary-like image (matches MNIST)
    _, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    # Step 3: Crop to the digit's bounding box to remove extra background
    contours, _ = cv2.findContours(image.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    if contours:
        # Grab the largest contour (assumed to be the digit)
        cnt = max(contours, key=cv2.contourArea)
        x, y, w, h = cv2.boundingRect(cnt)
        image = image[y:y+h, x:x+w]

    # Step 4: Pad to square then resize to 28x28 (preserves aspect ratio)
    height, width = image.shape
    max_dim = max(height, width)
    # Create a square canvas
    square_image = np.zeros((max_dim, max_dim), dtype=np.uint8)
    # Calculate padding to center the digit
    pad_y = (max_dim - height) // 2
    pad_x = (max_dim - width) // 2
    square_image[pad_y:pad_y+height, pad_x:pad_x+width] = image
    # Resize with area interpolation to avoid artifacts
    image = cv2.resize(square_image, (28, 28), interpolation=cv2.INTER_AREA)

    # Step 5: Auto-adjust color inversion to match MNIST's style
    avg_pixel = np.mean(image)
    if avg_pixel < 127:  # Background is dark, invert to white background
        image = cv2.bitwise_not(image)

    # Step 6: Normalize exactly like MNIST training data
    image_normalized = image.astype('float32') / 255

    # Save preprocessed image for debugging
    image_for_saving = (image_normalized * 255).astype(np.uint8)
    preprocessed_image_path = "preprocessed_digit.png"
    pil_image = Image.fromarray(image_for_saving)
    pil_image.save(preprocessed_image_path)
    print(f"Saved preprocessed image to {preprocessed_image_path}")

    # Prepare input for model (matches MNIST's input shape: (1, 28, 28, 1))
    image_input = np.expand_dims(image_normalized, axis=(0, -1))
    prediction = np.argmax(model.predict(image_input, verbose=0))
    print(f"Predicted Digit: {prediction}")

except Exception as e:
    print(f"Error processing the image: {e}")

Key Changes Explained

  • Noise Reduction: Gaussian blur smooths small artifacts that could throw off the model.
  • Thresholding: Converts your image to high-contrast data, just like the MNIST digits the model was trained on.
  • Cropping & Centering: Ensures the digit is the focus and centered in the frame—this is critical because MNIST's training data all has centered digits.
  • Square Padding: Resizing a non-square image directly would stretch the digit; padding to a square first preserves its shape.
  • Smart Inversion: Automatically checks if color inversion is needed, so you don't have to manually adjust based on your input image's style.
  • Efficient Shape Handling: Adds batch and channel dimensions in one line, perfectly matching how you prepared your MNIST training data.

Quick Verification Steps

  • Open the saved preprocessed_digit.png—it should look identical to a MNIST digit: a black digit centered on a white 28x28 square.
  • Check the pixel values: the background should be near 255 (white) and the digit near 0 (black), just like your training data.

备注:内容来源于stack exchange,提问作者Montrell Jubilee

火山引擎 最新活动