PyTorch FPN分割模型报ValueError: axes don't match array的原因排查

阿华AIGC实验室

2026-5-11

Hey there, let's figure out why you're hitting that ValueError: axes don't match array when using your FPN model on the Carvana dataset. The error happening in x.transpose(0,2,1) gives us a clear clue—your input array doesn't have the number of axes that transpose is expecting. Here's how to troubleshoot and fix this:

1. First, Verify Your Image/Mask Dimensions

Even though you said both images and masks are 3-channel, there might be hidden inconsistencies. Let's check the actual shape of every sample you're loading.

Add a print statement in your custom Dataset's __getitem__ method right after loading the image and mask:

def __getitem__(self, idx):
    img_path = self.image_paths[idx]
    mask_path = self.mask_paths[idx]
    
    img = cv2.imread(img_path)  # or whatever library you're using
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    mask = cv2.imread(mask_path)
    mask = cv2.cvtColor(mask, cv2.COLOR_BGR2RGB)
    
    # Add this line to check shapes
    print(f"Sample {idx}: Image shape = {img.shape}, Mask shape = {mask.shape}")
    
    # Rest of your preprocessing...

Run this for a few batches. If you see any mask with shape (H, W) (2D instead of 3D), that's the culprit—some masks are being loaded as single-channel grayscale instead of 3-channel RGB, which causes the transpose operation to fail (since there's no axis 2 to reference).

2. Fix Mask Preprocessing for Carvana Dataset

The Carvana masks are stored as RGB images, but all three channels are identical (it's just a way to store a single-channel segmentation mask). We don't need 3 channels for a binary segmentation task (car vs. background), so let's standardize the masks to single-channel:

import numpy as np

# After loading the mask as RGB (H, W, 3)
mask = mask[..., 0]  # Take the first channel—all channels are the same
mask = np.expand_dims(mask, axis=-1)  # Add back a channel dimension to make it (H, W, 1)

This ensures every mask has a consistent 3D shape, so the transpose operation won't throw an axis mismatch error.

3. Correct the Transpose Logic in Your `to_tensor` Function

Your current transpose x.transpose(0,2,1) might be targeting the wrong axis order, especially if your input is in (H, W, C) format (the standard for loaded images). To convert (H, W, C) to the PyTorch-friendly (C, H, W) format, use this instead:

def to_tensor(x):
    # Handle 2D inputs just in case
    if len(x.shape) == 2:
        x = x[..., np.newaxis]
    # Convert (H, W, C) -> (C, H, W)
    x = x.transpose(2, 0, 1)
    return torch.tensor(x, dtype=torch.float32)

Using transpose(2, 0, 1) swaps the channel axis (index 2) to the first position, which is what PyTorch expects for model inputs.

4. Double-Check Your Model's Output Channels

Since Carvana is a binary segmentation task, make sure your FPN model is initialized to output 1 channel (not 3):

from segmentation_models_pytorch import FPN

# Initialize model with classes=1 for binary segmentation
model = FPN(
    encoder_name="resnet34",  # or your chosen encoder
    encoder_weights="imagenet",
    in_channels=3,  # input is 3-channel images
    classes=1  # binary output (car/background)
)

If you had classes=3 before, this would cause mismatches between model output and mask dimensions later on.

Walk through these steps, and your axis mismatch error should disappear. Let me know if you hit any other snags!

内容的提问来源于stack exchange，提问作者Enes Uğuroğlu