You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用Python的ms_ssim对比图片遇4D张量要求报错求助

Fixing 4D Tensor & Type Errors for pytorch-msssim's ms_ssim

Let's break down your errors and fix the code step by step.

Why You're Getting These Errors

  • ValueError: Input images must be 4-d tensors: The ms_ssim function expects input tensors in the shape (batch_size, channels, height, width) (4D). When you use totensor() on a single PIL image, you get a 3D tensor (channels, height, width) — missing the required batch dimension.
  • TypeError: pic should be Tensor or ndarray: This came from your commented-out redundant conversions (like totensor(topil(np.array(image1)))), which are unnecessary and risk passing invalid types through the conversion chain.
  • AttributeError: 'numpy.ndarray' object has no attribute 'type': Numpy arrays don't have a .type() method — this happened when you tried to mix numpy operations with PyTorch tensor requirements incorrectly (e.g., malformed np.expand_dims calls).

Fixed Code

from PIL import Image
from pytorch_msssim import ms_ssim
import torchvision.transforms as transforms

# Define transforms once outside the function for efficiency
totensor = transforms.ToTensor()

def ssimcompare(path1: str, path2: str) -> float:
    # Load images directly as PIL objects, ensure 3-channel RGB (handles grayscale too)
    image1 = Image.open(path1).convert("RGB")
    image2 = Image.open(path2).convert("RGB")
    
    # Convert to 3D tensors (C, H, W) with values normalized to [0, 1]
    tensor1 = totensor(image1)
    tensor2 = totensor(image2)
    
    # Add batch dimension to make them 4D (B, C, H, W) — required by ms_ssim
    tensor1 = tensor1.unsqueeze(0)
    tensor2 = tensor2.unsqueeze(0)
    
    # Calculate MS-SSIM: use data_range=1.0 since totensor() normalizes pixels to [0,1]
    ms_ssim_value = ms_ssim(tensor1, tensor2, data_range=1.0, size_average=False)
    
    # Convert tensor result to a scalar float for return
    return ms_ssim_value.item()

Key Fixes & Explanations

  • Simplify Conversions: We skip unnecessary numpy ↔ PIL ↔ tensor loops — totensor() works directly on PIL images, so we use it directly.
  • Standardize Image Channels: .convert("RGB") ensures both images have 3 channels, avoiding shape mismatches if one image is grayscale.
  • Add Batch Dimension: .unsqueeze(0) adds a single batch dimension at index 0, turning 3D tensors into the 4D format ms_ssim requires.
  • Correct data_range: The totensor() transform scales pixel values from [0, 255] to [0, 1], so we set data_range=1.0 instead of 255. Using 255 here would lead to incorrect similarity scores.
  • Extract Scalar Value: The ms_ssim result is a PyTorch tensor, so .item() converts it to a regular float for the return type.

Test It Out

Call the function with your image paths:

score = ssimcompare("path/to/your/image1.jpg", "path/to/your/image2.jpg")
print(f"MS-SSIM Score: {score}")

Scores range from 0 (completely dissimilar) to 1 (identical images).

内容的提问来源于stack exchange,提问作者Kaltresian

火山引擎 最新活动