缺陷检测场景下目标检测与语义分割深度学习模型的融合、mAP评估及代码报错排查

阿华AIGC实验室

2026-5-1

问题分析与代码修改方案

你遇到的问题主要有两个：输入形状不匹配和U-Net输出解包错误，下面逐个拆解并给出具体解决方案：

1. 输入形状不匹配问题

第一个警告提示你的U-Net模型是基于256x256的输入构建的，但你传给它的是800x800的图像，这会导致模型无法正常计算。解决方法是在处理scratches类图像时，单独调整图像尺寸到U-Net要求的256x256，而不是复用RetinaNet的resize逻辑。

2. U-Net输出解包错误

这是直接触发报错的核心原因：U-Net是语义分割模型，它的输出是分割掩码（形状为(1,256,256,1)），而不是目标检测模型返回的boxes, scores, labels三个张量。你不能直接把分割结果解包成三个变量，需要从分割掩码中提取出检测框。

修改后的完整代码片段

我针对这两个问题修改了_get_detections函数中处理U-Net的部分，添加了从分割掩码提取检测框的逻辑（基于OpenCV轮廓检测）：

def _get_detections(generator, model, unetmodel=None, score_threshold=0.05, max_detections=100, save_path=None):
    """
    Get the detections from the model using the generator.
    The result is a list of lists such that the size is:
        all_detections[num_images][num_classes] = detections[num_detections, 4 + num_classes]
    # Arguments
        generator : The generator used to run images through the model.
        model : The model to run on the images.
        score_threshold : The score confidence threshold to use.
        max_detections : The maximum number of detections to use per image.
        save_path : The path to save the images with visualized detections to.
    # Returns
        A list of lists containing the detections for each image in the generator.
    """
    all_detections = [[None for i in range(generator.num_classes()) if generator.has_label(i)] for j in range(generator.size())]
    all_inferences = [None for i in range(generator.size())]

    # 定义U-Net要求的输入尺寸
    unet_input_size = (256, 256)

    for i in progressbar.progressbar(range(generator.size()), prefix='Running network: '):
        test_class = generator.load_annotations(i)['test_class']
        print("------")
        if test_class == 'scratches':
            usemodel = unetmodel
            print("Using unetmodel")
            raw_image = generator.load_image(i)
            
            # 针对U-Net单独resize图像到256x256
            image = cv2.resize(raw_image.copy(), unet_input_size)
            # 计算原始图像与U-Net输入的尺寸比例，用于后续还原检测框
            original_scale = (raw_image.shape[1]/unet_input_size[0], raw_image.shape[0]/unet_input_size[1])
            image = generator.preprocess_image(image)

            if keras.backend.image_data_format() == 'channels_first':
                image = image.transpose((2, 0, 1))

            # 运行U-Net模型
            start = time.time()
            seg_mask = unetmodel.predict_on_batch(np.expand_dims(image, axis=0))[0,:,:,0]
            seg_mask_bin = (seg_mask > 0.4).astype(np.uint8)
            inference_time = time.time() - start

            # 从分割掩码提取检测框（基于轮廓检测）
            contours, _ = cv2.findContours(seg_mask_bin, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
            
            image_boxes = []
            image_scores = []
            image_labels = []
            # 请根据你的数据集调整这个label值（scratches对应的类别编号）
            scratch_label = 1
            
            for cnt in contours:
                # 获取轮廓的外接矩形
                x, y, w, h = cv2.boundingRect(cnt)
                # 将框坐标还原到原始图像尺寸
                x_original = int(x * original_scale[0])
                y_original = int(y * original_scale[1])
                x2_original = int((x + w) * original_scale[0])
                y2_original = int((y + h) * original_scale[1])
                image_boxes.append([x_original, y_original, x2_original, y2_original])
                # 用分割区域的平均概率作为检测分数
                roi_score = np.mean(seg_mask[y:y+h, x:x+w])
                image_scores.append(roi_score)
                image_labels.append(scratch_label)
            
            # 转换为numpy数组，处理空检测结果的情况
            image_boxes = np.array(image_boxes) if image_boxes else np.empty((0,4))
            image_scores = np.array(image_scores) if image_scores else np.empty((0,))
            image_labels = np.array(image_labels) if image_labels else np.empty((0,))
            
            # 过滤低分数结果并排序
            if len(image_scores) > 0:
                indices = np.where(image_scores > score_threshold)[0]
                image_boxes = image_boxes[indices]
                image_scores = image_scores[indices]
                image_labels = image_labels[indices]
                # 按分数降序排列，取前max_detections个结果
                scores_sort = np.argsort(-image_scores)[:max_detections]
                image_boxes = image_boxes[scores_sort]
                image_scores = image_scores[scores_sort]
                image_labels = image_labels[scores_sort]
            
            # 拼接成和RetinaNet一致的检测结果格式
            if len(image_boxes) > 0:
                image_detections = np.concatenate([image_boxes, np.expand_dims(image_scores, axis=1), np.expand_dims(image_labels, axis=1)], axis=1)
            else:
                image_detections = np.empty((0,6))
        else:
            usemodel = model
            print("Using retinanetmodel")
            raw_image = generator.load_image(i)
            image, scale = generator.resize_image(raw_image.copy())
            image = generator.preprocess_image(image)

            if keras.backend.image_data_format() == 'channels_first':
                image = image.transpose((2, 0, 1))

            # 运行RetinaNet模型
            start = time.time()
            boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))[:3]
            inference_time = time.time() - start

            # 修正检测框的尺寸比例
            boxes /= scale

            # 过滤低分数结果并排序
            indices = np.where(scores[0, :] > score_threshold)[0]
            scores = scores[0][indices]
            scores_sort = np.argsort(-scores)[:max_detections]
            image_boxes = boxes[0, indices[scores_sort], :]
            image_scores = scores[scores_sort]
            image_labels = labels[0, indices[scores_sort]]
            image_detections = np.concatenate([image_boxes, np.expand_dims(image_scores, axis=1), np.expand_dims(image_labels, axis=1)], axis=1)

        if save_path is not None:
            draw_annotations(raw_image, generator.load_annotations(i), label_to_name=generator.label_to_name)
            draw_detections(raw_image, image_boxes, image_scores, image_labels, label_to_name=generator.label_to_name, score_threshold=score_threshold)
            cv2.imwrite(os.path.join(save_path, '{}.png'.format(i)), raw_image)

        # 将检测结果存入all_detections
        for label in range(generator.num_classes()):
            if not generator.has_label(label):
                continue
            all_detections[i][label] = image_detections[image_detections[:, -1] == label, :-1]
        all_inferences[i] = inference_time

    return all_detections, all_inferences

关键修改说明

输入尺寸适配：为U-Net单独添加resize逻辑，将图像缩放到256x256，之后通过比例计算把检测框还原回原始图像尺寸。
分割掩码转检测框：用OpenCV的findContours提取分割区域的外接矩形作为检测框，用分割区域的平均概率作为检测分数，确保结果格式和RetinaNet输出一致。
空结果处理：添加了对无检测目标情况的处理，避免数组拼接时报错。
label匹配：请根据你的数据集调整scratch_label的值，确保和RetinaNet的label体系一致，这样后续计算mAP时才能正确匹配类别。

内容的提问来源于stack exchange，提问作者Mansi