缺陷检测场景下目标检测与语义分割深度学习模型的融合、mAP评估及代码报错排查
问题分析与代码修改方案
你遇到的问题主要有两个:输入形状不匹配和U-Net输出解包错误,下面逐个拆解并给出具体解决方案:
1. 输入形状不匹配问题
第一个警告提示你的U-Net模型是基于256x256的输入构建的,但你传给它的是800x800的图像,这会导致模型无法正常计算。解决方法是在处理scratches类图像时,单独调整图像尺寸到U-Net要求的256x256,而不是复用RetinaNet的resize逻辑。
2. U-Net输出解包错误
这是直接触发报错的核心原因:U-Net是语义分割模型,它的输出是分割掩码(形状为(1,256,256,1)),而不是目标检测模型返回的boxes, scores, labels三个张量。你不能直接把分割结果解包成三个变量,需要从分割掩码中提取出检测框。
修改后的完整代码片段
我针对这两个问题修改了_get_detections函数中处理U-Net的部分,添加了从分割掩码提取检测框的逻辑(基于OpenCV轮廓检测):
def _get_detections(generator, model, unetmodel=None, score_threshold=0.05, max_detections=100, save_path=None): """ Get the detections from the model using the generator. The result is a list of lists such that the size is: all_detections[num_images][num_classes] = detections[num_detections, 4 + num_classes] # Arguments generator : The generator used to run images through the model. model : The model to run on the images. score_threshold : The score confidence threshold to use. max_detections : The maximum number of detections to use per image. save_path : The path to save the images with visualized detections to. # Returns A list of lists containing the detections for each image in the generator. """ all_detections = [[None for i in range(generator.num_classes()) if generator.has_label(i)] for j in range(generator.size())] all_inferences = [None for i in range(generator.size())] # 定义U-Net要求的输入尺寸 unet_input_size = (256, 256) for i in progressbar.progressbar(range(generator.size()), prefix='Running network: '): test_class = generator.load_annotations(i)['test_class'] print("------") if test_class == 'scratches': usemodel = unetmodel print("Using unetmodel") raw_image = generator.load_image(i) # 针对U-Net单独resize图像到256x256 image = cv2.resize(raw_image.copy(), unet_input_size) # 计算原始图像与U-Net输入的尺寸比例,用于后续还原检测框 original_scale = (raw_image.shape[1]/unet_input_size[0], raw_image.shape[0]/unet_input_size[1]) image = generator.preprocess_image(image) if keras.backend.image_data_format() == 'channels_first': image = image.transpose((2, 0, 1)) # 运行U-Net模型 start = time.time() seg_mask = unetmodel.predict_on_batch(np.expand_dims(image, axis=0))[0,:,:,0] seg_mask_bin = (seg_mask > 0.4).astype(np.uint8) inference_time = time.time() - start # 从分割掩码提取检测框(基于轮廓检测) contours, _ = cv2.findContours(seg_mask_bin, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) image_boxes = [] image_scores = [] image_labels = [] # 请根据你的数据集调整这个label值(scratches对应的类别编号) scratch_label = 1 for cnt in contours: # 获取轮廓的外接矩形 x, y, w, h = cv2.boundingRect(cnt) # 将框坐标还原到原始图像尺寸 x_original = int(x * original_scale[0]) y_original = int(y * original_scale[1]) x2_original = int((x + w) * original_scale[0]) y2_original = int((y + h) * original_scale[1]) image_boxes.append([x_original, y_original, x2_original, y2_original]) # 用分割区域的平均概率作为检测分数 roi_score = np.mean(seg_mask[y:y+h, x:x+w]) image_scores.append(roi_score) image_labels.append(scratch_label) # 转换为numpy数组,处理空检测结果的情况 image_boxes = np.array(image_boxes) if image_boxes else np.empty((0,4)) image_scores = np.array(image_scores) if image_scores else np.empty((0,)) image_labels = np.array(image_labels) if image_labels else np.empty((0,)) # 过滤低分数结果并排序 if len(image_scores) > 0: indices = np.where(image_scores > score_threshold)[0] image_boxes = image_boxes[indices] image_scores = image_scores[indices] image_labels = image_labels[indices] # 按分数降序排列,取前max_detections个结果 scores_sort = np.argsort(-image_scores)[:max_detections] image_boxes = image_boxes[scores_sort] image_scores = image_scores[scores_sort] image_labels = image_labels[scores_sort] # 拼接成和RetinaNet一致的检测结果格式 if len(image_boxes) > 0: image_detections = np.concatenate([image_boxes, np.expand_dims(image_scores, axis=1), np.expand_dims(image_labels, axis=1)], axis=1) else: image_detections = np.empty((0,6)) else: usemodel = model print("Using retinanetmodel") raw_image = generator.load_image(i) image, scale = generator.resize_image(raw_image.copy()) image = generator.preprocess_image(image) if keras.backend.image_data_format() == 'channels_first': image = image.transpose((2, 0, 1)) # 运行RetinaNet模型 start = time.time() boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))[:3] inference_time = time.time() - start # 修正检测框的尺寸比例 boxes /= scale # 过滤低分数结果并排序 indices = np.where(scores[0, :] > score_threshold)[0] scores = scores[0][indices] scores_sort = np.argsort(-scores)[:max_detections] image_boxes = boxes[0, indices[scores_sort], :] image_scores = scores[scores_sort] image_labels = labels[0, indices[scores_sort]] image_detections = np.concatenate([image_boxes, np.expand_dims(image_scores, axis=1), np.expand_dims(image_labels, axis=1)], axis=1) if save_path is not None: draw_annotations(raw_image, generator.load_annotations(i), label_to_name=generator.label_to_name) draw_detections(raw_image, image_boxes, image_scores, image_labels, label_to_name=generator.label_to_name, score_threshold=score_threshold) cv2.imwrite(os.path.join(save_path, '{}.png'.format(i)), raw_image) # 将检测结果存入all_detections for label in range(generator.num_classes()): if not generator.has_label(label): continue all_detections[i][label] = image_detections[image_detections[:, -1] == label, :-1] all_inferences[i] = inference_time return all_detections, all_inferences
关键修改说明
- 输入尺寸适配:为U-Net单独添加resize逻辑,将图像缩放到256x256,之后通过比例计算把检测框还原回原始图像尺寸。
- 分割掩码转检测框:用OpenCV的
findContours提取分割区域的外接矩形作为检测框,用分割区域的平均概率作为检测分数,确保结果格式和RetinaNet输出一致。 - 空结果处理:添加了对无检测目标情况的处理,避免数组拼接时报错。
- label匹配:请根据你的数据集调整
scratch_label的值,确保和RetinaNet的label体系一致,这样后续计算mAP时才能正确匹配类别。
内容的提问来源于stack exchange,提问作者Mansi




