如何用OpenCV将视频流输入PyTorch实现YOLOv3实时目标检测?
解决YOLOv3实时摄像头检测的问题
我来帮你梳理并解决这些问题,咱们一步步来:
1. 第一个错误:missing 1 required positional argument: 'self'
这个问题的根源是你调用streaming函数时传错了参数!你主代码里已经实例化并加载好权重的model对象,但调用函数时却传入了YOLOv3类本身。
看你的主代码:
# 你已经创建了正确的模型实例 model = YOLOv3(in_channels = 3, num_classes = 20).to(config.DEVICE) model.load_state_dict(checkpoint["state_dict"])
但你调用函数时写的是:
streaming(YOLOv3, 0.6, 0.6, config.ANCHORS, ip_camera)
这会导致函数里的model是类,不是实例,调用model.eval()自然会报错缺少self参数。正确的调用应该是:
streaming(model, 0.6, 0.6, config.ANCHORS, ip_camera, outputFile)
2. 输入张量维度错误(RuntimeError: Boolean value of Tensor...)
你处理帧的逻辑完全错了:
for frame in image会遍历图像的每一行,拿到的是单通道的行数据,根本不是完整的帧- YOLOv3要求输入是
(batch_size, channels, height, width)格式的张量,而OpenCV读取的帧是(height, width, channels)的HWC格式,需要转换为CHW,还要添加batch维度、归一化。
3. 视频写入逻辑错误
你在循环里每次创建cv2.VideoWriter,会导致文件被反复覆盖;而且frame.write(nms_boxes)是错误的——nms_boxes是检测框列表,不是图像帧,你需要先在原帧上绘制检测框,再写入绘制后的帧。
修正后的完整streaming函数
def streaming(model, thresh, iou_thresh, anchors, ip_camera, outputFile): stream = cv2.VideoCapture(ip_camera) # 连接失败时的提示信息 if not stream.isOpened(): print('Not opened.') print('Please ensure the following:') print('1. DroidCam is not running in your browser.') print('2. The IP address given is correct.') return # 获取原视频的宽高和帧率 orig_width = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH)) orig_height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT)) fps = int(stream.get(cv2.CAP_PROP_FPS)) or 30 # 如果获取不到帧率就用30 # 初始化视频写入器(放在循环外面!避免反复创建) fourcc = cv2.VideoWriter_fourcc(*'MJPG') out_writer = cv2.VideoWriter(outputFile, fourcc, fps, (orig_width, orig_height)) # YOLOv3的输入尺寸 input_size = 416 # 预处理anchors,只做一次 anchors = torch.tensor(anchors).to(config.DEVICE) # 模型设置为评估模式,只做一次 model.eval() # 开始处理视频流 while stream.isOpened(): ret, frame = stream.read() if not ret: break # 读取失败或视频结束 # 保存原帧用于绘制检测框 orig_frame = frame.copy() # 预处理帧:调整尺寸、转换格式、添加batch维度 resized_frame = cv2.resize(frame, (input_size, input_size), interpolation=cv2.INTER_AREA) # 从HWC转CHW,转成tensor并归一化 img_tensor = torch.from_numpy(resized_frame.transpose((2, 0, 1))).float() / 255.0 # 添加batch维度 img_tensor = img_tensor.unsqueeze(0).to(config.DEVICE) # 模型推理(关闭梯度计算提升速度) with torch.no_grad(): out = model(img_tensor) bboxes = [[] for _ in range(img_tensor.shape[0])] for i in range(len(out)): # 遍历三个尺度的输出 batch_size, A, S, _, _ = out[i].shape anchor = anchors[i] boxes_scale_i = cells_to_bboxes( out[i], anchor, S=S, is_preds=True ) for idx, box in enumerate(boxes_scale_i): bboxes[idx] += box # NMS筛选最优检测框 nms_boxes = non_max_suppression( bboxes[0], iou_threshold=iou_thresh, threshold=thresh, box_format="midpoint" ) # 在原帧上绘制检测框(把输入尺寸的坐标映射回原帧) for box in nms_boxes: # 假设box格式是[x1, y1, x2, y2, score, class],如果是midpoint格式请自行转换 x1 = int(box[0] * orig_width / input_size) y1 = int(box[1] * orig_height / input_size) x2 = int(box[2] * orig_width / input_size) y2 = int(box[3] * orig_height / input_size) score = box[4] cls_idx = int(box[5]) # 绘制矩形框和类别文本 cv2.rectangle(orig_frame, (x1, y1), (x2, y2), (0, 255, 0), 2) text = f"{config.CLASSES[cls_idx]}: {score:.2f}" cv2.putText(orig_frame, text, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2) # 显示检测画面 cv2.imshow('YOLOv3 Real-Time Detection', orig_frame) # 写入视频文件 out_writer.write(orig_frame) # 按q键退出 if cv2.waitKey(1) & 0xFF == ord('q'): break # 释放所有资源 stream.release() out_writer.release() cv2.destroyAllWindows()
额外注意事项
- 不需要把模型转成ONNX格式,用PyTorch直接推理完全可以,除非你有跨平台部署的需求
model.eval()和anchors预处理只需要执行一次,放在循环外能大幅提升效率- 一定要把检测框坐标从416x416的输入尺寸映射回原帧尺寸,不然检测框位置会完全错误
内容的提问来源于stack exchange,提问作者Simone De Bellis




