如何用OpenCV将视频流输入PyTorch实现YOLOv3实时目标检测？

阿华AIGC实验室

2026-4-30

解决YOLOv3实时摄像头检测的问题

我来帮你梳理并解决这些问题，咱们一步步来：

1. 第一个错误：`missing 1 required positional argument: 'self'`

这个问题的根源是你调用streaming函数时传错了参数！你主代码里已经实例化并加载好权重的model对象，但调用函数时却传入了YOLOv3类本身。

看你的主代码：

# 你已经创建了正确的模型实例
model = YOLOv3(in_channels = 3, num_classes = 20).to(config.DEVICE)
model.load_state_dict(checkpoint["state_dict"])

但你调用函数时写的是：

streaming(YOLOv3, 0.6, 0.6, config.ANCHORS, ip_camera)

这会导致函数里的model是类，不是实例，调用model.eval()自然会报错缺少self参数。正确的调用应该是：

streaming(model, 0.6, 0.6, config.ANCHORS, ip_camera, outputFile)

2. 输入张量维度错误（`RuntimeError: Boolean value of Tensor...`）

你处理帧的逻辑完全错了：

for frame in image会遍历图像的每一行，拿到的是单通道的行数据，根本不是完整的帧
YOLOv3要求输入是(batch_size, channels, height, width)格式的张量，而OpenCV读取的帧是(height, width, channels)的HWC格式，需要转换为CHW，还要添加batch维度、归一化。

3. 视频写入逻辑错误

你在循环里每次创建cv2.VideoWriter，会导致文件被反复覆盖；而且frame.write(nms_boxes)是错误的——nms_boxes是检测框列表，不是图像帧，你需要先在原帧上绘制检测框，再写入绘制后的帧。

修正后的完整`streaming`函数

def streaming(model, thresh, iou_thresh, anchors, ip_camera, outputFile):
    stream = cv2.VideoCapture(ip_camera)
    # 连接失败时的提示信息
    if not stream.isOpened():
        print('Not opened.')
        print('Please ensure the following:')
        print('1. DroidCam is not running in your browser.')
        print('2. The IP address given is correct.')
        return
    
    # 获取原视频的宽高和帧率
    orig_width = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH))
    orig_height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(stream.get(cv2.CAP_PROP_FPS)) or 30  # 如果获取不到帧率就用30
    
    # 初始化视频写入器（放在循环外面！避免反复创建）
    fourcc = cv2.VideoWriter_fourcc(*'MJPG')
    out_writer = cv2.VideoWriter(outputFile, fourcc, fps, (orig_width, orig_height))
    
    # YOLOv3的输入尺寸
    input_size = 416
    
    # 预处理anchors，只做一次
    anchors = torch.tensor(anchors).to(config.DEVICE)
    
    # 模型设置为评估模式，只做一次
    model.eval()
    
    # 开始处理视频流
    while stream.isOpened():
        ret, frame = stream.read()
        if not ret:
            break  # 读取失败或视频结束
        
        # 保存原帧用于绘制检测框
        orig_frame = frame.copy()
        
        # 预处理帧：调整尺寸、转换格式、添加batch维度
        resized_frame = cv2.resize(frame, (input_size, input_size), interpolation=cv2.INTER_AREA)
        # 从HWC转CHW，转成tensor并归一化
        img_tensor = torch.from_numpy(resized_frame.transpose((2, 0, 1))).float() / 255.0
        # 添加batch维度
        img_tensor = img_tensor.unsqueeze(0).to(config.DEVICE)
        
        # 模型推理（关闭梯度计算提升速度）
        with torch.no_grad():
            out = model(img_tensor)
            bboxes = [[] for _ in range(img_tensor.shape[0])]
            for i in range(len(out)):  # 遍历三个尺度的输出
                batch_size, A, S, _, _ = out[i].shape
                anchor = anchors[i]
                boxes_scale_i = cells_to_bboxes(
                    out[i], anchor, S=S, is_preds=True
                )
                for idx, box in enumerate(boxes_scale_i):
                    bboxes[idx] += box
        
        # NMS筛选最优检测框
        nms_boxes = non_max_suppression(
            bboxes[0], iou_threshold=iou_thresh, threshold=thresh, box_format="midpoint"
        )
        
        # 在原帧上绘制检测框（把输入尺寸的坐标映射回原帧）
        for box in nms_boxes:
            # 假设box格式是[x1, y1, x2, y2, score, class]，如果是midpoint格式请自行转换
            x1 = int(box[0] * orig_width / input_size)
            y1 = int(box[1] * orig_height / input_size)
            x2 = int(box[2] * orig_width / input_size)
            y2 = int(box[3] * orig_height / input_size)
            score = box[4]
            cls_idx = int(box[5])
            
            # 绘制矩形框和类别文本
            cv2.rectangle(orig_frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            text = f"{config.CLASSES[cls_idx]}: {score:.2f}"
            cv2.putText(orig_frame, text, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)
        
        # 显示检测画面
        cv2.imshow('YOLOv3 Real-Time Detection', orig_frame)
        # 写入视频文件
        out_writer.write(orig_frame)
        
        # 按q键退出
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    # 释放所有资源
    stream.release()
    out_writer.release()
    cv2.destroyAllWindows()