切换face_recognition至CNN模型后出现TypeError兼容问题求助

阿华AIGC实验室

2026-5-9

Got it, let's break down why you're hitting this TypeError when switching to the CNN face detector in the face_recognition library, and fix it step by step.

What's Causing the Error?

When you use the default HOG detector, face_recognition.face_locations() returns simple coordinate tuples (top, right, bottom, left). But when you switch to the CNN model, it returns _dlib_pybind11.mmod_rectangle objects instead.

The problem is that the underlying dlib shape predictor (used by face_encodings()) expects a standard dlib.rectangle object, not an mmod_rectangle. That's exactly what the error message is telling you—you're passing the wrong rectangle type to the shape predictor.

The Fix

We just need to convert those mmod_rectangle objects into a format that face_encodings() can work with. There are two straightforward ways to do this: either extract the inner dlib.rectangle from each mmod_rectangle, or pull out the raw coordinate values directly.

Here's the updated code that works with the CNN model:

import cv2
import face_recognition
import dlib  # We'll need this to handle the rectangle type conversion

cap = cv2.VideoCapture(0)
Nichapa_im1 = face_recognition.load_image_file("C:\\Users\\ACER\\Desktop\\facetest\\data01\\Nichapa04.jpg")
A_encoding1 = face_recognition.face_encodings(Nichapa_im1)[0]
person_face_encodings = [A_encoding1]
person_face_names = ["NICHAPA"]
data_locations = []
data_encodings = []
data_names = []
frameProcess = True

while True:
    ret, frame = cap.read()
    resizing = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
    rgb_resizing = resizing[:, :, ::-1]
    
    if frameProcess:
        # Explicitly use the CNN model for face detection
        data_locations = face_recognition.face_locations(rgb_resizing, model="cnn")
        
        # Convert mmod_rectangle objects to compatible format
        converted_locations = []
        for loc in data_locations:
            # Each mmod_rectangle has a .rect attribute that's the standard dlib rectangle
            converted_locations.append(loc.rect)
        
        # Alternatively, you can extract raw coordinates (face_encodings accepts tuples too):
        # converted_locations = [(loc.rect.top(), loc.rect.right(), loc.rect.bottom(), loc.rect.left()) for loc in data_locations]
        
        # Now pass the converted locations to face_encodings
        data_encodings = face_recognition.face_encodings(rgb_resizing, converted_locations)
        
        data_names = []
        for dc in data_encodings:
            matches = face_recognition.compare_faces(person_face_encodings, dc)
            name = "UNKNOWN"
            if True in matches:
                first_match_index = matches.index(True)
                name = person_face_names[first_match_index]
            data_names.append(name)
    
    frameProcess = not frameProcess
    
    # Adjust drawing logic to work with mmod_rectangle objects
    for loc, name in zip(data_locations, data_names):
        # Extract coordinates from the .rect attribute
        top = loc.rect.top() * 4
        right = loc.rect.right() * 4
        bottom = loc.rect.bottom() * 4
        left = loc.rect.left() * 4
        
        cv2.rectangle(frame, (left, top), (right, bottom), (255, 0, 0), 2)
        cv2.rectangle(frame, (left, bottom - 20), (right, bottom), (255, 0, 0), cv2.FILLED)
        font = cv2.FONT_HERSHEY_SIMPLEX
        cv2.putText(frame, name, (left + 6, bottom - 6), font, 0.7, (255, 255, 255), 2)
    
    cv2.imshow('frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('x'):
        break

cap.release()
cv2.destroyAllWindows()

Key Changes Explained

Explicit CNN Model: We added model="cnn" to face_locations() to make sure we're using the CNN detector (you might have already done this in your api.py, but it's good to make it explicit here).
Rectangle Conversion: We loop through each detected location and extract the inner .rect attribute (a standard dlib.rectangle) to pass to face_encodings().
Drawing Adjustment: Since data_locations now holds mmod_rectangle objects, we can't treat them like tuples anymore—we have to pull the coordinates from their .rect attribute before scaling and drawing.

Quick Note

Keep in mind that CNN face detection is more accurate than HOG, but it's also slower. If you're running this on a CPU, you might notice lag. For real-time use, consider using a GPU with dlib's CUDA support.

内容的提问来源于stack exchange，提问作者Bbest Nichapa