如何整合OpenCV环境下的目标分类与性别情绪识别模型？

阿华AIGC实验室

2026-5-28

Integrating Object Detection with Gender & Emotion Recognition Models

Got it, let's break down how to merge your two OpenCV-based pipelines into a single, cohesive workflow. The core idea is to use your MobileNet SSD model first to detect only people in the image, then run your gender/emotion recognition pipeline exclusively on those person regions—this saves computation and avoids unnecessary processing of non-human objects. Here's a step-by-step solution with full integrated code:

Key Integration Logic

Run Object Detection First: Use MobileNet SSD to spot all objects, then filter results to keep only the person class.
Focus on Person Regions: For each detected person, extract their bounding box from the image.
Detect Faces in Person Regions: Use the Haar cascade face detector to find faces within the person's area (more efficient than scanning the whole image).
Predict Gender & Emotion: Pass each detected face through your pre-trained gender and emotion models.
Draw All Results: Overlay object boxes, face boxes, and gender/emotion labels onto the original image.

Full Integrated Code

import numpy as np
import argparse
import cv2
from keras.models import load_model
import numpy as np
from utils.datasets import get_labels
from utils.inference import detect_faces
from utils.inference import draw_text
from utils.inference import draw_bounding_box
from utils.inference import apply_offsets
from utils.inference import load_detection_model
from utils.inference import load_image
from utils.preprocessor import preprocess_input

# ----------------------
# Argument Parsing (Unified for Both Pipelines)
# ----------------------
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to input image")
# MobileNet SSD parameters
ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak object detections")
# Gender/Emotion model paths
ap.add_argument("-fd", "--face-detector", default="../trained_models/detection_models/haarcascade_frontalface_default.xml", 
                help="path to Haar cascade face detector")
ap.add_argument("-em", "--emotion-model", default="../trained_models/emotion_models/fer2013_mini_XCEPTION.102-0.66.hdf5",
                help="path to pre-trained emotion model")
ap.add_argument("-gm", "--gender-model", default="../trained_models/gender_models/simple_CNN.81-0.96.hdf5",
                help="path to pre-trained gender model")
args = vars(ap.parse_args())

# ----------------------
# Load All Models Once
# ----------------------
# MobileNet SSD for general object detection
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

print("[INFO] loading object detection model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# Gender/Emotion recognition stack
print("[INFO] loading face detection and emotion/gender models...")
face_detection = load_detection_model(args["face_detector"])
emotion_classifier = load_model(args["emotion_model"], compile=False)
gender_classifier = load_model(args["gender_model"], compile=False)

emotion_labels = get_labels('fer2013')
gender_labels = get_labels('imdb')
font = cv2.FONT_HERSHEY_SIMPLEX

# Hyperparameters for face preprocessing (matching your original code)
gender_offsets = (10, 10)
emotion_offsets = (0, 0)
emotion_target_size = emotion_classifier.input_shape[1:3]
gender_target_size = gender_classifier.input_shape[1:3]

# ----------------------
# Process Input Image
# ----------------------
image = cv2.imread(args["image"])
(h, w) = image.shape[:2]
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Run object detection pipeline
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5)
print("[INFO] computing object detections...")
net.setInput(blob)
detections = net.forward()

# ----------------------
# Process Each Detection
# ----------------------
for i in np.arange(0, detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    
    # Filter weak detections and skip non-person objects
    if confidence > args["confidence"]:
        idx = int(detections[0, 0, i, 1])
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")
        
        # Draw label for non-person objects
        if CLASSES[idx] != "person":
            label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
            cv2.rectangle(image, (startX, startY), (endX, endY), COLORS[idx], 2)
            y = startY - 15 if startY - 15 > 15 else startY + 15
            cv2.putText(image, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
            continue
        
        # Draw person bounding box
        label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
        cv2.rectangle(image, (startX, startY), (endX, endY), COLORS[idx], 2)
        y = startY - 15 if startY - 15 > 15 else startY + 15
        cv2.putText(image, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
        
        # Extract person region for face detection
        person_gray = gray_image[startY:endY, startX:endX]
        person_rgb = rgb_image[startY:endY, startX:endX]
        
        # Detect faces within the person's region
        faces = detect_faces(face_detection, person_gray)
        for (face_x, face_y, face_w, face_h) in faces:
            # Adjust face coordinates to match original image
            face_x_global = startX + face_x
            face_y_global = startY + face_y
            face_endX = face_x_global + face_w
            face_endY = face_y_global + face_h
            
            # Preprocess face for emotion prediction
            gray_face = gray_image[face_y_global:face_endY, face_x_global:face_endX]
            gray_face = apply_offsets(gray_face, emotion_offsets)
            gray_face = preprocess_input(gray_face, True)
            gray_face = np.expand_dims(gray_face, 0)
            gray_face = np.expand_dims(gray_face, -1)
            emotion_prediction = emotion_classifier.predict(gray_face)
            emotion_text = emotion_labels[np.argmax(emotion_prediction)]
            
            # Preprocess face for gender prediction
            rgb_face = rgb_image[face_y_global:face_endY, face_x_global:face_endX]
            rgb_face = apply_offsets(rgb_face, gender_offsets)
            rgb_face = preprocess_input(rgb_face, False)
            rgb_face = np.expand_dims(rgb_face, 0)
            gender_prediction = gender_classifier.predict(rgb_face)
            gender_text = gender_labels[np.argmax(gender_prediction)]
            
            # Draw face box and labels
            draw_bounding_box((face_x_global, face_y_global, face_w, face_h), image, (255, 0, 0))
            # Position gender text above face box
            gender_y = face_y_global - 30 if face_y_global - 30 > 30 else face_y_global + 30
            draw_text((face_x_global, gender_y), gender_text, image, font, 0.5, (0, 255, 0), 1)
            # Position emotion text below face box
            emotion_y = face_endY + 20 if face_endY + 20 < h - 20 else face_endY - 20
            draw_text((face_x_global, emotion_y), emotion_text, image, font, 0.5, (0, 0, 255), 1)

# Show final combined output
cv2.imshow("Combined Detection Output", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Important Notes

Path Adjustments: Double-check that all model paths (MobileNet prototxt/model, Haar cascade, emotion/gender models) match your local file structure.
Confidence Tuning: Adjust the --confidence argument to filter out weaker object detections. You can also tweak Haar cascade parameters if face detection needs refinement.
Performance Boost: By limiting gender/emotion processing to person regions, you cut down on unnecessary computation compared to scanning the entire image.
Helper Files: Ensure the utils folder (with datasets.py, inference.py, etc.) from your original gender/emotion code is present in your project directory.

内容的提问来源于stack exchange，提问作者Sandhya SK