如何整合OpenCV环境下的目标分类与性别情绪识别模型?
Integrating Object Detection with Gender & Emotion Recognition Models
Got it, let's break down how to merge your two OpenCV-based pipelines into a single, cohesive workflow. The core idea is to use your MobileNet SSD model first to detect only people in the image, then run your gender/emotion recognition pipeline exclusively on those person regions—this saves computation and avoids unnecessary processing of non-human objects. Here's a step-by-step solution with full integrated code:
Key Integration Logic
- Run Object Detection First: Use MobileNet SSD to spot all objects, then filter results to keep only the
personclass. - Focus on Person Regions: For each detected person, extract their bounding box from the image.
- Detect Faces in Person Regions: Use the Haar cascade face detector to find faces within the person's area (more efficient than scanning the whole image).
- Predict Gender & Emotion: Pass each detected face through your pre-trained gender and emotion models.
- Draw All Results: Overlay object boxes, face boxes, and gender/emotion labels onto the original image.
Full Integrated Code
import numpy as np import argparse import cv2 from keras.models import load_model import numpy as np from utils.datasets import get_labels from utils.inference import detect_faces from utils.inference import draw_text from utils.inference import draw_bounding_box from utils.inference import apply_offsets from utils.inference import load_detection_model from utils.inference import load_image from utils.preprocessor import preprocess_input # ---------------------- # Argument Parsing (Unified for Both Pipelines) # ---------------------- ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image") # MobileNet SSD parameters ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak object detections") # Gender/Emotion model paths ap.add_argument("-fd", "--face-detector", default="../trained_models/detection_models/haarcascade_frontalface_default.xml", help="path to Haar cascade face detector") ap.add_argument("-em", "--emotion-model", default="../trained_models/emotion_models/fer2013_mini_XCEPTION.102-0.66.hdf5", help="path to pre-trained emotion model") ap.add_argument("-gm", "--gender-model", default="../trained_models/gender_models/simple_CNN.81-0.96.hdf5", help="path to pre-trained gender model") args = vars(ap.parse_args()) # ---------------------- # Load All Models Once # ---------------------- # MobileNet SSD for general object detection CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3)) print("[INFO] loading object detection model...") net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"]) # Gender/Emotion recognition stack print("[INFO] loading face detection and emotion/gender models...") face_detection = load_detection_model(args["face_detector"]) emotion_classifier = load_model(args["emotion_model"], compile=False) gender_classifier = load_model(args["gender_model"], compile=False) emotion_labels = get_labels('fer2013') gender_labels = get_labels('imdb') font = cv2.FONT_HERSHEY_SIMPLEX # Hyperparameters for face preprocessing (matching your original code) gender_offsets = (10, 10) emotion_offsets = (0, 0) emotion_target_size = emotion_classifier.input_shape[1:3] gender_target_size = gender_classifier.input_shape[1:3] # ---------------------- # Process Input Image # ---------------------- image = cv2.imread(args["image"]) (h, w) = image.shape[:2] rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Run object detection pipeline blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5) print("[INFO] computing object detections...") net.setInput(blob) detections = net.forward() # ---------------------- # Process Each Detection # ---------------------- for i in np.arange(0, detections.shape[2]): confidence = detections[0, 0, i, 2] # Filter weak detections and skip non-person objects if confidence > args["confidence"]: idx = int(detections[0, 0, i, 1]) box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) (startX, startY, endX, endY) = box.astype("int") # Draw label for non-person objects if CLASSES[idx] != "person": label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100) cv2.rectangle(image, (startX, startY), (endX, endY), COLORS[idx], 2) y = startY - 15 if startY - 15 > 15 else startY + 15 cv2.putText(image, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2) continue # Draw person bounding box label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100) cv2.rectangle(image, (startX, startY), (endX, endY), COLORS[idx], 2) y = startY - 15 if startY - 15 > 15 else startY + 15 cv2.putText(image, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2) # Extract person region for face detection person_gray = gray_image[startY:endY, startX:endX] person_rgb = rgb_image[startY:endY, startX:endX] # Detect faces within the person's region faces = detect_faces(face_detection, person_gray) for (face_x, face_y, face_w, face_h) in faces: # Adjust face coordinates to match original image face_x_global = startX + face_x face_y_global = startY + face_y face_endX = face_x_global + face_w face_endY = face_y_global + face_h # Preprocess face for emotion prediction gray_face = gray_image[face_y_global:face_endY, face_x_global:face_endX] gray_face = apply_offsets(gray_face, emotion_offsets) gray_face = preprocess_input(gray_face, True) gray_face = np.expand_dims(gray_face, 0) gray_face = np.expand_dims(gray_face, -1) emotion_prediction = emotion_classifier.predict(gray_face) emotion_text = emotion_labels[np.argmax(emotion_prediction)] # Preprocess face for gender prediction rgb_face = rgb_image[face_y_global:face_endY, face_x_global:face_endX] rgb_face = apply_offsets(rgb_face, gender_offsets) rgb_face = preprocess_input(rgb_face, False) rgb_face = np.expand_dims(rgb_face, 0) gender_prediction = gender_classifier.predict(rgb_face) gender_text = gender_labels[np.argmax(gender_prediction)] # Draw face box and labels draw_bounding_box((face_x_global, face_y_global, face_w, face_h), image, (255, 0, 0)) # Position gender text above face box gender_y = face_y_global - 30 if face_y_global - 30 > 30 else face_y_global + 30 draw_text((face_x_global, gender_y), gender_text, image, font, 0.5, (0, 255, 0), 1) # Position emotion text below face box emotion_y = face_endY + 20 if face_endY + 20 < h - 20 else face_endY - 20 draw_text((face_x_global, emotion_y), emotion_text, image, font, 0.5, (0, 0, 255), 1) # Show final combined output cv2.imshow("Combined Detection Output", image) cv2.waitKey(0) cv2.destroyAllWindows()
Important Notes
- Path Adjustments: Double-check that all model paths (MobileNet prototxt/model, Haar cascade, emotion/gender models) match your local file structure.
- Confidence Tuning: Adjust the
--confidenceargument to filter out weaker object detections. You can also tweak Haar cascade parameters if face detection needs refinement. - Performance Boost: By limiting gender/emotion processing to person regions, you cut down on unnecessary computation compared to scanning the entire image.
- Helper Files: Ensure the
utilsfolder (withdatasets.py,inference.py, etc.) from your original gender/emotion code is present in your project directory.
内容的提问来源于stack exchange,提问作者Sandhya SK




