树莓派3实时人脸识别提速求助：并行编程优化需求

阿华AIGC实验室

2026-5-20

Fixing Slow Real-Time Face Recognition on Raspberry Pi 3 with Parallel Processing

Hey there! I totally get the frustration of laggy face recognition on a Raspberry Pi 3—those limited CPU cores can turn a smooth real-time app into a stuttery mess. Let’s walk through practical fixes, including parallel processing, tailored to your face_recognition and dlib setup.

First: Quick Wins Before Parallel Processing

Before diving into threads/processes, let’s squeeze out some easy performance gains that can make a big difference:

1. Reduce Frame Resolution

Face recognition time scales directly with image size. Shrink your camera frames to something smaller (like 320x240 or 640x480) instead of using full HD. This cuts down the number of pixels dlib has to process drastically without losing enough detail for face detection.

2. Skip Frames

You don’t need to process every single frame. For example, process only every 2nd or 3rd frame—human eyes won’t notice the gap, but your CPU will thank you for the reduced workload.

3. Compile dlib with NEON Optimization

The default face_recognition package from PyPI uses a generic dlib build that doesn’t leverage the Raspberry Pi 3’s NEON SIMD instructions. Compiling dlib from source with NEON support can boost performance by 2-3x:

# Uninstall existing dlib and face_recognition first
pip uninstall -y dlib face_recognition

# Install required dependencies
sudo apt-get install build-essential cmake libopenblas-dev liblapack-dev libx11-dev libgtk-3-dev python3-dev

# Clone dlib and compile with NEON enabled
git clone https://github.com/davisking/dlib.git
cd dlib
mkdir build && cd build
cmake -DUSE_NEON=ON ..
cmake --build . --config Release
cd ..
python3 setup.py install --yes USE_NEON

# Reinstall face_recognition
pip install face_recognition

Parallel Processing with Threading

The main bottleneck in your original code is likely that you’re capturing frames and running face recognition in the same thread—so the app waits for recognition to finish before grabbing the next frame. We’ll split this into two dedicated threads:

Frame Capture Thread: Handles reading frames from the camera (an IO-bound task that spends a lot of time waiting on hardware) and feeds them into a thread-safe queue.
Face Recognition Thread: Pulls frames from the queue and runs recognition (a CPU-bound task that uses all available processing power).

Here’s a modified, optimized version of your code (assuming you use OpenCV for camera access):

import cv2
import face_recognition
import threading
from queue import Queue
import numpy as np

# Configure performance settings
FRAME_WIDTH = 320
FRAME_HEIGHT = 240
SKIP_FRAMES = 2  # Process every 2nd frame
MAX_QUEUE_SIZE = 5  # Prevent queue overflow from slow processing

# Initialize thread-safe queue and stop signal
frame_queue = Queue(maxsize=MAX_QUEUE_SIZE)
stop_event = threading.Event()

def capture_frames():
    """Thread to capture camera frames and add to queue"""
    video_capture = cv2.VideoCapture(0)
    video_capture.set(cv2.CAP_PROP_FRAME_WIDTH, FRAME_WIDTH)
    video_capture.set(cv2.CAP_PROP_FRAME_HEIGHT, FRAME_HEIGHT)
    
    frame_count = 0
    while not stop_event.is_set():
        ret, frame = video_capture.read()
        if not ret:
            break
        
        # Skip frames to reduce processing load
        frame_count += 1
        if frame_count % SKIP_FRAMES != 0:
            continue
        
        # Resize and convert to RGB (face_recognition uses RGB)
        small_frame = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5)
        rgb_small_frame = small_frame[:, :, ::-1]
        
        # If queue is full, discard oldest frame to avoid backlog
        if frame_queue.full():
            frame_queue.get()
        frame_queue.put((frame, rgb_small_frame))
    
    video_capture.release()

def process_faces():
    """Thread to run face recognition on queued frames"""
    # Load your known face encodings ONCE at startup (don't reload every frame!)
    known_face_encodings = []
    known_face_names = []
    # Example setup:
    # person1_image = face_recognition.load_image_file("person1.jpg")
    # known_face_encodings.append(face_recognition.face_encodings(person1_image)[0])
    # known_face_names.append("Person 1")
    
    while not stop_event.is_set() or not frame_queue.empty():
        try:
            # Get frame from queue (wait 1 second if empty to avoid busy waiting)
            frame, rgb_small_frame = frame_queue.get(timeout=1)
            
            # Detect faces and generate encodings
            face_locations = face_recognition.face_locations(rgb_small_frame)
            face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)
            
            # Match faces to known encodings
            face_names = []
            for face_encoding in face_encodings:
                matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
                name = "Unknown"
                # Use the closest match if available
                face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
                if len(face_distances) > 0:
                    best_match_index = np.argmin(face_distances)
                    if matches[best_match_index]:
                        name = known_face_names[best_match_index]
                face_names.append(name)
            
            # Draw results on original frame (scale back locations since we resized)
            for (top, right, bottom, left), name in zip(face_locations, face_names):
                top *= 2
                right *= 2
                bottom *= 2
                left *= 2
                
                cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)
                cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)
                cv2.putText(frame, name, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255), 1)
            
            # Display the output
            cv2.imshow('Video', frame)
            
            # Exit on 'q' press
            if cv2.waitKey(1) & 0xFF == ord('q'):
                stop_event.set()
                break
            
            frame_queue.task_done()
        except Queue.Empty:
            continue
    
    cv2.destroyAllWindows()

if __name__ == "__main__":
    # Start both threads
    capture_thread = threading.Thread(target=capture_frames)
    process_thread = threading.Thread(target=process_faces)
    
    capture_thread.start()
    process_thread.start()
    
    # Wait for threads to finish cleanly
    capture_thread.join()
    process_thread.join()

Key Details About This Code:

Thread-Safe Queue: Ensures frames are passed between threads without race conditions, and limits queue size to prevent memory bloat if recognition can’t keep up with capture.
Separated Tasks: The capture thread handles idle-waiting IO while the recognition thread uses CPU for processing, making better use of the Pi’s resources.
One-Time Loading: Known face encodings are loaded once at startup, not every frame—this avoids redundant work that kills performance.

When to Try Multiprocessing

If threading still isn’t enough, you could switch to multiprocessing (using Python’s multiprocessing module instead of threading). Since dlib’s C code releases the Python GIL, multiprocessing can leverage all 4 cores of the Pi 3. However, threading is simpler and usually sufficient for most real-time face recognition use cases on the Pi 3.

内容的提问来源于stack exchange，提问作者rkoofoo