树莓派3实时人脸识别提速求助:并行编程优化需求
Hey there! I totally get the frustration of laggy face recognition on a Raspberry Pi 3—those limited CPU cores can turn a smooth real-time app into a stuttery mess. Let’s walk through practical fixes, including parallel processing, tailored to your face_recognition and dlib setup.
First: Quick Wins Before Parallel Processing
Before diving into threads/processes, let’s squeeze out some easy performance gains that can make a big difference:
1. Reduce Frame Resolution
Face recognition time scales directly with image size. Shrink your camera frames to something smaller (like 320x240 or 640x480) instead of using full HD. This cuts down the number of pixels dlib has to process drastically without losing enough detail for face detection.
2. Skip Frames
You don’t need to process every single frame. For example, process only every 2nd or 3rd frame—human eyes won’t notice the gap, but your CPU will thank you for the reduced workload.
3. Compile dlib with NEON Optimization
The default face_recognition package from PyPI uses a generic dlib build that doesn’t leverage the Raspberry Pi 3’s NEON SIMD instructions. Compiling dlib from source with NEON support can boost performance by 2-3x:
# Uninstall existing dlib and face_recognition first pip uninstall -y dlib face_recognition # Install required dependencies sudo apt-get install build-essential cmake libopenblas-dev liblapack-dev libx11-dev libgtk-3-dev python3-dev # Clone dlib and compile with NEON enabled git clone https://github.com/davisking/dlib.git cd dlib mkdir build && cd build cmake -DUSE_NEON=ON .. cmake --build . --config Release cd .. python3 setup.py install --yes USE_NEON # Reinstall face_recognition pip install face_recognition
Parallel Processing with Threading
The main bottleneck in your original code is likely that you’re capturing frames and running face recognition in the same thread—so the app waits for recognition to finish before grabbing the next frame. We’ll split this into two dedicated threads:
- Frame Capture Thread: Handles reading frames from the camera (an IO-bound task that spends a lot of time waiting on hardware) and feeds them into a thread-safe queue.
- Face Recognition Thread: Pulls frames from the queue and runs recognition (a CPU-bound task that uses all available processing power).
Here’s a modified, optimized version of your code (assuming you use OpenCV for camera access):
import cv2 import face_recognition import threading from queue import Queue import numpy as np # Configure performance settings FRAME_WIDTH = 320 FRAME_HEIGHT = 240 SKIP_FRAMES = 2 # Process every 2nd frame MAX_QUEUE_SIZE = 5 # Prevent queue overflow from slow processing # Initialize thread-safe queue and stop signal frame_queue = Queue(maxsize=MAX_QUEUE_SIZE) stop_event = threading.Event() def capture_frames(): """Thread to capture camera frames and add to queue""" video_capture = cv2.VideoCapture(0) video_capture.set(cv2.CAP_PROP_FRAME_WIDTH, FRAME_WIDTH) video_capture.set(cv2.CAP_PROP_FRAME_HEIGHT, FRAME_HEIGHT) frame_count = 0 while not stop_event.is_set(): ret, frame = video_capture.read() if not ret: break # Skip frames to reduce processing load frame_count += 1 if frame_count % SKIP_FRAMES != 0: continue # Resize and convert to RGB (face_recognition uses RGB) small_frame = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5) rgb_small_frame = small_frame[:, :, ::-1] # If queue is full, discard oldest frame to avoid backlog if frame_queue.full(): frame_queue.get() frame_queue.put((frame, rgb_small_frame)) video_capture.release() def process_faces(): """Thread to run face recognition on queued frames""" # Load your known face encodings ONCE at startup (don't reload every frame!) known_face_encodings = [] known_face_names = [] # Example setup: # person1_image = face_recognition.load_image_file("person1.jpg") # known_face_encodings.append(face_recognition.face_encodings(person1_image)[0]) # known_face_names.append("Person 1") while not stop_event.is_set() or not frame_queue.empty(): try: # Get frame from queue (wait 1 second if empty to avoid busy waiting) frame, rgb_small_frame = frame_queue.get(timeout=1) # Detect faces and generate encodings face_locations = face_recognition.face_locations(rgb_small_frame) face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations) # Match faces to known encodings face_names = [] for face_encoding in face_encodings: matches = face_recognition.compare_faces(known_face_encodings, face_encoding) name = "Unknown" # Use the closest match if available face_distances = face_recognition.face_distance(known_face_encodings, face_encoding) if len(face_distances) > 0: best_match_index = np.argmin(face_distances) if matches[best_match_index]: name = known_face_names[best_match_index] face_names.append(name) # Draw results on original frame (scale back locations since we resized) for (top, right, bottom, left), name in zip(face_locations, face_names): top *= 2 right *= 2 bottom *= 2 left *= 2 cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2) cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED) cv2.putText(frame, name, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255), 1) # Display the output cv2.imshow('Video', frame) # Exit on 'q' press if cv2.waitKey(1) & 0xFF == ord('q'): stop_event.set() break frame_queue.task_done() except Queue.Empty: continue cv2.destroyAllWindows() if __name__ == "__main__": # Start both threads capture_thread = threading.Thread(target=capture_frames) process_thread = threading.Thread(target=process_faces) capture_thread.start() process_thread.start() # Wait for threads to finish cleanly capture_thread.join() process_thread.join()
Key Details About This Code:
- Thread-Safe Queue: Ensures frames are passed between threads without race conditions, and limits queue size to prevent memory bloat if recognition can’t keep up with capture.
- Separated Tasks: The capture thread handles idle-waiting IO while the recognition thread uses CPU for processing, making better use of the Pi’s resources.
- One-Time Loading: Known face encodings are loaded once at startup, not every frame—this avoids redundant work that kills performance.
When to Try Multiprocessing
If threading still isn’t enough, you could switch to multiprocessing (using Python’s multiprocessing module instead of threading). Since dlib’s C code releases the Python GIL, multiprocessing can leverage all 4 cores of the Pi 3. However, threading is simpler and usually sufficient for most real-time face recognition use cases on the Pi 3.
内容的提问来源于stack exchange,提问作者rkoofoo




