RTX 3050 GPU运行DeepFace+Sort目标检测脚本时出现内存分配失败错误的求助

阿华AIGC实验室

2026-4-14

我是目标检测领域的新手，最近写了一个用DeepFace库识别人的性别和年龄、再用Sort库添加跟踪功能的脚本。这个脚本在CPU和Google Colab上都能正常运行，但在我的RTX 3050 GPU上运行时，会抛出以下错误：

Error processing frame: {{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2]

我已经尝试设置experimental.set_memory_growth为True，但还是会出现同样的错误。任务管理器显示GPU并没有被100%利用，而且奇怪的是，每当错误发生时，GPU使用率会直接降到0%。

请问是我的GPU性能不足以支撑这个量级的目标检测吗？有没有什么必须做的优化手段？

源代码片段

results = DeepFace.analyze(resized_frame, actions=['gender', 'age'], enforce_detection=True,
                           detector_backend='retinaface')

# Process each detected face
detections = []
for result in results:
    # Extract bounding box and convert to integers
    x = int(result['region']['x'])
    y = int(result['region']['y'])
    w = int(result['region']['w'])
    h = int(result['region']['h'])
    # confidence = float(result.get('face_confidence', 0.5))  # Default to 0.5 if missing

    # Add detection for SORT tracker
    if is_within_roi(x, y, w, h):
        confidence = float(result.get('face_confidence', 0.5))
        detections.append([x, y, x + w, y + h, confidence])

# Ensure detections array is not empty
if detections:
    # Update tracker with valid detections
    tracked_objects = tracker.update(np.array(detections))
else:
    tracked_objects = []

# Iterate through tracked objects
for track in tracked_objects:
    track_id = int(track[4])  # Unique ID assigned by tracker
    x1, y1, x2, y2 = map(int, track[:4])

    # Match tracker object with DeepFace result (basic IOU/position check)
    for result in results:
        rx, ry, rw, rh = map(int, [result['region']['x'], result['region']['y'], result['region']['w'],
                                   result['region']['h']])
        if abs(x1 - rx) < 20 and abs(y1 - ry) < 20:  # Adjust threshold if necessary
            # Retrieve gender confidence scores
            male_confidence = result['gender']['Man']
            female_confidence = result['gender']['Woman']

            # Retrieve age from DeepFace result
            age = int(result.get('age', -1))  # Default to -1 if age is missing
            age_range = get_age_range(age)
            gender = "Male" if male_confidence > 99.8 else "Female" if female_confidence > 60 else "Unknown"


            if track_id not in tracking_data:
                tracking_data[track_id] = {
                    "Ages": [],
                    "Genders": set(),
                    "Entry Time": formatted_time,
                    "Exit Time": None
                }
            tracking_data[track_id]["Ages"].append(age)
            tracking_data[track_id]["Genders"].add(gender)

            if is_within_roi(x1, y1, x2 - x1, y2 - y1):
                tracking_data[track_id]["Exit Time"] = formatted_time

            if gender != "Unknown":
                label = f"{gender}, Age: {age_range}"
                color = (0, 255, 0)
                draw_label(resized_frame, x1, y1, x2 - x1, y2 - y1, label, color)

            if gender == "Male":
                global_male_count.add(track_id)
                male_ages.append(age)
            elif gender == "Female":
                global_female_count.add(track_id)
                female_ages.append(age)

            # Draw the label on the frame
            draw_label(resized_frame, x1, y1, x2 - x1, y2 - y1, label, color)

可能的优化方案和排查方向

先别担心，RTX 3050其实完全能支撑这类人脸检测+属性分析的任务，大概率是代码或环境配置的优化点没做到位，给你几个具体的方向试试：

调整DeepFace的推理参数
- 换用更轻量的检测器后端：你当前用的retinaface精度高但显存占用大，可以试试mtcnn或ssd，这两个在GPU上的显存开销会小很多。
- 开启自动混合精度推理：在脚本开头添加以下代码，让TensorFlow自动用半精度（FP16）运行模型，大幅降低显存占用：
```
from tensorflow.keras.mixed_precision import set_global_policy
set_global_policy('mixed_float16')
```
- 进一步缩小输入帧尺寸：比如把resized_frame降到640x480，人脸检测对小尺寸的鲁棒性不错，能有效减少每帧的计算量和显存需求。

显存管理细节优化

确保显存增长配置在DeepFace调用前执行：

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)

每帧处理后手动清理显存缓存：避免显存占用累积导致突然溢出
```
tf.keras.backend.clear_session()
import gc
gc.collect()
```

代码逻辑优化
- 优化跟踪匹配的双重循环：当前的tracked_objects和results双重循环会增加计算量，可提前把results的区域信息存入字典，用位置快速匹配，或者改用IOU计算的方式减少重复遍历。
- 避免重复绘制标签：你当前在两个分支里调用了draw_label，可以合并成一次调用，减少冗余操作。
环境兼容性排查
- 检查CUDA、cuDNN与TensorFlow版本是否匹配：RTX 3050需要CUDA 11.2及以上版本，版本不兼容会导致奇怪的显存分配错误。
- 用nvidia-smi命令查看后台是否有其他进程占用GPU显存，比如未关闭的AI模型、游戏等，确保显存资源充足。