如何将OpenGL缓冲区直接传入FFmpeg实现NVENC H.264编码以避免CPU拷贝？

阿华AIGC实验室

2026-5-25

Great question—cutting out that GPU-to-CPU roundtrip is critical for keeping latency low and performance high when encoding OpenGL frames with NVENC. Let’s break down exactly how to eliminate those redundant copies entirely, using CUDA-OpenGL interop and FFmpeg’s NVENC support for direct GPU memory access:

Core Approach: CUDA-OpenGL Interop + NVENC Direct GPU Encoding

The key insight here is that NVENC runs on the NVIDIA GPU and can directly process data stored in CUDA memory. By creating a shared memory bridge between OpenGL and CUDA, we can feed OpenGL frame data straight to NVENC without ever copying it to CPU RAM.

1. Set Up CUDA-OpenGL Interop Context

First, you need to enable CUDA to access your OpenGL frame buffer or texture:

Initialize CUDA and bind to your OpenGL context: Call cuInit(0) to initialize CUDA, then use cuGLGetDevices to fetch the CUDA device associated with your current OpenGL context (make sure your GPU supports interop—Kepler architecture or newer).
Register your OpenGL resource with CUDA: Use cuGraphicsGLRegisterImage to register your OpenGL framebuffer object (FBO) or texture. Use the flag CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY since we only need to read frame data for encoding.
Map the resource for CUDA access: Before processing each frame, call cuGraphicsMapResources to make the OpenGL memory accessible to CUDA. Don’t forget to unmap it with cuGraphicsUnmapResources after encoding to avoid conflicts with OpenGL.

2. Prepare the Mapped CUDA Resource for NVENC

NVENC accepts CUDA memory pointers or array objects. Convert your mapped OpenGL resource into a format NVENC can use:

Extract the CUDA array from the mapped resource with cuGraphicsSubResourceGetMappedArray.
If your OpenGL frame uses RGBA (common for rendering) and NVENC needs YUV420, you can either:
- Use a lightweight CUDA kernel to convert RGBA to YUV directly in GPU memory, or
- Configure NVENC to handle the format conversion internally (set NV_ENC_INPUT_FORMAT_RGBA in the encoder parameters—this conversion happens on the GPU, no CPU involvement).

3. Configure FFmpeg’s NVENC Encoder for Direct GPU Input

FFmpeg’s NVENC encoder supports hardware frame input directly from CUDA memory. Here’s how to set it up:

Ensure FFmpeg is compiled with CUDA/NVENC support: Your FFmpeg build must include --enable-cuda --enable-nvenc --enable-cuvid flags to enable hardware acceleration features.

Create a CUDA hardware frame context: This tells FFmpeg to expect CUDA memory pointers. Example code snippet:

AVBufferRef *hw_frames_ctx = NULL;
AVHWFramesContext *frames_ctx;
AVHWDeviceContext *device_ctx;

// Allocate and initialize CUDA device context
device_ctx = av_hwdevice_ctx_alloc(AV_HWDEVICE_TYPE_CUDA);
// Associate with your CUDA device (use cuDeviceGet to get the device ID if needed)
av_hwdevice_ctx_init(device_ctx);

// Allocate hardware frame context
hw_frames_ctx = av_hwframe_ctx_alloc(device_ctx);
frames_ctx = (AVHWFramesContext*)hw_frames_ctx->data;
frames_ctx->format = AV_PIX_FMT_CUDA;
frames_ctx->sw_format = AV_PIX_FMT_RGBA; // Match your OpenGL frame format
frames_ctx->width = your_frame_width;
frames_ctx->height = your_frame_height;
av_hwframe_ctx_init(hw_frames_ctx);

Package the CUDA memory into an AVFrame: For each frame, create an AVFrame pointing directly to the mapped CUDA memory:

AVFrame *frame = av_frame_alloc();
frame->format = AV_PIX_FMT_CUDA;
frame->width = your_frame_width;
frame->height = your_frame_height;
frame->hw_frames_ctx = av_buffer_ref(hw_frames_ctx);

// Get the CUDA device pointer from your mapped array
CUarray cu_array;
cuGraphicsSubResourceGetMappedArray(&cu_array, your_registered_resource, 0, 0);
CUDA_ARRAY_DESCRIPTOR cu_array_desc;
cuArrayGetDescriptor(&cu_array_desc, cu_array);

frame->data[0] = (uint8_t*)cu_array_desc.pitchedPtr.ptr;
frame->linesize[0] = cu_array_desc.pitchedPtr.pitch;

Send the frame directly to FFmpeg: Call avcodec_send_frame(codec_ctx, frame)—the NVENC encoder will read straight from CUDA memory, no CPU copy required.

4. Critical Cleanup Steps

Always unmap the CUDA resource with cuGraphicsUnmapResources after encoding each frame to release access back to OpenGL.
When shutting down, unregister the OpenGL resource from CUDA with cuGraphicsUnregisterResource.
Properly free FFmpeg’s AVFrame, hardware frame context, and device context to avoid memory leaks.

Key Notes

Driver & Hardware Compatibility: Ensure you’re using a recent NVIDIA driver (450.x or newer) and a GPU that supports both CUDA and NVENC (most modern NVIDIA GPUs do).
Format Matching: Align your OpenGL frame format with NVENC’s supported input formats to minimize on-GPU conversion overhead.
Latency Optimization: For low-latency use cases, configure FFmpeg’s NVENC encoder with preset=llhp (low-latency high-performance) and rc-lookahead=0 to skip unnecessary buffering.

内容的提问来源于stack exchange，提问作者Ian A McElhenny