You need to enable JavaScript to run this app.
优惠活动
大模型
产品
解决方案
定价
更多
文档控制台
免费开始使用

使用Metal Tier 2参数缓冲区时出现异常行为求助

问题原因分析
  • 参数缓冲区编码错误:直接将buffer1.gpuAddressbuffer2.gpuAddress以裸64位整数形式存入参数缓冲区,违反了Metal参数缓冲区的规范。Metal中的指针类型(如constant float*device float*)并非单纯内存地址,而是包含地址空间、资源类型等元数据的抽象引用,必须通过MTLArgumentEncoder序列化后,才能被着色器正确解析。
  • 未定义行为扩散:着色器读取错误编码的args.output时触发未定义行为,不仅导致buffer2写入失败,还破坏了后续内存访问逻辑——这就是注释args.output[index] = input[index];后,buffer4也无法正常写入的原因。
  • 地址空间修改无效:修改kernel_args的地址空间无法解决问题,核心矛盾是参数缓冲区的填充方式错误,而非地址空间声明不匹配。
解决方案

使用MTLArgumentEncoder正确编码参数缓冲区,步骤如下:

1. 修改主机端代码(替换参数缓冲区创建逻辑)

移除手动创建buffer3的代码,改用MTLArgumentEncoder生成合法的参数缓冲区:

// 原手动创建buffer3的代码替换为:
// 从函数获取参数信息,创建参数编码器
id<MTLArgumentEncoder> argEncoder = [function newArgumentEncoderWithBufferIndex:0];
// 分配参数缓冲区,大小由编码器计算
id<MTLBuffer> buffer3 = [device newBufferWithLength:argEncoder.encodedLength
                                            options:MTLResourceStorageModeShared |
                                                    MTLResourceCPUCacheModeWriteCombined];
// 锁定参数缓冲区,准备编码
[argEncoder setArgumentBuffer:buffer3 offset:0];
// 将buffer1绑定到kernel_args的input字段(索引0)
[argEncoder setBuffer:buffer1 offset:0 atIndex:0];
// 将buffer2绑定到kernel_args的output字段(索引1)
[argEncoder setBuffer:buffer2 offset:0 atIndex:1];

2. 完整修改后的主机端代码

#import <Metal/Metal.h>
#include <iostream>
#include <random>

int main(int argc, const char * argv[]) {
    std::vector<float> temp = std::vector<float> (1024*100, 5.0);

    id<MTLDevice> device = [MTLCopyAllDevices() objectAtIndex:0];
    id<MTLCommandQueue> queue = [device newCommandQueue];

    id<MTLLibrary> library = [device newDefaultLibrary];
    id<MTLFunction> function = [library newFunctionWithName:@"threadgroup_test"];

    MTLComputePipelineDescriptor *compute = [MTLComputePipelineDescriptor new];
    compute.threadGroupSizeIsMultipleOfThreadExecutionWidth = YES;
    compute.computeFunction = function;
    compute.maxTotalThreadsPerThreadgroup = 1024;
    compute.buffers[0].mutability = MTLMutabilityImmutable;
    compute.buffers[1].mutability = MTLMutabilityImmutable;
    compute.buffers[2].mutability = MTLMutabilityMutable;

    NSError *error;
    id<MTLComputePipelineState> state = [device newComputePipelineStateWithDescriptor:compute
                                                                              options:MTLPipelineOptionNone
                                                                           reflection:NULL
                                                                                error:&error];
    if (error) {
        NSLog(@"%@", error);
    }
    id<MTLBuffer> buffer1 = [device newBufferWithBytes:temp.data()
                                                length:temp.size()*sizeof(float)
                                               options:MTLResourceStorageModeShared |
                                                       MTLResourceCPUCacheModeWriteCombined];
    id<MTLBuffer> buffer2 = [device newBufferWithLength:temp.size()*sizeof(float)
                                                options:MTLResourceStorageModeShared];

    // --- 修改后的参数缓冲区创建逻辑 ---
    id<MTLArgumentEncoder> argEncoder = [function newArgumentEncoderWithBufferIndex:0];
    id<MTLBuffer> buffer3 = [device newBufferWithLength:argEncoder.encodedLength
                                                options:MTLResourceStorageModeShared |
                                                        MTLResourceCPUCacheModeWriteCombined];
    [argEncoder setArgumentBuffer:buffer3 offset:0];
    [argEncoder setBuffer:buffer1 offset:0 atIndex:0];
    [argEncoder setBuffer:buffer2 offset:0 atIndex:1];
    // --- 修改结束 ---

    id<MTLBuffer> buffer4 = [device newBufferWithLength:temp.size()*sizeof(float)
                                                options:MTLResourceStorageModeShared];

    NSUInteger threads_per_group = state.maxTotalThreadsPerThreadgroup;
    NSUInteger thread_groups = temp.size()/threads_per_group;
    
    id<MTLCommandBuffer> command_buffer = [queue commandBuffer];

    id<MTLComputeCommandEncoder> encoder = [command_buffer computeCommandEncoderWithDispatchType:MTLDispatchTypeSerial];
        
    [encoder setComputePipelineState:state];
    [encoder setBuffer:buffer3 offset:0 atIndex:0];
    [encoder setBuffer:buffer1 offset:0 atIndex:1];
    [encoder setBuffer:buffer4 offset:0 atIndex:2];

    [encoder dispatchThreadgroups:MTLSizeMake(thread_groups, 1, 1)
            threadsPerThreadgroup:MTLSizeMake(threads_per_group, 1, 1)];
    [encoder endEncoding];

    [command_buffer commit];
    [command_buffer waitUntilCompleted];

    const float *data1 = static_cast<float *> (buffer2.contents);
    const float *data2 = static_cast<float *> (buffer4.contents);
    for (size_t i = 0, ie = temp.size(); i < ie; i++) {
        std::cout << i << " " << temp[i] << " " << *(data1 + i) << " " << *(data2 + i) << std::endl;
    }

    return 0;
}

3. 着色器代码无需修改(保持原版本即可)

#include <metal_stdlib>
using namespace metal;

struct kernel_args {
    constant float *input;
    device float *output;
};

kernel void threadgroup_test(constant kernel_args &args [[buffer(0)]],
                             constant float *input [[buffer(1)]],
                             device float *output [[buffer(2)]],
                             uint index [[thread_position_in_grid]]) {
    args.output[index] = input[index];
    output[index] = args.input[index];
}
原理说明

MTLArgumentEncoder会根据Metal参数规范,将资源引用序列化为着色器可识别的格式,其中包含资源的地址空间、类型、GPU地址等必要信息,而非单纯的裸地址。这样着色器就能正确解析参数缓冲区中的指针,完成预期的内存读写操作。

内容的提问来源于stack exchange,提问作者user1139069

火山引擎 最新活动