调用Hugging Face API失败，请求排查代码问题并提供修复方案

阿华AIGC实验室

2026-4-14

看起来你遇到的核心问题是Llama-3.3 Instruct模型的输入格式不匹配，导致API返回400 Bad Request错误。我帮你一步步分析并修复代码：

错误原因分析

Llama这类经过指令微调的对话模型，必须严格遵循官方要求的对话模板格式来构造输入，不能直接传入纯文本问题。另外，你的请求payload缺少一些模型需要的关键参数，这也是触发400错误的可能原因。

修复后的完整代码

import requests

# 修正后的API URL（必须包含模型的完整命名空间）
API_URL = "https://api-inference.huggingface.co/models/meta-llama/Llama-3.3-70B-Instruct"
API_TOKEN = "hf_********************************"

headers = {
    "Authorization": f"Bearer {API_TOKEN}"
}

# 按照Llama-3.3 Instruct的要求构造对话格式的输入
# 模板参考模型官方说明：需要包含system、user的角色标识和分隔符
prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, honest and concise assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
What is the biggest animal?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

payload = {
    "inputs": prompt,
    # 补充生成参数，让响应更可控
    "parameters": {
        "max_new_tokens": 100,
        "temperature": 0.7,
        "top_p": 0.9
    }
}

def query_huggingface_model(api_url, headers, payload):
    try:
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()  # 触发HTTP错误提示
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None

# 发起请求并处理响应
response = query_huggingface_model(API_URL, headers, payload)
if response:
    # 提取并格式化响应内容（Llama的返回会包含生成的文本片段）
    if isinstance(response, list) and len(response) > 0:
        print("Model Response:", response[0].get("generated_text", "").split("<|start_header_id|>assistant<|end_header_id|>")[-1].strip())
    else:
        print("Model Response:", response)
else:
    print("Failed to get a response from the model.")

关键修复点说明

输入格式修正：
给纯文本问题套上Llama-3.3要求的对话模板，包含系统提示、用户问题的角色标识和官方规定的分隔符（比如<|eot_id|>表示对话结束），这是解决400错误的核心。
补充生成参数：
在payload中加入parameters字段，设置max_new_tokens（控制响应长度）、temperature（控制生成随机性）等参数，让模型的输出更符合预期，也避免因参数缺失导致的API校验失败。
响应处理优化：
新增了对响应内容的提取逻辑，直接取出助手生成的部分，避免输出大量模板冗余内容。