Node.js调用OpenRouter/OpenAI API时，max_tokens截断响应引发JSON解析SyntaxError的问题求助

阿华AIGC实验室

2026-3-30

最近在Express服务里通过OpenRouter调用LLM模型（用法和OpenAI API基本一致），要求返回结构化JSON响应，但遇到了个头疼的问题：当模型输出超过设置的max_tokens阈值时，响应会被中途截断，直接导致JSON.parse()抛出SyntaxError: Unexpected end of JSON input。

这个问题完全没规律——有时候JSON是完整的，有时候会在字符串、键名甚至数组元素的中间被截断，全看输出长度和token预算够不够匹配。

最小复现代码

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'openai/gpt-4o-mini',
    max_tokens: 600,
    messages: [
      { role: 'system', content: 'Return valid JSON only. No markdown.' },
      { role: 'user', content: 'Analyze this URL and return a JSON object with score, category_grades (object), and recommendations (array of 10 objects with title and description).' }
    ],
  }),
});

const data = await response.json();
const content = data.choices[0].message.content;

// 当max_tokens截断输出时，这行代码会抛出错误
const parsed = JSON.parse(content);

截断响应的实际示例

被截断的响应字符串可能是这样（中途切断值）：

{
  "score": 62,
  "category_grades": {
    "content": "B",
    "schema": "D",
    "headings": "C"
  },
  "recommendations": [
    {"title": "Add FAQ schema", "description": "Implement FAQPage structured data to improve ex

也可能是这样（中途切断键名）：

{
  "score": 74,
  "category_grades": {"content": "B", "schema": "C", "headings": "B-", "meta

更糟的是，有时候模型会忽略“不要用markdown”的要求，把JSON用json 围栏包裹，这也会直接导致解析失败。

可行解决方案建议

1. 强化系统提示，强制模型优先保证JSON完整性

在系统提示里明确要求模型，如果token不足，优先保证JSON结构完整，比如适当缩减输出内容（减少推荐数量、精简描述），而不是输出截断的JSON：

{
  role: 'system',
  content: '必须返回**结构完整的有效JSON**，绝对不能输出截断内容。如果token预算不足以生成10条推荐，可以减少推荐数量；如果描述太长，可以适当精简。禁止使用markdown格式或任何额外文本。'
}

2. 用工具库修复截断的JSON

可以使用jsonrepair库自动修复不完整的JSON，它能处理大部分常见的截断场景（补全缺失的括号、引号，闭合未结束的对象/数组等）：
首先安装依赖：

npm install jsonrepair

然后在代码中使用：

import { jsonrepair } from 'jsonrepair';

try {
  // 先清理可能的markdown围栏
  const cleanedContent = content.replace(/^```json\s*|\s*```$/g, '');
  const parsed = JSON.parse(cleanedContent);
} catch (parseError) {
  try {
    const cleanedContent = content.replace(/^```json\s*|\s*```$/g, '');
    const repairedContent = jsonrepair(cleanedContent);
    const parsed = JSON.parse(repairedContent);
  } catch (repairError) {
    console.error('修复JSON失败:', repairError);
    // 这里可以加入重试逻辑
  }
}

3. 使用函数调用强制结构化输出

OpenAI和多数LLM支持函数调用功能，通过定义输出Schema，模型会严格按照结构返回JSON，API层面也会更保证输出的规范性：

body: JSON.stringify({
  model: 'openai/gpt-4o-mini',
  max_tokens: 600,
  messages: [
    { role: 'user', content: 'Analyze this URL and return the required data.' }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'analyze_url',
        parameters: {
          type: 'object',
          properties: {
            score: { type: 'integer' },
            category_grades: {
              type: 'object',
              properties: {
                content: { type: 'string' },
                schema: { type: 'string' },
                headings: { type: 'string' }
              },
              required: ['content', 'schema', 'headings']
            },
            recommendations: {
              type: 'array',
              items: {
                type: 'object',
                properties: {
                  title: { type: 'string' },
                  description: { type: 'string' }
                },
                required: ['title', 'description']
              }
            }
          },
          required: ['score', 'category_grades', 'recommendations']
        }
      }
    }
  ],
  tool_choice: { type: 'function', function: { name: 'analyze_url' } }
})

4. 动态计算并调整max_tokens

用tiktoken库先计算prompt的token数，再根据模型的最大上下文限制，动态设置合理的max_tokens：

npm install tiktoken

然后在代码中计算：

import { encoding_for_model } from 'tiktoken';

const enc = encoding_for_model('gpt-4o-mini');
const prompt = JSON.stringify(messages);
const promptTokens = enc.encode(prompt).length;
// gpt-4o-mini最大上下文是128k，预留足够的token给输出
const maxOutputTokens = 128000 - promptTokens;

// 最终设置max_tokens为计算后的值（可以适当留冗余）

5. 解析失败时自动重试

当JSON解析失败时，重新调用模型，并且告诉它之前的响应被截断了，需要返回完整的JSON：

async function getUrlAnalysis() {
  let attempt = 0;
  const maxAttempts = 3;
  let messages = [
    { role: 'system', content: '必须返回结构完整的有效JSON，禁止markdown。' },
    { role: 'user', content: 'Analyze this URL and return a JSON object with score, category_grades, and recommendations (array of 10 objects).' }
  ];

  while (attempt < maxAttempts) {
    try {
      const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
        method: 'POST',
        headers: { /* 头信息 */ },
        body: JSON.stringify({
          model: 'openai/gpt-4o-mini',
          max_tokens: 800, // 重试时可以适当调大token限制
          messages: messages
        })
      });
      const data = await response.json();
      const content = data.choices[0].message.content;
      const cleanedContent = content.replace(/^```json\s*|\s*```$/g, '');
      const parsed = JSON.parse(cleanedContent);
      return parsed;
    } catch (e) {
      attempt++;
      if (attempt >= maxAttempts) throw new Error('多次重试仍无法获取完整JSON');
      // 追加提示，告诉模型之前的响应截断了
      messages.push({
        role: 'user',
        content: '你之前的响应被截断了，返回的JSON不完整。请重新生成**完整、有效的JSON**，不要有任何截断内容。'
      });
    }
  }
}