You need to enable JavaScript to run this app.
导航
文档理解
最近更新时间:2025.12.01 16:53:40首次发布时间:2025.12.01 16:53:40
复制全文
我的收藏
有用
有用
无用
无用

部分模型支持处理PDF格式的文档,会通过视觉功能来理解整个文档的上下文。当传入PDF文档时,大模型会将文件分页处理成多图,然后分析解读对应的文本、图片等信息,并结合这些信息完成文档理解相关任务。

前提条件

API 接口

Responses API

文档输入方式

支持文档传入方式如下:

  • 本地文件上传:
  • 文件 URL 传入:适用于文件已存在公网可访问 URL 的场景,文件大小不能超过 50 MB。

本地文件上传

Files API 上传(推荐)

建议优先使用 Files API 上传本地文件,不仅可以支持最大 512MB 文件的处理,还可以避免请求时重新上传内容,减少预处理导致的时延,同时可在多次请求中重复使用,节省公网下载时延。其中文件预处理的原理,参见附:文件预处理

  • 该方式上传的文件默认存储 7 天,存储有效期取值范围为1-30天。
  • 如果需要实时获取分析内容,或者要规避复杂任务引发的客户端超时失败问题,可采用流式输出的方式,具体示例见流式输出

代码示例:

  1. 上传PDF文件获取File ID。

    curl https://ark.cn-beijing.volces.com/api/v3/files \
    -H "Authorization: Bearer $ARK_API_KEY" \
    -F 'purpose=user_data' \
    -F 'file=@/Users/doc/demo.pdf'
    
  2. 在Responses API中引用File ID。

    curl https://ark.cn-beijing.volces.com/api/v3/responses \
    -H "Authorization: Bearer $ARK_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "doubao-seed-1-6-251015",
        "input": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_file",
                        "file_id": "file-20251018****"
                    },
                    {
                        "type": "input_text",
                        "text": "按段落给出文档中的文字内容,以JSON格式输出,包括段落类型(type)、文字内容(content)信息。"
                    }
                ]
            }
        ]
    }'
    

输出示例:

{
    "text": [
        {
            "type": "heading",
            "content": "1 Introduction"
        },
        {
            "type": "paragraph",
            "content": "Diffusion models [3–5] learn to reverse a process that incrementally corrupts data with noise, effectively decomposing a complex distribution into a hierarchy of simplified representations. This coarse-to-fine generative approach has proven remarkably successful across a wide range of applications, including image and video synthesis [6] as well as solving complex challenges in natural sciences [7]."
        },
        ...
        {
            "type": "heading",
            "content": "3 Seed Diffusion"
        },
        {
            "type": "paragraph",
            "content": "As the first experimental model in our Seed Diffusion series, Seed Diffusion Preview is specifically focused on code generation, thus adopting the data pipeline (code/code-related data only) and processing methodology of the open-sourced Seed Coder project [20]. The architecture is a standard dense Transformer, and we intentionally omit complex components such as LongCoT reasoning in this initial version to first establish a strong and efficient performance baseline. This section introduces its key components and training strategies."
        }
    ]
}

Base64 编码传入

将本地文件转换为 Base64 编码字符串,然后提交给大模型。该方式适用于文档体积较小的情况,,文件不能超过 50 MB,请求体不能超过 64 MB。

注意

将文档转换为Base64编码字符串,然后遵循data:{mime_type};base64,{base64_data}格式拼接,传入模型。

  • {mime_type}:文件的媒体类型,需要与文件格式mime_type对应(application/pdf)。
  • {base64_data}:文件经过Base64编码后的字符串。
BASE64_FILE=$(base64 < demo.pdf) && curl https://ark.cn-beijing.volces.com/api/v3/responses \
   -H "Content-Type: application/json"  \
   -H "Authorization: Bearer $ARK_API_KEY"  \
   -d @- <<EOF
   {
    "model": "doubao-seed-1-6-251015",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_file",
            "file_data": "data:application/pdf;base64,$BASE64_FILE",
            "filename": "demo.pdf" # When using file_data, the filename parameter is required.
          },
          {
            "type": "input_text",
            "text": "按段落给出文档中的文字内容,以JSON格式输出,包括段落类型(type)、文字内容(content)信息。"
          }
        ]
      }
    ]
  }
EOF

文件 URL 传入

如果文档已存在公网可访问 URL,可以在 Responses API 请求中直接填入文档的公网 URL,文件不能超过50 MB。

curl https://ark.cn-beijing.volces.com/api/v3/responses \
-H "Authorization: Bearer $ARK_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "doubao-seed-1-6-251015",
    "input": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_url": "https://ark-project.tos-cn-beijing.volces.com/doc_pdf/demo.pdf"
                },
                {
                    "type": "input_text",
                    "text": "按段落给出文档中的文字内容,以JSON格式输出,包括段落类型(type)、文字内容(content)信息。"
                }
            ]
        }
    ]
}'

流式输出

流式输出支持内容动态实时呈现,既能够缓解用户等待焦虑,又可以规避复杂任务因长时间推理引发的客户端超时失败问题,保障请求流程顺畅。

import asyncio
import os
from volcenginesdkarkruntime import AsyncArk
from volcenginesdkarkruntime.types.responses.response_completed_event import ResponseCompletedEvent
from volcenginesdkarkruntime.types.responses.response_reasoning_summary_text_delta_event import ResponseReasoningSummaryTextDeltaEvent
from volcenginesdkarkruntime.types.responses.response_output_item_added_event import ResponseOutputItemAddedEvent
from volcenginesdkarkruntime.types.responses.response_text_delta_event import ResponseTextDeltaEvent
from volcenginesdkarkruntime.types.responses.response_text_done_event import ResponseTextDoneEvent

client = AsyncArk(
    base_url='https://ark.cn-beijing.volces.com/api/v3',
    api_key=os.getenv('ARK_API_KEY')
)

async def main():
    # upload pdf file
    print("Upload pdf file")
    file = await client.files.create(
        # replace with your local pdf path
        file=open("/Users/doc/demo.pdf", "rb"),
        purpose="user_data"
    )
    print(f"File uploaded: {file.id}")

    # Wait for the file to finish processing
    await client.files.wait_for_processing(file.id)
    print(f"File processed: {file.id}")

    stream = await client.responses.create(
        model="doubao-seed-1-6-251015",
        input=[
            {"role": "user", "content": [
                {
                    "type": "input_file",
                    "file_id": file.id  # ref pdf file id
                },
                {
                    "type": "input_text",
                    "text": "按段落给出文档中的文字内容,以JSON格式输出,包括段落类型(type)、文字内容(content)信息。"
                }
            ]},
        ],
        caching={
            "type": "enabled",
        },
        store=True,
        stream=True
    )
    async for event in stream:
        if isinstance(event, ResponseReasoningSummaryTextDeltaEvent):
            print(event.delta, end="")
        if isinstance(event, ResponseOutputItemAddedEvent):
            print("\noutPutItem " + event.type + " start:")
        if isinstance(event, ResponseTextDeltaEvent):
            print(event.delta,end="")
        if isinstance(event, ResponseTextDoneEvent):
            print("\noutPutTextDone.")
        if isinstance(event, ResponseCompletedEvent):
            print("Response Completed. Usage = " + event.response.usage.model_dump_json())

if __name__ == "__main__":
    asyncio.run(main())

附:文件预处理

对于PDF文件会分页来处理成多图,在预处理时不会对拆分的图片做分辨率缩放,以确保图片能够完整且清晰地保留PDF文件中的原始信息。在作为输入的时候,会根据模型input.content.detail参数的auto行为自动缩放。