doubao-1.5-ui-tars--火山方舟大模型服务平台-火山引擎

文档中心

立即注册

导航

doubao-1.5-ui-tars

最近更新时间：2025.08.26 15:56:06首次发布时间：2025.04.17 09:13:59

模型效果

★★★★★

速度

★★★★

价格（元/百万token）

3.5, 12

[输入], [输出]

输入

Text,

Image , ~~Video~~, ~~Audio~~

文本，图像

输出

Text,

~~Image~~ , ~~Video~~, ~~Audio~~

文本

一款原生面向图形界面交互（GUI）的Agent模型。通过感知、推理和动作执行等类人的能力，与 GUI 进行连续、流程的交互。
与传统模块化框架不同，模型将所有核心能力（感知、推理、基础理解能力），统一集成在视觉大模型（VLM）中，实现无需预定义工作流程或人工规则的端到端任务自动化。

最大上下文长度：128k
最大输入长度：96k
最大思维链内容长度：32k
可配置最大回答长度：16k
默认最大回答长度：4k

附-模型输入输出长度限制说明

模型价格

元/百万 token

输入

3.50

输出

12.00

缓存命中

不涉及

缓存存储[每小时]

不涉及

输入[批量]

不涉及

输出[批量]

不涉及

其中使用上下文缓存会产生缓存命中、缓存存储费用；批量推理产生输入[批量]、输出[批量]费用。具体请参阅模型服务价格。

能力支持

深度思考

模型版本

doubao-1.5-ui-tars

doubao-1-5-ui-tars-250428
- 增加了深度思考能力，可通过API中的 thinking 字段控制是否启用深度思考（模型解决问题前，先进行深度思考，输出思维链内容，再进行回答）。
- 升级了模型回复长度限制：
  - 最大上下文长度： 32k升级至 128k
  - 最大输入长度： 不涉及变更至 96k
  - 最大思维链内容长度：不支持 变更至 32k
  - 可配置最大输出长度：4k升级至 16k
  - 默认最大输出长度： 4k维持4k

模型限流

速率限制通过对给定时间段内的请求或令牌使用量设置特定上限来确保公平可靠地访问 API。

TPM：5,000,000

RPM：30,000

使用文档

对话（chat） API

模型调用API参数的说明

供您查阅API请求以及返回参数取值范围、默认值、示例等信息。

GUI 任务处理

模型使用教程

供您了解快速调用该模型，及一些典型使用示例代码，您可以基于此进行扩展。

其他说明

深度思考模式开关

doubao-1-5-ui-tars-250428版本支持，有深度思考模式获得更好模型效果。
具体使用示例请参考开启关闭深度思考。

系统提示模板

处理GUI任务需要使用固定的提示词模板，使用和配置方法请参见系统提示设计。
doubao-1-5-ui-tars-250428提示词模板

# 电脑 GUI 任务场景的提示词模板
COMPUTER_USE_DOUBAO = '''You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.

## Output Format
```
Thought: ...
Action: ...
```

## Action Space
click(point='<point>x1 y1</point>')
left_double(point='<point>x1 y1</point>')
right_single(point='<point>x1 y1</point>')
drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>')
hotkey(key='ctrl c') # Split keys with a space and use lowercase. Also, do not use more than 3 keys in one hotkey action.
type(content='xxx') # Use escape characters \\', \\\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content. 
scroll(point='<point>x1 y1</point>', direction='down or up or right or left') # Show more information on the `direction` side.
wait() #Sleep for 5s and take a screenshot to check for any changes.
finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format.

## Note
- Use {language} in `Thought` part.
- Write a small plan and finally summarize your next action (with its target element) in one sentence in `Thought` part.

## User Instruction
{instruction}
'''


# 手机 GUI 任务场景的提示词模板
PHONE_USE_DOUBAO = '''
You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. 
## Output Format
```
Thought: ...
Action: ...
```

## Action Space
click(point='<point>x1 y1</point>')
long_press(point='<point>x1 y1</point>')
type(content='') #If you want to submit your input, use "\\n" at the end of `content`.
scroll(point='<point>x1 y1</point>', direction='down or up or right or left')
open_app(app_name=\'\')
drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>')
press_home()
press_back()
finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format.

## Note
- Use {language} in `Thought` part.
- Write a small plan and finally summarize your next action (with its target element) in one sentence in `Thought` part.

## User Instruction
{instruction}
'''