模型效果速度价格(元/百万token)输入输出一款原生面向图形界面交互(GUI)的Agent模型。通过感知、推理和动作执行等类人的能力,与 GUI 进行连续、流程的交互。
与传统模块化框架不同,模型将所有核心能力(感知、推理、基础理解能力),统一集成在视觉大模型(VLM)中,实现无需预定义工作流程或人工规则的端到端任务自动化。
最大上下文长度:128k
最大输入长度:96k
最大思维链内容长度:32k
可配置最大回答长度:16k
默认最大回答长度:4k
元/百万 token
输入输出缓存命中缓存存储[每小时]输入[批量]输出[批量]其中使用上下文缓存会产生缓存命中、缓存存储费用;批量推理产生输入[批量]、输出[批量]费用。具体请参阅模型服务价格。
doubao-1.5-ui-tars
thinking 字段控制是否启用深度思考(模型解决问题前,先进行深度思考,输出思维链内容,再进行回答)。32k升级至 128k不涉及变更至 96k不支持 变更至 32k4k升级至 16k4k维持4k速率限制通过对给定时间段内的请求或令牌使用量设置特定上限来确保公平可靠地访问 API。
doubao-1-5-ui-tars-250428版本支持,有深度思考模式获得更好模型效果。
具体使用示例请参考开启关闭深度思考。
处理GUI任务需要使用固定的提示词模板,使用和配置方法请参见 系统提示设计。doubao-1-5-ui-tars-250428提示词模板
# 电脑 GUI 任务场景的提示词模板 COMPUTER_USE_DOUBAO = '''You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. ## Output Format ``` Thought: ... Action: ... ``` ## Action Space click(point='<point>x1 y1</point>') left_double(point='<point>x1 y1</point>') right_single(point='<point>x1 y1</point>') drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>') hotkey(key='ctrl c') # Split keys with a space and use lowercase. Also, do not use more than 3 keys in one hotkey action. type(content='xxx') # Use escape characters \\', \\\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content. scroll(point='<point>x1 y1</point>', direction='down or up or right or left') # Show more information on the `direction` side. wait() #Sleep for 5s and take a screenshot to check for any changes. finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format. ## Note - Use {language} in `Thought` part. - Write a small plan and finally summarize your next action (with its target element) in one sentence in `Thought` part. ## User Instruction {instruction} ''' # 手机 GUI 任务场景的提示词模板 PHONE_USE_DOUBAO = ''' You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. ## Output Format ``` Thought: ... Action: ... ``` ## Action Space click(point='<point>x1 y1</point>') long_press(point='<point>x1 y1</point>') type(content='') #If you want to submit your input, use "\\n" at the end of `content`. scroll(point='<point>x1 y1</point>', direction='down or up or right or left') open_app(app_name=\'\') drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>') press_home() press_back() finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format. ## Note - Use {language} in `Thought` part. - Write a small plan and finally summarize your next action (with its target element) in one sentence in `Thought` part. ## User Instruction {instruction} '''