NVIDIA Cosmos™ 是一个专为物理人工智能打造的平台,它拥有先进的生成式世界基础模型 (WFM)、强大的安全防护机制以及加速的数据处理和管理流程。Cosmos 专为现实世界系统而设计,能够帮助开发者快速推进物理人工智能应用,例如自动驾驶汽车 (AV)、机器人和视频分析人工智能代理。
Cosmos-Transfer 2.5,一种基于多种空间控制输入(例如分割、深度和边缘等模态)的条件世界生成模型,能够生成世界模拟。该模型采用自适应且可定制的空间条件方案,允许在不同的空间位置对不同的条件输入赋予不同的权重。这使得世界生成具有高度可控性,并可应用于各种世界间迁移场景,包括机器人 Sim2Real 和自动驾驶车辆数据增强。
项目地址:https://github.com/nvidia-cosmos/cosmos-transfer2.5
本教程介绍如何利用 Cosmos-Transfer 2.5,通过“自定义视频 + 文本提示词(Prompt)”的组合,批量构建用于模型训练的高质量数据集,聚焦两类核心数据的生产场景。
场景一:仿真场景效果增强。通过提示词控制渲染仿真画面为高逼真现实风格,弥合仿真与现实的鸿沟,提升模型表现;
场景二:实采数据的条件泛化。通过修改提示词控制画面元素变换,在无需重新采集的情况下,大幅提升训练数据的多样性与鲁棒性。如将训练样本中的梨子替换为苹果、改变机械臂颜色、调整光照等。
以上两个场景通过不同的prompt和input_video灵活定义,使用步骤一致,不再分开赘述。
配置网络:
推理前环境准备过程过程会从 Huggingface 官网下载模型,推荐使用火山引擎网际快车, 并按照文档提示设置环境变量,打开terminal配置代理端口
配置nvidia模型申请许可
2. 申请权限:登录 模型主页,点击 "Agree and Access repository" 同意协议。
3. 获取 Token:在 设置页 创建 Read 类型的 Access Token 并复制。
4. 终端认证:在同一个terminal执行:
export HF_TOKEN="hf_你的Token"
# 进入cosmos transfer工作目录 %cd /workspace/cosmos-transfer2.5 # 激活环境 source .venv/bin/activate
nvidia推理配置官方文档
在终端中执行下面的命令之前,请先配置好以下参数:
python examples/inference.py -i assets/robot_example/depth/robot_depth_spec.json -o outputs/depth
用-i参数下指定的json文件配置输入,把视频输出到-o路径下
{ "name": "pick_orange_demo", "prompt_path": "/preset-datasets/robotics/cosmos_transfer_intput_data/pick_orange/prompt.txt", "video_path": "/preset-datasets/robotics/cosmos_transfer_intput_data/pick_orange/input_real.mp4", "guidance": 3, "depth": { "control_weight": 0.8 }, "edge": { "control_weight": 0.7 }, "seg": { "control_weight": 0.5 } }
参数配置说明:
Eg.
Prompt:The video is a demonstration of robotic manipulation, likely in a laboratory or testing environment. It features two robotic arms interacting with a piece of blue fabric. The setting is a room with a beige couch in the background, providing a neutral backdrop for the robotic activity. The robotic arms are positioned on either side of the fabric, which is placed on a yellow cushion. The left robotic arm is white with a black gripper, while the right arm is black with a more complex, articulated gripper. At the beginning, the fabric is laid out on the cushion. The left robotic arm approaches the fabric, its gripper opening and closing as it positions itself. The right arm remains stationary initially, poised to assist. As the video progresses, the left arm grips the fabric, lifting it slightly off the cushion. The right arm then moves in, its gripper adjusting to grasp the opposite side of the fabric. Both arms work in coordination, lifting and holding the fabric between them. The fabric is manipulated with precision, showcasing the dexterity and control of the robotic arms. The camera remains static throughout, focusing on the interaction between the robotic arms and the fabric, allowing viewers to observe the detailed movements and coordination involved in the task.
需根据业务场景进行调整,注意下面要求
- 格式:txt
- 语言:英文
- 风格:尽量贴近现实场景,避免过于抽象的描述。
需自己配置
在自定义运行推理命令指定的-o 输出路径下,查看输出的mp4视频