如何用含MediaPipe坐标的JSON驱动Three.js中VRM角色动效并修正异常
解决MediaPipe坐标驱动Three.js VRM角色动作异常的归一化方案
当用MediaPipe导出的Pose、Hand、Face坐标JSON驱动Three.js VRM角色时动作异常,核心问题基本是坐标系不匹配和坐标未适配VRM骨骼空间,以下是针对性的归一化与转换方案:
一、先修正坐标系差异
MediaPipe与Three.js/VRM的坐标系存在本质区别:
- MediaPipe:x轴水平向右,y轴垂直向上(视频帧基准),z轴向外(远离摄像头)
- Three.js/VRM:x轴向右,y轴向上,z轴向前(摄像头反方向),且基于角色自身局部空间
必须先做坐标翻转:
- z轴取反:
vrmZ = -mediaPipeZ - 若角色上下颠倒,再对y轴取反
二、Pose坐标的归一化与骨骼映射
MediaPipe Pose坐标是视频帧相对归一化值(x/y∈[0,1],z为摄像头相对深度),VRM需要的是角色局部空间的骨骼偏移/旋转,处理步骤:
- 以角色盆骨为根节点基准:
计算MediaPipe左右髋关节(LEFT_HIP/RIGHT_HIP)的中点作为根偏移,所有Pose坐标减去该中点,得到相对位置 - 适配角色体型:
计算VRM角色身高(盆骨到头顶骨骼的y轴距离)与MediaPipe人体身高(鼻尖到根中点的y轴距离)的比例,用该比例缩放所有Pose坐标 - 过滤无效点:
当visibility < 0.5时,该点不可靠,用上一帧有效值或骨骼默认位置替代
代码示例(Pose处理)
function normalizePose(mediaPipePose, vrm) { const hips = vrm.humanoid.getBoneNode('hips'); // 计算MediaPipe根中点(左右髋关节) const leftHip = mediaPipePose.LEFT_HIP; const rightHip = mediaPipePose.RIGHT_HIP; const rootMid = new THREE.Vector3( (leftHip.x + rightHip.x) / 2, (leftHip.y + rightHip.y) / 2, -(leftHip.z + rightHip.z) / 2 // z轴翻转 ); // 计算体型缩放比例 const vrmHeight = vrm.humanoid.getBoneNode('head').position.y - hips.position.y; const mpHeight = mediaPipePose.NOSE.y - rootMid.y; const scale = mpHeight > 0 ? vrmHeight / mpHeight : 1; // 映射Pose关键点到VRM骨骼 const poseToVRMMap = { NOSE: 'head', LEFT_SHOULDER: 'leftShoulder', RIGHT_SHOULDER: 'rightShoulder', LEFT_HIP: 'leftHip', RIGHT_HIP: 'rightHip', // 补充其他30个关键点的映射 }; for (const [mpName, landmark] of Object.entries(mediaPipePose)) { if (landmark.visibility < 0.5) continue; const vrmBoneName = poseToVRMMap[mpName]; const bone = vrmBoneName ? vrm.humanoid.getBoneNode(vrmBoneName) : null; if (!bone) continue; // 转换为角色局部空间并缩放 const localPos = new THREE.Vector3( (landmark.x - rootMid.x) * scale, (landmark.y - rootMid.y) * scale, (landmark.z + rootMid.z) * scale ); bone.position.copy(localPos); // 可选:通过相邻点计算骨骼旋转(VRM更依赖旋转驱动动作) const parentMpName = getParentPoseLandmark(mpName); if (parentMpName && mediaPipePose[parentMpName]) { const parentLandmark = mediaPipePose[parentMpName]; const dir = new THREE.Vector3( landmark.x - parentLandmark.x, landmark.y - parentLandmark.y, -(landmark.z - parentLandmark.z) ).normalize(); bone.quaternion.setFromUnitVectors(new THREE.Vector3(0, 1, 0), dir); } } } function getParentPoseLandmark(mpName) { // 定义Pose关键点的父子关系,比如LEFT_ELBOW的父节点是LEFT_SHOULDER const parentMap = { LEFT_ELBOW: 'LEFT_SHOULDER', RIGHT_ELBOW: 'RIGHT_SHOULDER', LEFT_WRIST: 'LEFT_ELBOW', // 补充其他关键点的父节点 }; return parentMap[mpName] || null; }
三、手部坐标的归一化
MediaPipe Hand坐标是手腕相对归一化值,处理步骤:
- 将手部坐标转换为世界空间:基于Pose中对应手腕的位置,叠加手部关键点的相对偏移
- 适配VRM手部大小:用经验缩放因子(比如0.1)调整偏移量,匹配角色手部骨骼尺寸
- 映射到VRM手部骨骼层级:基于手指关节的父子关系,计算每个骨骼的相对位置与旋转
代码示例(手部处理)
function normalizeHand(mediaPipeHand, poseWristPos, vrmHandRoot) { const mpWrist = mediaPipeHand.WRIST; // 转换为世界空间(基于Pose手腕位置) const worldWrist = new THREE.Vector3( poseWristPos.x + (mpWrist.x - 0.5), poseWristPos.y + (mpWrist.y - 0.5), -poseWristPos.z + (mpWrist.z - 0.5) ); // 手部关键点到VRM骨骼的映射 const handToVRMMap = { WRIST: 'wrist', THUMB_CMC: 'thumbMetacarpal', THUMB_MCP: 'thumbProximal', THUMB_IP: 'thumbDistal', THUMB_TIP: 'thumbTip', // 补充其他16个手指关键点的映射 }; // 手指关键点的父节点映射 const handParentMap = { THUMB_MCP: 'THUMB_CMC', THUMB_IP: 'THUMB_MCP', THUMB_TIP: 'THUMB_IP', INDEX_FINGER_PIP: 'INDEX_FINGER_MCP', // 补充其他手指的父节点 }; for (const [mpName, landmark] of Object.entries(mediaPipeHand)) { const vrmBoneName = handToVRMMap[mpName]; const bone = vrmBoneName ? vrmHandRoot.getObjectByName(vrmBoneName) : null; if (!bone) continue; // 计算相对手腕的偏移并缩放 const localOffset = new THREE.Vector3( (landmark.x - mpWrist.x) * 0.1, (landmark.y - mpWrist.y) * 0.1, -(landmark.z - mpWrist.z) * 0.1 ); bone.position.copy(localOffset); // 计算骨骼旋转 const parentMpName = handParentMap[mpName]; if (parentMpName && mediaPipeHand[parentMpName]) { const parentLandmark = mediaPipeHand[parentMpName]; const dir = new THREE.Vector3( landmark.x - parentLandmark.x, landmark.y - parentLandmark.y, -(landmark.z - parentLandmark.z) ).normalize(); bone.quaternion.setFromUnitVectors(new THREE.Vector3(0, 1, 0), dir); } } }
四、Face Mesh坐标的表情驱动
MediaPipe Face Mesh的478个点是面部边界框相对归一化值,VRM面部依赖BlendShape(形状键)驱动,需计算特征相对变化:
- 提取关键特征点:比如上下眼睑、嘴角、眉毛的关键点
- 计算特征变化值:比如眼睛睁开程度(上下眼睑y轴距离)、嘴角上扬幅度(嘴角与面部中心的y轴差)
- 映射到VRM BlendShape权重:将变化值归一化到[0,1]范围,设置对应BlendShape的权重
代码示例(面部表情处理)
function updateFaceBlendShapes(mediaPipeFace, vrm) { // 左眼睁开程度计算 const leftEyeTop = mediaPipeFace[159]; const leftEyeBottom = mediaPipeFace[145]; const leftEyeOpen = leftEyeTop.y - leftEyeBottom.y; // 映射到eyeBlinkLeft(0=闭眼,1=睁眼) vrm.blendShapeProxy.setValue('eyeBlinkLeft', 1 - Math.min(leftEyeOpen / 0.05, 1)); // 左嘴角微笑幅度计算 const leftMouth = mediaPipeFace[61]; const faceCenterY = mediaPipeFace[1].y; const leftSmile = leftMouth.y - faceCenterY; // 映射到mouthSmileLeft(0=无表情,1=最大微笑) vrm.blendShapeProxy.setValue('mouthSmileLeft', Math.min(leftSmile / 0.1, 1)); // 补充其他表情:右眼眨眼、右嘴角微笑、皱眉等 }
五、额外优化建议
- 动作平滑:用线性插值或卡尔曼滤波对连续帧的坐标进行平滑,避免动作卡顿
- 旋转优先:VRM角色动作主要由骨骼旋转驱动,尽量通过关键点向量计算旋转,而非直接设置位置
- 边界校验:确保转换后的坐标在角色骨骼空间范围内,避免夸张位移
内容的提问来源于stack exchange,提问作者Moriel Greenberg




