You need to enable JavaScript to run this app.
优惠活动
大模型
产品
解决方案
定价
更多
文档控制台
免费开始使用

Docker环境下YouTube视频抓取失败:Python未找到等问题排查

YouTube视频信息抓取工具Docker环境问题排查

问题概述

使用Remix、Node.js和yt-dlp开发的YouTube视频信息抓取工具在本地运行正常,但部署到Docker环境后出现以下三个问题:

  • 通过Python调用yt-dlp时提示“Python not found”
  • YouTube API返回“UNPLAYABLE”内容状态
  • 回退到youtube-dl-exec时速度过慢

错误输出

GET /?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DZyF-pmNfnH0 200 - - 17100.357 ms
Player API response for ZyF-pmNfnH0 (attempt 1): {
  "status": 200,
  "hasStreamingData": false,
  "playabilityStatus": "UNPLAYABLE",
  "reason": "This content isn't available."
}


Python yt-dlp error: Error: Command failed: python /app/yt_dlp_temp.py "https://www.youtube.com/watch?v=ZyF-pmNfnH0"
/bin/sh: 1: python: not found

已尝试解决方案

  • 更换不同的YouTube API端点(ANDROID、WEB)
  • 多种回退策略
  • 各种yt-dlp配置

咨询问题

  1. 如何解决Docker中的“python not found”错误?
  2. 视频明明可访问,为何YouTube API返回UNPLAYABLE状态?
  3. 如何提升性能以匹配y2mate这类网站?

代码实现

api.info.ts(主路由处理器)

// app/routes/api.info.ts
import { json, type LoaderFunctionArgs } from "@remix-run/node";
import youtubedl from "youtube-dl-exec";
import { fetchWithPythonYtDlp } from "~/lib/yt-dlp-wrapper";

export async function loader({ request }: LoaderFunctionArgs) {
  const url = new URL(request.url);
  const videoUrl = url.searchParams.get("url");

  try {
    const videoId = extractVideoId(videoUrl);
    let info;
    
    try {
      info = await fetchVideoInfo(videoId);
      if (info.formats.length === 0) throw new Error(`Format array is empty`);
    } catch (error) {
      console.warn(`Player API failed, falling back to yt-dlp:`, error);
      try {
        info = await fetchWithPythonYtDlp(videoUrl);
      } catch (error) {
        console.warn(`Falling back to youtube-dl-exec:`, error);
        info = await fetchVideoInfoFallback(videoId, videoUrl);
      }
    }

    return json({ success: true, data: info });
  } catch (error) {
    return json({ error: "Failed to fetch video info" }, { status: 500 });
  }
}

yt-dlp-wrapper.ts(Python回退方案)

// ~/lib/yt-dlp-wrapper.ts
import { exec } from "child_process";
import { promisify } from "util";
import fs from "fs";

const execPromise = promisify(exec);

export async function fetchWithPythonYtDlp(videoUrl: string) {
  const pythonScript = `
import yt_dlp
import json
import sys

video_url = sys.argv[1]
ydl_opts = {'quiet': True, 'skip_download': True}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(video_url, download=False)
print(json.dumps(info))
`;

  fs.writeFileSync("yt_dlp_temp.py", pythonScript);
  const { stdout } = await execPromise(`python yt_dlp_temp.py "${videoUrl}"`);
  return JSON.parse(stdout);
}

Dockerfile

# ---------- Stage 1: Build ----------
FROM node:18-bullseye-slim AS builder

# Install yt-dlp for build phase (if needed)
RUN apt-get update && apt-get install -y python3 python3-pip && rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir yt-dlp

# Set working directory
WORKDIR /app

# Copy and install dependencies
COPY package*.json ./
RUN npm install

# Copy all source files
COPY . .

# Build Remix app
RUN npm run build


# ---------- Stage 2: Runtime ----------
FROM node:18-bullseye-slim

# Install only runtime dependencies
RUN apt-get update && \
    apt-get install -y ffmpeg python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*

# Install yt-dlp (needed at runtime)
RUN pip3 install --no-cache-dir yt-dlp

# Set working directory
WORKDIR /app

# Copy package files and install only prod deps
COPY package*.json ./
RUN npm install --omit=dev

# Copy only built + required app files from builder stage
COPY --from=builder /app/build ./build
COPY --from=builder /app/public ./public
COPY --from=builder /app/app ./app
COPY --from=builder /app/remix.config.js ./remix.config.js

# Expose port and start the app
EXPOSE 3000
CMD ["npm", "run", "start"]

解决方案

1. 解决Docker中"python not found"错误

Debian/Ubuntu系镜像默认仅提供python3命令,无python别名,导致调用失败。两种修复方式:

  • 方式一:修改Dockerfile添加别名
    在Runtime阶段添加:
RUN ln -s /usr/bin/python3 /usr/bin/python
  • 方式二:修改Node.js代码中的命令
    yt-dlp-wrapper.ts中的执行命令改为:
const { stdout } = await execPromise(`python3 /app/yt_dlp_temp.py "${videoUrl}"`);

额外优化:使用绝对路径存储临时脚本,并在执行后删除避免文件堆积:

const tempPath = "/app/yt_dlp_temp.py";
fs.writeFileSync(tempPath, pythonScript);
const { stdout } = await execPromise(`python3 ${tempPath} "${videoUrl}"`);
fs.unlinkSync(tempPath); // 执行后删除临时文件

2. 解决YouTube API返回UNPLAYABLE状态

YouTube Player API会根据请求的User-Agent、IP地理位置、是否登录等维度限制访问,即使视频公开也可能被拦截。修复方案:

  • 添加模拟浏览器的User-Agent
    在调用Player API的fetchVideoInfo函数中,添加浏览器UA:
const response = await fetch(`https://www.youtube.com/youtubei/v1/player?key=YOUR_API_KEY`, {
  method: "POST",
  headers: {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({ videoId: videoId })
});
  • 优先使用yt-dlp获取信息
    跳过Player API直接调用yt-dlp,因为yt-dlp会自动处理UA、签名验证等反爬机制,避免被拦截。
  • 检查IP状态
    Docker容器的公网IP可能被YouTube标记为异常,可尝试更换IP或使用代理服务。

3. 提升性能匹配y2mate类网站

性能瓶颈主要来自子进程启动开销、无缓存机制及回退逻辑不合理。优化方案:

  • 复用yt-dlp进程
    避免每次创建临时脚本,改用child_process.spawn启动持久化的yt-dlp进程,通过stdin传递命令,减少进程启动时间。
  • 添加缓存机制
    给yt-dlp配置缓存目录,并在Docker中创建对应目录:
    • 在yt-dlp配置中添加:'cache-dir': '/app/.yt-dlp-cache'
    • 在Dockerfile Runtime阶段添加:RUN mkdir -p /app/.yt-dlp-cache && chown node:node /app/.yt-dlp-cache
  • 使用Redis缓存视频信息
    对已获取的视频信息进行缓存,有效期设为1-2天,避免重复请求YouTube。
  • 优化回退逻辑
    直接优先使用yt-dlp而非Player API,因为Player API易被拦截且稳定性差,减少不必要的重试开销。
  • 预下载yt-dlp二进制
    使用预编译的yt-dlp二进制文件替代pip安装,减少启动时的依赖加载时间。

内容的提问来源于stack exchange,提问作者Gods Neo

火山引擎 最新活动