Docker环境下YouTube视频抓取失败:Python未找到等问题排查
YouTube视频信息抓取工具Docker环境问题排查
问题概述
使用Remix、Node.js和yt-dlp开发的YouTube视频信息抓取工具在本地运行正常,但部署到Docker环境后出现以下三个问题:
- 通过Python调用yt-dlp时提示“Python not found”
- YouTube API返回“UNPLAYABLE”内容状态
- 回退到youtube-dl-exec时速度过慢
错误输出
GET /?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DZyF-pmNfnH0 200 - - 17100.357 ms Player API response for ZyF-pmNfnH0 (attempt 1): { "status": 200, "hasStreamingData": false, "playabilityStatus": "UNPLAYABLE", "reason": "This content isn't available." } Python yt-dlp error: Error: Command failed: python /app/yt_dlp_temp.py "https://www.youtube.com/watch?v=ZyF-pmNfnH0" /bin/sh: 1: python: not found
已尝试解决方案
- 更换不同的YouTube API端点(ANDROID、WEB)
- 多种回退策略
- 各种yt-dlp配置
咨询问题
- 如何解决Docker中的“python not found”错误?
- 视频明明可访问,为何YouTube API返回UNPLAYABLE状态?
- 如何提升性能以匹配y2mate这类网站?
代码实现
api.info.ts(主路由处理器)
// app/routes/api.info.ts import { json, type LoaderFunctionArgs } from "@remix-run/node"; import youtubedl from "youtube-dl-exec"; import { fetchWithPythonYtDlp } from "~/lib/yt-dlp-wrapper"; export async function loader({ request }: LoaderFunctionArgs) { const url = new URL(request.url); const videoUrl = url.searchParams.get("url"); try { const videoId = extractVideoId(videoUrl); let info; try { info = await fetchVideoInfo(videoId); if (info.formats.length === 0) throw new Error(`Format array is empty`); } catch (error) { console.warn(`Player API failed, falling back to yt-dlp:`, error); try { info = await fetchWithPythonYtDlp(videoUrl); } catch (error) { console.warn(`Falling back to youtube-dl-exec:`, error); info = await fetchVideoInfoFallback(videoId, videoUrl); } } return json({ success: true, data: info }); } catch (error) { return json({ error: "Failed to fetch video info" }, { status: 500 }); } }
yt-dlp-wrapper.ts(Python回退方案)
// ~/lib/yt-dlp-wrapper.ts import { exec } from "child_process"; import { promisify } from "util"; import fs from "fs"; const execPromise = promisify(exec); export async function fetchWithPythonYtDlp(videoUrl: string) { const pythonScript = ` import yt_dlp import json import sys video_url = sys.argv[1] ydl_opts = {'quiet': True, 'skip_download': True} with yt_dlp.YoutubeDL(ydl_opts) as ydl: info = ydl.extract_info(video_url, download=False) print(json.dumps(info)) `; fs.writeFileSync("yt_dlp_temp.py", pythonScript); const { stdout } = await execPromise(`python yt_dlp_temp.py "${videoUrl}"`); return JSON.parse(stdout); }
Dockerfile
# ---------- Stage 1: Build ---------- FROM node:18-bullseye-slim AS builder # Install yt-dlp for build phase (if needed) RUN apt-get update && apt-get install -y python3 python3-pip && rm -rf /var/lib/apt/lists/* RUN pip3 install --no-cache-dir yt-dlp # Set working directory WORKDIR /app # Copy and install dependencies COPY package*.json ./ RUN npm install # Copy all source files COPY . . # Build Remix app RUN npm run build # ---------- Stage 2: Runtime ---------- FROM node:18-bullseye-slim # Install only runtime dependencies RUN apt-get update && \ apt-get install -y ffmpeg python3 python3-pip && \ rm -rf /var/lib/apt/lists/* # Install yt-dlp (needed at runtime) RUN pip3 install --no-cache-dir yt-dlp # Set working directory WORKDIR /app # Copy package files and install only prod deps COPY package*.json ./ RUN npm install --omit=dev # Copy only built + required app files from builder stage COPY --from=builder /app/build ./build COPY --from=builder /app/public ./public COPY --from=builder /app/app ./app COPY --from=builder /app/remix.config.js ./remix.config.js # Expose port and start the app EXPOSE 3000 CMD ["npm", "run", "start"]
解决方案
1. 解决Docker中"python not found"错误
Debian/Ubuntu系镜像默认仅提供python3命令,无python别名,导致调用失败。两种修复方式:
- 方式一:修改Dockerfile添加别名
在Runtime阶段添加:
RUN ln -s /usr/bin/python3 /usr/bin/python
- 方式二:修改Node.js代码中的命令
将yt-dlp-wrapper.ts中的执行命令改为:
const { stdout } = await execPromise(`python3 /app/yt_dlp_temp.py "${videoUrl}"`);
额外优化:使用绝对路径存储临时脚本,并在执行后删除避免文件堆积:
const tempPath = "/app/yt_dlp_temp.py"; fs.writeFileSync(tempPath, pythonScript); const { stdout } = await execPromise(`python3 ${tempPath} "${videoUrl}"`); fs.unlinkSync(tempPath); // 执行后删除临时文件
2. 解决YouTube API返回UNPLAYABLE状态
YouTube Player API会根据请求的User-Agent、IP地理位置、是否登录等维度限制访问,即使视频公开也可能被拦截。修复方案:
- 添加模拟浏览器的User-Agent
在调用Player API的fetchVideoInfo函数中,添加浏览器UA:
const response = await fetch(`https://www.youtube.com/youtubei/v1/player?key=YOUR_API_KEY`, { method: "POST", headers: { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "Content-Type": "application/json" }, body: JSON.stringify({ videoId: videoId }) });
- 优先使用yt-dlp获取信息
跳过Player API直接调用yt-dlp,因为yt-dlp会自动处理UA、签名验证等反爬机制,避免被拦截。 - 检查IP状态
Docker容器的公网IP可能被YouTube标记为异常,可尝试更换IP或使用代理服务。
3. 提升性能匹配y2mate类网站
性能瓶颈主要来自子进程启动开销、无缓存机制及回退逻辑不合理。优化方案:
- 复用yt-dlp进程
避免每次创建临时脚本,改用child_process.spawn启动持久化的yt-dlp进程,通过stdin传递命令,减少进程启动时间。 - 添加缓存机制
给yt-dlp配置缓存目录,并在Docker中创建对应目录:- 在yt-dlp配置中添加:
'cache-dir': '/app/.yt-dlp-cache' - 在Dockerfile Runtime阶段添加:
RUN mkdir -p /app/.yt-dlp-cache && chown node:node /app/.yt-dlp-cache
- 在yt-dlp配置中添加:
- 使用Redis缓存视频信息
对已获取的视频信息进行缓存,有效期设为1-2天,避免重复请求YouTube。 - 优化回退逻辑
直接优先使用yt-dlp而非Player API,因为Player API易被拦截且稳定性差,减少不必要的重试开销。 - 预下载yt-dlp二进制
使用预编译的yt-dlp二进制文件替代pip安装,减少启动时的依赖加载时间。
内容的提问来源于stack exchange,提问作者Gods Neo




