You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在JavaScript中抓取真实媒体URL而非Blob URL?——基于NightmareJS的爬取异常排查

Fixing Blob URL Issue with NightmareJS for HLS (.m3u8) Video Streams

Hey there, let's sort out this problem where you're getting Blob URLs instead of the actual video source with NightmareJS. The root cause here is that the site is using HLS streaming (.m3u8 files) and loading them via the MediaSource API, which creates a Blob URL for the video element—so grabbing the src attribute directly won't give you the real stream URL. Here's how to adjust your code to capture the actual .m3u8 link:

Step 1: Capture .m3u8 Requests via Nightmare's Network Listener

Instead of extracting the video element's src, we'll listen for all network requests made by Nightmare and filter out the .m3u8 stream URL. This works because the browser has to fetch the .m3u8 file before converting it to a Blob URL.

Modified Code Example

Here's your updated route handler with the necessary changes:

app.get("/video/:VideoName", function(req, res) {
  var VideoName = req.params.VideoName;
  request("https://sample.com/videos/" + VideoName, function(err, response, html) {
    if (!err && response.statusCode == 200) {
      const $ = cheerio.load(html);
      const videoInfo = $(".video-info");
      const includesVideo = VidLinks.find(e => e.name == VideoName);

      if (includesVideo) {
        res.render("videoPlayer", { episode: includesVideo });
      } else {
        const Nightmare = require('nightmare');
        const nightmare = Nightmare({ show: true });
        let m3u8Url = null; // Store the captured .m3u8 URL

        // Listen for network requests to capture .m3u8 links
        nightmare.on('request', (request) => {
          // Check if the request URL ends with .m3u8 (adjust regex if needed for variant streams)
          if (/\.m3u8$/i.test(request.url)) {
            console.log("Found HLS stream:", request.url);
            m3u8Url = request.url;
          }
        });

        var iframeLink = videoInfo.find("iframe").attr("src");
        iframeLink = iframeLink.replace(/\/\//g, "https://");

        nightmare
          .goto(iframeLink)
          .click("#myVideo") // Trigger video loading to initiate stream requests
          .wait(3000) // Give time for the stream to load (adjust delay based on site speed)
          .end()
          .then(() => {
            if (m3u8Url) {
              // Store the actual stream URL instead of Blob
              VidLinks.push({ name: VideoName, url: m3u8Url });
              res.render("videoPlayer", { video: m3u8Url });
            } else {
              res.send("Failed to capture video stream URL");
            }
          })
          .catch((error) => {
            console.error("Nightmare error:", error);
            res.send("Error capturing video stream");
          });
      }
    } else {
      res.send("500 Error, Try Again.");
      console.log("ERROR IS HERE! - " + err);
    }
  });
});

Key Changes Explained

  • Network Request Listener: Added nightmare.on('request') to watch all outgoing requests. We check for URLs ending in .m3u8 to identify the HLS stream source.
  • Wait for Stream Initialization: The .wait(3000) gives the browser time to fetch the .m3u8 file after clicking the video. Tweak this delay if the site loads streams faster or slower.
  • Store the Real Stream URL: Instead of saving the Blob URL to VidLinks, we store the actual .m3u8 playlist address.

Additional Notes

  • If the site offers multiple quality variants (e.g., 720p, 1080p), you might capture several .m3u8 URLs. Refine the regex to target specific streams (e.g., /1080p.*\.m3u8$/i).
  • .m3u8 is a streaming playlist, not a direct MP4 file. If you need a downloadable MP4, use tools like ffmpeg to convert the stream:
    ffmpeg -i "your-captured-m3u8-url" -c copy output.mp4
    
  • Always ensure you have permission to scrape and access the video content—respect the site's terms of service and robots.txt rules.

内容的提问来源于stack exchange,提问作者Venkat Lohith Dasari

火山引擎 最新活动