如何用Python脚本获取ting22.com听书MP3的请求URL？

阿华AIGC实验室

2026-5-14

获取有声书MP3请求URL的Python实现方案

嘿，针对你在Windows 7 x64 + Python 3.7.0环境下的需求，我整理了两种可行的实现方法，帮你提取目标页面的MP3请求URL（只拿链接，不需要下载文件）：

一、适用的Python库

先准备几个必要的工具库，根据页面加载方式选对应方案：

requests + BeautifulSoup4：适合页面直接把MP3链接写在HTML里的静态场景
selenium：适合MP3链接通过JavaScript动态生成、需要模拟浏览器加载的场景

二、方案1：静态页面解析（优先尝试）

如果目标页面的MP3链接是直接嵌入在HTML的<audio>标签里，这个方法最快最省心。

步骤1：安装依赖

打开命令行执行：

pip install requests beautifulsoup4

步骤2：示例代码

import requests
from bs4 import BeautifulSoup
import time

# 配置请求头，模拟浏览器访问，避免被反爬拦截
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
}

# 遍历目标页面范围：从659-2到659-1724
for page_num in range(2, 1725):
    page_url = f"https://www.ting22.com/ting/659-{page_num}.html"
    try:
        # 发送请求获取页面内容
        response = requests.get(page_url, headers=headers, timeout=10)
        response.encoding = response.apparent_encoding  # 自动识别页面编码
        if response.status_code != 200:
            print(f"页面{page_num}请求失败，状态码：{response.status_code}")
            continue
        
        # 解析HTML提取MP3链接
        soup = BeautifulSoup(response.text, "html.parser")
        # 假设MP3链接在audio标签的src属性里，你可以根据实际页面结构调整选择器
        audio_tag = soup.find("audio")
        if audio_tag and "src" in audio_tag.attrs:
            mp3_url = audio_tag["src"]
            print(f"页面{page_num}的MP3 URL：{mp3_url}")
            # 可选：把链接写入文件保存
            # with open("mp3_urls.txt", "a", encoding="utf-8") as f:
            #     f.write(f"{page_num}\t{mp3_url}\n")
        else:
            print(f"页面{page_num}未找到MP3链接")
        
        # 加小延时，避免请求太频繁被封禁
        time.sleep(1)
    except Exception as e:
        print(f"处理页面{page_num}时出错：{str(e)}")
        continue

三、方案2：动态页面解析（静态方法无效时用）

如果F12的Media面板能看到MP3请求，但静态解析找不到链接，说明是JS动态生成的链接。这时候需要用selenium模拟浏览器加载页面，捕获网络请求里的Media资源。

步骤1：安装依赖

pip install selenium

然后下载ChromeDriver：Windows 7仅支持Chrome 109及以下版本，你要找对应版本的ChromeDriver（比如Chrome 109对应ChromeDriver 109.0.5414.74），下载后把驱动文件放在Python的Scripts目录，或者添加到系统环境变量PATH里。

步骤2：示例代码

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time

# 配置Chrome选项，关闭自动化检测，模拟真实浏览器
chrome_options = Options()
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36")

# 启动浏览器并开启网络日志捕获
driver = webdriver.Chrome(options=chrome_options)
driver.execute_cdp_cmd("Network.enable", {})

try:
    for page_num in range(2, 1725):
        page_url = f"https://www.ting22.com/ting/659-{page_num}.html"
        driver.get(page_url)
        # 等待页面核心元素加载完成，最多等10秒
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "audio")))
        time.sleep(2)  # 给音频请求留加载时间
        
        # 获取所有网络请求，筛选Media类型的MP3链接
        requests = driver.execute_cdp_cmd("Network.getRequests", {})
        mp3_urls = [req["url"] for req in requests["requests"] if req["resourceType"] == "Media" and req["url"].endswith(".mp3")]
        
        if mp3_urls:
            print(f"页面{page_num}的MP3 URL：{mp3_urls[0]}")
            # 可选：写入文件保存链接
        else:
            print(f"页面{page_num}未捕获到MP3请求")
        
        time.sleep(1)
finally:
    # 用完记得关闭浏览器
    driver.quit()