You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python3.8+Brotli1.0.9解码br压缩内容时BrotliDecompress失败求助

解决BrotliDecompress failed错误的方案

Hey there, let's dig into why your Brotli decompression is throwing that error. I've run into similar issues before, so here's what's going on and how to fix it:

最可能的原因:Requests自动帮你处理了压缩解压

The big gotcha here is that Requests library automatically handles decompression by default. When you set Accept-Encoding: gzip, deflate, br in your headers and the server returns Brotli-compressed content, Requests quietly unpacks it for you. That means response.content is already the uncompressed raw byte data—not the Brotli-compressed bytes you're expecting. Trying to run brotli.decompress() on already-unpacked data will definitely throw that BrotliDecompress failed error.

Here's a simplified, fixed version of your code that leverages Requests' built-in handling:

import requests

headers = {
    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    'Accept-Encoding': "gzip, deflate, br",
    'Host': "book.douban.com",
    'Referer': "https://book.douban.com/",  # Fixed: Use full URL instead of partial domain
    'Sec-Fetch-Dest': "document",
    'Sec-Fetch-Mode': "navigate",
    'Upgrade-Insecure-Requests': "1"
}

s = requests.Session()
url = "https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4"

try:
    response = s.get(url, headers=headers)
    response.raise_for_status()  #主动抛出HTTP errors like 403/500 instead of silently ignoring
except Exception as e:
    print(f"Request failed: {str(e)}")
    return ""

if response.status_code == 200:
    print(response.headers)
    # Requests already handled decompression—just use response.text directly
    return response.text
else:
    return ""

If you really need to handle Brotli manually

If you have a specific reason to unpack the content yourself (like debugging), you need to disable Requests' automatic decompression. Do this by enabling stream=True and turning off decode_content on the raw response:

import brotli
import requests

headers = {
    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    'Accept-Encoding': "gzip, deflate, br",
    'Host': "book.douban.com",
    'Referer': "https://book.douban.com/",
    'Sec-Fetch-Dest': "document",
    'Sec-Fetch-Mode': "navigate",
    'Upgrade-Insecure-Requests': "1"
}

s = requests.Session()
url = "https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4"

try:
    # Enable stream mode to access raw compressed content
    response = s.get(url, headers=headers, stream=True)
    response.raw.decode_content = False  # Disable automatic decompression
    response.raise_for_status()
except Exception as e:
    print(f"Request failed: {str(e)}")
    return ""

if response.status_code == 200:
    print(response.headers)
    if response.headers.get('Content-Encoding') == 'br':
        # Read raw compressed bytes and decompress manually
        raw_compressed_data = response.raw.read()
        uncompressed_data = brotli.decompress(raw_compressed_data)
        return uncompressed_data.decode('utf-8')
    else:
        return response.text
else:
    return ""

Extra tips to avoid future issues

  • Upgrade your Brotli version: Brotli 1.0.9 is pretty old (released in 2019). Newer versions fix compatibility issues with modern Brotli compression formats. Run this command to upgrade:
    pip install --upgrade brotli
    
  • Fix your Referer header: Your original code used book.douban.com instead of the full URL https://book.douban.com/—this might trigger anti-scraping measures from Douban.
  • Avoid early returns in try blocks: Your original code had a return "" inside the try block that skipped all the status code checking logic. Use raise or print to handle errors instead of bailing early.

内容的提问来源于stack exchange,提问作者xin.chen

火山引擎 最新活动