SeleniumBase Python脚本在Ubuntu Linux环境下无法通过CAPTCHA导致执行失败的求助

阿华AIGC实验室

2026-4-14

看起来你遇到的问题是Windows上能正常运行的SeleniumBase脚本，在Linux环境下（用xvfb-run启动）因为Cloudflare CAPTCHA没有被正确处理，导致后续找不到目标元素而报错。我来帮你梳理几个实用的解决方向：

强化SeleniumBase的反检测启动参数
Linux环境下的无头浏览器反检测逻辑需要更细致的配置，你可以调整SB的启动参数，添加适配Linux的选项：
```
with SB(uc=True, test=True, ad_block=True, headless=True, incognito=True, disable_gpu=True, no_sandbox=True) as sb:
```
其中no_sandbox是Linux环境下规避权限问题的常用参数，uc=True（undetected-chromedriver模式）是绕过Cloudflare的核心，一定要确保启用。

优化CAPTCHA处理的等待逻辑
固定的sleep(2)在Linux环境下可能不够稳定，建议用显式等待替代，确保CAPTCHA元素完全加载并验证完成后再执行后续操作：

sb.activate_cdp_mode(url)
# 延长超时时间，等待CAPTCHA验证区域出现
cf_shadow = '[style="display: grid;"] div div'
if sb.wait_for_element_visible(cf_shadow, timeout=10):
    sb.cdp.gui_click_element(cf_shadow)
# 等待CAPTCHA验证完成（元素消失）
sb.wait_for_element_not_visible(cf_shadow, timeout=15)

另外，SeleniumBase提供了专门针对Cloudflare的方法uc_open_with_reconnect，它会自动重试直到通过CAPTCHA，你可以替换原有的open方法：

sb.uc_open_with_reconnect(url, reconnect_time=30)

调整xvfb的运行配置
xvfb的显示分辨率可能影响页面渲染，导致元素布局异常，你可以指定更高的分辨率启动脚本：
```
xvfb-run -a -s "-screen 0 1920x1080x24" python3 py3.py
```
同时要确保Linux环境下的Chrome版本和SeleniumBase依赖的chromedriver版本完全匹配，版本不兼容也会导致反检测失效。
临时调试查看页面状态
你可以临时关闭无头模式，在带桌面的Ubuntu环境下直接运行脚本，观察CAPTCHA的实际显示状态，排查是不是Linux环境下的浏览器指纹被Cloudflare识别了。

这里给你整合了上述建议的完整脚本示例：

import json
from seleniumbase import SB

with SB(uc=True, test=True, ad_block=True, headless=True, incognito=True, disable_gpu=True, no_sandbox=True) as sb:
    url = "https://www.nombrerutyfirma.com/"
    # 使用专门的Cloudflare绕过方法自动处理CAPTCHA
    sb.uc_open_with_reconnect(url, reconnect_time=30)
    
    # 等待页面完全加载，确认CAPTCHA已通过
    sb.wait_for_element('body', timeout=20)
    
    # 等待RUT标签可见后再点击
    sb.wait_for_element_visible('a[href="#rut"]', timeout=15)
    sb.click('a[href="#rut"]')
    
    rut_input = sb.wait_for_element_visible('div#rut input[name="term"]', timeout=10)
    rut_input.send_keys("21.405.338-1") 
    rut_input.send_keys("\ue007")
    
    rows = sb.wait_for_elements('table.table-hover tbody tr', timeout=10)
    data = []
    for row in rows:
        columns = row.find_elements('td')
        record = {
            "Nombre": columns[0].text,
            "RUT": columns[1].text,
            "Sexo": columns[2].text,
            "Dirección": columns[3].text,
            "Ciudad/Comuna": columns[4].text,
        }
        data.append(record)

    json_data = json.dumps(data, ensure_ascii=False, indent=4)
    print(json_data)

备注：内容来源于stack exchange，提问作者Oswuell