You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Azure Function中集成Selenium与ChromeDriver的extract_signals函数触发500内部服务器错误,无法定位故障代码行

Azure Function中集成Selenium与ChromeDriver的extract_signals函数触发500内部服务器错误,无法定位故障代码行

我的http_trigger测试函数运行正常,能按预期将文件写入Azure Blob存储,但调用extract_signals函数时却触发了500内部服务器错误,完全找不到代码里的故障位置。这个函数用到了无头Chrome浏览器,通过ChromeDriverManager().install()动态安装ChromeDriver,还依赖DefaultAzureCredential()访问Azure Key Vault。

从Application Insights里只能看到如下日志,驱动下载缓存成功后直接报错,没有更多细节:

Connected! You are now viewing logs of Function runs in the current Code + Test panel. To see all the logs for this Function, please go to 'Logs' from the Function menu.
2025-07-28T04:59:31Z [Verbose] AuthenticationScheme: WebJobsAuthLevel was successfully authenticated.
2025-07-28T04:59:31Z [Verbose] Authorization was successful.
2025-07-28T04:59:31Z [Information] Executing 'Functions.extract_signals' (Reason='This function was programmatically called via the host APIs.', Id=5e2ad53f-b0ca-4349-b470-c654ab4c8f2c)
2025-07-28T04:59:31Z [Verbose] Sending invocation id: '5e2ad53f-b0ca-4349-b470-c654ab4c8f2c
2025-07-28T04:59:31Z [Verbose] Posting invocation id:5e2ad53f-b0ca-4349-b470-c654ab4c8f2c on workerId:d9a24210-7c41-47c1-8649-ce95814f013f
2025-07-28T04:59:31Z [Information] ====== WebDriver manager ======
2025-07-28T04:59:31Z [Information] Get LATEST chromedriver version for google-chrome
2025-07-28T04:59:32Z [Information] About to download new driver from https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_linux64.zip
2025-07-28T04:59:32Z [Information] Driver downloading response is 200
2025-07-28T04:59:32Z [Information] Get LATEST chromedriver version for google-chrome
2025-07-28T04:59:32Z [Information] Get LATEST chromedriver version for google-chrome
2025-07-28T04:59:32Z [Information] Driver has been saved in cache [/home/.wdm/drivers/chromedriver/linux64/114.0.5735.90]
2025-07-28T04:59:32Z [Error] Executed 'Functions.extract_signals' (Failed, Id=5e2ad53f-b0ca-4349-b470-c654ab4c8f2c, Duration=700ms)

依赖包清单(requirements.txt)

attrs==25.3.0
azure-core==1.35.0
azure-functions==1.23.0
azure-identity==1.23.1
azure-keyvault==4.2.0
azure-keyvault-certificates==4.10.0
azure-keyvault-keys==4.11.0
azure-keyvault-secrets==4.10.0
azure-storage-blob==12.26.0
azure-storage-file-datalake==12.21.0
beautifulsoup4==4.13.4
certifi==2025.7.14
cffi==1.17.1
charset-normalizer==3.4.2
cryptography==45.0.5
h11==0.16.0
idna==3.10
isodate==0.7.2
MarkupSafe==3.0.2
msal==1.32.3
msal-extensions==1.3.1
numpy==2.2.5
outcome==1.3.0.post0
packaging==25.0
pandas==2.2.3
pip==23.2.1
pycparser==2.22
PyJWT==2.10.1
PySocks==1.7.1
python-dateutil==2.9.0.post0
python-dotenv==1.1.0
pytz==2025.2
requests==2.32.3
selenium==4.32.0
setuptools==65.5.0
six==1.17.0
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.7
trio==0.30.0
trio-websocket==0.12.2
typing_extensions==4.14.1
tzdata==2025.2
urllib3==2.5.0
webdriver-manager==4.0.2
websocket-client==1.8.0
Werkzeug==3.1.3
wsproto==1.2.0

函数代码片段(function_app.py)

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options as ChromeOptions
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from azure.identity import DefaultAzureCredential, ClientSecretCredential
from azure.storage.blob import (
    BlobServiceClient, ContainerClient, BlobClient,
    BlobSasPermissions, ContainerSasPermissions, AccountSasPermissions,
    Services, ResourceTypes, UserDelegationKey,
    generate_account_sas, generate_container_sas, generate_blob_sas,
)
from azure.keyvault.secrets import SecretClient
from datetime import datetime, timedelta
from argparse import ArgumentParser
from concurrent.futures import ThreadPoolExecutor
from dotenv import load_dotenv
from pathlib import Path
from urllib.parse import urlencode
import time
import requests
import os
import azure.functions as func
import logging
import json

# # this is strictly used only in development
# # load env variables
# env_dir = Path('../').resolve()
# load_dotenv(os.path.join(env_dir, '.env'))

app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION)

def batch_signal_files_lookup(data: list, batch_size: int):
    """ returns an iterator with a json object representing the all the base url and the relative url of the compressed audio recording/si
    """
    # 注:原代码此处未完成,保留用户提供内容

故障排查建议

我给你整理几个实用的排查方向,你可以一步步来定位问题:

  • 添加详细日志打点:在Chrome初始化、Key Vault访问等关键步骤前后增加logging.info()logging.debug(),比如在创建Chrome实例、初始化SecretClient、读取密钥的代码前后都加上日志,这样就能知道到底是哪一步出了问题。

  • 检查Chrome浏览器是否存在:Azure Function的Linux消费计划默认没有预装Chrome浏览器,ChromeDriver只是驱动程序,没有浏览器本体根本无法启动。你可以通过自定义启动脚本在函数启动时安装Chrome,或者切换到高级/专用计划并使用包含Chrome的自定义镜像。另外,记得给Chrome配置无头模式的必要参数:

    options = ChromeOptions()
    options.add_argument("--headless=new")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    
  • 捕获并记录详细异常:把关键代码块用try-except包裹,捕获异常并打印完整的错误栈,比如:

    try:
        logging.info("开始初始化Chrome驱动")
        driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)
        logging.info("Chrome驱动初始化成功")
    except Exception as e:
        logging.error(f"Chrome驱动初始化失败: {str(e)}", exc_info=True)
        raise
    

    这样异常的详细信息会被写入Application Insights,能直接看到错误原因。

  • 验证Key Vault权限配置DefaultAzureCredential在Function中默认使用托管标识,要确保你的函数应用的托管标识已经被授予Key Vault的密钥读取权限,否则会出现静默认证失败的情况,导致函数报错。

  • 调整函数资源限制:无头Chrome运行需要一定的内存,消费计划默认的内存可能不够用,建议临时调高内存配额(比如到1.5GB),同时延长函数的超时时间,避免因为资源不足或超时触发500错误。

内容来源于stack exchange

火山引擎 最新活动