使用Selenium多进程下载网页时,Chrome「另存为」窗口交互异常的修复方案咨询
Selenium多进程下载网页时,Chrome「另存为」窗口交互异常的修复方案咨询
我现在有两个脚本,在多进程下载网页时遇到了交互问题:
other.py负责生成URL列表并异步调用下载脚本:
# 生成URL列表示例 urls = ['https://www.walmart.com/ip/Sabrina-Carpenter-Cherry-Pop-EDP-30ml-1oz/5492571361?classType=REGULAR&athbdg=L1600', 'https://www.walmart.com/ip/Hoey-5-1-Painless-Hair-Remover-Women-Facial-Removal-Electric-Cordless-Shaver-Set-Wet-Dry-Lady-Razor-Women-Bikini-Line-Nose-Hair-Eyebrow-Arm-Leg-USB-R/647670434?classType=REGULAR'] # 异步调用get_url.py,不阻塞后续代码执行 subprocess.Popen(['python', 'get_url.py', str(urls)])
get_url.py接收URL并通过多进程批量下载网页:
import pandas as pd import os import time import sys import ast from datetime import datetime import pyautogui from selenium.webdriver.chrome.options import Options from selenium import webdriver from concurrent.futures import ProcessPoolExecutor def get_page(url): # 生成带时间戳的文件名 file_name = f"{url[:20]}_{pd.to_datetime(datetime.now()).strftime('%Y-%m-%d %H-%M-%S')}.html" file_path = os.path.join(os.getcwd(), 'data', 'htmls') path_and_name = os.path.join(file_path, file_name) # 启动Chrome浏览器并访问目标URL driver = webdriver.Chrome(options=options) driver.get(url) # 模拟快捷键打开「另存为」窗口,输入保存路径并确认 time.sleep(1) pyautogui.hotkey('ctrl', 's') time.sleep(1) pyautogui.typewrite(path_and_name) time.sleep(.5) pyautogui.hotkey('enter') time.sleep(.2) # 等待下载完成后关闭浏览器 while True: files = os.listdir(file_path) if file_name in files: driver.close() break time.sleep(.1) # 接收来自other.py的URL字符串并转换为列表 urls = sys.argv[1] urls = ast.literal_eval(urls.replace('[', '').replace(']', '').replace('\n', ', ')) if __name__ =='__main__': options = Options() # 多进程批量处理URL(必须启用以提升速度) with ProcessPoolExecutor(max_workers=10) as executer: executer.map(get_page, urls, chunksize = 1)
遇到的问题
单浏览器窗口运行时功能正常,但一旦ProcessPoolExecutor开启多个窗口,pyautogui.typewrite会出现以下异常:
- 无法精准追踪「另存为」窗口,导致
path_and_name重复输入或输入不完整,进而引发下载失败、文件命名错误或路径错误; - 如果运行时点击其他窗口(比如代码编辑器),
pyautogui会直接将路径文件名输入到当前激活的窗口中; - 即使将浏览器设置为无头模式,上述问题仍然无法解决。
想请教各位,该如何修复这段代码?
备注:内容来源于stack exchange,提问作者Saeed




