如何用Selenium从Excel读取并逐个运行300个URL?附代码优化请求
How to Read URLs from Excel and Open Them with Selenium (Java)
Hey there! Let's walk through how to adjust your code to read those 300 URLs from Excel and open each one in Chrome using Selenium. Here's a step-by-step breakdown and the corrected implementation:
Key Steps to Make This Work
- Gather dependencies: You'll need Apache POI libraries (to handle Excel files) and Selenium WebDriver. If using Maven, add these to your
pom.xml:<!-- Apache POI for Excel parsing --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>5.2.5</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>5.2.5</version> </dependency> <!-- Selenium WebDriver --> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>4.15.0</version> </dependency> - Extract URLs from Excel: Parse the Excel sheet and collect valid URLs into a list.
- Loop through URLs: Use a single WebDriver instance to open each URL (more efficient than launching a new browser every time).
- Add error handling: Catch common issues like missing files, invalid URLs, or page load failures.
Corrected Code Implementation
Here's the modified version of your code that does exactly what you need:
import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.chrome.ChromeOptions; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import org.apache.poi.xssf.usermodel.XSSFSheet; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.CellType; import java.io.File; import java.io.FileInputStream; import java.util.ArrayList; import java.util.List; import java.util.Iterator; public class ExcelUrlOpener { public static void main(String[] args) { // Set ChromeDriver path System.setProperty("webdriver.chrome.driver", "C:\\Users\\Jatin\\Downloads\\chromedriver_win32\\chromedriver.exe"); // Optional: Run Chrome in headless mode to save resources (remove if you want to see the browser) ChromeOptions options = new ChromeOptions(); options.addArguments("--headless=new"); // Initialize WebDriver (reuse this instance for all URLs) WebDriver driver = new ChromeDriver(options); // Add implicit wait to handle page loading delays driver.manage().timeouts().implicitlyWait(java.time.Duration.ofSeconds(10)); List<String> urlList = new ArrayList<>(); // Read URLs from Excel file try { File excelFile = new File("C:\\Users\\Jatin\\Documents\\Output.xlsx"); FileInputStream fis = new FileInputStream(excelFile); XSSFWorkbook workbook = new XSSFWorkbook(fis); XSSFSheet sheet = workbook.getSheetAt(0); // Target the first sheet Iterator<Row> rowIterator = sheet.iterator(); // Skip header row if your Excel has one (delete this block if no header) if (rowIterator.hasNext()) { rowIterator.next(); } while (rowIterator.hasNext()) { Row row = rowIterator.next(); Cell urlCell = row.getCell(0); // Assume URLs are in the first column (index 0) // Only collect valid string URLs starting with http/https if (urlCell != null && urlCell.getCellType() == CellType.STRING) { String url = urlCell.getStringCellValue().trim(); if (url.startsWith("http://") || url.startsWith("https://")) { urlList.add(url); } } } // Clean up Excel resources workbook.close(); fis.close(); } catch (Exception e) { System.err.println("Error reading Excel file: " + e.getMessage()); e.printStackTrace(); driver.quit(); return; } // Open each URL in the browser for (int i = 0; i < urlList.size(); i++) { String url = urlList.get(i); try { System.out.printf("Processing URL %d/%d: %s%n", i+1, urlList.size(), url); driver.get(url); // Optional: Add a short delay between URLs (adjust or remove as needed) Thread.sleep(1500); } catch (Exception e) { System.err.printf("Failed to open URL %d: %s%n", i+1, url); e.printStackTrace(); } } // Clean up: Close the browser once all URLs are processed driver.quit(); System.out.println("All URLs processed successfully!"); } }
Key Improvements Over Your Original Code
- URL collection: Instead of just printing Excel values, we store valid URLs in a list for later use.
- Reused WebDriver: Launching one browser instance instead of 300 saves resources and speeds up the process.
- Validation: We filter out non-URL values to avoid errors when trying to open invalid strings.
- Header handling: Skips the first row if your Excel has a header (like "URL")—remove that block if your sheet has no header.
- Error resilience: Catches errors for individual URLs so one bad link doesn't stop the entire batch.
- Headless option: Optional setting to run Chrome without a visible window (great for background processing).
内容的提问来源于stack exchange,提问作者Jåţîñ Sēţĥï




