Python Selenium爬虫抓取数据无法存储至Excel文件的问题求助
How to Save Scrapped Data to Excel with Your Python Web Scraper
Hey there! Nice work getting that infinite scroll scraper up and running—let's get that data saved to Excel properly. Here's what's off in your current code and how to fix it:
- You're creating a new
pd.DataFrameevery loop iteration, which overwrites previous data instead of building up a full dataset. - You're storing Selenium
WebElementobjects directly in the DataFrame instead of extracting the actual text content. - You're missing the final step to write your collected data to an Excel file.
Here's the revised code that fixes all these issues:
import time import pandas as pd from selenium import webdriver from selenium.webdriver.common.by import By ##### Web scraper for infinite scrolling page ##### driver = webdriver.Chrome(executable_path='./chromedriver.exe') driver.get("https://www.zomato.com/ncr/south-delhi-restaurants/fast-food?rating_range=4.0-5.0&category=2") time.sleep(10) # Allow time for the web page to fully load scroll_pause_time = 1 screen_height = driver.execute_script("return window.screen.height;") i = 1 count = 0 # Initialize an empty list to collect all restaurant data restaurant_data = [] while True: # Scroll one screen height each time driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i)) i += 1 time.sleep(scroll_pause_time) # Update scroll height after each scroll (page content might load dynamically) scroll_height = driver.execute_script("return document.body.scrollHeight;") # Break loop when we've scrolled past the total page height if (screen_height * i) > scroll_height: break # Extract all restaurant elements title_elements = driver.find_elements(By.XPATH, "//a[@class='sc-jHZirH intUsQ']") for elem in title_elements: count += 1 # Extract visible text from elements instead of storing the WebElement itself name = elem.find_element(By.XPATH, './div/h4').text address = elem.find_element(By.XPATH, './p[1]').text # Add the current restaurant's data to our collection list restaurant_data.append({ "Bakery Restaurants": name, "Address": address }) print(count) # Convert our list of restaurant dictionaries to a DataFrame df = pd.DataFrame(restaurant_data) # Write the DataFrame to an Excel file (index=False removes auto-generated row numbers) df.to_excel("Bakery.xlsx", index=False) driver.close()
Key Changes Explained:
- Data Accumulation: We use an empty list
restaurant_datato collect each restaurant's details as a dictionary. This ensures we don't lose data between loop iterations. - Text Extraction: The
.textproperty pulls the visible text from each Selenium WebElement—this is the actual content you want to save, not the element object itself. - Excel Export: After collecting all data, we convert the list to a Pandas DataFrame and use
df.to_excel()to save it directly to a file. Theindex=Falseargument keeps your Excel sheet clean by omitting the default row index.
Run this code, and you'll find a Bakery.xlsx file in your project folder with all your scraped restaurant names and addresses!
内容的提问来源于stack exchange,提问作者krishan




