You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python Selenium爬虫抓取数据无法存储至Excel文件的问题求助

How to Save Scrapped Data to Excel with Your Python Web Scraper

Hey there! Nice work getting that infinite scroll scraper up and running—let's get that data saved to Excel properly. Here's what's off in your current code and how to fix it:

  • You're creating a new pd.DataFrame every loop iteration, which overwrites previous data instead of building up a full dataset.
  • You're storing Selenium WebElement objects directly in the DataFrame instead of extracting the actual text content.
  • You're missing the final step to write your collected data to an Excel file.

Here's the revised code that fixes all these issues:

import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By

##### Web scraper for infinite scrolling page #####
driver = webdriver.Chrome(executable_path='./chromedriver.exe')
driver.get("https://www.zomato.com/ncr/south-delhi-restaurants/fast-food?rating_range=4.0-5.0&category=2")
time.sleep(10) # Allow time for the web page to fully load
scroll_pause_time = 1 
screen_height = driver.execute_script("return window.screen.height;") 
i = 1
count = 0

# Initialize an empty list to collect all restaurant data
restaurant_data = []

while True:
    # Scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
    i += 1
    time.sleep(scroll_pause_time)
    # Update scroll height after each scroll (page content might load dynamically)
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    # Break loop when we've scrolled past the total page height
    if (screen_height * i) > scroll_height:
        break

# Extract all restaurant elements
title_elements = driver.find_elements(By.XPATH, "//a[@class='sc-jHZirH intUsQ']")

for elem in title_elements:
    count += 1
    # Extract visible text from elements instead of storing the WebElement itself
    name = elem.find_element(By.XPATH, './div/h4').text
    address = elem.find_element(By.XPATH, './p[1]').text
    # Add the current restaurant's data to our collection list
    restaurant_data.append({
        "Bakery Restaurants": name,
        "Address": address
    })
    print(count)

# Convert our list of restaurant dictionaries to a DataFrame
df = pd.DataFrame(restaurant_data)
# Write the DataFrame to an Excel file (index=False removes auto-generated row numbers)
df.to_excel("Bakery.xlsx", index=False)

driver.close()

Key Changes Explained:

  1. Data Accumulation: We use an empty list restaurant_data to collect each restaurant's details as a dictionary. This ensures we don't lose data between loop iterations.
  2. Text Extraction: The .text property pulls the visible text from each Selenium WebElement—this is the actual content you want to save, not the element object itself.
  3. Excel Export: After collecting all data, we convert the list to a Pandas DataFrame and use df.to_excel() to save it directly to a file. The index=False argument keeps your Excel sheet clean by omitting the default row index.

Run this code, and you'll find a Bakery.xlsx file in your project folder with all your scraped restaurant names and addresses!

内容的提问来源于stack exchange,提问作者krishan

火山引擎 最新活动