You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python写入Excel避免覆盖及去除表头行号问题求助

Solution to Your Amazon Scraper & Excel Writing Issue

Hey Ross, let's fix your code and address your two main questions directly:

Key Fixes & Explanations

1. Stop Writing Headers and Row Numbers

When using to_excel(), you can disable both with two simple parameters:

  • *index=False*: Removes the row numbers (index column) from the output
  • *header=False*: Omits the column headers from being written to Excel

2. Avoid Overwriting Existing Data

Your original code was creating a new ExcelWriter every loop iteration, which overwrites the file each time. Instead, collect all your scraped data into a single DataFrame first, then write it to Excel once. This is far more efficient and prevents overwrites.

If you ever need to append to an existing Excel file (not just start fresh), use ExcelWriter with mode='a' and specify the engine='openpyxl' (since the default engine doesn't support appending), but for your current use case, collecting data first is the cleanest approach.

Updated Working Code

import pandas as pd
import requests
from bs4 import BeautifulSoup
from pandas import ExcelWriter

# Read the original URL list from your input Excel
df = pd.read_excel('python.xlsx')

# Initialize an empty DataFrame to store all scraped data
scraped_data = pd.DataFrame(columns=['Url', 'Title', 'Price'])

for row_idx in range(len(df)):
    url = df.iloc[row_idx, 0]
    headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'}
    
    try:
        page = requests.get(url, headers=headers)
        page.raise_for_status()  # Catch HTTP errors like 404 or 500
        soup = BeautifulSoup(page.content, 'html.parser')
        
        # Extract and clean product data
        title = soup.find(id="productTitle").get_text().strip()
        price = soup.find(id="priceblock_ourprice").get_text()
        converted_price = float(price[1:6])
        
        # Append the scraped row to our collection DataFrame
        scraped_data.loc[row_idx] = [url, title, converted_price]
    
    except Exception as e:
        print(f"Error processing row {row_idx}: {str(e)}")
        continue  # Skip problematic rows and keep the script running

# Write all collected data to Excel in one go
with ExcelWriter(r'C:\Users\HP\Documents\python2.xlsx') as writer:
    scraped_data.to_excel(writer, sheet_name='Sheet1', index=False, header=False)

Additional Updates Based on Your Notes

  • Update 1: Instead of writing to Excel in each loop, we collect all rows into a single DataFrame first, then write once. This avoids repeated file operations and accidental overwrites.
  • Update 2: To prevent headers from being written, we use header=False directly in to_excel()—no need to modify the DataFrame's structure itself.
  • Update 3: Removing row numbers is done via index=False in to_excel()—you don't need to alter the DataFrame's index to achieve this.

Quick Note on Error Handling

I added a try-except block to handle cases where a URL might fail to load, or the product title/price elements can't be found. This prevents your script from crashing halfway through scraping.

内容的提问来源于stack exchange,提问作者Ross Symonds

火山引擎 最新活动