You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何使用Python的Pandas将Selenium爬取的数据导出至Excel?

Export Bitcoin Historical Data to Excel with Pandas & Selenium

Hey there! I see you've already nailed the Selenium part for scraping Bitcoin's historical data from Investing.com. Let's tweak your code to collect that data properly and export it to Excel using Pandas. Here's a straightforward, step-by-step solution:

Step 1: Install Required Libraries

First, make sure you have Pandas and the Excel-compatible engine openpyxl installed. Run this in your terminal:

pip install pandas openpyxl

Step 2: Modified Scraping & Export Code

Here's your updated code that structures the scraped data and exports it to an Excel file seamlessly:

import random
from time import sleep
from selenium import webdriver
import pandas as pd

# Initialize Chrome driver
driver = webdriver.Chrome('./chromedriver.exe')
driver.get('https://es.investing.com/crypto/bitcoin/btc-usd-historical-data?cid=1035793')

# Handle cookie consent popup
boton = driver.find_element_by_xpath('//button[@id="onetrust-accept-btn-handler"]')
boton.click()
sleep(random.uniform(5.0, 10.0))

# Select weekly data interval (your original option[3] choice)
lista = driver.find_element_by_xpath('//*[@id="data_interval"]/option[3]')
lista.click()
sleep(random.uniform(10.0, 20.0))

# Close signup popup
registro = driver.find_element_by_xpath('//*[@id="PromoteSignUpPopUp"]/div[2]/i')
registro.click()
sleep(random.uniform(8.0, 10.0))

# Initialize empty list to store structured data
data = []
# Get total rows and columns in the table
rows = len(driver.find_elements_by_xpath('//*[@id="curr_table"]/tbody/tr'))
cols = len(driver.find_elements_by_xpath('//*[@id="curr_table"]/tbody/tr[1]/td'))

# Scrape each row and collect cell values
for n in range(1, rows+1):  # Start at 1 to include the first data row
    row_data = []
    for b in range(1, cols+1):  # Capture all columns instead of truncating
        cell_text = driver.find_element_by_xpath(f'//*[@id="curr_table"]/tbody/tr[{n}]/td[{b}]').text
        row_data.append(cell_text)
    data.append(row_data)

# Define column names (matches the Spanish headers on es.investing.com)
columns = ['Fecha', 'Apertura', 'Máximo', 'Mínimo', 'Cierre', 'Volumen', 'Variación%']

# Convert scraped data to a Pandas DataFrame
df = pd.DataFrame(data, columns=columns)

# Export DataFrame to Excel
df.to_excel('bitcoin_historical_data.xlsx', index=False, engine='openpyxl')

# Clean up: close the browser
driver.quit()

print("Success! Your Bitcoin data is saved to bitcoin_historical_data.xlsx")

Key Changes Explained:

  • Added import pandas as pd to leverage Pandas' data structuring and Excel export tools.
  • Created a data list to store each row's values (instead of just printing them) for easy conversion to a DataFrame.
  • Adjusted the row loop to start at 1—your original code skipped the first data row with range(2, rows+1).
  • Modified the column loop to capture all columns (cols+1) instead of stopping early, ensuring we get all available metrics (date, open, high, low, close, volume, percentage change).
  • Defined explicit column names that match the website's Spanish headers for clarity in the Excel file.
  • Used df.to_excel() with index=False to exclude Pandas' default index column, and engine='openpyxl' to support the modern .xlsx format.

Quick Notes:

  • If you still want to exclude specific columns, just adjust the column loop's range (e.g., go back to range(1, cols-4) as you originally had) and update the columns list to match the remaining fields.
  • Double-check that your ChromeDriver version matches your installed Chrome browser to avoid compatibility errors.

内容的提问来源于stack exchange,提问作者Socrame

火山引擎 最新活动