如何使用Python的Pandas将Selenium爬取的数据导出至Excel?
Export Bitcoin Historical Data to Excel with Pandas & Selenium
Hey there! I see you've already nailed the Selenium part for scraping Bitcoin's historical data from Investing.com. Let's tweak your code to collect that data properly and export it to Excel using Pandas. Here's a straightforward, step-by-step solution:
Step 1: Install Required Libraries
First, make sure you have Pandas and the Excel-compatible engine openpyxl installed. Run this in your terminal:
pip install pandas openpyxl
Step 2: Modified Scraping & Export Code
Here's your updated code that structures the scraped data and exports it to an Excel file seamlessly:
import random from time import sleep from selenium import webdriver import pandas as pd # Initialize Chrome driver driver = webdriver.Chrome('./chromedriver.exe') driver.get('https://es.investing.com/crypto/bitcoin/btc-usd-historical-data?cid=1035793') # Handle cookie consent popup boton = driver.find_element_by_xpath('//button[@id="onetrust-accept-btn-handler"]') boton.click() sleep(random.uniform(5.0, 10.0)) # Select weekly data interval (your original option[3] choice) lista = driver.find_element_by_xpath('//*[@id="data_interval"]/option[3]') lista.click() sleep(random.uniform(10.0, 20.0)) # Close signup popup registro = driver.find_element_by_xpath('//*[@id="PromoteSignUpPopUp"]/div[2]/i') registro.click() sleep(random.uniform(8.0, 10.0)) # Initialize empty list to store structured data data = [] # Get total rows and columns in the table rows = len(driver.find_elements_by_xpath('//*[@id="curr_table"]/tbody/tr')) cols = len(driver.find_elements_by_xpath('//*[@id="curr_table"]/tbody/tr[1]/td')) # Scrape each row and collect cell values for n in range(1, rows+1): # Start at 1 to include the first data row row_data = [] for b in range(1, cols+1): # Capture all columns instead of truncating cell_text = driver.find_element_by_xpath(f'//*[@id="curr_table"]/tbody/tr[{n}]/td[{b}]').text row_data.append(cell_text) data.append(row_data) # Define column names (matches the Spanish headers on es.investing.com) columns = ['Fecha', 'Apertura', 'Máximo', 'Mínimo', 'Cierre', 'Volumen', 'Variación%'] # Convert scraped data to a Pandas DataFrame df = pd.DataFrame(data, columns=columns) # Export DataFrame to Excel df.to_excel('bitcoin_historical_data.xlsx', index=False, engine='openpyxl') # Clean up: close the browser driver.quit() print("Success! Your Bitcoin data is saved to bitcoin_historical_data.xlsx")
Key Changes Explained:
- Added
import pandas as pdto leverage Pandas' data structuring and Excel export tools. - Created a
datalist to store each row's values (instead of just printing them) for easy conversion to a DataFrame. - Adjusted the row loop to start at
1—your original code skipped the first data row withrange(2, rows+1). - Modified the column loop to capture all columns (
cols+1) instead of stopping early, ensuring we get all available metrics (date, open, high, low, close, volume, percentage change). - Defined explicit column names that match the website's Spanish headers for clarity in the Excel file.
- Used
df.to_excel()withindex=Falseto exclude Pandas' default index column, andengine='openpyxl'to support the modern .xlsx format.
Quick Notes:
- If you still want to exclude specific columns, just adjust the column loop's range (e.g., go back to
range(1, cols-4)as you originally had) and update thecolumnslist to match the remaining fields. - Double-check that your ChromeDriver version matches your installed Chrome browser to avoid compatibility errors.
内容的提问来源于stack exchange,提问作者Socrame




