在R中从JSP页面抓取SBIN过去24个月股价表格失败求助

阿华AIGC实验室

2026-5-11

How to Scrape SBIN's Last 24 Months Stock Price Data into R

Hey there! Let’s tackle this SBIN stock data scraping issue together. I’ve helped folks troubleshoot similar problems before, so let’s break down why your RSelenium/httr attempts might be failing and walk through two reliable solutions—one using a dedicated financial data package (way easier!) and another for direct web scraping if you need that specific table.

Why Your Initial Attempts Might Be Failing

Dynamic Content: If the table loads via JavaScript, httr alone can’t render it—it only fetches raw HTML.
Timing Issues with RSelenium: You might be trying to extract the table before it fully loads, or using the wrong selector to target it.
Stock Ticker Confusion: SBIN is listed on the National Stock Exchange (NSE) of India, so you need the correct ticker symbol for most data tools.

Solution 1: Use `quantmod` (Recommended—No Web Scraping Needed!)

The quantmod package is built for fetching financial data directly from exchanges, so it’s way more reliable than scraping. Here’s how to get the last 24 months of SBIN data:

# Install and load the package (run once)
install.packages("quantmod")
library(quantmod)

# Define date range: last 24 months from today
end_date <- Sys.Date()
start_date <- end_date - months(24)

# Fetch SBIN data (use "SBIN.NS" for NSE listing)
getSymbols("SBIN.NS", from = start_date, to = end_date, auto.assign = TRUE)

# Convert the xts object to a clean data frame
sbin_data <- as.data.frame(SBIN.NS)
# Add a Date column (since quantmod uses row names for dates)
sbin_data$Date <- rownames(sbin_data)
rownames(sbin_data) <- NULL

# Check the first few rows
head(sbin_data)

This will give you all the standard stock metrics (Open, High, Low, Close, Volume, Adjusted Close) without dealing with web page rendering or selectors.

Solution 2: Web Scraping with RSelenium (If You Need the Exact Table)

If you specifically need the table from your target link, here’s a refined RSelenium workflow that fixes common timing/selector issues:

Step 1: Set Up RSelenium

# Install required packages (run once)
install.packages(c("RSelenium", "rvest", "dplyr"))
library(RSelenium)
library(rvest)
library(dplyr)

# Start Chrome driver (adjust port if needed)
driver <- rsDriver(browser = "chrome", port = 4567L)
remDr <- driver[["client"]]

# Optional: Set a realistic user agent to avoid anti-scraping blocks
remDr$executeScript("navigator.userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36';")

Step 2: Load the Page & Wait for the Table

# Navigate to your target URL (replace with your actual link)
remDr$navigate("YOUR_TARGET_PAGE_URL")

# Wait for the table to load (use explicit wait instead of Sys.sleep for reliability)
# Replace the CSS selector with your table's actual ID/class (find via browser dev tools F12)
remDr$waitForElement(using = "css selector", value = "table#sbin-historical-table", timeout = 10000)

Step 3: Extract & Clean the Table

# Get the table's HTML
table_html <- remDr$findElement(using = "css selector", value = "table#sbin-historical-table")$getElementAttribute("outerHTML")[[1]]

# Parse HTML into a data frame
sbin_table <- read_html(table_html) %>%
  html_table(fill = TRUE) %>%
  .[[1]]

# Clean up the data (adjust based on your table's structure)
sbin_table <- sbin_table %>%
  # Remove any extra header rows (if your table has them)
  filter(row_number() > 1) %>%
  # Rename columns to match your table's headers
  rename(Date = X1, Open = X2, High = X3, Low = X4, Close = X5, Volume = X6)

# View the result
head(sbin_table)

Step 4: Clean Up

# Close the browser and stop the driver
remDr$close()
driver$server$stop()

Key Tips for Success

Find the Right Selector: Use your browser’s dev tools (right-click the table → Inspect) to get the exact CSS selector or XPath for the table.
Avoid Anti-Scraping Blocks: Add delays between actions, use a realistic user agent, and don’t hammer the website with requests.
Stick to quantmod When Possible: It’s maintained, reliable, and avoids all the headaches of web scraping.

内容的提问来源于stack exchange，提问作者Nad Pat