在R中从JSP页面抓取SBIN过去24个月股价表格失败求助
Hey there! Let’s tackle this SBIN stock data scraping issue together. I’ve helped folks troubleshoot similar problems before, so let’s break down why your RSelenium/httr attempts might be failing and walk through two reliable solutions—one using a dedicated financial data package (way easier!) and another for direct web scraping if you need that specific table.
Why Your Initial Attempts Might Be Failing
- Dynamic Content: If the table loads via JavaScript,
httralone can’t render it—it only fetches raw HTML. - Timing Issues with RSelenium: You might be trying to extract the table before it fully loads, or using the wrong selector to target it.
- Stock Ticker Confusion: SBIN is listed on the National Stock Exchange (NSE) of India, so you need the correct ticker symbol for most data tools.
Solution 1: Use quantmod (Recommended—No Web Scraping Needed!)
The quantmod package is built for fetching financial data directly from exchanges, so it’s way more reliable than scraping. Here’s how to get the last 24 months of SBIN data:
# Install and load the package (run once) install.packages("quantmod") library(quantmod) # Define date range: last 24 months from today end_date <- Sys.Date() start_date <- end_date - months(24) # Fetch SBIN data (use "SBIN.NS" for NSE listing) getSymbols("SBIN.NS", from = start_date, to = end_date, auto.assign = TRUE) # Convert the xts object to a clean data frame sbin_data <- as.data.frame(SBIN.NS) # Add a Date column (since quantmod uses row names for dates) sbin_data$Date <- rownames(sbin_data) rownames(sbin_data) <- NULL # Check the first few rows head(sbin_data)
This will give you all the standard stock metrics (Open, High, Low, Close, Volume, Adjusted Close) without dealing with web page rendering or selectors.
Solution 2: Web Scraping with RSelenium (If You Need the Exact Table)
If you specifically need the table from your target link, here’s a refined RSelenium workflow that fixes common timing/selector issues:
Step 1: Set Up RSelenium
# Install required packages (run once) install.packages(c("RSelenium", "rvest", "dplyr")) library(RSelenium) library(rvest) library(dplyr) # Start Chrome driver (adjust port if needed) driver <- rsDriver(browser = "chrome", port = 4567L) remDr <- driver[["client"]] # Optional: Set a realistic user agent to avoid anti-scraping blocks remDr$executeScript("navigator.userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36';")
Step 2: Load the Page & Wait for the Table
# Navigate to your target URL (replace with your actual link) remDr$navigate("YOUR_TARGET_PAGE_URL") # Wait for the table to load (use explicit wait instead of Sys.sleep for reliability) # Replace the CSS selector with your table's actual ID/class (find via browser dev tools F12) remDr$waitForElement(using = "css selector", value = "table#sbin-historical-table", timeout = 10000)
Step 3: Extract & Clean the Table
# Get the table's HTML table_html <- remDr$findElement(using = "css selector", value = "table#sbin-historical-table")$getElementAttribute("outerHTML")[[1]] # Parse HTML into a data frame sbin_table <- read_html(table_html) %>% html_table(fill = TRUE) %>% .[[1]] # Clean up the data (adjust based on your table's structure) sbin_table <- sbin_table %>% # Remove any extra header rows (if your table has them) filter(row_number() > 1) %>% # Rename columns to match your table's headers rename(Date = X1, Open = X2, High = X3, Low = X4, Close = X5, Volume = X6) # View the result head(sbin_table)
Step 4: Clean Up
# Close the browser and stop the driver remDr$close() driver$server$stop()
Key Tips for Success
- Find the Right Selector: Use your browser’s dev tools (right-click the table → Inspect) to get the exact CSS selector or XPath for the table.
- Avoid Anti-Scraping Blocks: Add delays between actions, use a realistic user agent, and don’t hammer the website with requests.
- Stick to
quantmodWhen Possible: It’s maintained, reliable, and avoids all the headaches of web scraping.
内容的提问来源于stack exchange,提问作者Nad Pat




