关于世界银行API多数据库数据获取及分页设置的技术问询
Great questions about working directly with the World Bank API—let’s break this down step by step, no third-party packages required.
Short answer: You can’t do this in a single API request. The World Bank API is designed such that all indicators in one request must belong to the same data source (database).
Here’s why your earlier attempts behaved the way they did:
- When you used
source=2(World Development Indicators, WDI) with multiple indicators separated by semicolons, it worked because all those indicators live in the WDI database. - When mixing indicators from WDI and Doing Business (different sources), the API can’t resolve indicators across separate datasets, so the request fails.
Solution: Separate Requests + Merge in R
You’ll need to make one request per data source, then combine the results in R using common keys like country ISO code and year. Here’s a concrete example:
library(httr) library(jsonlite) # Fetch WDI indicators (source=2: World Development Indicators) wdi_request <- httr::GET("http://api.worldbank.org/v2/country/all/indicator/AG.AGR.TRAC.NO;NE.CON.PRVT.ZS?format=json&source=2&per_page=1000") %>% content("text", encoding = "UTF-8") %>% fromJSON(flatten = TRUE) wdi_data <- wdi_request[[2]] # API returns metadata ([[1]]) + actual data ([[2]]) # Fetch a Doing Business indicator (source=14: Doing Business) db_request <- httr::GET("http://api.worldbank.org/v2/country/all/indicator/IC.BUS.EASE.XQ?format=json&source=14&per_page=1000") %>% content("text", encoding = "UTF-8") %>% fromJSON(flatten = TRUE) db_data <- db_request[[2]] # Merge datasets using country code and year as shared keys merged_data <- merge( wdi_data[, c("countryiso3code", "date", "AG.AGR.TRAC.NO", "NE.CON.PRVT.ZS")], db_data[, c("countryiso3code", "date", "IC.BUS.EASE.XQ")], by = c("countryiso3code", "date"), all = TRUE # Preserve all rows, even if some have missing data )
The World Bank API has a hard maximum of 1000 rows per page—using per_page=9999999 won’t work; the API will automatically cap it at 1000. To get all data, you need to:
- First retrieve the total number of rows from the API’s metadata.
- Calculate how many pages you need to cover all rows.
- Loop through each page to fetch data, then combine everything into one dataframe.
Solution: Automated Page Loop
Here’s a reusable function to handle this:
# Function to fetch all pages of data for a given API endpoint fetch_all_pages <- function(base_url) { # Initial request to get total rows and metadata initial_response <- httr::GET(base_url) %>% content("text", encoding = "UTF-8") %>% fromJSON(flatten = TRUE) total_rows <- initial_response[[1]]$total per_page <- 1000 # Max allowed by World Bank API total_pages <- ceiling(total_rows / per_page) # Round up to cover all rows # Store data from each page in a list all_data <- list() # Loop through each page for (page_num in 1:total_pages) { page_url <- paste0(base_url, "&page=", page_num) page_response <- httr::GET(page_url) %>% content("text", encoding = "UTF-8") %>% fromJSON(flatten = TRUE) all_data[[page_num]] <- page_response[[2]] Sys.sleep(0.5) # Small delay to avoid hitting API rate limits } # Combine all pages into a single dataframe do.call(rbind, all_data) } # Example usage: Fetch all WDI data for your two indicators full_wdi_dataset <- fetch_all_pages("http://api.worldbank.org/v2/country/all/indicator/AG.AGR.TRAC.NO;NE.CON.PRVT.ZS?format=json&source=2&per_page=1000")
Key Notes:
- The API returns data as a list: the first element is metadata (total rows, page count), the second is the actual dataset.
- Using
ceiling(total_rows / per_page)ensures you don’t miss any rows if the total isn’t a perfect multiple of 1000. - The
Sys.sleep(0.5)is optional but recommended to avoid overwhelming the API with rapid consecutive requests.
内容的提问来源于stack exchange,提问作者Jeparov




