关于世界银行API多数据库数据获取及分页设置的技术问询

阿华AIGC实验室

2026-5-6

Great questions about working directly with the World Bank API—let’s break this down step by step, no third-party packages required.

1. Fetching Indicators from Different Databases in a Single Request?

Short answer: You can’t do this in a single API request. The World Bank API is designed such that all indicators in one request must belong to the same data source (database).

Here’s why your earlier attempts behaved the way they did:

When you used source=2 (World Development Indicators, WDI) with multiple indicators separated by semicolons, it worked because all those indicators live in the WDI database.
When mixing indicators from WDI and Doing Business (different sources), the API can’t resolve indicators across separate datasets, so the request fails.

Solution: Separate Requests + Merge in R

You’ll need to make one request per data source, then combine the results in R using common keys like country ISO code and year. Here’s a concrete example:

library(httr)
library(jsonlite)

# Fetch WDI indicators (source=2: World Development Indicators)
wdi_request <- httr::GET("http://api.worldbank.org/v2/country/all/indicator/AG.AGR.TRAC.NO;NE.CON.PRVT.ZS?format=json&source=2&per_page=1000") %>%
  content("text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE)
wdi_data <- wdi_request[[2]]  # API returns metadata ([[1]]) + actual data ([[2]])

# Fetch a Doing Business indicator (source=14: Doing Business)
db_request <- httr::GET("http://api.worldbank.org/v2/country/all/indicator/IC.BUS.EASE.XQ?format=json&source=14&per_page=1000") %>%
  content("text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE)
db_data <- db_request[[2]]

# Merge datasets using country code and year as shared keys
merged_data <- merge(
  wdi_data[, c("countryiso3code", "date", "AG.AGR.TRAC.NO", "NE.CON.PRVT.ZS")],
  db_data[, c("countryiso3code", "date", "IC.BUS.EASE.XQ")],
  by = c("countryiso3code", "date"),
  all = TRUE  # Preserve all rows, even if some have missing data
)

2. Setting Rows Per Page & Fetching All Data

The World Bank API has a hard maximum of 1000 rows per page—using per_page=9999999 won’t work; the API will automatically cap it at 1000. To get all data, you need to:

First retrieve the total number of rows from the API’s metadata.
Calculate how many pages you need to cover all rows.
Loop through each page to fetch data, then combine everything into one dataframe.

Solution: Automated Page Loop

Here’s a reusable function to handle this:

# Function to fetch all pages of data for a given API endpoint
fetch_all_pages <- function(base_url) {
  # Initial request to get total rows and metadata
  initial_response <- httr::GET(base_url) %>%
    content("text", encoding = "UTF-8") %>%
    fromJSON(flatten = TRUE)
  
  total_rows <- initial_response[[1]]$total
  per_page <- 1000  # Max allowed by World Bank API
  total_pages <- ceiling(total_rows / per_page)  # Round up to cover all rows
  
  # Store data from each page in a list
  all_data <- list()
  
  # Loop through each page
  for (page_num in 1:total_pages) {
    page_url <- paste0(base_url, "&page=", page_num)
    page_response <- httr::GET(page_url) %>%
      content("text", encoding = "UTF-8") %>%
      fromJSON(flatten = TRUE)
    
    all_data[[page_num]] <- page_response[[2]]
    Sys.sleep(0.5)  # Small delay to avoid hitting API rate limits
  }
  
  # Combine all pages into a single dataframe
  do.call(rbind, all_data)
}

# Example usage: Fetch all WDI data for your two indicators
full_wdi_dataset <- fetch_all_pages("http://api.worldbank.org/v2/country/all/indicator/AG.AGR.TRAC.NO;NE.CON.PRVT.ZS?format=json&source=2&per_page=1000")

Key Notes:

The API returns data as a list: the first element is metadata (total rows, page count), the second is the actual dataset.
Using ceiling(total_rows / per_page) ensures you don’t miss any rows if the total isn’t a perfect multiple of 1000.
The Sys.sleep(0.5) is optional but recommended to avoid overwhelming the API with rapid consecutive requests.

内容的提问来源于stack exchange，提问作者Jeparov