Accessing and plotting World Bank data with R
Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site and it is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool. You can have the world’s data in your pocket.
In this post I will show you how we can access data from the World Bank in R. As an example we create a motion chart, in the Hans Rosling style, as you find it on the Google Public Data Explorer site, which also uses data from the World Bank. Doing this, should give us the confidence that we understand the World Bank’s interface. You can find this example as demo WorldBank
as part of the googleVis package from version 0.2.10 onwards.
So let’s try to replicate the initial plot of the Google Public Data Explorer, which shows fertility rate against life expectancy for each country from 1960 to today, whereby the countries are represented as bubbles, with the size reflecting the population and the colour the region.
Duncan Temple Lang provides us with examples for accessing the World Bank’s data using his RJSONIO and RCurl packages. The World Bank data is available via their API either as XML or JSON. We will use JSON as it is straightforward to read the JSON data set into R and to transform it into a data frame with the fromJSON
function of the RJSONIO package. In order to query the data base we have to know which indicator variable we want and what its key is. Thankfully, the World Bank provides us with a page which lists all indicator variables. Clicking on any of those reveals the indicator key in the URL. For our example we get the following mappings:
Indicator | Key |
fertility rate | SP.DYN.TFRT.IN |
life expectancy | SP.DYN.LE00.IN |
population | SP.POP.TOTL |
GDP per capita (current US$) | NY.GDP.PCAP.CD |
That’s about it. From Duncan we have learned how to create the URL string to query the data base, and how to transform the query result from JSON into a data frame. The rest is re-arranging the data and combining the various data sets to get the final table. We display it via a motion chart using the gvisMotionChart
function of the googleVis package. You find the detailed R code below.
## This demo shows how country level data can be accessed from ## the World Bank via their API and displayed with a Motion Chart. ## Inspired by Google's Public Data Explorer, see ## http://www.google.com/publicdata/home ## ## For the World Bank Data terms of use see: ## http://data.worldbank.org/summary-terms-of-use ## ## To run this demo an internet connection and Flash are required. ## This demo is part of the googleVis R package. ## ## Markus Gesmann, 24 September 2011 ## Distributed under GPL 2 or later getWorldBankData <- function(id='SP.POP.TOTL', date='1960:2010', value="value", per.page=12000){ require(RJSONIO) url <- paste("http://api.worldbank.org/countries/all/indicators/", id, "?date=", date, "&format=json&per_page=", per.page, sep="") wbData <- fromJSON(url)[[2]] wbData = data.frame( year = as.numeric(sapply(wbData, "[[", "date")), value = as.numeric(sapply(wbData, function(x) ifelse(is.null(x[["value"]]),NA, x[["value"]]))), country.name = sapply(wbData, function(x) x[["country"]]['value']), country.id = sapply(wbData, function(x) x[["country"]]['id']) ) names(wbData)[2] <- value return(wbData) } getWorldBankCountries <- function(){ require(RJSONIO) wbCountries <- fromJSON("http://api.worldbank.org/countries?per_page=12000&format=json") wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist))) wbCountries$longitude <- as.numeric(wbCountries$longitude) wbCountries$latitude <- as.numeric(wbCountries$latitude) levels(wbCountries$region.value) <- gsub(" \\(all income levels\\)", "", levels(wbCountries$region.value)) return(wbCountries) } ## Create a string 1960:this year, e.g. 1960:2011 years <- paste("1960:", format(Sys.Date(), "%Y"), sep="") ## Fertility rate fertility.rate <- getWorldBankData(id='SP.DYN.TFRT.IN', date=years, value="fertility.rate") ## Life Expectancy life.exp <- getWorldBankData(id='SP.DYN.LE00.IN', date=years, value="life.expectancy") ## Population population <- getWorldBankData(id='SP.POP.TOTL', date=years, value="population") ## GDP per capita (current US$) GDP.per.capita <- getWorldBankData(id='NY.GDP.PCAP.CD', date=years, value="GDP.per.capita.Current.USD") ## Merge data sets wbData <- merge(life.exp, fertility.rate) wbData <- merge(wbData, population) wbData <- merge(wbData, GDP.per.capita) ## Get country mappings wbCountries <- getWorldBankCountries() ## Add regional information wbData <- merge(wbData, wbCountries[c("iso2Code", "region.value", "incomeLevel.value")], by.x="country.id", by.y="iso2Code") ## Filter out the aggregates and country id column subData <- subset(wbData, !region.value %in% "Aggregates" , select= -country.id) ## Create a motion chart M <- gvisMotionChart(subData, idvar="country.name", timevar="year", options=list(width=700, height=600)) ## Display the chart in your browser plot(M)
Created by Pretty R at inside-R.org
Addition: We could simplify the code by using the WDI package by Vincent Arel-Bundock, as Diego points out in his comment below.
Citation
For attribution, please cite this work as:Markus Gesmann (Sep 25, 2011) Accessing and plotting World Bank data with R. Retrieved from https://magesblog.com/post/2011-09-25-accessing-and-plotting-world-bank-data/
@misc{ 2011-accessing-and-plotting-world-bank-data-with-r,
author = { Markus Gesmann },
title = { Accessing and plotting World Bank data with R },
url = { https://magesblog.com/post/2011-09-25-accessing-and-plotting-world-bank-data/ },
year = { 2011 }
updated = { Sep 25, 2011 }
}