Crowd Sourcing ground-truth demographics for machine learning

The collection of geo-demographic data for understanding cities has typically been undertaken by census and other large scale surveys. Providing the basis for a wide range of activity including city planning, market intelligence and policy the information is routinely collected by both governments and market intelligence companies within developed countries at significant cost. In less developed countries, this cost means that information is typically collected less frequently and at a lower fidelity if at all. Looking to address this situation, the N/LAB is working on alternate methods for deriving geo-demographic data, with a focus on Tanzania in Africa. Working with partner companies, the project aims to leveraging existing data streams such as Call Detail Records, Mobile Money and drone/high resolution satellite imagery to automatically generate equivalent geo-demographic for direct use by local companies (market intelligence) and policy (measurement of economic activity and citizen vulnerability).

As part of this goal an extensive exercise to develop ground-truths for fine-grained geographical regions in Dar es Salaam in Tanzania has been undertaken, jointly funded by the RUCK and the Gate Foundation. The ground-truths, acquired through repeated survey and statistically significant agreements between local experts collected information on social mobility, average age group, cultural background, average age of buildings, land use, zoning, population density, as well as social demographics of the regions.

This ongoing project has and will underpinned a number of other projects undertaken by the N/LAB, most recently contributing to N/LAB’s project on Deep Learning and Aerial Imagery for Land Use Classification.


  • Second round of data collection occurring late March.
    Contact us for details and/or to discuss access to this data.
Posted in Projects.