Informing humanitarians worldwide 24/7 — a service provided by UN OCHA

World

Reaching the unreached with life-saving vaccines through data science and geospatial technologies

Attachments

*Niccolo Cirone, Technology for Development Specialist, UNICEF Western and Central Africa Regional Office, Kyriaki Kalimeri, Senior Researcher, Frontier Data Network, Rocco Panciera, Geospatial Health Specialist, UNICEF HQ*

Each year, approximately 6 million of the 20 million children born in West and Central Africa miss out on life-saving vaccines during their first year of life, sometimes called 'zero-dose children'. UNICEF’s ‘Reach the Unreached’ initiative aims to improve child vaccination coverage data by pairing efforts in strengthening administrative systems with innovative probability models and alternative data sources to provide geolocated vaccination estimates. However, incorporating multiple probability models and data sources brings new challenges, such as potential biases and inconsistencies. This article highlights a recent pilot designed to equip programme teams with actionable insights to navigate these discrepancies in vaccination coverage estimates.

Data inadequacy and inaccuracies limit immunization programming

In the absence of timely and accurate vaccine administration records and civil registration systems, immunization planning, delivery and monitoring are often informed by a combination of different national coverage data sources. This includes the annual WHO and UNICEF Estimates of National Immunization Coverage (WUENIC), which are derived from administrative data sources, and household surveys, where available. Denominator data for eligible children is often sourced from population projections based on the most recent census, helping to estimate the eligible population for immunization, in the absence of real-time demographic information.

While widely used to estimate vaccination coverage, national estimates do not effectively represent diverse sub-national realities and can magnify or mask underlying data challenges. National estimates can also be especially misleading in countries or locations where population dynamics have changed in recent years, due to urbanization or other demographic or socio-economic trends. Additionally, administrative data on vaccination coverage in many countries show discrepancies compared to survey data—sometimes reporting aggregated vaccination coverages of over 100 per cent.

UNICEF works closely with countries to improve administrative data through efforts to enhance the quality and timeliness of civil registration (e.g. birth and death registration) records, update population projections, and strengthen data systems. In fact, civil registration records offer significant advantages in estimating vaccine coverage and are the gold standard for denominator data, providing continuous, up-to-date, and disaggregated figures down to the lowest administrative levels.

While strengthening civil registration and vital statistics (CRVS) systems remains an integral component of UNICEF’s efforts to improve vaccine coverage data, in contexts where administrative data at a local level presents challenges, estimates continue to play a crucial role in vaccination planning.

Inaccuracies and biases in estimates have consequences for immunization programme planning and implementation. They may mislead health workers on the ground about the actual size of their caseload, or the areas that require priority vaccination efforts, ultimately preventing UNICEF, governments, and health workers from accurately identifying areas with vulnerable “zero-dose” children who lack essential vaccinations and, often, other fundamental child rights.

Modern data tech introduces new possibilities – and new challenges

The Reach the Unreached (RtU) initiative was launched by UNICEF’s Western and Central Africa Office as a collaboration with five pilot countries (Cameroon, Cote d’Ivoire, Chad,
Guinea and Mali). Its goal is to explore the potential of new data sources and methods, such as machine learning algorithms to highlight geographic inequities in immunization coverage. The initiative uses machine learning algorithms to disaggregate population data and Bayesian models to estimate immunization coverage, helping uncover the prevalence of zero-dose children—those who have not received any vaccines.

Using this approach, Frontier Data Network colleagues in the regional office and country offices put over 1.1 million unreached children on the map, aiming to make an additional, granular source of information available to participating countries to identify local geographies at risk of being left behind, uncovering and investigating child rights inequities, starting with immunization and birth registration.

While the employed methodologies are rigorous and well-documented among geostatistics academics and practitioners, RtU represents a spearheading innovation for these countries as the methods are applied with the most country-specific, up-to-date inputs. The following chart summarizes the models leveraged by RtU to estimate the number and distribution of zero-dose children, together with several examples of world-class institutions proposing different designs, with different results.

As UNICEF Geospatial Health Specialist Rocco Panciera notes, the addition of new data sources and models introduces new challenges for interpretation. “While the proliferation of granular population estimates and vaccination coverage datasets is beneficial and possibly game-changing, these new sources of information will only make an impact for improving health programming and health outcomes if they're integrated into existing information systems and decision-making processes at the country level.”

**Comparing estimation models to improve decision-making**

Recognizing the potential issues and biases posed by these innovative methodologies, RtU team members contacted Frontier Data Network (FDN) researchers and data scientists for support to compare different estimation models. The FDN team worked with the RtU team to co-define the problem and data science requirements and secured additional expertise by mobilizing a partnership with MIT.

With the support of MIT data science students, the FDN and RtU teams systematically compared a series of population and vaccination estimates generated by the three major organizations routinely producing such estimates: the Institute for Health Metrics and Evaluation (IHME) (coverage estimates), WorldPop (see population estimates, vaccination coverage), and the Demographic and Health Survey (DHS) Programme (coverage estimates).

The RtU approach uses advanced statistical and machine learning methods, which means there is some level of uncertainty in the population data, vaccination coverage, and the final estimates of children who have not received vaccinations. To better understand this uncertainty, FDN and MIT looked at how different datasets affect the estimates of zerodose children, while considering possible errors and variations. Their analysis showed that there is significant variability in the models used for both population and vaccination coverage, even for similar countries and years. These comparisons were based on data from national surveys and the annual WHO and UNICEF estimates of vaccination coverage.

Given the observed high variability between the population and coverage estimates produced by different organizations for comparable countries and years, and in the absence of a clear ground truth to assess absolute accuracy of the estimates, the analysis focused on the relative comparison of the estimates between different datasets. This showed that the estimates produced as part of the RtU, which use country-specific covariates, rather than regional or global ones, often produced results that are best-aligned with the national-level WUENIC estimates, therefore strengthening confidence on the validity of the methodology adopted during initiative.

Ultimately, this analysis helped build trust in these scientifically rigorous models that make use of the latest available official data but disaggregate them in a more detailed and advanced way, providing valuable benefits in terms of granularity over the application of generic census projections and national estimates.

This image illustrates the wide variability of vaccination coverage estimates obtained when different probabilistic approaches are considered. Considering the Côte d’Ivoire as an example, and the WHO/UNICEF estimates as the common standard for vaccination coverage, administrative data sources over-estimate the vaccination coverage with respect to other sources, while the estimations based on the Demographic and Health Survey (DHS) surveys are under-estimating the coverage. Comparatively, the Reach the Unreached approach, using a combination of country-specific covariates, the best possible input data (most granular, most up to date), and solid data science methodologies, yields a similar result than WUENIC when aggregated to the national level.

“We lack insights into how data bias and algorithmic inequalities affect combined population estimation and vaccination coverage models,” says Manuel Garcia-Herranz, FDN’s Principal Researcher. “Even for single models, understanding performance across different socioeconomic contexts is challenging. For chained estimations, such analysis is nonexistent. How can algorithmic inequalities shape RtU results? Do they cluster in specific areas? While our technical partners focused on building models, we needed support to assess the interplay of these layers. Working with MIT provided additional and dedicated expertise to help us evaluate outputs and identify risks of algorithmic distortions.”

A warning for decision-makers and data practitioners

This initiative demonstrates the potential of leveraging data from machine learning algorithms and probabilistic models to examine geographic inequities in the reach of health services, including immunization coverage, supporting the goal of “leaving no one behind”.

As part of this effort with the FDN, the RtU and MIT teams conducted additional analyses showing strong links between under-immunization and other child rights deprivations, particularly the lack of birth certificates, which are essential for legal identity. These findings highlight both a warning and an opportunity for UNICEF, governments, and stakeholders to use non-traditional data sources for a coordinated approach to immunization and birth registration, targeting the same “unreached” children.

By employing a combination of methodologies, this project also highlights the importance of thoroughly assessing external data sources before integrating them into planning or programming, always looking at the final aim of the decision-making process to evaluate if a dataset or a model can be a fit for the purpose, and with what caveats.

**Looking Ahead**

As UNICEF continues working with governments to strengthen civil registration and other administrative data systems to improve the precision of vaccination coverage estimations, the RtU team is making complementary frontier datasets usable for programmatic decision-making at a granular level, and to help overcome existing pitfalls.

For example, high-resolution raster data of population, zero-dose children, and vaccination coverage can be aggregated by custom geometries like catchment areas of health facilities, or overlayed with information on the capacity of local health systems (e.g., geospatial data for human resources, cold chain, and logistics) and other child rightsbased institutions, to identify potential opportunities for more efficient allocation of resources. Additional analysis, enabled by these datasets, could include assessing the resilience of road networks to external disruptions such as extreme climate events, or political conflicts, and their impact on the accessibility of healthcare for the most vulnerable populations.

A key contribution of this approach is its generalizability and sustainability across countries, providing comparable insights and country-level coverage maps with fine grain geographical resolution. This ambitious initiative also trained over 120 government personnel in the participating countries to build local GIS and data science capacity, ensuring localized ownership and uptake of this innovative approach. As the pilot phase comes to an end, new countries including Burkina Faso or the Democratic Republic of Congo are interested in joining the initiative, and discussions are ongoing to replicate the approach to support equity in other child rights, looking for instance at out-of-school children.

*Acknowledgements*

The Frontier Data Network (FDN), supported by the Frontier Data Technology Unit at UNICEF’s Chief Statistician Office, is dedicated to accelerating adoption of innovative data technologies and practices to enhance delivery of impactful, reproducible, data-driven solutions for the world’s most vulnerable children. We achieve this by cultivating community and partnerships to scale talent, data infrastructure, and new analytics initiatives.

Project team includes Niccolo Cirone, Initiative Coordinator and Data Specialist, Minu Limbu, Initiative Supervisor and Regional Business Analyst (Innovation and T4D) , Janice Kaday Williams, Technology for Development Specialist, Abdoulaye Sandiakou Doucoure, Chief of ICT4D, Ulrike Gilbert, Regional Health Adviser, Karin Heissler, Regional Child Protection Adviser, Alex Garraud, Immunization Specialist, Eyram Adzra, Digital Health Specialist, Gloria Waithira Mathenge, Civil Registration Vital Statistics and Legal Identity Specialist, Stephanie Kauv, M&E Specialist at UNICEF’s West and Central Africa Regional Office; Yves Jaques, Chief Frontier Data Tech Unit, Manuel Garcia Herranz, Principal Researcher, Frontier Data Network; Kyriaki Kalimeri, Senior Researcher, Frontier Data Network; Rocco Panciera, Geospatial Health Specialist, Digital Health and Information Systems Unit, Programme Division; Bhaskar Mishra, Child Protection Specialist, Child Protection Section, Programme Division, Martin Bogaert and Tommaso Salvatore, MIT Sloan Operations Research Center.