Innovative Big Data Approaches for Capturing and Analyzing Data to Monitor and Achieve the SDGs

Originally published
View original


Executive summary

This report showcases around 140 big data approaches to potentially assist traditional statistics methods in capturing and analysing data to support the calculation of SDG indicators and the achievement of SDG targets. The presented approaches also aim to replace costly occasional surveys of traditional statistics with cheaper real-time information. The structure of the report is as follows: First, the SDGs are introduced with a focus on current challenges regarding lacking data as well as methodologies. Then an overview of big data, IoT and AI is given with a focus on categorization, opportunities and challenges. The main section is dedicated to describing, classifying and linking the aforementioned approaches to suitable SDG indicators and targets. Benefits, risks and potential recommendations for pilot projects are discussed per big data category. This is followed by a summary of the key findings, an analysis and the conclusion.

The 2030 Agenda for Sustainable Development for the time period from 2016 until 2030 comprises 17 Sustainable Development Goals (SDGs), subdivided into 169 targets and 232 indicators. In comparison, there were only 8 Millennium Development Goals (MDGs) with 21 targets and 60 indicators for the previous period from 2000 until 2015. Not only do the SDGs cover a much broader range of issues, the SDG indicators are also very different from and more complex than the MDG indicators, thus in many instances challenging for traditional statistics. Therefore, innovative approaches are required. The technological environment has continued to advance in recent years to a stage where it now appears promising to harness big data for both the achievement of SDG targets as well as the calculation of SDG indicators. Many of the new big data are passively emitted and collected as by-products of people’s interactions with and uses of digital devices. Data coming from various sources provide unique insights about human behaviour and beliefs, which could be harnessed to increase the quality of life of these people, thereby contributing ultimately to the achievements of the SDGs.

In contrast to MDG data, which were mostly collected and owned by Governments, critical SDG data are produced passively by people, collected by machines and owned by corporations. Under the umbrella of corporate social responsibility, data philanthropy is a win-win opportunity for corporations to cost-efficiently improve their reputation while the UN or other organizations are receiving the data in order to use them. The concept of open data calls for Governments to provide free of charge, up to date, openly licensed and machine readable online data to enable data analysis by NGOs or other stakeholders.

Consequently, bigger quantities of data have come with low-costs and real-time availability, enabling the collection of bigger samples that are statistically more significant. Combined with the concepts of real-time data and open data, citizen participation can create more efficient citizen participation and feedback. All of these help monitor SDG indicators frequently where possible, which would facilitate the identification of significant data in a more timely manner and thus inform appropriate policy and programmatic decisions. As such, big data offers a key opportunity for predictive analytics by identifying trends so that probabilistic scenarios for the future can be proposed. Connectivity to pass the data on and capacity to perform real-time analysis remain challenges.

In order to analyse the data efficiently, the volume of data and the need to develop analysis approaches for all types of big data need to be addressed. Real-time data are only beneficial if real-time analysis can also be conducted. Issues of privacy, with regards to the tracing of individuals who produced the data and having them potentially face negative repercussions, must be prevented. There remain significant challenges to access relevant data, which have not been collected for either institutional or for technical reasons. With regards to data format, a desirable solution to reduce cleaning efforts, but also to apply approaches globally towards SDG targets and indicators, would be the introduction and widespread acceptance of standardized formats and controlled vocabularies.

It is also critical to scrutinize potential correlations, especially when analysing important and novel SDG-related data. Capacities for data literacy are particularly needed in National Statistics Offices and, as often, tend to be less strong in developing countries. Where the costs for the production of SDG-relevant data have been calculated, innovative big data approaches are actually recognized as a means to save expenses on traditional statistics. Although the low costs of big data do provide an opportunity, additional costs must be taken into account, especially when it comes to big data for development. It is estimated bt Sustainable Development Solutions Network (2015a) that approximately US$1 billion are required yearly to enable statistical systems to monitor the SDGs. Further costs include initial investments to sensor installations and related infrastructure as well as the running costs for energy consumption. Moreover, Because of the large volume of the data, the likelihood of errors is not only higher, but the errors themselves are also harder to detect.

In addition, problems remain with the indicators regarding methodology and data availability due to the innovative data types, as well the massive amount of required data, with only one third of the SDG indicators possessing an existing methodology and data availability to some extent. As of April 2017, progress towards the associated targets of the remaining two thirds of the indicators can currently not be monitored. The report consequently analyses the potential offered by exhaust data, sensing data and digital content in resolving these issues. While big data are the principal topic of this report, their potential is only enabled through synergies with the internet of things (IoT) as well as methods of artificial intelligence (AI): The growing IoT allows capturing numerous additional data, and innovative AI techniques allow analysing large amounts of data, often in realtime. This report addresses the initial challenges that all three concepts are only vaguely defined and are accompanied by a not always beneficial media hype.

One way to look at the IoT is that it aims to reduce the information gap between the world and the internet. As such, IoT is characterized by a shift from P2P communication and decision making to M2M communication and decision making. The key components of the IoT are sensors which detect changes in their environment and potentially quantify the extent of the change. This enables a much more comprehensive and remote monitoring of the status of the environment, which includes nature, human behaviour, urban settings, infrastructure, means of traffic, etc. Consequently, the IoT contributes enormously towards big data as more and more sensors produce data and lead to more precise predictive analytics. Possible applications range from the potentially remote management of manufacturing, infrastructure, traffic and buildings to monitoring of environment and illegal activities.

Healthcare can benefit in two ways from IoT, namely through preventive intervention as well as through remote monitoring. To promote accessibility and coordination, universal standards and protocols for interoperability within the IoT are needed better sooner than later. Much concern has been expressed about security, considering what all is or is planned to be connected to the IoT, such as household devices, means of traffic, factories or hospitals and medical sensors. Given that one promising application of the IoT is to monitor other devices for energy efficiency, it is critical that the IoT does not waste unnecessary energy. Moreover, IoT contributes to electronic waste due to a short lifespan and hazardous elements that are inefficiently or not possible to be recycled, which affects developing countries in particular. Solutions for efficient energy use of the IoT and reduction of electronic waste are critical. Overall, there have been and there are still other traditional sources for big data, but the IoT has enormous potential to provide much more data, accurately and in real-time.

Unlike big data and the IoT, the United Nations has not embraced much the field of AI yet, neither in terms of projects, nor in terms of policies. Machine learning and deep learning offer opportunities for data mining and pattern recognition, to discover regularities and correlations within big data and use them for conclusions, which are applicable for decision making, and predictions about the future. The ENEA countries have been pioneers in some fields of AI, e.g. robotics, and have the capacities to continue top-class AI research, also towards the SDGs.

Adverse effects due to AI are also starting to be identified, which include discrimination towards groups of the population and exacerbating economic inequality. The analysis phase of the data must not become a bottleneck, which can only be prevented through powerful AI methodologies. Moreover, the monitoring of individuals, e.g. through biometric sensors, raises privacy concerns, which must be addressed by strict guidelines. AI safety is an area of research that focuses on exploring methods to increase the likelihood that, if and when a machine reaches superintelligence, its behaviour is aligned with what humans value. Therefore, it is highly desirable that such a machine values the SDGs as well as many things that humans value. While the field is currently booming again, there is no guarantee that this will continue or that all expectations will be fulfilled. AI allows analysing data in new ways. Overall, AI has enormous potential to analyse much more data to deliver actionable information in real-time.

Whereas big data, IoT and AI are already individually very powerful and innovative technologies, combining them creates further synergies, which explains the current enthusiasm about these developments and which can be also harnessed for the calculation of SDG indicators or for the achievement of SDG targets. The United Nations has acknowledged the relevance as well as the potential of data. In a report, called “A world that counts”, commissioned by the previous UN Secretary-General Ban Ki-moon, it is stated: “Never again should it be possible to say ‘we didn’t know’. No one should be invisible. This is the world we want – a world that counts.” (Independent Expert Advisory Group on a Data Revolution for Sustainable Development, 2014, p. 3). As such, the Economic and Social Commission for Asia and the Pacific (ESCAP) supports its member states in their efforts to achieve the 2030 Agenda and also recommends the enhancement of data capacities and the harnessing of science, technology and innovation for this purpose (United Nations Economic and Social Commission for Asia and the Pacific, 2016a; United Nations Economic and Social Commission for Asia and the Pacific, 2016b). The ESCAP East and North-East Asia (ENEA) sub-region covers some technologically advanced countries, thus offering a suitable environment to pioneer ground-breaking methodologies towards the SDGs.

Because of the extreme range of topics, the 230 SDG indicators are very diverse. Three types of indicators can be identified as not suitable for big data, namely the number or proportion of countries or governments, agreements or strategies, and financial flows or investments or budgets. These indicators are neither about what people say or do, nor can sensors capture data about these indicators. Big data approaches were found to assist the calculation of 42 SDG indicators, of which there are 232 in total. Of these 232, 70 have been listed as not suitable for big data, i.e. for 26 per cent of the remaining 162 indicators approaches that were presented, which is more than one quarter. Big data and AI approaches were found to assist the achievement of 66 SDG targets, of which there are 169 in total. This is 39 per cent, i.e. a higher percentage than the 24.5 per cent that assist calculation of SDG indicators. In addition, big data and AI approaches were found to assist the achievement of 66 SDG targets, of which there are 169 in total. This is 39 per cent, i.e. a higher percentage than the 24.5 per cent that assist calculation of SDG indicators. Overall, it is recommended that the ENEA countries promote and incentivize a vibrant environment for big data, IoT and AI, which involves the private sector, including start-ups, academic research institutions as well as NGOs to further foster citizen participation.