As the world faces more frequent and intense crises---driven in part by *climate change, political and social conflict, cyber attacks, and disease outbreak---humanitarians are working to protect vulnerable people beyond the initial shock of each new emergency. While coordinating efforts between humanitarian programmes and social protection systems may help support individual and community resilience, it also creates the risk of new threats to data protection. **In this post, ahead of the ICRC Digitharium's 'Digital Dilemmas' virtual debate on social protection systems, Jill Capotosto from the Yale Jackson Institute for Global Affairs reviews one of the key data protection concerns of combining -- or 'mosaicking' -- humanitarian and social protection data systems: the risk of revealing new information, or 'data revelation'. Particularly in fragile environments, the unintentional revelation of new information may expose both beneficiaries and humanitarian organizations to unanticipated harm.*
Exacerbated by the COVID-19 pandemic, many of the world's most vulnerable people find themselves increasingly at risk of facing humanitarian crises. Even before the pandemic, the World Bank estimated that more than half of the world's poor will likely live in fragile and conflict-affected settings by 2030, up from less than 20% in 2016. Climate change will play no small part in increasing the likelihood, frequency, and intensity of conflict, worsening drought and flood conditions and generating greater numbers of extreme weather events.
In this new landscape, the well-being of at-risk people depends on more than just a swift humanitarian response; it requires a seamless transition from emergency aid to long-term resilience. To do this, humanitarians are increasingly working with local governments to coordinate humanitarian programmes with government-run social protection systems. Effective coordination requires close collaboration, which often includes sharing detailed data about programmes and programme recipients. While this kind of data sharing may improve efficiency and protect people from falling through the cracks, it also poses significant data protection risks. These risks affect not only specific beneficiaries and programmes, but entire humanitarian organizations. This is especially true in today's fast-growing open data landscape. By combining datasets from collaborative humanitarian and social protection programmes with existing datasets, outside actors can increasingly reveal new---and potentially harmful---information through the act of mosaicking.
The datasphere, mosaicking, and sensitive data
By 2025, the global digital datasphere will grow to a whopping 175 zettabytes---equivalent to 175 trillion gigabytes, or 175 sextillion bytes, of digital data. In 2020, there are already more than 40 times more bytes of data than there are stars in the observable universe (a number that tops one billion trillion). We'll soon be adding nearly five zettabytes every year to the datasphere, more than six trillion bytes of data per person on earth.
Housed within this universe, the humanitarian datasphere is also growing at a breakneck pace. From its launch in 2014 to today, the Humanitarian Data Exchange ballooned from 900 datasets to almost 18,000---a 1,900% increase in just six years. Every month, an average of 100,000 people access this data, which covers every humanitarian crisis in the world.
In response to calls for collaboration, innovation, and data democracy, more and more of this data is openly shared. Open data advocates celebrate the potential for widely shared datasets to be combined---or mosaicked---with other datasets to reveal new information for the good of society. This includes everything from exposing gender pay gaps to uncovering government corruption. Humanitarians see incredible opportunities to use data to predict potential humanitarian crises, and to respond more effectively and efficiently to disasters or hardships. They share data amongst each other and with partner governments to provide critical services (a practice that will only increase if humanitarians begin to engage in government social protection programmes). They also make data public to support researchers, educate the public, and share with donors.
Yet these opportunities are tempered by the risk that sensitive information -- including personally identifiable information -- may be unintentionally revealed. In the case of humanitarian data, this can mean information about some of the world's most vulnerable people, as well as critical, life-saving programmes. Sensitive data may be revealed when the data security guidelines in place to protect vulnerable populations fail, often because they are not adequately applied to a quickly changing data landscape. Steps to de-identify datasets, for example, may fall far short of true anonymization as re-identification becomes increasingly sophisticated; sometimes, nominally de-identified data can still be mosaicked with other datasets to reveal sensitive information about individual people or groups.
An example from a non-humanitarian setting pulls these shortcomings into sharp relief. Sparked by a study that found Mohammed to be the most common name among New York City taxi drivers, a single private citizen compared Islamic prayer times with publicly available data from the NYC Taxi & Limousine Commission to identify drivers who could be Muslim. Simply mosaicking these two datasets revealed individuals whose periods of inactivity overlapped conspicuously with the five daily calls to prayer. Though the dataset had been carefully stripped of personally identifiable information---names, license plate numbers, medallion numbers---specific, potentially sensitive, groups still came into focus with minimal manipulation.
Humanitarians must be aware of the risks of data revelation and mosaicking and take proactive measures to protect sensitive data. Data is not just bytes in the ether---data is people.
Mosaicking in a humanitarian setting
It's not difficult to imagine how this can (and does) play out in humanitarian settings. Slightly adapting the taxi driver example above, we can imagine a group of recently displaced people enrolled in a programme to receive cash assistance through prepaid debit cards. Transaction data from the cards captures the location and time of cash withdrawals or deposits, as well as information about the types of items purchased with the cards. The time and location data connected to these withdrawals could easily be mosaicked with locational data on mosques and calls to prayer, revealing people who are frequently near mosques at prayer times. Transaction data for food purchases may reveal certain dietary patterns, which could further suggest a specific religious or ethnic affiliation. These assumptions, in turn, could be used to profile an individual or group, possibly to monitor them for presumed political views or allegiances, or to target, exclude, or exploit them in some other way.
For humanitarians -- who gather data on some of the world's most vulnerable people -- the implications are clear. Data created to support refugees, displaced people, asylum seekers, and others could actually put lives in greater danger if it allows them to be identified or tracked, or if it reveals other sensitive parts of their identity. It also poses risks to humanitarians and humanitarian organizations, whose operations could be compromised if, for example, it revealed the location of camps or the supply chains of critical goods.
But humanitarians will not -- and cannot -- simply stop collecting and sharing data. Their ability to provide life-saving services depends on collaborative datasets that allows them to know who needs what where. Their ability to protect also depends on how well they understand the data landscape that lies outside of their control, particularly when it comes to metadata---data about data. Humanitarian organizations cannot dictate the data that is generated, processed, or shared by stakeholders who are party to a conflict, including mobile network operators and financial institutions that may be responsible for the kinds of transaction and location data highlighted in the examples above. This means that humanitarians must be aware of external datasets---and who has access to them---so they can responsively structure their own operations to protect beneficiaries and programmes from potentially dangerous mosaicking.
What we can do
If data sharing is both crucial and risky, how can data be shared in a way that is humanitarian? For the ICRC, it comes back to the fundamental principles governing the organization's daily operations---neutrality, impartiality, and independence in humanitarian action, commonly referred to as NIIHA. These principles offer guidance on what data to collect, how to share it, and with whom. They instruct humanitarians to not be influenced by external forces urging them to collect or share more data than what is required. They provide a buffer against calls to share data too broadly or with risky parties, and to store data for longer than necessary. Importantly, they complement the work on mosaicking being done by the OCHA Centre for Humanitarian Data. The Centre's new Data Environment Mapping Tool, used to identify potentially related HDX datasets, is an important step in understanding the existing humanitarian data ecosystem. However, it is not enough on its own, as it does not capture the many non-humanitarian datasets with which HDX data could be mosaicked, nor does it consider potential future datasets or future technologies that may change the nature of mosaicking. Truly limiting the risk of sensitive data revelation depends not just on understanding the information that is already out there, but also taking proactive steps to create and manage safer datasets that uphold NIIHA.
Following the spirit of NIIHA and the ICRC Data Protection Handbook, the humanitarian sector can take specific steps to limit the risk of revealing potentially sensitive data:
1. Understand the risk of metadata
As mentioned above, metadata can be a central ingredient in data revelation, especially when it involves transaction and location data generated and controlled by mobile network operators, financial institutions, or other entities that process large quantities of sensitive data. How these stakeholders choose to generate, store, combine, and share their data---including whether they choose to share it with government authorities who may themselves use the data in many different ways---is beyond the control of humanitarian organizations. While this dynamic may not always pose a risk, humanitarians should be able to identify both the potential for harm and the steps needed to mitigate or avoid the unwanted mosaicking of metadata. Overall, humanitarians must be aware of what data is generated by whom in the process of providing aid. This requires knowing the stakeholders who are involved, as well as the types of information that may be inferred by combining datasets, to get a full picture of both how and to what extent the mosaicking of third-party data may pose a problem. Building on this landscape analysis, humanitarians can then decide how to structure programmes, determining the extent to which they must limit data generation or reduce reliance on third-party processors. They may also consider alternative modalities rather than cash to deliver programmes, including in-kind assistance, to ensure different degrees of protection. Chapter 9 of the Data Protection Handbook, which covers Cash Transfer Programming, offers helpful guidance on navigating these decisions.
2. Purpose limitation, proportionality, and data minimization
Before collecting or processing any data, humanitarians should clearly delineate their specific goals and purposes, setting clear boundaries not only on the data they will collect, but also on how the data will be processed, shared, and stored. In establishing these parameters, humanitarians can compare their goals with the data required to meet them. Are the anticipated outcomes strong enough to justify collecting and processing the data? Do the benefits outweigh the risks or the burdens? If the answer is yes, the next step is to collect and process only the absolute minimum amount of data needed, then to delete that data when the planned processing is complete. Following these steps prevents unnecessary data collection, processing, sharing, and storage, limiting the proliferation of errant, potentially identifiable, information.
3. Minimize sharing
Data sharing can be critical for effective aid delivery, but just because data must be shared does not mean it must be shared widely or freely. By only sharing with trusted partners on a need-to-know basis, and by creating strong data sharing agreements, humanitarians can safeguard against sensitive data being leaked, inappropriately shared, or captured in a breach. These steps can help ensure that if and when data is shared publicly, it is only done by organizations who have employed safety measures, such as purpose limitation, data minimization, de-identification, and protections against DII revelation.
4. Continuing education
New datasets, and new tools for analyzing datasets, are emerging almost constantly. Humanitarians must stay aware of the ever-changing global datasphere so they can understand new and emerging risks, and continue to adjust their own data practices accordingly.
As greater collaboration opens new doors for both humanitarian aid programmes and social protection systems, the potential for humanitarian data is wide open. While the risks of data collection and sharing can never be fully avoided, they can be managed. This requires unwavering dedication to humanitarian principles and purpose, putting the protection of the world's most vulnerable people at the forefront of every data decision. Data can offer new opportunities to provide protection and aid, smoothing the transition from emergency response to long-term resilience. By balancing these opportunities with a proper understanding of, and dedicated attention to, risks, humanitarians can move more capably into the data future.
- Rachel Xu, You can't handle the truth: misinformation and humanitarian action, January 15, 2021
- Massimo Marelli & Martin Schüepp, Hacking humanitarians: operational dialogue and cyberspace, June 4, 2020
- Massimo Marelli & Adrian Perrig, Hacking humanitarians: mapping the cyber environment and threat landscape, May 7, 2020
- Massimo Marelli, Hacking Humanitarians: moving towards a humanitarian cybersecurity strategy, January 16, 2020