Informing humanitarians worldwide 24/7 — a service provided by UN OCHA

Unlocking the Power of AI for Humanitarians

In the fast-paced world of humanitarian response, every second counts. Picture a scenario where timely and efficient decisions can make a significant difference in life-saving humanitarian operations. That’s where HumSet and HumBERT step in — groundbreaking tools developed by Data Friendly Space for the DEEP Platform.

HumSet: Turbocharged Data Assistance

Think of HumSet as an open-source humanitarian dataset, a rich multilingual collection of humanitarian response documents annotated by experts. It’s designed to swiftly extract vital information from large amounts of text and facilitate humanitarian data analysis ([1]).

How does it help? Whether assessing urgent needs or preparing critical reports, HumSet streamlines the process. Imagine you’re a humanitarian analyst working with lots of text data related to crises, disasters or emergencies. To be able to work on tasks such as information extraction, summarisation, you will need labelled or classified text data.

HumSet is the annotated humanitarian dataset that will eliminate the extra time and resources to label data from scratch and to be able to augment these tasks especially with humanitarian understanding. So, when you need to prepare reports or assess situations, HumSet makes your job easier.

HumBERT: The Language Expert

Imagine a humanitarian language expert. While general language models are trained on a wide range of text and do well in many tasks, they often struggle with specific fields that use unique vocabulary and concepts not common in everyday language. To tackle this issue, HumBERT was developed as a pre-trained language model and knows a lot about human text from classified humanitarian data.

Here’s the cool part — you can fine-tune HumBERT for specific tasks. Need it to recognise names or automatically classify text to identify needs/sectors/impacts? HumBERT’s got your back. HumBERT was trained on multilingual humanitarian texts such as reports and news articles about humanitarian crises and aid, making it better suited for analysing this type of information.

Together, HumSet and HumBERT empower humanitarians to act swiftly and accurately, making a real impact on the ground.

Real-Life Use Cases

Mitigating Bias: In one case study ([2]), HumSet played a crucial role. HumSet is used as a benchmark dataset to compare the performance of different foundational language models, including HumBERT, and to identify and reduce their gender and geographical biases. In humanitarian work, it is crucial to mitigate and address biases and to have accurate geographic information to properly assess needs and how they change over time. Accurate and unbiased classification matters — it helps us to assess needs and track changes over time ethically.

Enhancing Geographical Detection: Another case study ([3]) focused on improving geographical entity detection. HumSet served as a key dataset for creating a tool that can better identify and clarify geographic locations from text. Imagine enhancing our “geocoding” abilities in humanitarian documents — knowing where events occur is essential for effective response and planning.

Powered by ReliefWeb Reports

HumSet and HumBERT are like language superheroes in the humanitarian world. They draw knowledge from three main sources: ReliefWeb, UNHCR Refworld, and Europe Media Monitor News Brief (EMMT). These reports are highly valuable sources of information, enabling our tools to learn about the world’s challenges. Imagine a library of 50 million sentences — each packed with insights. That’s the humanitarian corpus these tools use. It's their power, allowing them to decipher the complexities of the world.

Ready to Dive Deeper?

Both HumSet and HumBERT are available on the DEEP Platform NLP HuggingFace profile([4]). Explore their capabilities and join us in harnessing AI for a more efficient and impactful humanitarian response.

In Conclusion

HumSet and HumBERT represent a significant step forward in the use of AI for humanitarian work. These NLP-based solutions support evidence-based decision-making and provide faster, deeper access to crucial humanitarian information. Let’s continue using technology to make a difference! 🌐🤝

Authors:

References:

[1] Selim Fekih, Nicolo Tamagnone, Benjamin Minixhofer, Ranjan Shrestha, Ximena Contla, Ewan Oglethorpe, and Navid Rekabsaz. 2022. HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crises Response. In Findings of the Association for Computational Linguistics: EMNLP 2022.

[2] Nicolò Tamagnone, Selim Fekih, Ximena Contla, Nayid Orozco, and Navid Rekabsaz. 2023. Leveraging domain knowledge for inclusive and bias-aware humanitarian response entry classification. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI '23). Article 690, 6219–6227.

[3] Enrico Belliardo, Kyriaki Kalimeri, and Yelena Mejova. 2023. Leave no Place Behind: Improved Geolocation in Humanitarian Documents. In Proceedings of the 2023 ACM Conference on Information Technology for Social Good (GoodIT '23).

[4] https://huggingface.co/nlp-thedeep