Using data science to improve dataset tagging on HDX
By Ghadeer Abouda
As a Data Science Fellow with the Centre, I explored how HDX metadata tags are used by data contributors and whether a typical user can expect to find the data they are looking for. The goal is for a user to enter a keyword into the search bar and get back a list of all relevant data. If the data has been tagged imprecisely by the contributor, their data may never be found. In this blog, I describe how I analyzed the relatedness of datasets to their tags and developed a solution to improve the tagging of newly added datasets on HDX.
The Humanitarian Data Exchange (HDX) now hosts more than 11,000 datasets shared by hundreds of organizations covering humanitarian crises around the world. As more data is added, it is critical that users are able to find what they are looking for. Keyword search, which relies on user-generated metadata, is the most common way to find specific data on the platform. The use of accurate, relevant tags supports better search results and promotes increased user engagement with data on HDX.