The dataset Release Coming Soon! For inquiries and collaboration, feel free to reach out to us.
In this project, we aim to collect and annotate emotion in texts for African languages, which are among the most diverse and under-resourced languages in the world. We will use various sources of text data, such as news articles, social media posts, literary works, and oral narratives, to cover a range of domains, genres, and styles. We will also use a comprehensive and consistent emotion annotation scheme, based on the six basic emotions (anger, disgust, fear, happiness, sadness, and surprise), to label the texts with their corresponding emotion categories.
The main objectives of this project are:
- To create a large and high-quality emotion dataset for African languages, which can be used for various NLP tasks, such as emotion detection, emotion generation, emotion analysis, and emotion synthesis.
- To investigate the similarities and differences of emotion expression and perception across different African languages and cultures, and to identify the linguistic and cultural factors that influence them.
- To develop and evaluate NLP models and methods for emotion processing in African languages, and to explore the challenges and opportunities of cross-lingual and multilingual emotion processing.
This project will contribute to the advancement of NLP research and applications for African languages, and to the understanding of emotion as a universal and diverse human phenomenon.
cite our papers:
| # | Language | Country | Language Coordinators |
|---|---|---|---|
| 1. | Hausa | Nigeria | Murja Sani Gadanya |
| 2. | Yoruba | Nigeria | David Ifeoluwa Adelani |
| 3. | Igbo | Nigeria | Chiamaka Ijeoma Chukwuneke |
| 4. | Nigerian-Pidgin | Nigeria | Saminu Mohammad Aliyu |
| 5. | Amharic | Ethiopia | Ebrahim Chekol Jibril |
| 6. | Tigrinya | Ethiopia | Hagos Tesfahun Gebremichael |
| 7. | Oromo | Ethiopia | Tadesse Kebede Guge |
| 8. | Somali | Ethiopia | Elyas Abdi Ismail |
| 9. | Twi | Ghana | Abigail Oppong |
| 10. | Swahili | Kenya | Lilian D. A. Wanzare |
| 11. | Moroccan Arabic | Morocco | Oumaima Hourrane |
| 12. | Kinyarwanda | Rwanda | Samuel Rutunda |
| 13. | isiZulu | South Africa | Rooweither Mabuya |
| 14. | isiXhosa | South Africa | Andiswa Bukula |
| 15. | Algerian Arabic | Algeria | Nedjma Ousidhoum |
This is a collaborative project with team members from different universities, institutions, and the industry. Team members include:
| Name | Affiliation |
|---|---|
| Shamsuddeen Hassan Muhammad | Bayero University, Kano Nigeria; MasaKhane |
| Esubalew Alemneh | Bahir Dar University, Bahir Dar, Ethiopia |
| Ibrahim Said Ahmad | Northeastern University; Bayero University Kano; MasaKhane |
| Seid Muhie Yimam | University of Humberg; MasaKhane; EthioNLP |
| Idris Abdulmumin | Ahmadu Bello University, Zaria, Nigeria |
| Abinew Ali Ayele | Bahir Dar University; EthioNLP |
| David Ifeoluwa Adelani | MasaKhane; Saarland University |
| Saminu Mohammad Aliyu | Bayero University, Kano; MasaKhane |
| Nedjma Ousidhoum | University of Cambridge |
To extend the reach of emotion classification data on a global scale, we are organzing a SemEval shared task 2025. Leveraging datasets sponsored by the Lucuna fund, our initiative encompasses over 12 additional languages, predominantly focusing on low-resource languages in Asia and Latin America. This shared task not only facilitates broader utilization of the dataset but also propels research in low-resource languages worldwide.
