All our community docs! Start here! Lets put Africa on the NLP Map
MASAKHANE is an research effort for NLP for African languages that is OPEN SOURCE, CONTINENT-WIDE, DISTRIBUTED and ONLINE. This GitHub repository houses the data, code, results and research for building open baseline NLP results for African languages.
Website: masakhane.io
Masakhane is a grassroots organisation whose mission is to strengthen and spur NLP research in African languages, for Africans, by Africans. Despite the fact that 2000 of the world’s languages are African, African languages are barely represented in technology. The tragic past of colonialism has been devastating for African languages in terms of their support, preservation and integration. This has resulted in technological space that does not understand our names, our cultures, our places, our history.
Masakhane roughly translates to “We build together” in isiZulu. Our goal is for Africans to shape and own these technological advances towards human dignity, well-being and equity, through inclusive community building, open participatory research and multidisciplinarity
Umuntu Ngumuntu Ngabantu - loosely translated from isiZulu means “a person is a person through another person” or “I am because you are”. This philosophy calls for collaboration and participation and community. It proposes relationality, over individualism for stronger social cohesions towards sustainable communities. It believes we share our successes and one’s personhood is evaluated based on their contributions to the community.
African-centricity. We centralize the narratives of Africans as a remedy to the effects of Euro-centricism on our beliefs. This way we reassert a new way of looking at information from a African perspective and shun any attempts to devalue our knowledge and stories
Ownership - We believe that Africans should be in charge of owning, driving and participating in the NLP research process, rather than as observers or data providers.
Openness - We believe in sharing our ideas and progress openly, especially on the African continent, for Africans. We’re against research that takes African contributions or data and puts them behind a paywall that is infeasible for Africans to access.
Multidisciplinarity - We truly believe that participation from all fields and experience and that multidisciplinarity leads to a more robust and more inclusive society
Everyone has valuable knowledge - We believe that each person’s individual experiences have value and each person is worth listening too and has something to contribute.
Kindness - We believe that being considerate, friendly and generous within our community is the best way to support it and encourage more inclusivity
Responsibility - We believe that each person in the technology process has an ethical responsibility to what they produce in the world. For this reason, we actively wreckon with the ethical impacts of our work
Data sovereignty - We believe Africans should be able to decide what data represents our communities globally, retain ultimate ownership of that data, and know how it is used
Reproducibility - We believe in reproducible research. As a result, we publish our code and data from our research so that others can reproduce and build upon it.
Sustainability - We believe that sustainability is necessary for societal change - that small daily efforts, over a long time are what truly change the world. To that, we aim for sustainability of our work, by being fully integrated with technological stakeholders to ensure the community continues to thrive into the future
For Africa: To build and facilitate a community of NLP researchers, connect and grow it, spurring and sharing further research, build helpful tools for applications in government, medicine, science and education, to enable language preservation and increase its global visibility and relevance.
For NLP Research: To build data sets and tools to facilitate NLP research on African languages, and to pose new research problems to enrich the NLP research landscape.
For the global researchers community: To discover best practices for distributed research, to be applied by other emerging research communities.
There are many ways to contribute to MASAKHANE.
Want more details? Check out our current initiatives
Join our Slack
Request to join our Google Group - this will add you to our weekly meetings
So we can feature you on our webpage masakhane.io, please fill in our membership form HERE:
Please be patient with a response via our email address, we're very behind on our administration, in the time of COVID-19.
Every week we have more ideas, and more impromptu projects that emerge. Keen on any initiatives? Join our slack and find the respective group.
Working on a Masakhane initiative that is not listed here? Please add it with a PR :heart:
Keen to help on any of these initiatives? Please see our message board
Initiative | Description | Slack Channel | Repository |
---|---|---|---|
Machine Translation Benchmarks | Continued expansion and iterations on our language benchmarks as documented on the main GitHUB README | #benchmarks | HERE |
NER Datasets and Benhmarks | We're busy releasing datasets and research around NER | #ner | HERE |
Dataset Creation | We never have enough data. More is always needed. We have a number of members finding creative ways to build datasets. | #datasetcreation | |
Reproducibility | The goal is to ensure reproducibility and comparability of models and results. | #reproducibility | |
Takalani NLP | Development of Language Models for South African languages | #takalani-nlp | |
Wazobia | Yoruba, Igbo, Hausa and Nigerian languages NMT | #wazobia | |
Multilingual Chatbot | Developing multilingual chatbots | #multilingual-dialogue | |
Transfer Learning | Transfer Learning & Multilingual Expansion of Benchmarks | #transfer-learning | |
Evaluation of Masakhane Models | How good are the Masakhane models? How can we measure it, besides looking at BLEU scores? | #evaluation | |
Text-to-speech | Corpora and models for text to speech synthesis (TTS) from audio bibles in Ewe, Hausa, Lingala, Asante Twi, Akuapem Twi and Yoruba | #bible-speech | HERE |
See Code of Conduct