Daba: a model and tools for Manding corpora

Kirill Maslinsky

Daba: a model and tools for Manding corpora

Kirill Maslinsky

visibility

…

description

9 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This article provides a brief overview of Daba software package created in the course of building corpora for Manding languages. Key software features are motivated by the tasks and problems characteristic of many African languages. The corpus-building model proposed here was initially developed for Bambara Reference Corpus which is available online and is freely accessible. The morphological analysis procedure and corpus annotation scheme are dis-cussed in detail. Daba uses a morpheme-based morphological annotation scheme inspired by the interlinear glossed form of presentation of linguistic examples. A scheme mapping Daba's morpheme-based morphological information onto tra-ditional word-based corpus annotation is provided. Since Bambara is characterized by a low level of written language standardization special attention is paid to the issues of representing variability in corpus annotation. Résumé. L'article traite du paquet des logiciels « Daba » créé dans le cadre du pr...

Kenneth Ngure

2013

African Language Technology is rapidly becoming one of the hottest new topics in computational linguistics. The increasing availability of digital resources, an exponentially growing number of publications and a myriad of exciting new projects are just some of the indications that African Language Technology has been firmly established as a mature field of research. The AfLaT workshops attempt to bring together researchers in the field of African Language Technology and provide a forum to present ongoing efforts and discuss common obstacles and goals. We are pleased to present to you the proceedings of the Second Workshop on African Language Technology (AfLaT 2010), which is held in collocation with the Seventh International Conference on Language Resources and Evaluation (LREC 2010). We were overwhelmed by the quantity and quality of the submissions we received this year, but were lucky enough to have a wonderful program committee, who sacrificed their valuable time to help us pick the cream of the crop. We pay tribute to their efforts by highlighting reviewers' quotes in the next paragraphs. Grover et al. kick off the proceedings with a comprehensive overview of the HLT situation in South Africa, followed by Bański and Wójtowicz's description of an initiative that is beneficial to the creation of resources [...] for African languages. De Pauw et al. describe techniques that could be used to develop a plethora of [...] HLT resources with minimal human effort, while Shah et al. present impressive results on tackling the problem of NER in MT systems between languages, one of which at least is poorly resourced. Groenewald and du Plooy's paper tackles the all too-often overlooked problem of text anonymization in corpus collection, followed by Chege et al.'s effort that is significant [...] to the open source community, not just for Gĩkũyũ but for the African languages in general. Faaß presents a useful resource for further computational processing of the language of Northern Sotho. Tachbelie and Menzel provide a clear and concise overview of the general issues affecting language models for morphologically rich languages, while Van der Merwe et al. go into an informative discussion of the properties of the Zulu verb, its extensions, and deverbatives. The paper by Oosthuizen et al. aptly discusses the issue of quantifying and correcting transcription differences between inexperienced transcribers, while Davydov's paper is an interesting case study for collecting corpora for "languages recently put into writing". Ng'ang'a presents the key resource for the identification of a machine-readable dialectal dictionary for Igbo and Purvis concludes by discussing a corpus that contributes to the development of HLT tools for Dagbani. We are proud to have Justus Roux as the invited speaker for this year's edition of AfLaT to discuss one of the most often asked and rarely answered questions in our field of research: Do we need linguistic knowledge for speech technology applications in African languages? We hope you enjoy the AfLat 2010 workshop and look forward to meeting you again at AfLaT 2011.

Log In

Daba: a model and tools for Manding corpora

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics