Academia.eduAcademia.edu

Building an annotated corpus for Amazighe

2013

Abstract

This paper gives an overview of the morpho-syntactic features of the Amazighe language and corpus encoding, afterwards we present our experience of constructing an annotated corpus with part-of-speech (POS) information. The annotated corpora consist of 20,667 Moroccan Amazighe tokens chosen from different materials; it is to our knowledge the first one dealing with Amazighe language. The experience is also meant to give a handle on the encoding and tagging processes of the aforementioned corpus.