Academia.eduAcademia.edu

Descovering Collocations in Modern Greek Language

2004

Abstract

In this paper two statistical methods for extracting collocations from text corpora written in Modern Greek are described, the mean and variance method and a method based on the X 2 test. The mean and variance method calculates distances ("offsets") between words in a corpus and looks for specific patterns of distance. The X 2 test is combined with the formulation of a null hypothesis H 0 for a sample of occurrences and we check if there are associations between the words. The X 2 testing does not assume that the words in the corpus have normally distributed probabilities and hence it seems to be more flexible. The two methods extract interesting collocations that are useful in various applications e.g. computational lexicography, language generation and machine translation.