0% found this document useful (0 votes)
36 views8 pages

A Morphology-Aware Network For Morphological Disambiguation

This paper presents a deep learning-based system for morphological disambiguation, focusing on morphologically rich languages like Turkish, while also providing results for French and German. The proposed architecture achieves high accuracy without the need for language-specific feature engineering, demonstrating effectiveness in selecting the correct morphological analysis of words. The system utilizes a convolutional neural network followed by the Viterbi algorithm to optimize disambiguation results.

Uploaded by

anilkimsesiz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

A Morphology-Aware Network For Morphological Disambiguation

This paper presents a deep learning-based system for morphological disambiguation, focusing on morphologically rich languages like Turkish, while also providing results for French and German. The proposed architecture achieves high accuracy without the need for language-specific feature engineering, demonstrating effectiveness in selecting the correct morphological analysis of words. The system utilizes a convolutional neural network followed by the Viterbi algorithm to optimize disambiguation results.

Uploaded by

anilkimsesiz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Morphology-aware Network for Morphological Disambiguation

Eray Yildiz Caglar Tirkaz H. Bahadir Sahin Mustafa Tolga Eren Ozan Sonmez
Huawei Turkey Research and Development Center, Umraniye, Istanbul, Turkey
{[Link], [Link]}@[Link]
{caglartirkaz, hbahadirsahin, osonmez}@[Link]
arXiv:1702.03654v1 [[Link]] 13 Feb 2017

Abstract and semantic role labeling whereas it can be utilized in other


NLP tasks such as topic modeling, named entity recognition
Agglutinative languages such as Turkish, Finnish and and machine translation.
Hungarian require morphological disambiguation be-
fore further processing due to the complex morphol-
While morphological disambiguation is important for
ogy of words. A morphological disambiguator is used natural language processing in any language, it is vital
to select the correct morphological analysis of a word. in morphologically rich languages. We specifically focus
Morphological disambiguation is important because it on Turkish which is an important language spoken by
generally is one of the first steps of natural language over 70 million people and has a complex morphology
processing and its performance affects subsequent anal- that allows construction of thousands of word forms from
yses. In this paper, we propose a system that uses deep each root through inflectional and derivational suffixation
learning techniques for morphological disambiguation. (Hakkani-Tür, Oflazer, and Tür 2000). For instance, yürü
Many of the state-of-the-art results in computer vi- (walk), yürüdüm (I walked), yürüyeceksiniz (you will walk),
sion, speech recognition and natural language process- yürüttüler (they made somebody walk), yürüyünce (When
ing have been obtained through deep learning models.
However, applying deep learning techniques to morpho-
(he/she/it) walks) and yürüyecektiler (They were going to
logically rich languages is not well studied. In this work, walk) are some of the possible word formations of a Turkish
while we focus on Turkish morphological disambigua- verb root yürü. In all the examples “yürü” is the root of the
tion we also present results for French and German in words whereas the suffixes are used to change meaning.
order to show that the proposed architecture achieves Morphological analysis of a word might produce more
high accuracy with no language-specific feature engi- than one analysis since there might be multiple interpreta-
neering or additional resource. In the experiments, we tions of a single word. Consider the example given in Ta-
achieve 84.12 , 88.35 and 93.78 morphological dis- ble 1 where the Turkish word “dolar” is analyzed. The out-
ambiguation accuracy among the ambiguous words for put of the morphological analyzer for this word contains four
Turkish, German and French respectively.
possible analyses. The reason for that is each of the words
“dolar”, “dola”, “dol” and “do” can be used as roots and at
Introduction the same time “r”, “ar” and “lar” are all valid suffixes in
Turkish. Thus, all four of the analyses are valid that lead to
Morphological analysis is generally achieved through the quite different meanings.
use of finite state transducers (FST) (Kaplan and Kay 1981;
Koskenniemi 1984; Beesley and Karttunen 2003; Oflazer
1993). During morphological analysis, the surface form of Table 1: Morphological analyses of the Turkish word “dolar”
the word is given as input and an FST is used to output possi-
ble morphological analyses of the input word. A morpholog- Morphological English
ical analysis contains a root and a set of tags so called mor- Analysis Translation
phemes, minimal units of meaning in a language (Oflazer dolar +Noun +3sg +Pnon +Nomina- dollar
1993). tive
A morphological disambiguator is used to select the cor- dola +Verb +Positive +Aorist +3sg he/she wraps
rect analysis among the possible analyses of a word using dol +Verb +Positive +Aorist +3sg it fills
the context that the word appears in. The output of morpho- do +Noun +3pl +Pnon +Nominative Multiple C
logical disambiguation contains syntactic and semantic in- (musical note)
formation about a word such as its POS tag, tense, polarity
and it being accusative, possessive or genitive. This informa-
tion is vital for some NLP tasks such as dependency parsing Another reason for multiple morphological analyses is
due to the fact that a morpheme might change meanining
Copyright c 2016, Association for the Advancement of Artificial depending on the context of the word. Consider the example
Intelligence ([Link]). All rights reserved. given in Table 2. In the first row “evi” is used in the ac-
cusative case whereas in the second row it is used as a pos-
sessive noun. The word “evi” has two morphological anal- Table 4: Possible word formations and morphological anal-
yses sharing the same root. Thus, the only difference in the yses of the French word “savoir”
analyses is at the suffix of the word. The suffix “–i” in the Word Morphological Analyses
first sentence transforms the word into “accusative marker”, savoir Noun Masculine Singular
while its interpretation is “third person possessive” in the Verb Infinitive
second one. In addition, some root words might have mul- sais Verb Present SecondPerson Singular
tiple meanings. For instance, the Turkish word “yüz” could Verb Present FirstPerson Singular
be interpreted as a noun (face), a verb (swim) or a number savons Verb Present FirstPerson Plural
(hundred) depending on its context. savaient Verb Imperfect ThirdPerson Plural
saches Verb Subjunctive SecondPerson Singular
sachant Verb Present Participle
Table 2: Multiple interpretations of the Turkish word “evi”
su Verb Past Participle Masculine Singular
Turkish sentence and its transla- Morphological
tion Analysis
Evi bulabildiniz mi? – Did you ev +Noun +3sg Turkish morphological disambiguation rules from disam-
find the house? +Pnon +Accusative biguated training data. Tür et al. (2000) developed a sta-
Evi gerçekten güzelmiş. – ev +Noun +3sg tistical model which scores the probability of each analy-
His/Her house is really beau- +P3sg +Nominative sis using trigram models of the tags and roots. Sak et al.
tiful. (2007) applied a multilayer perceptron algorithm which uses
n-grams of the roots and tags as features. A CRF based dis-
As we noted before, morphological disambiguation is im- ambiguation model was proposed by Razieh et al. 2012.
portant for NLP in most of the languages. For instance, al- Finally, hybrid models which combine statistical and rule
though German and French do not have a morphology as based approaches are also proposed (Oflazer and Tur 1996;
rich as Turkish, NLP in these languages can still benefit from Kutlu and Cicekli 2013).
morphological disambiguation. Higher accuracies in NLP We propose a deep neural architecture followed by
tasks such as POS tagging and dependency parsing can be the Viterbi algorithm for morphological disambiguation of
obtained if the morphology of the words are taken into ac- words in a sentence. In this paper we focus on Turkish as
count (Sennrich et al. 2009), (Candito et al. 2010). We apply an example even though the proposed model can be utilized
our general purpose morphological disambiguation method in all morphologically rich languages. We test our approach
to these languages and show that high accuracies can be ob- in German and French in order to prove that the proposed
tained for POS tagging and lemmatization. Possible word method is able to work well for other languages as well. The
formations and morphological analyses of the German word network architecture in this paper is designed to produce a
“Haus” and the French word “savoir” are given in Table 3 classification score for a sequence of n-words. It consists of
and Table 4 respectively. two layers and a softmax layer. The first layer of the model
builds a representation for each word using root embeddings
and some syntactic and semantic features. The second layer
Table 3: Possible word formations and morphological anal- takes as input the learned word representations and incorpo-
yses of the German word “haus” rates contextual information. A softmax layer uses the out-
put of the second layer to produce a classification score. We
Word Morphological Analyses use the neural network to produce a score for each n length
haus Noun Neuter Nominative Singular sequence in a given sentence. We then employ the Viterbi
Noun Neuter Dative Singular algorithm to produce the morphological disambiguation for
Noun Neuter Accusative Singular each word in the sentence by finding the most probable se-
häuser Noun Neuter Accusative Plural quence using the output of the softmax layer.
Noun Neuter Nominative Plural
Noun Neuter Genitive Plural Related Works
häusern Noun Neuter Dative Plural
In a natural language processing pipeline morphological dis-
hause Noun Neuter Dative Singular
ambiguation can be considered at the same level as POS tag-
hauses Noun Neuter Genitive Singular
ging. In order to perform POS tagging in English, various
approaches such as rule-based models (Brill 1992), statisti-
There are various approaches proposed for morphological cal models (Brill 1995), maximum entropy models (Ratna-
disambiguation based on lexical rules or statistical models. parkhi 1997), HMMs (Cutting et al. 1992), CRFs (Lafferty,
Rule based methods apply hand-crafted rules in order to se- McCallum, and Pereira 2001) and decision trees (Schmid
lect the correct morphological analyses or eliminate incor- 1994) are proposed. However, morphological disambigua-
rect ones (Oflazer and Kuruöz 1994; Oflazer and Tur 1996; tion is a much harder problem in general due to the fact
Daybelge and Cicekli 2007). Yüret and Türe (2006) pro- that it requires the classification of both roots, suffixes and
posed a decision list learning algorithm for extraction of the corresponding labels. Moreover, compared to an agglu-
tinative language such as Turkish, English words can take present their results only for English which is not morpho-
on a limited number of word forms and part-of-speech tags. logically as rich as languages such as Turkish or Finnish.
Yüret and Türe (2006) observe that more than ten thousand There are also recent works that suggest integrating mor-
tag types exists in a corpus comprised of a million Turk- phological knowledge into distributed word representations
ish words. Thus, due to the high number of possible tags such as (Cotterell and Schütze 2015) and (Cui et al. 2015).
and the number of possible analyses in languages with pro- Cotterell and Schütze (2015) extend log-bilinear model (an
ductive morphology, morphological disambiguation is quite instance of language models that make the Markov assump-
different from part-of-speech tagging in English. tion as n-gram language models) in order to jointly predict
The previous work on morphological disambiguation in the next morphological tag along with the next word, en-
morphologically rich languages can be summarized into couraging the resulting embeddings to encode morphology.
three categories: rule based, statistical and hybrid ap- On the other hand, (Cui et al. 2015) propose a method for
proaches. In the rule-based approaches a large number of learning embeddings which is a modified version of skip-
hand crafted rules are used to select the correct morpho- gram algorithm (Mikolov et al. 2013) that benefits from mor-
logical analyses or to eliminate incorrect ones (Karlsson et phological knowledge when predicting the target word. Us-
al. 1995; Oflazer and Kuruöz 1994; Oflazer and Tur 1996; ing morphology-based word representations improves the
Daybelge and Cicekli 2007). Statistical approaches gen- performance for different NLP tasks such as word similar-
erally utilize the statistics of root and tag sequences for ity and statistical machine translation according to empirical
selection of the best roots and tags. A statistical Turk- evaluation of Botha and Phil (2014).
ish morphological disambiguation model which scores the Our work uses a convolutional architecture and handles
probability of each tag by considering the statistics over any number of morphological features in order to build
the derivational boundaries and roots using trigrams has word representations while performing disambiguation at
been proposed by Tür et al. (2000). They test their model the same time.
on a manually disambiguated test data consisting of 2763
words and obtain 93.5% accuracy in morphological dis- Method
ambiguation (including non-ambiguous words). A similar In this work we propose an architecture with the ability to
morphology-aware nonparametric Bayesian model is pro- represent morphologically rich words and model spatial de-
posed in (Chahuneau, Smith, and Dyer 2013). They inte- pendencies among word vectors. A softmax layer that is
grate their generative model to NLP applications such as trained on top of the layers is used to predict the likelihood
language modeling, word alignment and morphological dis- of a window of words. Finally, the Viterbi algorithm is used
ambiguation and obtain state-of-the-art results for Russian on the outputs of the softmax layer in order to find the op-
morphological disambiguation. Yüret and Türe (2006) ex- timal disambiguation of the words in a sentence. We also
tract Turkish morphological disambiguation rules using a show how unsupervised pre-trainining can be used to im-
decision list learner, Greedy Prepend Algorithm (GPA), and prove the performance of the designed system and achieve
they achieve 95.8% accuracy on manually disambiguated the state-of-the-art accuracy for Turkish morphological dis-
data consisting of around 1K words. Megyesi (1999) adapt ambiguation.
a transformation based syntactic rule learner (Brill 1995) The input to our model is a sentence where each word
for Hungarian and Hajic (1998) extend his work for Czech in the sentence needs to be disambiguated. We first tok-
and five other languages. Sak et al. (2007) apply a multi- enize the sentences and then use morphological analyzers to
layer perceptron algorithm using a set of 23 features includ- find possible analyses of each word in the sentence. HFST
ing tri-gram and bi-gram statistics of morphological tags tool (Lindén, Silfverberg, and Pirinen 2009) is used to per-
and roots. They obtain 96.8% accuracy on test data con- form morphological analysis in German and French whereas
sisting of 2.5K words. Ehsani et al. (2012) apply condi- (Oflazer 1993) is used for Turkish morphological analysis.
tional random fields (CRFs) using several features derived NLP systems that use deep learning generally employ word
from morphological and syntactic properties of words and embeddings in order to represent each word in a dictionary.
achieve 96.8% accuracy. Görgün and Yildiz (2011) use a J48 Word embeddings are dense low dimensional real-valued
decision tree and achieve 95.6% accuracy. There are also vectors with each dimension corresponding to a latent fea-
several studies that combine statistical and rule based ap- ture of the word (Turian, Ratinov, and Bengio 2010). In a
proaches such as (Ezeiza et al. 1998; Oflazer and Tur 1996; morphologically rich language, representing words in sur-
Kutlu and Cicekli 2013; Orosz and Novák 2013). face form might not be a good idea since lots of surface form
Although deep learning techniques have been success- words can be derived from a single root. Thus, in our design,
fully used in various NLP tasks in English(Collobert and each word in surface form is represented with a root and a
Weston 2008; Collobert et al. 2011; Le and Mikolov 2014; set of morphological features where each root and feature
Pennington, Socher, and Manning 2014; Luong, Socher, and has individual embeddings that are learned during training.
Manning 2013; Socher et al. 2012), this study is unique in Root and morphological feature embeddings can have vary-
that we create a deep learning architecture specifically suited ing lengths and through their concatenation surface form
for handling morphologically rich languages. One similar words are represented with fixed length embeddings.
work to ours is the recent work of Luong et al. (2013) who Our architecture is illustrated in Figure 1 where individ-
introduce morphological RNNs to create word representa- ual layers are marked with (a), (b) and (c). The first layer (a)
tions through composition of morphemes. However, they takes the root and morphological features of a single word
Figure 1: Structure of morphology-aware model. Layer (a) construct word vectors using the morphological features of the
input word. Layer (b) allows the model to utilize contextual information by considering a window of words. Layer (c) is the
softmax layer that makes a binary decision of whether or not the current disambiguation result is correct.

as input and propagates to the next layer. The second layer, proved recognition accuracies (Collobert et al. 2011; Turian,
(b), takes a window of n words as input and propagates to Ratinov, and Bengio 2010). In order to improve the perfor-
the softmax layer, (c). The non-linearity in both the first and mance of our disambiguation system we also use unsuper-
the second layers are provided through the use of tanh as vised methods to pre-train root embeddings of words. We
the transfer function. The softmax layer is responsible for created a corpus comprised of 1 billion Turkish words that
deciding the likelihood of the current morphological analy- we collected from various sources, such as e-books and web
sis of the words, i.e., a binary decision is produced with the sites. Although our corpus is rather small compared to En-
expected result of 1 if the analysis is correct, 0 otherwise. glish corpora, it is the largest text corpus in Turkish that
We train our network with the possible sequences of mor- we know of. After we trained the supervised disambiguation
phological analyses in the training data. For each sentence, system as described above, we disambiguated each word in
and for each word, we select the n-2 words preceding the the corpus and extracted the roots of words. Next, we built
word and their groundtruth annotations along with the pos- representations for root forms of the words using the unsu-
sible annotations of the last two words. We also add n-1 out pervised skip-gram algorithm (Mikolov et al. 2013). After
of sentence tokens at the beginning of each sentence so that obtaining the pre-trained root vectors, we retrained our dis-
all words in the sentence are included in the training data. ambiguation system with pre-trained root embeddings. This
We label the sequences containing the correct morphologi- technique allowed us to further improve the disambiguation
cal analysis as positive whereas the remaining sequences are accuracies we obtained.
labeled as negative. This way the model is trained to predict As discussed earlier, the first layer takes as input the root
the correct annotation for the last two words in a sequence and the morphological features of a word. The morphologi-
given that the first n-2 words have correct annotations. Train- cal features of words we use are presented in Table 5. Specif-
ing is performed with stochastic gradient descent and Ada- ically, the set of morphological features we consider con-
Grad (Duchi, Hazan, and Singer 2011) as the optimization tains the root, main POS tag, minor POS tag, person and
algorithms. At inference time, given a sentence containing possessive agreements, plurality, gender, case marker, polar-
words to disambiguate, we use the network to make predic- ity and tense. Note that the information contained in a sur-
tions for window of words in the sentence and then use the face word form may differ due to morphological character-
Viterbi algorithm to select the best morphological analysis istics of a language. For instance, German and French have
for each word. gender feature contrary to Turkish while Turkish words have
Unsupervised pre-training of word embeddings have been possesive agreement and polarity. Main POS tag, describes
employed in various NLP tasks, and their usage have im- the category of a word and can take on values such as noun,
Table 5: The morphological (morphosyntactic and morphosemantic) features we used to represent each word
Morphosyntactic and Morphosemantic Features
Root Main Minor Person Plurality Gender Possesive Case Polarity Tense
Language POSTag POSTag Agreement Agreement Marker

Turkish + + + + + - + + + +
German + + + + + + - + - +
French + + - + + + - - - +

verb, adjective and adverb. Minor POS tag determines the We use SPMRL 2014 dataset (Seddah and Tsarfaty 2014)
minor morphological properties of a word such as seman- for German and French. This data set is created in the Penn
tic markers, causative markers and post-position. ”Since”, tree bank format and used for a shared task on statistical
”While”, ”Propernoun”, ”Without” can be given as exam- parsing of morphologically rich languages. This dataset con-
ples to this kind of morphological features in Turkish. Per- tains 1M and 500K sentences with POS tag and morpho-
son and possessive agreement are used to answer the ques- logical information for German and French respectiveley. It
tions “who” and “whose” respectively, i.e., they are used to provides 90% of all sentences as training set and %10 of
indicate a person or an ownership relationship. Case marker rest of the sentences as test set. We align the features in the
relate the nouns to the rest of the sentence as prepositions do tree bank to the HFST outputs in order to determine the cor-
in English. Nominative (none), dative(to, for), locative (at, rect morphological analyses generated by the HFST tool. We
in, on), ablative (from, out of) and genitive (of) forms are ex- use this data set for both training and testing. The develop-
amples of the forms that can be observed in a sentence. Po- ment sets for each language are randomly separated from the
larity of a word is positive if the word is not negated and neg- training data and are used to optimize the embedding lengths
ative otherwise. Tense indicates the tense of the verbs such as of morphological features.
present, past and future tense. Additionally, we consider the We noticed that similar parameters lead to the best per-
moods of the verbs within tense feature. Moods express the formance. Thus, in the experiments, we used embedding
speaker’s attitude such as indicative, imperative or subjunc- lengths 50, 20 and 5 for roots, POS tags and the other mor-
tive moods. In languages with grammatical gender such Ger- phological features respectively. The number of filters in the
man and French, every noun is associated with a gender. The first and second layers are 30 and 40 respectively. The win-
morphological analyzer we use associates each French word dow length, n, that determines the number of words input to
with one of the two genders (masculine and feminine) while the second layer is set to 5.
it associates each German word with one of the four pos-
sible genders (masculine, feminine, neuter and no gender).
Some of the suffixes in Turkish change word meaning cre- Table 6: POS tagging, lemmatization and morphological dis-
ating derivational boundaries in the morphological analyses. ambiguation accuracies of the proposed approach for Turk-
The morphological features of a word given in Table 5 are ish, German and French.
extracted after the final derivational boundary. In Turkish, Turkish(%) German(%) French(%)
we add one more feature to each word named previous tags
in order to account for the previous suffixes that the word POS Tagging 96.85 98.35 98.47
might have. This way, our model learns the effect of suffixes Lemma. 97.59 95.95 99.52
that change word meaning. Some of the described morpho- M. disamb. 84.12 88.35 93.78
logical features exist only for certain word categories. For
instance possessive agreement and case marker features can The experiment results for POS tagging, lemmatization
only exist in nouns, polarity and tense exist in verbs and per- and morphological disambiguation in Turkish, German and
son agreement exist in nouns and verbs. If a morphological French are presented in Table 6. Notice that the POS tagging
feature cannot be extracted from a word, we label it as hav- and lemmatization accuracies are refer to the percentages of
ing NULL for the feature. POS tags and lemmas predicted correctly while morpholog-
ical disambiguation accuracies are refer to the percentages
Experiments of the words disambiguated correctly among the ambiguous
For Turkish, we used a semi-automatically disambiguated words According to the results, we observe that even though
corpus containing 1M tokens (Yüret and Türe 2006). Since our initial target was to be able to achieve Turkish morpho-
this dataset is annotated semi-automatically, it also contains logical disambiguation, our model consistently obtains high
noise. In order to reduce the effect of noise to the recogni- accuracies in French and German as well.
tion accuracies, we created a test set by randomly selecting In Table 7, we present the results of various models for
sentences containing 20K of the tokens and manually anno- Turkish morphological disambiguation on our hand-labeled
tating them. We make this test data publicly available 1 so test data. The results of the multilayer perceptron developed
that Turkish morphological disambiguation algorithms can in (Sak, Güngör, and Saraçlar 2007) and the decision list
be compared more accurately in the future. learning algorithm developed in (Yüret and Türe 2006) are
presented in lines 1 and 2 respectively. We present Turk-
1
[Link] ish morphological disambiguation results obtained by our
English words can be separated into morphemes so that they
Table 7: The comparison of the disambiguation accuracy of can be better represented. This allows creating systems that
the proposed approach with the state-of-the-art models in are less affected from problems such as data sparsity (Lu-
Turkish. ong, Socher, and Manning 2013).
Method Accuracy(%) While using pre-training, we only considered the pre-
Multilayer Perceptron 82.13 trained root embeddings. It would be preferred to pre-train
Decision List 83.31 all the embeddings using our text corpus which we leave as
Proposed Model - w/o pre-training 84.12 future work. Another point of note is the selected embedding
Proposed Model - with pre-training 85.18 sizes that we used in our experiments. While we worked on a
development set separated from training data for parameter
selection, further investigation in parameter selection might
improve the obtained accuracies.
model with and without pre-training in lines 3 and 4 respec-
tively. As we discussed before, unsupervised pre-training of
the embeddings can boost accuracies of neural networks. As Acknowledgments
expected, morphological disambiguation accuracy increases This project is partially funded by 3140951 numbered
by around 1% (around 6% reduction in error) when root em- TUBITAK-TEYDEB (The Scientific and Technological Re-
beddings are pre-trained instead of randomly initialized. We search Council of Turkey – Technology and Innovation
see that even without unsupervised pre-training our algo- Funding Programs Directorate).
rithm outperforms the current state of the art models and
we are able to further improve the accuracy by pre-training References
of the embeddings. [Beesley and Karttunen 2003] Beesley, K. R., and Karttunen, L.
Although we do not evaluate the effects of unsupervised 2003. Finite state morphology. Center for the Study of Language
pre-training for German and French, it is expected that and Inf.
higher accuracies can be achieved using unsupervised pre- [Botha and Blunsom 2014] Botha, J. A., and Blunsom, P. 2014.
training of the embeddings for these languages as well. Error Compositional morphology for word representations and language
analysis for Turkish morphological disambiguation shows modelling. arXiv preprint arXiv:1405.4273.
that the root is incorrectly decided in 30% of errors. The [Brill 1992] Brill, E. 1992. A simple rule-based part of speech
root is correct but the POS tag is incorrectly decided in 40% tagger. In Proceedings of the Third Conference on Applied Natu-
of errors while 30% of errors caused by wrong decisions on ral Language Processing, ANLC ’92, 152–155. Stroudsburg, PA,
other inflectional groups. When compared with the study of USA: Association for Computational Linguistics.
Sak et al. 2007, there is no significant difference in the distri- [Brill 1995] Brill, E. 1995. Transformation-based error-driven
bution of mistakes. However our method performs better in learning and natural language processing: A case study in part-of-
root decisions due to unsupervised learning of root embed- speech tagging. Comput. Linguist. 21(4):543–565.
dings. As discussed before, the available data for Turkish [Candito et al. 2010] Candito, M.; Nivre, J.; Denis, P.; and An-
morphological disambiguation task contains some system- guiano, E. H. 2010. Benchmarking of statistical dependency
atic errors. Yüret and Türe (2006) report that the accuracy parsers for french. In Proceedings of the 23rd International Confer-
of the training data is below 95%. According to our obser- ence on Computational Linguistics: Posters, 108–116. Association
vation there is a major confusion between noun and adjective for Computational Linguistics.
POS tags in training data which affects the decision of the [Chahuneau, Smith, and Dyer 2013] Chahuneau, V.; Smith, N. A.;
morphological disambiguation systems. In our experiment, and Dyer, C. 2013. Knowledge-rich morphological priors for
we observe that 18% of the errors are caused by such con- bayesian language models. Association for Computational Lin-
fusion, whereas the ratio of these errors are reported as 22% guistics.
in the experiments of Sak et al. (2007). [Collobert and Weston 2008] Collobert, R., and Weston, J. 2008. A
unified architecture for natural language processing: Deep neural
Summary and Future Work networks with multitask learning. In Proceedings of the 25th Inter-
national Conference on Machine Learning, ICML ’08, 160–167.
In this paper, we present a model capable of learning word New York, NY, USA: ACM.
representations for languages with rich morphology. We [Collobert et al. 2011] Collobert, R.; Weston, J.; Bottou, L.; Karlen,
show the utility of our approach in the task of Turkish, M.; Kavukcuoglu, K.; and Kuksa, P. 2011. Natural language pro-
German and French morphological disambiguation. We also cessing (almost) from scratch. J. Mach. Learn. Res. 12:2493–2537.
show the effect of unsupervised pre-training on recognition [Cotterell and Schütze 2015] Cotterell, R., and Schütze, H. 2015.
accuracies and improve the current state-of-the-art in Turk- Morphological word-embeddings. In Annual Conference of the
ish morphological disambiguation. We publicly make avail- North American Chapter of the ACL, 1287—-1292.
able a manually annotated test set containing 20K tokens [Cui et al. 2015] Cui, Q.; Gao, B.; Bian, J.; Qiu, S.; Dai, H.; and
which we believe will benefit Turkish NLP. Liu, T.-Y. 2015. Knet: A general framework for learning word
This paper presents a deep learning architecture specif- embedding using morphological knowledge. ACM Transactions
ically aiming to handle morphologically rich languages. on Information Systems (TOIS) 34(1):4.
Nonetheless, NLP systems that work on languages such as [Cutting et al. 1992] Cutting, D.; Kupiec, J.; Pedersen, J.; and Si-
English can also benefit from our work. Using our model, bun, P. 1992. A practical part-of-speech tagger. In Proceedings
of the Third Conference on Applied Natural Language Process- [Lindén, Silfverberg, and Pirinen 2009] Lindén, K.; Silfverberg,
ing, ANLC ’92, 133–140. Stroudsburg, PA, USA: Association for M.; and Pirinen, T. 2009. Hfst tools for morphology–an efficient
Computational Linguistics. open-source package for construction of morphological analyzers.
[Daybelge and Cicekli 2007] Daybelge, T., and Cicekli, I. 2007. In State of the Art in Computational Morphology. Springer. 28–47.
A rule-based morphological disambiguator for turkish”, in. In [Luong, Socher, and Manning 2013] Luong, T.; Socher, R.; and
Proceedings of Recent Advances in Natural Language Processing Manning, C. D. 2013. Better word representations with recur-
(RANLP 2007), Borovets, 145–149. sive neural networks for morphology. In Proceedings of the Seven-
[Duchi, Hazan, and Singer 2011] Duchi, J.; Hazan, E.; and Singer, teenth Conference on Computational Natural Language Learning,
Y. 2011. Adaptive subgradient methods for online learning and CoNLL 2013, Sofia, Bulgaria, August 8-9, 2013, 104–113.
stochastic optimization. The Journal of Machine Learning Re- [Megyesi 1999] Megyesi, B. 1999. Improving brill’s pos tagger
search 12:2121–2159. for an agglutinative language. In Proceedings of the Joint SIGDAT
[Ehsani et al. 2012] Ehsani, R.; Alper, M. E.; Eryigit, G.; and Adali, Conference on Empirical Methods in Natural Language Process-
E. 2012. Disambiguating main POS tags for turkish. In Pro- ing and Very Large Corpora, 275–284.
ceedings of the 24th Conference on Computational Linguistics and [Mikolov et al. 2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado,
Speech Processing, ROCLING 2012, Yuan Ze University, Chung- G. S.; and Dean, J. 2013. Distributed representations of words and
Li, Taiwan, September 21-22, 2012. phrases and their compositionality. In Advances in neural informa-
tion processing systems, 3111–3119.
[Ezeiza et al. 1998] Ezeiza, N.; Alegria, I.; Arriola, J. M.; Urizar,
R.; and Aduriz, I. 1998. Combining stochastic and rule-based [Oflazer and Kuruöz 1994] Oflazer, K., and Kuruöz, I. 1994. Tag-
methods for disambiguation in agglutinative languages. In Pro- ging and morphological disambiguation of turkish text. In Pro-
ceedings of the 36th Annual Meeting of the Association for Com- ceedings of the Fourth Conference on Applied Natural Language
putational Linguistics and 17th International Conference on Com- Processing, ANLC ’94, 144–149. Stroudsburg, PA, USA: Associ-
putational Linguistics-Volume 1, 380–384. Association for Com- ation for Computational Linguistics.
putational Linguistics. [Oflazer and Tur 1996] Oflazer, K., and Tur, G. 1996. Combining
[Görgün and Yildiz 2011] Görgün, O., and Yildiz, O. T. 2011. A hand-crafted rules and unsupervised learning in constraint-based
novel approach to morphological disambiguation for turkish. In morphological disambiguation. In Conference on Empirical Meth-
Computer and Information Sciences II - 26th International Sympo- ods in Natural Language Processing, 69–81.
sium on Computer and Information Sciences, London, UK, 26-28 [Oflazer 1993] Oflazer, K. 1993. Two-level description of turkish
September 2011, 77–83. morphology. In Proceedings of the Sixth Conference on European
[Hajic and Hladka 1998] Hajic, J., and Hladka, B. 1998. Czech Chapter of the Association for Computational Linguistics, EACL
language processing—pos tagging. In Proceedings of the First ’93, 472–472. Stroudsburg, PA, USA: Association for Computa-
International Conference on Language Resources & Evaluation, tional Linguistics.
931–936. [Orosz and Novák 2013] Orosz, G., and Novák, A. 2013. Purepos
[Hakkani-Tür, Oflazer, and Tür 2000] Hakkani-Tür, D. Z.; Oflazer, 2.0: a hybrid tool for morphological disambiguation. In RANLP,
K.; and Tür, G. 2000. Statistical morphological disambiguation 539–545.
for agglutinative languages. In Proceedings of the 18th Conference [Pennington, Socher, and Manning 2014] Pennington, J.; Socher,
on Computational Linguistics - Volume 1, COLING ’00, 285–291. R.; and Manning, C. 2014. Glove: Global vectors for word rep-
Stroudsburg, PA, USA: Association for Computational Linguistics. resentation. In Proceedings of the 2014 Conference on Empirical
[Kaplan and Kay 1981] Kaplan, R. M., and Kay, M. 1981. Phono- Methods in Natural Language Processing (EMNLP), 1532–1543.
logical rules and finite-state transducers. In Linguistic Society of Doha, Qatar: Association for Computational Linguistics.
America Meeting Handbook, Fifty-Sixth Annual Meeting, 27–30. [Ratnaparkhi 1997] Ratnaparkhi, A. 1997. A maximum entropy
[Karlsson et al. 1995] Karlsson, F.; Voutilainen, A.; Heikkila, J.; model for part-of-speech tagging. In EMNLP 1997.
and Anttila, A., eds. 1995. Constraint Grammar: A Language- [Sak, Güngör, and Saraçlar 2007] Sak, H.; Güngör, T.; and
Independent System for Parsing Unrestricted Text. Mouton de Saraçlar, M. 2007. Morphological disambiguation of Turkish text
Gruyter. with perceptron algorithm. In CICLing 2007, volume LNCS 4394,
[Koskenniemi 1984] Koskenniemi, K. 1984. A general computa- 107–118.
tional model for word-form recognition and production. In Pro- [Schmid 1994] Schmid, H. 1994. Probabilistic part-of-speech tag-
ceedings of the 10th international conference on Computational ging using decision trees. In Proceedings of the international con-
Linguistics, 178–181. Association for Computational Linguistics. ference on new methods in language processing. Vol. 12.
[Kutlu and Cicekli 2013] Kutlu, M., and Cicekli, I. 2013. A hybrid [Seddah and Tsarfaty 2014] Seddah, D., and Tsarfaty, R. 2014. In-
morphological disambiguation system for turkish. In Sixth Interna- troducing the spmrl 2014 shared task on parsing morphologically-
tional Joint Conference on Natural Language Processing, IJCNLP rich languages. SPMRL-SANCL 2014 103.
2013, Nagoya, Japan, October 14-18, 2013, 1230–1236. [Sennrich et al. 2009] Sennrich, R.; Schneider, G.; Volk, M.; and
[Lafferty, McCallum, and Pereira 2001] Lafferty, J.; McCal- Warin, M. 2009. A new hybrid dependency parser for german.
lum, A.; and Pereira, F. C. 2001. Conditional random fields: Proceedings of the German Society for Computational Linguistics
Probabilistic models for segmenting and labeling sequence data. and Language Technology 115–124.
[Le and Mikolov 2014] Le, Q., and Mikolov, T. 2014. Distributed [Socher et al. 2012] Socher, R.; Huval, B.; Manning, C. D.; and Ng,
representations of sentences and documents. In Jebara, T., and A. Y. 2012. Semantic compositionality through recursive matrix-
Xing, E. P., eds., Proceedings of the 31st International Conference vector spaces. In Proceedings of the 2012 Joint Conference on
on Machine Learning (ICML-14), 1188–1196. JMLR Workshop Empirical Methods in Natural Language Processing and Compu-
and Conference Proceedings. tational Natural Language Learning, EMNLP-CoNLL ’12, 1201–
1211. Stroudsburg, PA, USA: Association for Computational Lin-
guistics.
[Turian, Ratinov, and Bengio 2010] Turian, J.; Ratinov, L.; and
Bengio, Y. 2010. Word representations: A simple and general
method for semi-supervised learning. In Proceedings of the 48th
Annual Meeting of the Association for Computational Linguistics,
ACL ’10, 384–394. Stroudsburg, PA, USA: Association for Com-
putational Linguistics.
[Yüret and Türe 2006] Yüret, D., and Türe, F. 2006. Learning mor-
phological disambiguation rules for turkish. In Proceedings of the
Main Conference on Human Language Technology Conference of
the North American Chapter of the Association of Computational
Linguistics, HLT-NAACL ’06, 328–334. Stroudsburg, PA, USA:
Association for Computational Linguistics.

You might also like