Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1996, Arxiv preprint cmp-lg/9608013
, 128 pages This thesis is about the computational morphological analysis and generation of Turkish word forms. Turkish morphological description is encoded using the two-level morphological model. This description consists of a phonological component that contains the two-level morphophonemic rules, and a lexicon component which lists lexical items (indivisible words and a xes) and encodes the morphotactic constraints. In the scope of the study, a generic word grammar in a tabular form expressing the ordering relationships among morphemes is designed and morphophonemic processes along with solutions to exceptional cases are formulated.
Formalising Natural Languages with NooJ, 2013
1998
?? pages This thesis is about developing a teaching tool for the morphology of Turkish. The tool will provide a practising environment for non-native learners of Turkish and students of linguistics. Two-level morphology is chosen for modelling the morphology of Turkish. In this model, morphophonemic processes are described by two-level phonology rules and the ordering constraints for the morphemes are encoded by a word grammar.
2001
We describe the design objectives, features, and the computational language model of a computer-mediated tool designed for learners of Turkish morphology. The underlying system is a generative grammar, more speci cally, a computational word grammar that makes use of feature structures to deliver composition and decomposition of morphemes at the syntax-lexicon interface via two-level morphology. The potential applicability of the Web-based tool in learning Turkish as a foreign language, and the possibility of its use in a linguistics class are discussed.
2019
We present a broad coverage model of Turkish morphology and an open-source morphological analyzer that implements it. The model captures intricacies of Turkish morphology-syntax interface, thus could be used as a base-line that guides language model development. It introduces a novel fine part-of-speech tagset, a fine-grained affix inventory and represents morphotactics without zero-derivations. The morphological analyzer is freely available. It consists of modular reusable components of human-annotated gold standard lexicons, implements Turkish morphotactics as finite-state transducers using OpenFst and morphophone-mic processes as Thrax grammars.
This paper describes the implementation of a two-level morphological analyzer for the Turkmen Language. Like all Turkic languages, the Turkmen Language is an agglutinative language that has productive inflectional and derivational suffixes. In this work, we implemented a finitestate two-level mo
2019
The theoretical grammar of Turkish is a major research area with interesting topics. An important book which is written by V. G. Guzev on the theoretical grammar of Turkish was published in Petersburg in 2015. In Guzev’s work, remarkable determinations are made on the grammar of Turkish. First of all, I would like to mention that I found this work successful. The book consists four main headings following a preamble and a long introduction. Phonology, morphonology, morphology, and functional syntax of Turkish are investigated. Research on the understanding of many linguistic cases has been advanced, and opinions based on the foundations of the Indo-European languages have begun to be abandoned in order to create a more effective grammar in additive languages. Studies on grammar of Turkish have also accelerated in the last decade. However, these studies were not completed unfortunately. According to Guzev, Turkish writers do not follow the linguistics developments in the world becaus...
Advances in natural language processing, 2008
The paper argues for two points in relation to Turkish NLP: (i) we are better off developing and using research methodologies and tools that are not language-specifi c, although the models built with these methods and tools must certainly exploit language-specifi c thinking or technology. One way to do this is to collect distributional data at the level of morphemes. (ii) we need to incorporate semantics into the picture somehow, otherwise what we do is form recognition, or contextually deprived (or dissituated) form production. The last point raises problems from the world's morphologies (and from Turkish morphology in particular) for the current state of art in NLP, where morphological processing is usually separated from syntactic processing for practical reasons. There is no semantic motivation to separate morphological processing of compositional meaning from syntactic processing of meaning. In fact, semantic aspects indicate that we should integrate them. I will mention some attempts at the problem and suggest some lines of research.
Proceedings - Natural Language Processing in a Deep Learning World, 2019
In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.
This paper primarily discusses how to model Turkish morphotactics using flag diacritics. We present a two-level Turkish morphological analyzer based on a lexicon of word lemmata with over 49321 entries, as well as an auxiliary unknown word analyzer. Our main analyzer demonstrates the use of flag diacritics for Turkish, which is to date not a well-researched approach for the language. Turkish is an agglutinative language with many exceptions to phonetic and morphological rules, and flag diacritics are useful in handling these exceptions. Our unknown word analyzer operates without an extra lexicon, using affix stripping to find word lemmata by recursively removing affixes. We use the described methodology to find all possible lemmata which are not in our lexicon.
2017
The Kazakh and Turkish languages belong to the group of the Turkic languages and have much in common. The detailed comparison of the ontologies on the example of the Kazakh and Turkish nouns allowed entering the analysis of morphological rules of these languages and the unified system of designations to create the uniform morphological analyzer based on the general algorithm of the morphological analysis.
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2019
Morphological analysis is an important component of natural language processing systems like spelling correction tools, parsers, machine translation systems, and dictionary tools. In this paper, we present TRMOR, a morphological analyzer for Turkish, which uses the SFST tool (Stuttgart Finite-State Transducer). TRMOR can be freely used for academic research (see schmid/tools/SFST/). It covers a large part of Turkish morphology including inflection, derivation, and some compounding. It uses morphotactic and morphophonological rules and a stem lexicon. We describe the morphological structure of Turkish, explain the phonological and morphological rules implemented in TRMOR, evaluate the system, and test it in special cases. The evaluation of TRMOR was executed on gold-standard words. One thousand words were randomly selected from Wikipedia word lists. For those words, we achieved gold-standard analysis. TRMOR has 94.12% precision on these 1000 words that were randomly selected from Wikipedia word lists. Morphological analyses of Turkish are prepared for the gold-standard version since, to our knowledge, there is no gold-standard segmentation available for Turkish morphological analyzers for noncommercial purposes.
Abstract: In multi-word unit (MWU) extraction studies, most of the challenges for rich morphology languages like Turkish can be overcome by the study of how colligational filtering works in our minds, along with how statistical and collocational sorting affects the process. Based on the assumption that lexicalization of any given collocation as a MWU also requires compatibility to some lexical or morphosyntactic constraints, this study will present the morphosyntactic tendencies observed in colligational patterns of Turkish MWUs and discuss their implications on language-specific MWU filtering processes. The aim of the study is to discuss if in Turkish, associative strength is enough for a collocation to be lexicalized as a MWU or not. Another purpose of the study is to show some morphosyntactic and lexical constraints that may validate collocations to be lexical multi-word units in Turkish. The paper will also underscore the methodological perspectives of MWU identification valid for rich-morphology languages. To achieve these goals, we first extracted MWU candidates -trigrams from a 10-million-word sub-corpus of Turkish National Corpus (TNC) by using Text-NSP (Banerjee & Pederson, 2011). After that, the 3-grams were annotated by using the NLP dictionary of TNC-tagger, and classified according to their colligational patterns and lexical categories of the MWU. Most frequently observed colligational patterns are argued to be morphosyntactic tendencies governing MWU lexicalization in Turkish. In this respect, the study aims to contribute to the understudied area of formulaic language in Turkish. (to cite Aksan, Yeşim, Ümit Mersinli & Serap Altunay (2016) Türkçede Çok Sözcüklü Birimlerin İşlev Dizisi Örüntüleri. Dil ve Edebiyat Dergisi, 13(2), 71-108 ) .
2006
Grammaticalization theory assumes that the gradual progression from a content item to a grammatical marker is accompanied by a number of interdependent phonological, morphosyntactic, and functional processes. Accordingly, morphologization processes, such as cliticization and compounding, are said to be concomitant with phonological erosion and desemantization (Lehmann [1982] 1995, Heine & Reh 1984, Heine, Claudi & Hünnemeyer 1991, Hopper & Traugott 1993, Croft 2003). Some proponents of this theory even claim that the loss of autonomy and substance defines grammaticalization as opposed to other mechanisms in language change, for instance reanalysis (Haspelmath 1998).
Materials Methods Technologies, 2014
This paper presents a three-layered morpho-phonological analyzer for Turkish. These layers correspond to the phonetic, phonological and morphological levels of the language. The layered approach is based on the recognition of the autonomy of the levels of a language. Three types of automata are designed and implemented to carry out the autonomous operations: an automaton without initial and final states for analyzing phonetic dependencies, an acylcic automaton for performing phonological operations and a cyclic automaton for morphological operations. Each automaton is responsible for combining linguistic units relevant to the level implemented into larger fragments. However, in addition to this axis of combination the analyzer also implements an axis of abstraction. Notice that the linguistic levels are argued to be autonomous not entirely independent. The vertical interaction among the levels is carried out by an automaton providing the one higher up with properly abstracted information.
2014
This article describes a multifunctional computer model of the Turkic affixal morphemes. This model is a hierarchical system of characteristics of morphemes belonging to different language levels: phonological, morphological, syntactic and semantic, and it requires a certain structure and unification in the description of characteristics of morphemes. It is a kind of “inventory” base of the language that can be used for different purposes; in particular, to perform automated comparative analysis of the properties of the Turkic languages, and to develop different linguoprocessors working with Turkic languages. Here, we describe the elements of the multifunctional computer model with examples on the Tatar and Kazakh languages.
STAD Sanal Türkoloji Araştırmaları Dergisi / Electronic Journal of Turcology Researches, 2019
The theoretical grammar of Turkish is a major research area with interesting topics. An important book which is written by V. G. Guzev on the theoretical grammar of Turkish was published in Petersburg in 2015. In Guzev’s work, remarkable determinations are made on the grammar of Turkish. First of all, I would like to mention that I found this work successful. The book consists of four main headings following a preamble and a long introduction. Phonology, morphonology, morphology, and functional syntax of Turkish are investigated. Research on the understanding of many linguistic cases has been advanced, and opinions based on the foundations of the Indo-European languages have begun to be abandoned in order to create a more effective grammar in additive languages. Studies on the grammar of Turkish have also accelerated in the last decade. However, these studies were not completed unfortunately. According to Guzev, Turkish writers do not follow the linguistics developments in the world because Turkish writers think that the researches of non-native Turkish speakers would not be understandable enough. Guzev has rightness in terms of these ideas. Unfortunately, it is true that studies conducted on Turkish in the world are not followed well. The native Turkish speaker researcher may occasionally lose sight of the characteristics, functionality, and subtleties of language in a natural language environment. The non-native Turkish speaker can clearly see these functions when compared to other languages. In this respect, I believe that the studies to be conducted by considering the foreign literature will be healthier. In this article, the book is evaluated and some of the topics are discussed.
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL '06, 2006
This paper investigates the use of sublexical units as a solution to handling the complex morphology with productive derivational processes, in the development of a lexical functional grammar for Turkish. Such sublexical units make it possible to expose the internal structure of words with multiple derivations to the grammar rules in a uniform manner. This in turn leads to more succinct and manageable rules. Further, the semantics of the derivations can also be systematically reflected in a compositional way by constructing PRED values on the fly. We illustrate how we use sublexical units for handling simple productive derivational morphology and more interesting cases such as causativization, etc., which change verb valency. Our priority is to handle several linguistic phenomena in order to observe the effects of our approach on both the c-structure and the f-structure representation, and grammar writing, leaving the coverage and evaluation issues aside for the moment.
LOT Publications, 2020
The present study aims to explain the phonology-morphology interface and phonological processes without referring to extra-phonological objects. We develop a new model of constituent structure based on templates, by which specific morphological categories such as base (root/stem), prefix and suffix become visible in the phonology component: a base is recognizable by its unique constituent structure and is thereby distinguishable from a suffix and/or prefix, each having its own specific constituent structure in phonology. These unique constituent structures are called templates, thanks to which phonological processes and the phonology-morphology interface are non-arbitrarily explainable. The New Template Model works with licensing mechanisms and the parameters/sub-parameters occurring under the Parametric Hierarchical System. The model explains the phonology-morphology interface in the case of Turkish, which provides a rich data source regarding phonological processes as it is an agglutinative language with a high degree of suffixation, and also in other languages. This book is of interest to phonologists and morphologists interested in Turkish phonology and in the way in which phonology and morphology interact in languages of the world.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.