Skip to main content

Christopher Mala

Followers

173

Following

116

Co-authors

5

Public Views

David Pierre Leibovitz

Carleton University

Hazrat Masoumeh University

University of Hyderabad

Punjabi University

Dublin City University

Viacheslav Kuleshov

Stockholm University

University of Bristol

Thennarasu Sakkan

Central University of Kerala

Stephen Doherty

The University of New South Wales

University of Florida

Interests

Uploads

Telugu Computational Tools by Christopher Mala

Telugu Word Synthesizer

This paper describes the development of a Morphological Generator, a generic Engine which can be ... more This paper describes the development of a Morphological Generator, a generic Engine which can be used for any language by plugging in a specific language database. This Generator synthesizes all and only the well-formed word forms. These word forms include both inflectional and productive derivational forms. This Morphological Generator engine is independent of language and works effectively and is based on word-and-paradigm method. This Computational model uses machine learning method based on morphological data base developed by using word and paradigm model of Morphology. This method not only ensures coverage but also evolvement. The engine takes as input a root and along with it its inflectional categories (features) like gender, number, person and case in case of nouns and verbal categories in case of verbs and other relevant inflectional endings depending on the category. In this paper we describe how the Morphological Generator handles all of the inflectional forms in addition to the productive derivational forms. The Input and output are in Shakti Standard Form (SSF).When tested with languages like Telugu, Hindi and Tamil their accuracy was 97.2%, 98% and 94% respectively.

Telugu Spell-Checker

by Christopher Mala and Amba Kulkarni

Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All t... more Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All the misspelt words are marked and allowed for correction. This system also can be used as an editor where the text is checked for spelling errors and suggestion for correction are provided. Telugu is an agglutinating language and has a very complex morphology which is coupled with prolific sandhi or morphophonemics. The sandhi that is noticed in Telugu is not limited to internal but also external. Both consonantal and vocalic sandhi are common and well-studied in Telugu [Krishnamurti, 1957, 1985]. To identify the specific sandhi type and split it appropriately is a very challenging task. External sandhi is a linguistic phenomenon which refers to a set of changes that occur at word boundaries. These changes are similar to phonological processes such as substition (modification by various means) deletion, and insertion. External sandhi is often orthographically reflected in Telugu. External sandhi in such cases, causes the formation of such forms which are morphologically unanalyzable, thus posing a problem for all kinds of NLP applications. In this paper, we discuss in detail the processes external sandhi in Telugu and the Computational tool the Spell Checker.

A Tamil – Telugu Machine Translation System

We present the development of Machine Translation (MT) System which translates texts from Tamil t... more We present the development of Machine Translation (MT) System which translates texts from Tamil to Telugu and vice-versa (Bi-directional). It is based on Transfer Approach. The System's Architecture is divided into three stages i.e. Source language Analysis module (SL), Source language to Target language Transfer module (SL-TL) and Target language generation module (TL). The major cross-linguistic differences that are experienced between Tamil and Telugu during the development of Machine Translation system are discussed here.

A Hindi-Telugu Bi-Directional Machine Translation System

by Christopher Mala, Karumuri V. Subbarao, and Bindu Madhavi

The development of Machine Translation (MT) is one of the most challenging tasks of Natural Langu... more The development of Machine Translation (MT) is one of the most challenging tasks of Natural Language Processing Applications. In MT there are a number of approaches that are being practiced all over the world, chiefly, they are Direct translations, Interlingual translations, Transfer based translations and a combination of these beside the statistical and corpus based methods. It is a known fact that Indian languages exhibit a considerable amount of diversity between them at every level viz. morphological, syntactic, semantic and lexical levels. In the Transfer Based approach a representation of source language (SL) at certain level is transferred to the corresponding target language (TL) representation. Keeping these in mind, building a Machine Translation System for these languages using Transfer based Method can be non-trivial and challenging. The present paper discusses the successful implementation of the Transfer based Approach to the Machine Translation (MT) System for Hindi<->Telugu. Different resources for this system come from eleven different institutions across India.

The Tibetic languages and their classification

A TELUGU MORPHOLOGICAL ANALYZER

by Amba Kulkarni and Christopher Mala

A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural languag... more A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural language into their roots and their constituent morpho-syntactic elements along with their attributes. The present paper demonstrates computational implementation of a Morphological Analyzer for Telugu. The algorithm used to build this MA is theoretically justified and is practically executed for Telugu in the context of Modern Standard Written variety. The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms. The current MA engine's coverage may range between 95-97% on a variety of corpora (3 million word length corpus).

Papers by Christopher Mala

The Tibetic languages and their classification

Trans-Himalayan Linguistics, 2013

Rule Based Approch of Clause Boundary Identification in Telugu

by Christopher Mala and Thennarasu Sakkan

One of the major challenges in Natural Language Processing is identifying Clauses and their Bound... more One of the major challenges in Natural Language Processing is identifying Clauses and their Boundaries in Compu-tational Linguistics. This paper attempts to develop an Automatic Clause Bound-ary Identifier (CBI) for Telugu lan-guage. The language Telugu belongs to South-Central Dravidian language fami-ly with features of head-final, leftbranching and morphologically agglutinative in nature (Bh. Krishnamurti, 2003). A huge amount of corpus is studied to frame the rules for identifying clause boundaries and these rules are trained to a computational algorithm and also discussed some of the issues in identifying clause boundaries. A clause boundary annotated corpus can be developed from raw text which can be used to train a machine learning algorithm which in turns helps in development of a Hybrid Clause Boundary Identification Tool for Telugu. Its implementation and evaluation are discussed in this paper.

A Tamil-Telugu Machine Translation System

by Christopher Mala and Uma Maheshwar Rao Garapati

caltslab.uohyd.ernet.in

... For instance, (27) Ta. eṉ-akku nīccal teriyum. Eng. I know to swim. 'me-DAT swimming kno... more

Course Material Decisions and Factors: Unpacking the Opaque Box

From Start-Up to Adolescence: University of Oklahoma’s OER Efforts

Making the Connections: The Role of Professional Development in Advocating for OER

Library-Supported Adoption and Creation Programs

Seeking Alternatives to High-Cost Textbooks: Six Years of The Open Education Initiative at the University of Massachusetts Amherst

OER: A Field Guide for Academic Librarians | Editor's Cut

Advocacy in OER: A Statewide Strategy for Building a Sustainable Library Effort

A Telugu Morphological Analyzer

A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural languag... more A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural language into their roots and their constituent morpho-syntactic elements along with their attributes. The present paper demonstrates computational implementation of a Morphological Analyzer for Telugu. The algorithm used to build this MA is theoretically justified and is practically executed for Telugu in the context of Modern Standard Written variety. The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms. The current MA engine's coverage may range between 95-97% on a variety of corpora (3 million word length corpus). Introduction: It is a well known fact that the morphology of Telugu is not only rich in terms of the density of word-forms produced for a given root/stem but also diverse in the morphological strategies that are usually empl...

From Textbook Affordability to Transformative Pedagogy: Growing an OER Community

Telugu Word Synthesizer

This paper describes the development of a Morphological Generator, a generic Engine which can be ... more This paper describes the development of a Morphological Generator, a generic Engine which can be used for any language by plugging in a specific language database. This Generator synthesizes all and only the well-formed word forms. These word forms include both inflectional and productive derivational forms. This Morphological Generator engine is independent of language and works effectively and is based on word-and-paradigm method. This Computational model uses machine learning method based on morphological data base developed by using word and paradigm model of Morphology. This method not only ensures coverage but also evolvement. The engine takes as input a root and along with it its inflectional categories (features) like gender, number, person and case in case of nouns and verbal categories in case of verbs and other relevant inflectional endings depending on the category. In this paper we describe how the Morphological Generator handles all of the inflectional forms in addition to the productive derivational forms. The Input and output are in Shakti Standard Form (SSF).When tested with languages like Telugu, Hindi and Tamil their accuracy was 97.2%, 98% and 94% respectively.

Telugu Spell-Checker

by Christopher Mala and Amba Kulkarni

Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All t... more Spell Checker is an application which handles spelling errors and Spelling Variations (SV). All the misspelt words are marked and allowed for correction. This system also can be used as an editor where the text is checked for spelling errors and suggestion for correction are provided. Telugu is an agglutinating language and has a very complex morphology which is coupled with prolific sandhi or morphophonemics. The sandhi that is noticed in Telugu is not limited to internal but also external. Both consonantal and vocalic sandhi are common and well-studied in Telugu [Krishnamurti, 1957, 1985]. To identify the specific sandhi type and split it appropriately is a very challenging task. External sandhi is a linguistic phenomenon which refers to a set of changes that occur at word boundaries. These changes are similar to phonological processes such as substition (modification by various means) deletion, and insertion. External sandhi is often orthographically reflected in Telugu. External sandhi in such cases, causes the formation of such forms which are morphologically unanalyzable, thus posing a problem for all kinds of NLP applications. In this paper, we discuss in detail the processes external sandhi in Telugu and the Computational tool the Spell Checker.

A Tamil – Telugu Machine Translation System

We present the development of Machine Translation (MT) System which translates texts from Tamil t... more We present the development of Machine Translation (MT) System which translates texts from Tamil to Telugu and vice-versa (Bi-directional). It is based on Transfer Approach. The System's Architecture is divided into three stages i.e. Source language Analysis module (SL), Source language to Target language Transfer module (SL-TL) and Target language generation module (TL). The major cross-linguistic differences that are experienced between Tamil and Telugu during the development of Machine Translation system are discussed here.

A Hindi-Telugu Bi-Directional Machine Translation System

by Christopher Mala, Karumuri V. Subbarao, and Bindu Madhavi

The development of Machine Translation (MT) is one of the most challenging tasks of Natural Langu... more The development of Machine Translation (MT) is one of the most challenging tasks of Natural Language Processing Applications. In MT there are a number of approaches that are being practiced all over the world, chiefly, they are Direct translations, Interlingual translations, Transfer based translations and a combination of these beside the statistical and corpus based methods. It is a known fact that Indian languages exhibit a considerable amount of diversity between them at every level viz. morphological, syntactic, semantic and lexical levels. In the Transfer Based approach a representation of source language (SL) at certain level is transferred to the corresponding target language (TL) representation. Keeping these in mind, building a Machine Translation System for these languages using Transfer based Method can be non-trivial and challenging. The present paper discusses the successful implementation of the Transfer based Approach to the Machine Translation (MT) System for Hindi<->Telugu. Different resources for this system come from eleven different institutions across India.

The Tibetic languages and their classification

A TELUGU MORPHOLOGICAL ANALYZER

by Amba Kulkarni and Christopher Mala

A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural languag... more A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural language into their roots and their constituent morpho-syntactic elements along with their attributes. The present paper demonstrates computational implementation of a Morphological Analyzer for Telugu. The algorithm used to build this MA is theoretically justified and is practically executed for Telugu in the context of Modern Standard Written variety. The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms. The current MA engine's coverage may range between 95-97% on a variety of corpora (3 million word length corpus).

The Tibetic languages and their classification

Trans-Himalayan Linguistics, 2013

Rule Based Approch of Clause Boundary Identification in Telugu

by Christopher Mala and Thennarasu Sakkan

One of the major challenges in Natural Language Processing is identifying Clauses and their Bound... more One of the major challenges in Natural Language Processing is identifying Clauses and their Boundaries in Compu-tational Linguistics. This paper attempts to develop an Automatic Clause Bound-ary Identifier (CBI) for Telugu lan-guage. The language Telugu belongs to South-Central Dravidian language fami-ly with features of head-final, leftbranching and morphologically agglutinative in nature (Bh. Krishnamurti, 2003). A huge amount of corpus is studied to frame the rules for identifying clause boundaries and these rules are trained to a computational algorithm and also discussed some of the issues in identifying clause boundaries. A clause boundary annotated corpus can be developed from raw text which can be used to train a machine learning algorithm which in turns helps in development of a Hybrid Clause Boundary Identification Tool for Telugu. Its implementation and evaluation are discussed in this paper.

A Tamil-Telugu Machine Translation System

by Christopher Mala and Uma Maheshwar Rao Garapati

caltslab.uohyd.ernet.in

... For instance, (27) Ta. eṉ-akku nīccal teriyum. Eng. I know to swim. 'me-DAT swimming kno... more

Course Material Decisions and Factors: Unpacking the Opaque Box

From Start-Up to Adolescence: University of Oklahoma’s OER Efforts

Making the Connections: The Role of Professional Development in Advocating for OER

Library-Supported Adoption and Creation Programs

Seeking Alternatives to High-Cost Textbooks: Six Years of The Open Education Initiative at the University of Massachusetts Amherst

OER: A Field Guide for Academic Librarians | Editor's Cut

Advocacy in OER: A Statewide Strategy for Building a Sustainable Library Effort

A Telugu Morphological Analyzer

A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural languag... more A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural language into their roots and their constituent morpho-syntactic elements along with their attributes. The present paper demonstrates computational implementation of a Morphological Analyzer for Telugu. The algorithm used to build this MA is theoretically justified and is practically executed for Telugu in the context of Modern Standard Written variety. The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms. The current MA engine's coverage may range between 95-97% on a variety of corpora (3 million word length corpus). Introduction: It is a well known fact that the morphology of Telugu is not only rich in terms of the density of word-forms produced for a given root/stem but also diverse in the morphological strategies that are usually empl...

From Textbook Affordability to Transformative Pedagogy: Growing an OER Community