Papers by Thatsanee Charoenporn

14th Annual Meeting of the Association for Natural Language Processing, Mar 18, 2008
WordNet is a kind of word knowledge database which is widely used in the recent years. Basically,... more WordNet is a kind of word knowledge database which is widely used in the recent years. Basically, the word concept is defined by a set of its synonyms, called synset. English WordNet was originally proposed and developed at Princeton University. Since then, WordNet for several languages such as Euro WordNet were constructed. For Asian languages, the efforts on creating WordNet for Chinese, Korean, Hindi, and Japanese can also be found. This paper aims to create a linkage among Asian languages by adopting the concept of semantic relations and synset expressed in WordNet. Based on the Princeton WordNet (PWN), we propose a method in generating a WordNet by using an existing bilingual dictionary. Our algorithm is to align the PWN synset to a bilingual dictionary through the English equivalent and its part-of-speech (POS). Number of English equivalent of a word in a synset increases the degree of confidence in the synset assignment process. We also introduce a web-based collaborative workbench, called KUI (Knowledge Unifying Initiator), for revising the result of synset assignment and provide a framework to create Asian WordNet via the linkage through PWN synset.
Proceedings of the 7th …, Aug 6, 2009
This paper describes semi-automatic construction of Thai WordNet and the applied method for Asian... more This paper describes semi-automatic construction of Thai WordNet and the applied method for Asian wordNet. Based on the Princeton WordNet, we develop a method in generating a WordNet by using an existing bi-lingual dictionary. We align the PWN synset to a bilingual dictionary through the English equivalent and its part-of-speech (POS), automatically. Manual translation is also employed after the alignment. We also develop a web-based collaborative workbench, called KUI (Knowledge Unifying Initiator), for ...
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008
Numerative classifiers are ubiquitous in many Asian languages. This paper proposes a method to co... more Numerative classifiers are ubiquitous in many Asian languages. This paper proposes a method to construct a taxonomy of numerative classifiers based on a nounclassifier agreement database. The taxonomy defines superordinate-subordinate relation among numerative classifiers and represents the relations in tree structures. The experiments to construct taxonomies were conducted for evaluation by using data from three different languages: Chinese, Japanese and Thai. We found that our method was ...

Orchid, TR-NECTEC- …, Jan 1, 1997
This paper presents a procedure in building a Thai part-of-speech (POS) tagged corpus named ORCHI... more This paper presents a procedure in building a Thai part-of-speech (POS) tagged corpus named ORCHID [1]. It is a collaboration project between Communications Research Laboratory (CRL) of Japan and National Electronics and Computer Technology Center (NECTEC) of Thailand. We proposed a new tagset based on the previous research on Thai parts-of-speech for using in a multi-lingual machine translation project. We marked the corpus in three levels:-paragraph, sentence and word. The corpus keeps text information in text information line and numbering line which are necessary in retrieving process. Since there are no explicit word/sentence boundary, punctuation and inflection in Thai text, we have to separate a paragraph into sentences before tagging the POS. We applied a probabilistic trigram model for simultaneously word segmenting and POS tagging. Rule for syllable construction is additionally used to reduce the number of candidates for computing the probability. The problems in POS assignment are formalized to reduce the ambiguity occurring in case of the similar POSs.
Automatic corpus-based Thai word extraction with the C4. 5 learning algorithm
Proceedings of the 18th …, Jan 1, 2000
Uploads
Papers by Thatsanee Charoenporn