Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguatio... more Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine readable dictionaries (MRDs), machine readable thesauri, and bilingual corpora. In recent years, WordNet has become the most widely used resource for the study of WSD and lexical semantics in general. This paper describes the Class-Based Translation Model and its application in assigning translations to nominal senses in WordNet in order to build a prototype Chinese WordNet. Experiments and evaluations show that the proposed approach can potentially be adopted to speed up the construction of WordNet for Chinese and other languages.
We introduce a method for disambiguating a given group of semantically related words with respect... more We introduce a method for disambiguating a given group of semantically related words with respect to a certain sense inventory, such as WordNet or Cambridge English Dictionary. In our approach, every member word is converted into a set of senses to be disambiguated. The method involves clustering relevant senses and filtering out irrelevant senses, and determining the intended senses for each word in the group based on pairwise sense similarity. A preliminary evaluation of our method on several datasets shows that the method extends and outperforms the previous work that only deals with noun groups [1]. Our method is more generally applicable, allowing nouns, verbs, and adjectives groups, and can be used to aligning two anthologies to combine knowledge resources, as well as to generate training data for word sense disambiguation tasks.
This paper describes a heuristic algorithm capable of automatically assigning a label to each of ... more This paper describes a heuristic algorithm capable of automatically assigning a label to each of the senses in a machine readable dictionary (MRD) for the purpose of acquiring a computational-semantic lexicon for treatment of lexical ambiguity. Including these labels in the MRD-based lexical database offers several positive effects. The labels can be used as a coarser sense division so unnecessarily fine sense distinction can be avoided in word sense disambiguation (WSD).The algorithm is based primarily on simple word matching between an MRD definition sentence and word lists of an LLOCE topic. We also describe an implementation of the algorithm for labeling definition sentences in Longman Dictionary of Contemporary English (LDOCE). For this purpose the topics and sets of related words in Longman Lexicon of Contemporary English (LLOCE) are used in this work. Quantitative results for a 12-word test set are reported. Our discussion entails how the availability of these labels provides the means for treating such problems as: acquisition of a lexicon capable of providing broad coverage, systematic word sense shifts, lexical underspecification, and acquisition of zero-derivatives.
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits mu... more We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense ambiguity for words in a parallel corpus. That sense tagging procedure, in effect, produces a semantic bilingual concordance, which can be used to train WSD systems for the two languages involved. Experimental results show that CBSDM trained on Longman Dictionary of Contemporary English, English-Chinese Edition (LDOCE E-C) and Longman Lexicon of Contemporary English (LLOCE) is very effectively in turning a Chinese-English parallel corpus into sense tagged data for development of WSD systems.
2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
The named-entity phrases in free text represent a formidable challenge to text analysis. Translat... more The named-entity phrases in free text represent a formidable challenge to text analysis. Translating a named-entity is important for the task of Cross Language Information Retrieval and Question Answering. However, both tasks are not easy to handle because named-entities found in free text are often not listed in a monolingual or bilingual dictionary. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list certainly will ensure the high accuracy rate of text analysis. We use a list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in a bilingual corpus with high average precision and recall rates.
Previous studies have shown that web-based concordancing is advantageous for tertiary-level learn... more Previous studies have shown that web-based concordancing is advantageous for tertiary-level learners or above in improving their vocabulary knowledge and writing skills; however, its effects on primary school students are less well known as relatively little research has been conducted for this particular age group. Therefore, In the present study, a Chinese-English concordancer specifically designed for Taiwanese primary school students was developed with a corpus taken from sentences modeling their English textbooks. The research aimed to investigate the effectiveness of web-based concordancing on children's vocabulary learning. In order to observe the development of vocabulary knowledge, one of the longitudinal designs, the time-series design (Mellow, Reeder, & Forster, 1996), was adopted for 28 weeks long. In the study, seven fifth graders in an intact class were required to provide the word meanings, usage of the target words and produce sentences with the words before, dur...
Int. J. Comput. Linguistics Chin. Lang. Process., 2008
Researchers have developed many computational tools aimed at extracting collocations for both sec... more Researchers have developed many computational tools aimed at extracting collocations for both second language learners and lexicographers. Unfortunately, the tremendously large number of collocates returned by these tools usually overwhelms language learners. In this paper, we introduce a thesaurus-based semantic classification model that automatically learns semantic relations for classifying adjective-noun (A-N) and verb-noun (V-N) collocations into different thesaurus categories. Our model is based on iterative random walking over a weighted graph derived from an integrated knowledge source of word senses in WordNet and semantic categories of a thesaurus for collocation classification. We conduct an experiment on a set of collocations whose collocates involve varying levels of abstractness in the collocation usage box of Macmillan English Dictionary. Experimental evaluation with a collection of 150 multiple-choice questions commonly used as a similarity benchmark in the TOEFL syn...
We introduce a method for assisting English as Second Language (ESL) learners by providing transl... more We introduce a method for assisting English as Second Language (ESL) learners by providing translations of Collins COBUILD grammar patterns(GP) for a given word. In our approach, bilingual parallel corpus is transformed into bilingual GP pairs aimed at providing native language support for learning word usage through GPs. The method involves automatically parsing sentences to extract GPs, automatically generating translation GP pairs from bilingual sentences, and automatically extracting common bilingual GPs. At run-time, the target word is used for lookup GPs and translations, and the retrieved common GPs and their example sentences are shown to the user. We present a prototype phrase search engine, Linggle GPTrans, that implements the methods to assist ESL learners. Preliminary evaluation on a set of more than 300 GP-translation pairs shows that the methods achieve 91% accuracy.
We introduce a method for generating error-correction rules for grammar pattern errors in a given... more We introduce a method for generating error-correction rules for grammar pattern errors in a given annotated learner corpus. In our approach, annotated edits in the learner corpus are converted into edit rules for correcting common writing errors. The method involves automatic extraction of grammar patterns, and automatic alignment of the erroneous patterns and correct patterns. At run-time, grammar patterns are extracted from the grammatically correct sentences, and correction rules are retrieved by aligning the extracted grammar patterns with the erroneous patterns. Using the proposed method, we generate 1,499 high-quality correction rules related to 232 headwords. The method can be used to assist ESL students in avoiding grammatical errors, and aid teachers in correcting students’ essays. Additionally, the method can be used in the compilation of collocation error dictionaries and the construction of grammar error correction systems.
This paper describes the development of an innovative web-based environment for English language ... more This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus (Sinorama) and various natural language processing (NLP) tools to construct effective English learning tasks for college learners with adaptive computational scaffolding. It integrates the expertise of a group of researchers in four areas: (a) advances in NLP technologies and applications, (b) construction of a self-access reading environment, (c) exploration of English language learning through written exercises and translations, and (d) use of bilingual corpora for culture-based English learning. In this paper, the conceptualization of the system and its various reference tools (e.g., a bilingual concordancer) for English learning and pilot testing on various modules (e.g., a reading module, Text Grader, and Collocation Practice) are reported.
We introduce a method for learning to grammatically categorize and organize the contexts of a giv... more We introduce a method for learning to grammatically categorize and organize the contexts of a given query. In our approach, grammatical descriptions, from general word groups to specific lexical phrases, are imposed on the query's contexts aimed at accelerating lexicographers' and language learners' navigation through and GRASP upon the word usages. The method involves lemmatizing, part-of-speech tagging and shallowly parsing a general corpus and constructing its inverted files for monolingual queries, and word-aligning parallel texts and extracting and pruning translation equivalents for cross-lingual ones. At run-time, grammar-like patterns are generated, organized to form a thesaurus index structure on query words' contexts, and presented to users along with their instantiations. Experimental results show that the extracted predominant patterns resemble phrases in grammar books and that the abstract-to-concrete context hierarchy of querying words effectively assists the process of language learning, especially in sentence translation or composition. Index terms-Grammatical constructions, lexical phrases, context, language learning, inverted files, phrase pairs, crosslingual pattern retrieval.
Proceedings of the second SIGHAN workshop on Chinese language processing -, 2003
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits mu... more We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense ambiguity for words in a parallel corpus. That sense tagging procedure, in effect, produces a semantic bilingual concordance, which can be used to train WSD systems for the two languages involved. Experimental results show that CBSDM trained on Longman Dictionary of Contemporary English, English-Chinese Edition (LDOCE E-C) and Longman Lexicon of Contemporary English (LLOCE) is very effectively in turning a Chinese-English parallel corpus into sense tagged data for development of WSD systems.
2013 International Conference on Asian Language Processing, 2013
Syntactic patterns which are hard to be expressed by binary dependent relations need special trea... more Syntactic patterns which are hard to be expressed by binary dependent relations need special treatments, since structure evaluations of such constructions are different from general parsing framework. Moreover, these different syntactic patterns (special cases) should be handled with distinct estimated model other than the general one. In this paper, we present a special-case probability re-estimation model (SCM), integrating the general model with an adoptable estimated model in special cases. The SCM model can estimate evaluation scores in specific syntactic constructions more accurately, and is able for adopting different features in different cases. Experiment results show that our proposed model has better performance than the state-of-the-art parser in Chinese.
Proceedings of the COLING/ACL on Interactive presentation sessions -, 2006
This paper introduces a method for computational analysis of move structures in abstracts of rese... more This paper introduces a method for computational analysis of move structures in abstracts of research articles. In our approach, sentences in a given abstract are analyzed and labeled with a specific move in light of various rhetorical functions. The method involves automatically gathering a large number of abstracts from the Web and building a language model of abstract moves. We also present a prototype concordancer, CARE, which exploits the move-tagged abstracts for digital learning. This system provides a promising approach to Webbased computer-assisted academic writing.
Proceedings of the ACL 2004 on Interactive poster and demonstration sessions -, 2004
This paper describes a database of translation memory, TotalRecall, developed to encourage authen... more This paper describes a database of translation memory, TotalRecall, developed to encourage authentic and idiomatic use in second language writing. TotalRecall is a bilingual concordancer that support search query in English or Chinese for relevant sentences and translations. Although initially intended for learners of English as Foreign Language (EFL) in Taiwan, it is a gold mine of texts in English or Mandarin Chinese. TotalRecall is particularly useful for those who write in or translate into a foreign language. We exploited and structured existing high-quality translations from bilingual corpora from a Taiwan-based Sinorama Magazine and Official Records of Hong Kong Legislative Council to build a bilingual concordance. Novel approaches were taken to provide highprecision bilingual alignment on the subsentential and lexical levels. A browserbased user interface was developed for ease of access over the Internet. Users can search for word, phrase or expression in English or Mandarin. The Web-based user interface facilitates the recording of the user actions to provide data for further research.
While researchers have examined the effectiveness of various online gloss types on incidental L2 ... more While researchers have examined the effectiveness of various online gloss types on incidental L2 vocabulary learning, little research on online gloss languages has been conducted. Previous attempts which compared the effects of L1 and L2 glosses have reported mixed results. To fill the gaps, this study examined the effectiveness of Chinese and English e-glosses on incidental English vocabulary learning on a less-researched student group in CALL – junior high-school English-as-a-foreign-language (EFL) students. Seventy-eight students with Chinese as their first language read two online passages with either Chinese (L1) or English (L2) glosses. They were divided into four treatment groups: (1) high-proficiency students receiving L1 gloss before L2 gloss (n = 19), (2) high-proficiency students receiving L2 gloss before L1 gloss (n = 19), (3) low-proficiency students receiving L1 gloss before L2 gloss (n = 20), and (4) low-proficiency students receiving L2 gloss before L1 gloss (n = 20)...
Various writing assistance tools have been developed through efforts in the areas of natural lang... more Various writing assistance tools have been developed through efforts in the areas of natural language processing with different degrees of success of curriculum integration depending on their functional rigor and pedagogical designs. In this paper, we developed a system, WriteAhead, that provides six types of suggestions when non-native graduate students of English from different disciplines are composing journal abstracts, and assessed its effectiveness. The method involved automatically building domain-specific corpora of abstracts from the Web via domain names and related keywords as query expansions, and automatically extracting vocabulary and n-grams from the corpora in order to offer writing suggestions. At runtime, learners' input in the writing area of the system actively triggered a set of corresponding writing suggestions. This abstract writing assistant system facilitates interactions between learners and the system for writing abstracts in an effective and contextualized way, by providing suggestions such as collocations or transitional words. For assessment of WriteAhead, we compared the writing performance of two groups of students with or without using the system, and adopted student perception data. Findings show that the experiment group wrote better, and most students were satisfied with the system concerning most suggestion types, as they can effectively compose quality abstracts through provided language supports from WriteAhead.
Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguatio... more Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine readable dictionaries (MRDs), machine readable thesauri, and bilingual corpora. In recent years, WordNet has become the most widely used resource for the study of WSD and lexical semantics in general. This paper describes the Class-Based Translation Model and its application in assigning translations to nominal senses in WordNet in order to build a prototype Chinese WordNet. Experiments and evaluations show that the proposed approach can potentially be adopted to speed up the construction of WordNet for Chinese and other languages.
We introduce a method for disambiguating a given group of semantically related words with respect... more We introduce a method for disambiguating a given group of semantically related words with respect to a certain sense inventory, such as WordNet or Cambridge English Dictionary. In our approach, every member word is converted into a set of senses to be disambiguated. The method involves clustering relevant senses and filtering out irrelevant senses, and determining the intended senses for each word in the group based on pairwise sense similarity. A preliminary evaluation of our method on several datasets shows that the method extends and outperforms the previous work that only deals with noun groups [1]. Our method is more generally applicable, allowing nouns, verbs, and adjectives groups, and can be used to aligning two anthologies to combine knowledge resources, as well as to generate training data for word sense disambiguation tasks.
This paper describes a heuristic algorithm capable of automatically assigning a label to each of ... more This paper describes a heuristic algorithm capable of automatically assigning a label to each of the senses in a machine readable dictionary (MRD) for the purpose of acquiring a computational-semantic lexicon for treatment of lexical ambiguity. Including these labels in the MRD-based lexical database offers several positive effects. The labels can be used as a coarser sense division so unnecessarily fine sense distinction can be avoided in word sense disambiguation (WSD).The algorithm is based primarily on simple word matching between an MRD definition sentence and word lists of an LLOCE topic. We also describe an implementation of the algorithm for labeling definition sentences in Longman Dictionary of Contemporary English (LDOCE). For this purpose the topics and sets of related words in Longman Lexicon of Contemporary English (LLOCE) are used in this work. Quantitative results for a 12-word test set are reported. Our discussion entails how the availability of these labels provides the means for treating such problems as: acquisition of a lexicon capable of providing broad coverage, systematic word sense shifts, lexical underspecification, and acquisition of zero-derivatives.
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits mu... more We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense ambiguity for words in a parallel corpus. That sense tagging procedure, in effect, produces a semantic bilingual concordance, which can be used to train WSD systems for the two languages involved. Experimental results show that CBSDM trained on Longman Dictionary of Contemporary English, English-Chinese Edition (LDOCE E-C) and Longman Lexicon of Contemporary English (LLOCE) is very effectively in turning a Chinese-English parallel corpus into sense tagged data for development of WSD systems.
2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
The named-entity phrases in free text represent a formidable challenge to text analysis. Translat... more The named-entity phrases in free text represent a formidable challenge to text analysis. Translating a named-entity is important for the task of Cross Language Information Retrieval and Question Answering. However, both tasks are not easy to handle because named-entities found in free text are often not listed in a monolingual or bilingual dictionary. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list certainly will ensure the high accuracy rate of text analysis. We use a list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in a bilingual corpus with high average precision and recall rates.
Previous studies have shown that web-based concordancing is advantageous for tertiary-level learn... more Previous studies have shown that web-based concordancing is advantageous for tertiary-level learners or above in improving their vocabulary knowledge and writing skills; however, its effects on primary school students are less well known as relatively little research has been conducted for this particular age group. Therefore, In the present study, a Chinese-English concordancer specifically designed for Taiwanese primary school students was developed with a corpus taken from sentences modeling their English textbooks. The research aimed to investigate the effectiveness of web-based concordancing on children's vocabulary learning. In order to observe the development of vocabulary knowledge, one of the longitudinal designs, the time-series design (Mellow, Reeder, & Forster, 1996), was adopted for 28 weeks long. In the study, seven fifth graders in an intact class were required to provide the word meanings, usage of the target words and produce sentences with the words before, dur...
Int. J. Comput. Linguistics Chin. Lang. Process., 2008
Researchers have developed many computational tools aimed at extracting collocations for both sec... more Researchers have developed many computational tools aimed at extracting collocations for both second language learners and lexicographers. Unfortunately, the tremendously large number of collocates returned by these tools usually overwhelms language learners. In this paper, we introduce a thesaurus-based semantic classification model that automatically learns semantic relations for classifying adjective-noun (A-N) and verb-noun (V-N) collocations into different thesaurus categories. Our model is based on iterative random walking over a weighted graph derived from an integrated knowledge source of word senses in WordNet and semantic categories of a thesaurus for collocation classification. We conduct an experiment on a set of collocations whose collocates involve varying levels of abstractness in the collocation usage box of Macmillan English Dictionary. Experimental evaluation with a collection of 150 multiple-choice questions commonly used as a similarity benchmark in the TOEFL syn...
We introduce a method for assisting English as Second Language (ESL) learners by providing transl... more We introduce a method for assisting English as Second Language (ESL) learners by providing translations of Collins COBUILD grammar patterns(GP) for a given word. In our approach, bilingual parallel corpus is transformed into bilingual GP pairs aimed at providing native language support for learning word usage through GPs. The method involves automatically parsing sentences to extract GPs, automatically generating translation GP pairs from bilingual sentences, and automatically extracting common bilingual GPs. At run-time, the target word is used for lookup GPs and translations, and the retrieved common GPs and their example sentences are shown to the user. We present a prototype phrase search engine, Linggle GPTrans, that implements the methods to assist ESL learners. Preliminary evaluation on a set of more than 300 GP-translation pairs shows that the methods achieve 91% accuracy.
We introduce a method for generating error-correction rules for grammar pattern errors in a given... more We introduce a method for generating error-correction rules for grammar pattern errors in a given annotated learner corpus. In our approach, annotated edits in the learner corpus are converted into edit rules for correcting common writing errors. The method involves automatic extraction of grammar patterns, and automatic alignment of the erroneous patterns and correct patterns. At run-time, grammar patterns are extracted from the grammatically correct sentences, and correction rules are retrieved by aligning the extracted grammar patterns with the erroneous patterns. Using the proposed method, we generate 1,499 high-quality correction rules related to 232 headwords. The method can be used to assist ESL students in avoiding grammatical errors, and aid teachers in correcting students’ essays. Additionally, the method can be used in the compilation of collocation error dictionaries and the construction of grammar error correction systems.
This paper describes the development of an innovative web-based environment for English language ... more This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus (Sinorama) and various natural language processing (NLP) tools to construct effective English learning tasks for college learners with adaptive computational scaffolding. It integrates the expertise of a group of researchers in four areas: (a) advances in NLP technologies and applications, (b) construction of a self-access reading environment, (c) exploration of English language learning through written exercises and translations, and (d) use of bilingual corpora for culture-based English learning. In this paper, the conceptualization of the system and its various reference tools (e.g., a bilingual concordancer) for English learning and pilot testing on various modules (e.g., a reading module, Text Grader, and Collocation Practice) are reported.
We introduce a method for learning to grammatically categorize and organize the contexts of a giv... more We introduce a method for learning to grammatically categorize and organize the contexts of a given query. In our approach, grammatical descriptions, from general word groups to specific lexical phrases, are imposed on the query's contexts aimed at accelerating lexicographers' and language learners' navigation through and GRASP upon the word usages. The method involves lemmatizing, part-of-speech tagging and shallowly parsing a general corpus and constructing its inverted files for monolingual queries, and word-aligning parallel texts and extracting and pruning translation equivalents for cross-lingual ones. At run-time, grammar-like patterns are generated, organized to form a thesaurus index structure on query words' contexts, and presented to users along with their instantiations. Experimental results show that the extracted predominant patterns resemble phrases in grammar books and that the abstract-to-concrete context hierarchy of querying words effectively assists the process of language learning, especially in sentence translation or composition. Index terms-Grammatical constructions, lexical phrases, context, language learning, inverted files, phrase pairs, crosslingual pattern retrieval.
Proceedings of the second SIGHAN workshop on Chinese language processing -, 2003
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits mu... more We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense ambiguity for words in a parallel corpus. That sense tagging procedure, in effect, produces a semantic bilingual concordance, which can be used to train WSD systems for the two languages involved. Experimental results show that CBSDM trained on Longman Dictionary of Contemporary English, English-Chinese Edition (LDOCE E-C) and Longman Lexicon of Contemporary English (LLOCE) is very effectively in turning a Chinese-English parallel corpus into sense tagged data for development of WSD systems.
2013 International Conference on Asian Language Processing, 2013
Syntactic patterns which are hard to be expressed by binary dependent relations need special trea... more Syntactic patterns which are hard to be expressed by binary dependent relations need special treatments, since structure evaluations of such constructions are different from general parsing framework. Moreover, these different syntactic patterns (special cases) should be handled with distinct estimated model other than the general one. In this paper, we present a special-case probability re-estimation model (SCM), integrating the general model with an adoptable estimated model in special cases. The SCM model can estimate evaluation scores in specific syntactic constructions more accurately, and is able for adopting different features in different cases. Experiment results show that our proposed model has better performance than the state-of-the-art parser in Chinese.
Proceedings of the COLING/ACL on Interactive presentation sessions -, 2006
This paper introduces a method for computational analysis of move structures in abstracts of rese... more This paper introduces a method for computational analysis of move structures in abstracts of research articles. In our approach, sentences in a given abstract are analyzed and labeled with a specific move in light of various rhetorical functions. The method involves automatically gathering a large number of abstracts from the Web and building a language model of abstract moves. We also present a prototype concordancer, CARE, which exploits the move-tagged abstracts for digital learning. This system provides a promising approach to Webbased computer-assisted academic writing.
Proceedings of the ACL 2004 on Interactive poster and demonstration sessions -, 2004
This paper describes a database of translation memory, TotalRecall, developed to encourage authen... more This paper describes a database of translation memory, TotalRecall, developed to encourage authentic and idiomatic use in second language writing. TotalRecall is a bilingual concordancer that support search query in English or Chinese for relevant sentences and translations. Although initially intended for learners of English as Foreign Language (EFL) in Taiwan, it is a gold mine of texts in English or Mandarin Chinese. TotalRecall is particularly useful for those who write in or translate into a foreign language. We exploited and structured existing high-quality translations from bilingual corpora from a Taiwan-based Sinorama Magazine and Official Records of Hong Kong Legislative Council to build a bilingual concordance. Novel approaches were taken to provide highprecision bilingual alignment on the subsentential and lexical levels. A browserbased user interface was developed for ease of access over the Internet. Users can search for word, phrase or expression in English or Mandarin. The Web-based user interface facilitates the recording of the user actions to provide data for further research.
While researchers have examined the effectiveness of various online gloss types on incidental L2 ... more While researchers have examined the effectiveness of various online gloss types on incidental L2 vocabulary learning, little research on online gloss languages has been conducted. Previous attempts which compared the effects of L1 and L2 glosses have reported mixed results. To fill the gaps, this study examined the effectiveness of Chinese and English e-glosses on incidental English vocabulary learning on a less-researched student group in CALL – junior high-school English-as-a-foreign-language (EFL) students. Seventy-eight students with Chinese as their first language read two online passages with either Chinese (L1) or English (L2) glosses. They were divided into four treatment groups: (1) high-proficiency students receiving L1 gloss before L2 gloss (n = 19), (2) high-proficiency students receiving L2 gloss before L1 gloss (n = 19), (3) low-proficiency students receiving L1 gloss before L2 gloss (n = 20), and (4) low-proficiency students receiving L2 gloss before L1 gloss (n = 20)...
Various writing assistance tools have been developed through efforts in the areas of natural lang... more Various writing assistance tools have been developed through efforts in the areas of natural language processing with different degrees of success of curriculum integration depending on their functional rigor and pedagogical designs. In this paper, we developed a system, WriteAhead, that provides six types of suggestions when non-native graduate students of English from different disciplines are composing journal abstracts, and assessed its effectiveness. The method involved automatically building domain-specific corpora of abstracts from the Web via domain names and related keywords as query expansions, and automatically extracting vocabulary and n-grams from the corpora in order to offer writing suggestions. At runtime, learners' input in the writing area of the system actively triggered a set of corresponding writing suggestions. This abstract writing assistant system facilitates interactions between learners and the system for writing abstracts in an effective and contextualized way, by providing suggestions such as collocations or transitional words. For assessment of WriteAhead, we compared the writing performance of two groups of students with or without using the system, and adopted student perception data. Findings show that the experiment group wrote better, and most students were satisfied with the system concerning most suggestion types, as they can effectively compose quality abstracts through provided language supports from WriteAhead.
Uploads
Papers by Jason Chang