0% found this document useful (0 votes)
24 views8 pages

2020 Acl-Srw 15

The document presents a multi-task neural model designed to enhance neural machine translation (NMT) for agglutinative languages like Turkish and Uyghur by jointly performing bi-directional translation and stemming. This approach utilizes monolingual data and modifies the standard NMT architecture by adding a token to specify the desired output tasks, significantly improving translation performance with limited resources. Experimental results demonstrate the effectiveness of this model in addressing the challenges posed by the morphological richness of agglutinative languages.

Uploaded by

anilkimsesiz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

2020 Acl-Srw 15

The document presents a multi-task neural model designed to enhance neural machine translation (NMT) for agglutinative languages like Turkish and Uyghur by jointly performing bi-directional translation and stemming. This approach utilizes monolingual data and modifies the standard NMT architecture by adding a token to specify the desired output tasks, significantly improving translation performance with limited resources. Experimental results demonstrate the effectiveness of this model in addressing the challenges posed by the morphological richness of agglutinative languages.

Uploaded by

anilkimsesiz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Multi-Task Neural Model for Agglutinative Language Translation

Yirong Pan1,2,3, Xiao Li1,2,3, Yating Yang1,2,3, and Rui Dong1,2,3


1
Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, China
2
University of Chinese Academy of Sciences, China
3
Xinjiang Laboratory of Minority Speech and Language Information Processing, China
[email protected]
{xiaoli, yangyt, dongrui}@ms.xjb.ac.cn

Abstract (Ablimit et al., 2010). Since the suffixes have


many inflected and morphological variants, the
Neural machine translation (NMT) has vocabulary size of an agglutinative language is
achieved impressive performance recently considerable even in small-scale training data.
by using large-scale parallel corpora.
Moreover, many words have different morphemes
However, it struggles in the low-resource
and morphologically-rich scenarios of and meanings in different context, which leads to
agglutinative language translation task. inaccurate translation results.
Inspired by the finding that monolingual Recently, researchers show their great interest
data can greatly improve the NMT in utilizing monolingual data to further improve
performance, we propose a multi-task the NMT model performance (Cheng et al., 2016;
neural model that jointly learns to perform Ramachandran et al., 2017; Currey et al., 2017).
bi-directional translation and agglutinative Sennrich et al. (2016) pair the target-side
language stemming. Our approach employs monolingual data with automatic back-translation
the shared encoder and decoder to train a as additional training data to train the NMT model.
single model without changing the standard
Zhang and Zong (2016) use the source-side
NMT architecture but instead adding a
token before each source-side sentence to
monolingual data and employ the multi-task
specify the desired target outputs of the two learning framework for translation and source
different tasks. Experimental results on sentence reordering. Domhan and Hieber (2017)
Turkish-English and Uyghur-Chinese modify the decoder to enable multi-task learning
show that our proposed approach can for translation and language modeling. However,
significantly improve the translation the above works mainly focus on boosting the
performance on agglutinative languages by translation fluency, and lack the consideration of
using a small amount of monolingual data. morphological and linguistic knowledge.
Stemming is a morphological analysis method,
1 Introduction which is widely used for information retrieval tasks
Neural machine translation (NMT) has achieved (Kishida, 2005). By removing the suffixes in the
impressive performance on many high-resource word, stemming allows the variants of the same
machine translation tasks (Bahdanau et al., 2015; word to share representations and reduces data
Luong et al., 2015a; Vaswani et al., 2017). The sparseness. We consider that stemming can lead to
standard NMT model uses the encoder to map the better generalization on agglutinative languages,
source sentence to a continuous representation which helps NMT to capture the in-depth semantic
vector, and then it feeds the resulting vector to the information. Thus we use stemming as an auxiliary
decoder to produce the target sentence. task for agglutinative language translation.
However, the NMT model still suffers from the In this paper, we investigate a method to exploit
low-resource and morphologically-rich scenarios the monolingual data of the agglutinative language
of agglutinative language translation tasks, such as to enhance the representation ability of the encoder.
Turkish-English and Uyghur-Chinese. Both This is achieved by training a multi-task neural
Turkish and Uyghur are agglutinative languages model to jointly perform bi-directional translation
with complex morphology. The morpheme and agglutinative language stemming, which
structure of the word can be denoted as: prefix1 utilizes the shared encoder and decoder. We treat
+ … + prefixN + stem + suffix1 + … + suffixN stemming as a sequence generation task.

1
103
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 103–110
July 5 - July 10, 2020. c 2020 Association for Computational Linguistics
Training data Bilingual Data Monolingual Data The method requires no changes to the standard
NMT architecture but instead requires adding a
<MT> + English sentence Turkish sentence token at the beginning of each source sentence to
Encoder-Decoder
specify the desired target sentence. Inspired by
<MT> + Turkish sentence English sentence
Framework
their work, we employ the standard NMT model
<ST> + Turkish sentence stem sequence with one encoder and one decoder for parameter
sharing and model generalization. In addition, we
Figure 1: The architecture of the multi-task neural model
build a joint vocabulary on the concatenation of the
that jointly learns to perform bi-directional translation
between Turkish and English, and stemming for Turkish
source-side and target-side words.
sentence. Several works on morphologically-rich NMT
have focused on using morphological analysis to
2 Related Work pre-process the training data (Luong et al., 2016;
Huck et al., 2017; Tawfik et al., 2019). Gulcehre et
Multi-task learning (MTL) aims to improve the al. (2015) segment each Turkish sentence into a
generalization performance of a main task by using sequence of morpheme units and remove any non-
the other related tasks, which has been successfully surface morphemes for Turkish-English translation.
applied to various research fields ranging from Ataman et al. (2017) propose a vocabulary
language (Liu et al., 2015; Luong et al., 2015a), reduction method that considers the morphological
vision (Yim et al., 2015; Misra et al., 2016), and properties of the agglutinative language, which is
speech (Chen and Mak, 2015; Kim et al., 2016). based on the unsupervised morphology learning.
Many natural language processing (NLP) tasks This work takes inspiration from our previously
have been chosen as auxiliary task to deal with the proposed segmentation method (Pan et al., 2020)
increasingly complex tasks. Luong et al. (2015b) that segments the word into a sequence of sub-
employ a small amount of data of syntactic parsing word units with morpheme structure, which can
and image caption for English-German translation. effectively reduce language complexity.
Hashimoto et al. (2017) present a joint MTL model
to handle the tasks of part-of-speech (POS) tagging, 3 Multi-Task Neural Model
dependency parsing, semantic relatedness, and
textual entailment for English. Kiperwasser and 3.1 Overview
Ballesteros (2018) utilize the POS tagging and We propose a multi-task neural model for machine
dependency parsing for English-German machine translation from and into a low-resource and
translation. To the best of our knowledge, we are morphologically-rich agglutinative language. We
the first to incorporate stemming task into MTL train the model to jointly learn to perform both the
framework to further improve the translation bi-directional translation task and the stemming
performance on agglutinative languages. task on an agglutinative language by using the
Recently, several works have combined the standard NMT framework. Moreover, we add an
MTL method with sequence-to-sequence NMT artificial token before each source sentence to
model for machine translation tasks. Dong et al. specify the desired target outputs for different tasks.
(2015) follow a one-to-many setting that utilizes a The architecture of the proposed model is shown in
shared encoder for all the source languages with Figure 1. We take the Turkish-English translation
respective attention mechanisms and multiple task as example. The “<MT>” token denotes the
decoders for the different target languages. Luong bilingual translation task and the “<ST>” token
et al. (2015b) follow a many-to-many setting that denotes the stemming task on Turkish sentence.
uses multiple encoders and decoders with two
separate unsupervised objective functions. Zoph 3.2 Neural Machine Translation (NMT)
and Knight (2016) follow a many-to-one setting Our proposed multi-task neural model on using the
that employs multiple encoders for all the source source-side monolingual data for agglutinative
languages and one decoder for the desired target language translation task can be applied in any
language. Johnson et al. (2017) propose a more NMT structures with encoder-decoder framework.
simple method in one-to-one setting, which trains In this work, we follow the NMT model proposed
a single NMT model with the shared encoder and by Vaswani et al. (2017), which is implemented as
decoder in order to enable multilingual translation. Transformer. We will briefly summarize it here.

2
104
Task Data # Sent # Src # Trg 4 Experiment
Tr-En train 355,251 6,356,767 8,021,161
valid 2,455 37,153 52,125 4.1 Dataset
test 4,962 69,006 96,291 The statistics of the training, validation, and test
Uy- train 333,097 6,026,953 5,748,298 datasets on Turkish-English and Uyghur-Chinese
Ch valid 700 17,821 17,085
machine translation tasks are shown in Table 1.
test 1,000 20,580 18,179
For the Turkish-English machine translation,
Table 1: The statistics of the training, validation, and following (Sennrich et al., 2015a), we use the WIT
test datasets on Turkish-English and Uyghur-Chinese corpus (Cettolo et al., 2012) and the SETimes
machine translation tasks. The “# Src” denotes the corpus (Tyers and Alperen, 2010) as the training
number of the source tokens, and the “# Trg” denotes dataset, merge the dev2010 and tst2010 as the
the numbers of the target tokens. validation dataset, and use tst2011, tst2012, tst2013,
tst2014 from the IWSLT as the test datasets. We
bir dilin son hecelerini kendisiyle birlikte mezara also use the talks data from the IWSLT evaluation
Morpheme Segmentation campaign1 in 2018 and the news data from News
hece+ler+i+ni
Crawl corpora 2 in 2017 as external monolingual
Stem+Combined Suffix
hece+lerini data for the stemming task on Turkish sentences.
Apply BPE on Stem For the Uyghur-Chinese machine translation, we
he@@+ce@@+lerini use the news data from the China Workshop on
Machine Translation in 2017 (CWMT2017) as the
Figure 2: The example of morphological segmentation training dataset and validation dataset, use the
method for the word in Turkish. news data from CWMT2015 as the test dataset.
Each Uyghur sentence has four Chinese reference
Firstly, the Transformer model maps the source sentences. Moreover, we use the news data from
sequence 𝒙 = (𝑥1 , … , 𝑥𝑚 ) and the target sentence the Tianshan website3 as external monolingual data
𝒚 = (𝑦1 , … , 𝑦𝑛 ) into a word embedding matrix, for the stemming task on Uyghur sentences.
respectively. Secondly, in order to make use of the
word order in the sequence, the above word 4.2 Data Preprocessing
embedding matrices sum with their positional We normalize and tokenize the experimental data.
encoding matrices to generate the source-side and We utilize the jieba toolkit4 to segment the Chinese
target-side positional embedding matrices. The sentences, we utilize the Zemberek toolkit 5 with
encoder is composed of a stack of N identical morphological disambiguation (Sak et al., 2007)
layers. Each layer has two sub-layers consisting of and the morphological analysis tool (Tursun et al.,
the multi-head self-attention and the fully
2016) to annotate the morpheme structure of the
connected feed-forward network, which maps the words in Turkish and Uyghur, respectively.
source-side positional embedding matrix into a We use our previously proposed morphological
representation vector. segmentation method (Pan et al., 2020), which
The decoder is also composed of a stack of N segments the word into smaller subword units with
identical layers. Each layer has three sub-layers: morpheme structure. Since Turkish and Uyghur
the multi-head self-attention, the multi-head only have a few prefixes, we combine the prefixes
attention, and the fully connected feed-forward with stem into the stem unit. As shown in Figure 2,
network. The multi-head attention attends to the
the morpheme structure of the Turkish word
outputs of the encoder and decoder to generate a “hecelerini” (syllables) is: hece + lerini. Then the
context vector. The feed-forward network followed byte pair encoding (BPE) technique (Sennrich et
by a linear layer maps the context vector into a
al., 2015b) is applied on the stem unit “hece” to
vector with the original space dimension. Finally, segment it into “he@@” and “ce@@”. Thus the
the softmax function is applied on the vector to Turkish word is segmented into a sequence of sub-
predict the target word sequence. word units: he@@ + ce@@ + lerini.

1 4
https://wit3.fbk.eu/archive/2018-01/additional_TED_xml/ https://github.com/fxsjy/jieba
2 5
http://data.statmt.org/wmt18/translation-task/ https://github.com/ahmetaa/zemberek-nlp
3 http://uy.ts.cn/

3
105
Task Training Sentence Samples 4.3 Training and Evaluation Details
En-Tr <MT> We go through initiation
Translation rit@@ es. We employ the Transformer model implemented in
Başla@@ ma ritüel@@ lerini the Sockeye toolkit (Hieber et al., 2017). The
yaş@@ ıyoruz. number of layer in both the encoder and decoder is
Tr-En <MT> Başla@@ ma ritüel@@ set to N=6, the number of head is set to 8, and the
Translation lerini yaş@@ ıyoruz. number of hidden unit in the feed-forward network
We go through initiation rit@@ es. is set to 1024. We use an embedding size of both
Turkish <ST> Başla@@ ma ritüel@@ lerini the source and target words of 512 dimension, and
Stemming yaş@@ ıyoruz. use a batch size of 128 sentences. The maximum
Başla@@ ritüel@@ yaş@@ sentence length is set to 100 tokens with 0.1 label
smoothing. We apply layer normalization and add
Table 2: The training sentence samples for multi-task
neural model on Turkish-English machine translation
dropout to the embedding and transformer layers
task. We add “<MT>” and “<ST>” before each source with 0.1 probability. Moreover, we use the Adam
sentence to specify the desired target outputs for optimizer (Kingma and Ba, 2015) with an initial
different tasks. learning rate of 0.0002, and save the checkpoint
every 1500 updates.
Lang Method # Merge Vocab Avg.Len Model training process stops after 8 checkpoints
Tr Morph 15K 36,468 28 without improvements on the validation perplexity.
Tr BPE 36K 36,040 22 Following Niu et al. (2018a), we select the 4 best
En BPE 32K 31,306 25 checkpoint based on the validation perplexity
Uy Morph 10K 38,164 28
values and combine them in a linear ensemble for
Uy BPE 38K 38,292 21
decoding. Decoding is performed by using beam
Ch BPE 32K 40,835 19
search with a beam size of 5. We evaluate the
Table 3: The detailed statistics of using different word machine translation performance by using the
segmentation methods on Turkish, English, Uyghur, case-sensitive BLEU score (Papineni et al., 2002)
and Chinese. with standard tokenization.

In this paper, we utilize the above morphological 4.4 Neural Translation Models
segmentation method for our experiments by In this paper, we select 4 neural translation models
applying BPE on the stem units with 15K merge for comparison. More details about the models are
operations for the Turkish words and 10K merge shown below:
operations for the Uyghur words. The standard General NMT Model: The standard NMT model
NMT model trained on this experimental data is trained on the experimental data segmented by
denoted as “baseline NMT model”. Moreover, we BPE.
employ BPE to segment the words in English and Baseline NMT Model: The standard NMT model
Chinese by learning separate vocabulary with 32K trained on the experimental data segmented by
merge operations. Table 2 shows the training morphological segmentation. The following
sentence samples for multi-task neural model on models also use this word segmentation method.
Turkish-English machine translation task. Bi-Directional NMT Model: Following Niu et al.
In addition, to certify the effectiveness of the (2018b), we train a single NMT model to perform
morphological segmentation method, we employ bi-directional machine translation. We concatenate
the pure BPE to segment the words in Turkish and the bilingual parallel sentences in both directions.
Uyghur by learning a separate vocabulary with Since the source and target sentences come from
36K and 38K merge operations, respectively. The the same language pairs, we share the source and
standard NMT model trained on this experimental target vocabulary, and tie their word embedding
data is denoted as “general NMT model”. Table 3 during model training.
shows the detailed statistics of using different word Multi-Task Neural Model: We simply use the
segmentation methods on Turkish, English, monolingual data of the agglutinative language
Uyghur, and Chinese. The “Vocab” token denotes from the bilingual parallel sentences. We use a joint
the vocabulary size after data preprocessing. The vocabulary, tie the word embedding as well as the
“Avg.Len” token denotes the average sentence output layer’s weight matrix.
length.

4
106
Task Model tst11 tst12 tst13 tst14
Tr- general 25.92 26.55 27.34 26.35
En baseline 26.48 27.02 27.91 26.33
En- general 13.73 14.68 13.84 14.65
Tr baseline 14.85 15.93 15.45 15.93

Table 4: The BLEU scores of the general NMT model


and baseline NMT model on the machine translation
task between Turkish and English.

Task Model tst11 tst12 tst13 tst14


Tr- baseline 26.48 27.02 27.91 26.33
En bi- 26.21 27.17 28.68 26.90
directional
multi-task Figure 3: The function of epochs (x-axis) and perplexity
26.82 27.96 29.16 27.98
(y-axis) values on the validation dataset in different
En- baseline 14.85 15.93 15.45 15.93
neural translation models for the translation task.
Tr bi- 15.08 16.20 16.25 16.56
directional
multi-task 15.65 17.10 16.35 16.41 Translation Examples
source üniversite hayatı taklit ediyordu.
Table 5: The BLEU scores of the baseline NMT model, reference College was imitating life.
bi-directional NMT model, and multi-task neural baseline It was emulating a university life.
model on the machine translation task between Turkish bi- The university was emulating its
and English. directional lives.
multi-task The university was imitating life.
5 Results and Discussion
Table 6: A translation example for the different NMT
Table 4 shows the BLEU scores of the general models on Turkish-English.
NMT model and baseline NMT model on machine
The function of epochs and perplexity values on
translation task. We can observe that the baseline
NMT model is comparable to the general NMT the validation dataset in different neural translation
model, and it achieves the highest BLEU scores on models are shown in Figure 3. We can see that the
perplexity values are consistently lower on the
almost all the test datasets in both directions, which
indicates that the NMT baseline based on our multi-task neural model, and it converges rapidly.
proposed segmentation method is competitive. Table 6 shows a translation example for the
different models on Turkish-English. We can see
5.1 Using Original Monolingual Data that the translation result of the multi-task neural
model is more accurate. The Turkish word “taklit”
Table 5 shows the BLEU scores of the baseline
means “imitate” in English, both the baseline NMT
NMT model, bi-directional NMT model, and
and bi-directional NMT translate it into a synonym
multi-task neural model on the machine translation
“emulate”. However, they are not able to express
task between Turkish and English. The table shows
the meaning of the sentence correctly. The main
that the multi-task neural model outperforms both
reason is that the auxiliary task of stemming forces
the baseline NMT model and bi-directional NMT
the proposed model to focus more strongly on the
model, and it achieves the highest BLEU scores on
core meaning of each word (or stem), therefore
almost all the test datasets in both directions, which
helping the model make the correct lexical choices
suggests that the multi-task neural model is capable
and capture the in-depth semantic information.
of improving the bi-directional translation quality
on agglutinative languages. The main reason is that 5.2 Using External Monolingual Data
compared with the bi-directional NMT model, our
Moreover, we evaluate the multi-task neural model
proposed multi-task neural model additionally
employs the stemming task for the agglutinative on using external monolingual data for Turkish
language, which is effective for the NMT model to stemming task. We employ the parallel sentences
and the monolingual data in a 1-1 ratio, and shuffle
learn both the source-side semantic information
and the target-side language modeling. them randomly before each training epoch. More
details about the data are shown below:

5
107
Task Data tst11 tst12 tst13 tst14 We also evaluate the translation performance of
Tr-En original 26.82 27.96 29.16 27.98 the general NMT model, baseline NMT model, and
talks 26.55 27.94 29.13 28.02 multi-task neural model with external news data on
news 26.47 28.18 28.89 27.40 the machine translation task between Uyghur and
mixed 26.60 27.93 29.58 27.32 Chinese. The experimental results are shown in
En-Tr original 15.65 17.10 16.35 16.41 Table 8. The results indicate that the multi-task
talks 15.57 16.97 16.22 16.91 neural model achieves the highest BLEU scores on
news 15.67 17.19 16.26 16.69
the test dataset by utilizing external monolingual
mixed 15.96 17.35 16.55 16.89
data for the stemming task on Uyghur sentences.
Table 7: The BLEU scores of the multi-task neural
model on using external monolingual data of talks data, 6 Conclusions
news data, and mixed data.
In this paper, we propose a multi-task neural model
Task Model BLEU for translation task from and into a low-resource
Uy-Ch general NMT model 35.12 and morphologically-rich agglutinative language.
baseline NMT model 35.46 The model jointly learns to perform bi-directional
multi-task neural model with 36.47 translation and agglutinative language stemming
external monolingual data by utilizing the shared encoder and decoder under
Ch-Uy general NMT model 21.00 standard NMT framework. Extensive experimental
baseline NMT model 21.57 results show that the proposed model is beneficial
multi-task neural model with 23.02 for the agglutinative language machine translation,
external monolingual data and only a small amount of the agglutinative data
Table 8: The BLEU scores of the general NMT model, can improve the translation performance in both
baseline NMT model, and the multi-task neural model directions. Moreover, the proposed approach with
with external monolingual data on Uyghur-Chinese external monolingual data is more useful for
and Chinese-Uyghur machine translation tasks. translating into the agglutinative language, which
achieves an improvement of +1.42 BLEU points
Original Data: The monolingual data comes from for translation from English into Turkish and +1.45
the original bilingual parallel sentences. BLEU points from Chinese into Uyghur.
Talks Data: The monolingual data contains talks. In future work, we plan to utilize other word
News Data: The monolingual data contains news. segmentation methods for model training. We also
Talks and News Mixed Data: The monolingual plan to combine the proposed multi-task neural
data contains talks and news in a 3:4 ratio as the model with back-translation method to enhance the
same with the original bilingual parallel sentences. ability of the NMT model on target-side language
Table 7 shows the BLEU scores of the proposed modeling.
multi-task neural model on using different external
monolingual data. We can see that there is no Acknowledgements
obvious difference on Turkish-English translation
performance by using different monolingual data, We are very grateful to the mentor of this paper for
whether the data is in-domain or out-of-domain to her meaningful feedback. Thanks three anonymous
the test dataset. However, for the English-Turkish reviewers for their insightful comments and
machine translation task, which can be seen as practical suggestions. This work is supported by
agglutinative language generation task, using the the High-Level Talents Introduction Project of
mixed data of talks and news achieves further Xinjiang under Grant No.Y839031201, the
improvements of the BLEU scores on almost all National Natural Science Foundation of China
the test datasets. The main reason is that the under Grant No.U1703133, the National Natural
proposed multi-task neural model incorporates Science Foundation of Xinjiang under Grant
many morphological and linguistic information of No.2019BL-0006, the Open Project of Xinjiang
Turkish rather than that of English, which mainly Key Laboratory under Grant No.2018D04018, and
pays attention to the source-side representation the Youth Innovation Promotion Association of the
ability on agglutinative languages rather than the Chinese Academy of Sciences under Grant
target-side language modeling. No.2017472.

6
108
References Kazuma Hashimoto, Caiming Xiong, Yoshimasa
Tsuruoka, and Richard Socher. 2017. A joint many-
Mijit Ablimit, Graham Neubig, Masato Mimura, task model: Growing a neural network for multiple
Shinsuke Mori, Tatsuya Kawahara, and Askar NLP tasks. In Proceedings of EMNLP.
Hamdulla. 2010. Uyghur morpheme-based
Felix Hieber, Tobias Domhan, Michael Denkowski,
language models and ASR. In IEEE International
David Vilar, Artem Sokolov, Ann Clifton, and Matt
Conference on Signal Processing.
Post. 2017. Sockeye: A toolkit for neural machine
Duygu Ataman, Matteo Negri, Marco Turchi, and translation. arXiv preprint arXiv:1712.05690.
Marcello Federico. 2017. Linguistically motivated
Matthias Huck, Simon Riess, and Alexander Fraser.
vocabulary reduction for neural machine translation
2017. Target-side word segmentation strategies for
from Turkish to English. Journal of the Prague
neural machine translation. In Proceedings of the
Bulletin of Mathematical Linguistics, 108:331-342.
Second Conference on Machine Translation.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
Sébastien Jean, Kyunghyun Cho, Roland Memisevic,
Bengio. 2014. Neural machine translation by jointly
and Yoshua Bengio. 2015. On using very large
learning to align and translate. arXiv preprint
target vocabulary for neural machine translation. In
arXiv:1409.0473.
Proceedings of ACL.
Mauro Cettolo, Christian Girardi and Marcello
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim
Federico. 2012. WIT3: Web inventory of
Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat,
transcribed and translated talks. In Proceedings of
Fernanda Viégas, Martin Wattenberg, Greg Corrado,
the 16th Conference of the European Association
Macduff Hughes, and Jeffrey Dean. 2017. Google’s
for Machine Translation.
multilingual neural machine translation system:
Dongpeng Chen and Brian Kan-Wing Mak. 2015. Enabling zero-shot translation. In Proceedings of
Multitask learning of deep neural networks for low- ACL.
resource speech recognition. In IEEE/ACM
Suyoun Kim, Takaaki Hori, and Shinji Watanabe. 2016.
Transactions on Audio, Speech, and Language
Joint CTC-Attention based end-to-end speech
Processing.
recognition using multi-task learning. arXiv
Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, preprint arXiv:1609.06773.
Maosong Sun, and Yang Liu. 2016. Semi-
Diederik Kingma and Jimmy Ba. 2015. Adam: A
Supervised learning for neural machine translation.
method for stochastic optimization. In Proceedings
In Proceedings of the 54th Annual Meeting of ACL.
of ICLR.
Kyunghyun Cho, Bart Van Merrienboer, Caglar
Eliyahu Kiperwasser, Miguel Ballesteros. 2018.
Gulcehre, Dzmitry Bahdanau, Fethi Bougares,
Scheduled multi-task learning: From syntax to
Holger Schwenk, and Yoshua Bengio. 2014.
translation. In Transactions of the Association for
Learning phrase representations using RNN
Computational Linguistics.
encoder-decoder for statistical machine translation.
arXiv preprint arXiv:1406.1078. Kazuaki Kishida. 2005. Technical issues of cross-
language information retrieval: A review. Journal of
Anna Currey, Antonio Valerio Miceli Barone, and
the Information Processing and Management,
Kenneth Heafield. 2017. Copied monolingual data
41(3):433-455.
improves low-resource neural machine translation.
https://doi.org/10.1016/j.ipm.2004.06.007.
In Proceedings of the Second Conference on
Machine Translation. Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng,
Kevin Duh, and Ye-Yi Wang. 2015. Representation
Tobias Domhan and Felix Hieber. 2017. Using target-
learning using multi-task deep neural networks for
side monolingual data for neural machine
semantic classification and information retrieval. In
translation through multi-task learning. In
Proceedings of NAACL.
Proceedings of EMNLP.
Minh-Thang Luong, Hieu Pham, and Christopher D
Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and
Manning. 2015a. Effective approaches to Attention-
Haifeng Wang. 2015. Multi-task learning for
based neural machine translation. In Proceedings of
multiple language translation. In Proceedings of
EMNLP.
ACL.
Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol
Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun
Vinyals, and Lukasz Kaiser. 2015b. Multi-task
Cho, et al. 2015. On using monolingual corpora in
sequence to sequence learning. arXiv preprint
neural machine translation. arXiv preprint
arXiv:1511.06114.
arXiv:1503.03535v2.

7
109
Minh-Thang Luong, Christopher D. Manning. 2016. Eziz Tursun, Debasis Ganguly, Turghun Osman,
Achieving open vocabulary neural machine Yating Yang, Ghalip Abdukerim, Junlin Zhou, and
translation with hybrid word-character models. In Qun Liu. 2016. A semi-supervised tag-transition-
Proceedings of ACL. based Markovian model for Uyghur morphology
analysis. In ACM Transactions on Asian and Low-
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and
Resource Language Information Processing.
Martial Hebert. 2016. Cross-Stitch networks for
multi-task learning. In Proceedings of CVPR. Francis M. Tyers and Murat Alperen. 2010. South-East
European Times: A parallel corpus of the Balkan
Xing Niu, Sudha Rao, and Marine Carpuat. 2018a.
languages, In Proceedings of the LREC Workshop
Multi-task neural models for translating between
on Exploitation of Multilingual Resources and Tools
styles within and across languages. In Proceedings
for Central and (South-) Eastern European
of COLING.
Languages.
Xing Niu, Michael Denkowski, and Marine Carpuat.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
2018b. Bi-Directional neural machine translation
Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz
with synthetic parallel data. arXiv preprint arXiv:
Kaiser, and Illia Polosukhin. 2017. Attention is all
1805.11213.
you need. In Advances in Neural Information
Prajit Ramachandran, Peter Liu, and Quoc Le. 2017. Processing Systems.
Unsupervised pretraining for sequence to sequence
Junho Yim, Heechul Jung, ByungIn Yoo, Changkyu
learning. In Proceedings of the 2017 Conference on
Choi, Dusik Park, and Junmo Kim. 2015. Rotating
EMNLP.
your face using multi-task deep neural network. In
Yirong Pan, Xiao Li, Yating Yang, and Rui Dong. 2020. Proceedings of CVPR.
Morphological word segmentation on agglutinative
Jiajun Zhang and Chengqing Zong. 2016. Exploiting
languages for neural machine translation. arXiv
source-side monolingual data in neural machine
preprint arXiv: 2001.01589.
translation. In Proceedings of EMNLP.
Kishore Papineni, Salim Roukos, Todd Ward, and
Barret Zoph and Kevin Knight. 2016. Multi-source
WeiJing Zhu. 2002. BLEU: A method for automatic
neural translation. In Proceedings of NAACL.
evaluation of machine translation. In Proceedings of
ACL.
Ha¸Sim Sak, Tunga Güngör, and Murat Saraçlar. 2007.
Morphological disambiguation of Turkish text with
perceptron algorithm. In International Conference
on Intelligent Text Processing and Computational
Linguistics.
Rico Sennrich, Barry Haddow, and Alexandra Birch.
2015a. Improving neural machine translation
models with monolingual data. arXiv preprint
arXiv:1511.06709.
Rico Sennrich, Barry Haddow, and Alexandra Birch.
Neural machine translation of rare words with
subword units. 2015b. arXiv preprint arXiv:
1508.07909.
Rico Sennrich, Barry Haddow, and Alexandra Birch.
2016. Controlling politeness in neural machine
translation via side constraints. In Proceedings of
NAACL.
Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014.
Sequence to sequence learning with neural
networks. In Proceedings of NIPS.
Ahmed Tawfik, Mahitab Emam, Khaled Essam,
Robert Nabil, and Hany Hassan. 2019.
Morphology-aware word-segmentation in dialectal
Arabic adaptation of neural machine translation. In
Proceedings of the Fourth Arabic Natural
Language Processing Workshop.

8
110

You might also like