TOWARDS CROSS-ATTENTION PRE-TRAINING IN NEURAL MACHINE TRANSLATION
Main Article Content
Abstract
The advent of pre-train techniques and large language models has significantly leveraged the performance of many natural language processing (NLP) tasks. However, pre-trained language models for neural machine translation remain a challenge as little information about the interaction of the language pair is learned. In this paper, we explore several studies trying to define a training scheme to pre-train the cross-attention module between the encoder and the decoder by using the large-scale monolingual corpora independently. The experiments show promising results, proving the effectiveness of using pre-trained language models in neural machine translation.
Keywords
cross-attention, cross-lingual, natural language processing, neural machine translation, pre-training, language model
Article Details
References
Artetxe, M., Labaka, G., & Agirre, E. (2018). A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 789-798.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Lample, G., & Conneau, A. (2019). Cross-lingual Language Model Pretraining. arXiv preprint arXiv:1901.07291.
Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. (2017). Unsupervised Machine Translation Using Monolingual Corpora Only.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., . . . Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv preprint arXiv:1910.13461.
Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., . . . Zettlemoyer, L. (2020). Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, 726-742.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., . . . Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
Nguyen, Q. D., & Nguyen, T. A. (2020). PhoBERT: Pre-trained language models for Vietnamese. arXiv preprint arXiv:2003.00744.
Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., . . . Auli, M. (2019). fairseq: A Fast, Extensible Toolkit for Sequence Modeling. Proceedings of NAACL-HLT 2019: Demonstrations.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI.
Ren, S., Zhou, L., Liu, S., Wei, F., Zhou, M., & Ma, S. (2021). SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4518-4527.
Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2019). MASS: Masked Sequence to Sequence Pre-training for Language Generation. arXiv preprint arXiv:1905.02450.
Tran, N. L., Le, D. M., & Nguyen, D. Q. (2022). BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese. arXiv preprint arXiv:2109.09701.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is All you Need. NIPS.
Weng, R., Yu, H., Huang, S., Cheng, S., & Luo, W. (2020). Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 9266-9273.
Yang, J., Wang, M., Zhou, H., Zhao, C., Zhang, W., Yu, Y., & Li, L. (2020). Towards making the most of bert in neural machine translation. Proceedings of the AAAI Conference on Artificial Intelligence, 9378–9385.