SYNTAX-ENHANCED NEURAL MACHINE TRANSLATION WITH GRAPH ENCODER

Hồng Bửu Long Nguyến , Hùng Việt Phạm

Main Article Content

Abstract

Neural Machine Translation (NMT) is a new paradigm in machine translation (MT) powered by recent advances in sequence to sequence learning frameworks. With the advance of Neural Networks, NMT has become the most promising MT approach in recent years. Despite the apparent success, NMT still suffers from one significant drawback in integrating syntactic knowledge into neural networks. This paper proposes an extension of the NMT model to incorporate additional syntactic information from constituency trees. We represent the constituency trees under graph forms encoded by a graph encoder to enhance the attention layer, which allows the decoder to focus on both sequential and graph representation at each decoding step. The experiments show promising results of the proposed method on English-Vietnamese datasets, proving the effectiveness of our syntax-enhanced NMT method.

 

Article Details

References

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Y. Bengio & Y. LeCun (Eds.), 3rd international conference on learning representations, ICLR 2015, Sandiego, Ca, Usa, May 7-9, 2015, conference track proceedings. Retrieved from http://arxiv.org/abs/1409.0473
Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Cattoni, R., & Federico, M. (2015). The iwslt 2015 evaluation campaign. Chen, H., Huang, S., Chiang, D., & Chen, J. (2017, July). Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1936–1945). Vancouver, Canada: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P17-1177 doi: https://doi.org/10.18653/v1/P17-1177
Eriguchi, A., Hashimoto, K., & Tsuruoka, Y. (2016, August). Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 823-833). Berlin, Germany: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P16-1078 doi: https://doi.org/10.18653/v1/P16-1078
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. CoRR, abs/1705.03122 . Retrieved from http://arxiv.org/abs/1705.03122
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Y. Bengio & Y. LeCun (Eds.), 3rd international conference on learning representations, ICLR 2015, San Diego, Ca, Usa, May 7-9, 2015, conference track proceedings. Retrieved from http://arxiv.org/abs/1412.6980
Koehn, P. (2004, July). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 388-395). Barcelona, Spain: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W04-3250

Koehn, P., & Hoang, H. (2007, June). Factored translation models. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL) (pp. 868-876). Prague, Czech Republic: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D07-1091
Li, J., Xiong, D., Tu, Z., Zhu, M., Zhang, M., & Zhou, G. (2017, July). Modeling source syntax for neural machine translation. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 688-697). Vancouver, Canada: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P17-1064 doi: https://doi.org/10.18653/v1/P17-1064
Nădejde, M., Reddy, S., Sennrich, R., Dwojak, T., Junczys-Dowmunt, M., Koehn, P., & Birch, A. (2017, September). Predicting target language CCG supertags improves neural machine translation. In Proceedings of the second conference on machine translation (pp. 68-79). Copenhagen, Denmark: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W17-4707 doi: https://doi.org/10.18653/v1/W17-4707
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (p. 311-318). USA: Association for Computational Linguistics. Retrieved from https://doi.org/10.3115/1073083.1073135 doi: https://doi.org/10.3115/1073083.1073135
Sennrich, R., Haddow, B., & Birch, A. (2016, August). Neural machine translation of rare words with subword units. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1715-1725). Berlin, Germany: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P16-1162 doi: https://doi.org/10.18653/v1/P16-1162
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. CoRR, abs/1706.03762. Retrieved from http://arxiv.org/abs/1706.03762
Wang, Y., Wang, L., Zeng, X., Wong, D. F., Chao, L. S., & Lu, Y. (2014, June). Factored statistical machine translation for grammatical error correction. In Proceedings of the eighteenth conference on computational natural language learning: Shared task (pp. 83-90). Baltimore, Maryland: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W14-1711 doi: https://doi.org/10.3115/v1/W14-1711
Wu, F., Fan, A., Baevski, A., Dauphin, Y. N., & Auli, M. (2019). Pay less attention with lightweight and dynamic convolutions. CoRR, abs/1901.10430 Retrieved from http://arxiv.org/abs/1901.10430
Xu, K., Wu, L., Wang, Z., Feng, Y., & Sheinin, V. (2018). Graph2seq: Graph to sequence learning with attention-based neural networks. CoRR, abs/1804.00823. Retrieved from http://arxiv.org/abs/1804.00823