In:
ACM Transactions on Asian and Low-Resource Language Information Processing, Association for Computing Machinery (ACM), Vol. 21, No. 2 ( 2022-03-31), p. 1-12
Abstract:
We present a simple, efficient data augmentation approach for boosting Chinese-Vietnamese neural machine translation performance by leveraging the linguistic difference between the two languages. We first define the formalized representation of modifier symmetry, which is one of the most representative linguistic differences between Chinese and Vietnamese. We then propose and test two data augmentation strategies for leveraging the linguistic difference, which can be integrated naturally with different translation models. Results indicate that both strategies can introduce linguistic rules to boost translation accuracy. Tests on Chinese-Vietnamese benchmarks show significant accuracy improvements. To facilitate studies in this domain, we also release an open-source toolkit 1 with flexible implementation for Chinese-Vietnamese linguistic difference tagging.
Type of Medium:
Online Resource
ISSN:
2375-4699
,
2375-4702
Language:
English
Publisher:
Association for Computing Machinery (ACM)
Publication Date:
2022
detail.hit.zdb_id:
2820615-0
Permalink