Research Article
Improving Transformer-Based Neural Machine Translation with Prior Alignments
Table 3
Overview of the datasets.
| | Vietnamese/English | Training | Validation | Testing |
| | Number of news articles | 930 | 40 | 30 | | Number of sentences | 42,026 | 1,482 | 1,527 | | Average sentence length | 26.2/19.2 | 24.5/17.8 | 28.3/20.6 | | Alignments per sentence | 22.4 | 20.8 | 23.1 | | Number of unique tokens | 16441/36672 | 2720/4981 | 3462/6211 | | Number of alignments | 942001 | 30821 | 35291 | | Number of tokens | 1099205/806456 | 36276/26315 | 43286/31513 |
|
|