Research Article
Parallel Cleaning Algorithm for Similar Duplicate Chinese Data Based on BERT
Table 1
Error rate on IMDB and Sogou.
| Method | IMDB | Sogou |
| Head only | 5.63 | 2.58 | Tail only | 5.44 | 3.17 | Head + tail | 5.42 | 2.43 | hier.mean | 5.89 | 2.83 | hier.max | 5.71 | 2.47 | hier.self-attention | 5.49 | 2.65 |
|
|