Research Article

Parallel Cleaning Algorithm for Similar Duplicate Chinese Data Based on BERT

Table 2

Configuration parameters of cluster nodes.

Host nameIPChip modelNumber of coresRunning memory (GB)Hard disk size

Master192.168.2.101Intel® Core™ i7-6700 CPU @3.40 GHz881,000 GB
slave1192.168.2.103Intel® Xeon® CPU E5-1603 v3@2.80 GHz4162,000 GB
slave2192.168.2.102Intel® Core™ i3-2120 CPU @3.30 GHz48500 GB
slave3192.168.2.104Intel® Core™ i5-4590 CPU @3.30 GHz48500 GB