Research Article

Medical Specialty Classification Based on Semiadversarial Data Augmentation

Figure 8

Comparison of sentence embedding distribution generated by SRA and SemiADA. We randomly select 10 original samples, which are sampled from different categories in the dataset. Then, we generate 20 new samples for each original sample by SRA and SemiADA. In order to avoid the overlap of embeddings from different categories, we add bias terms of different sizes to embeddings from different categories in the visualization, so that the categories are far away from each other. We repeat the experiment three times to obtain three sets of plots (each column is a set of experimental results), where the visualization results under the SRA method are shown in (a1–a3) and the visualization results under SemiADA are shown in (b1–b3). We only need to observe the coverage of each category (the area covered by each color). The larger the coverage means examples cover a wider range in the decision space.