Research Article

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation

Figure 1

The architecture of molecular representation with a multiple SMILES-based augmentation for molecular property prediction. (a) The process of data augmentation using multiple SMILES. After cleaning and removing invalid molecules from the original datasets, multiple SMILES sequences are generated for each molecule, and further one-hot vectorization is carried out. (b) The stacked CNN and RNN neural networks. After passing through different layers (including dense layer, dropout layer, pooling layer, and gather layer), finally the characteristics such as molecular properties are predicted.