A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation

<div>The architecture of molecular representation with a multiple SMILES-based augmentation for molecular property prediction. (a) The process of data augmentation using multiple SMILES. After cleaning and removing invalid molecules from the original datasets, multiple SMILES sequences are generated for each molecule, and further one-hot vectorization is carried out. (b) The stacked CNN and RNN neural networks. After passing through different layers (including dense layer, dropout layer, pooling layer, and gather layer), finally the characteristics such as molecular properties are predicted.</div>

Computational Intelligence and Neuroscience

fig1

Figure 1

Figure 1: A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation