Abstract
Background. Esophageal carcinoma (ESCA) is not only a threat to people’s health but also the sixth most common cause of cancer-related mortality worldwide. Methods. In this study, the key targets of ESCA are screened through GeneCards and DisGeNET databases combined with the Gene Expression Omnibus (GEO) database (GSE1420 and GSE20347). Then, data associated with ESCA samples are downloaded from The Cancer Genome Atlas (TCGA) database for integrated analysis. Moreover, the effect of epithelial cell adhesion molecule (EpCAM) expression on the survival of patients with ESCA is evaluated by Kaplan–Meier and Cox analyses. The virtual screening is carried out using a Suflex-Dock molecular docking module. The chemical components, which have been well bound to EpCAM, are screened out based on a total score >5 as a threshold. Ginsenosides and EpCAM are analyzed by LigPlot + v.2.2 software to identify the binding sites. Results. Four ESCA targets are obtained from GeneCards, DisGeNET, and GEO databases. In this study, it is found that high EpCAM expression is associated with histologic grade, stage, patient age, N classification, T classification, and radiation therapy. The Kaplan–Meier curves for overall survival also show that the higher expression of EpCAM is associated with worse outcomes in patients with ESCA. Univariate and multivariate Cox analyses indicate that EpCAM mRNA expression might be a useful biomarker for ESCA. Molecular docking technology suggests that ginsenoside Rg3 and ginsenoside Rh2 can easily establish good docking modes and have a high affinity with EpCAM. The 6′-hydroxyl and 6″-hydroxyl on the 3-glycosyl of ginsenoside Rg3 are prone to form hydrogen bonds (Lys151 and Lys221) with the active sites of EpCAM ligand binding domain. The hydroxyl groups on the 12 sites of the ginsenoside Rh2 glycoside framework are found to have hydrogen bonding with Leu240. The formation of hydrogen bonds plays an important role in binding of ginsenoside Rg3 and ginsenoside Rh2 to EpCAM, as well as the stability of EpCAM conformation. Conclusion. EpCAM may be determined as a potential biomarker for early diagnosis and prognosis of ESCA. Ginsenoside Rg3 and ginsenoside Rh2 have potential antiesophageal cancer activities. This experiment provides a reference for the study of the chemical compositions of ginsenosides in the treatment of esophageal cancer.
1. Introduction
Esophageal carcinoma (ESCA) is a type of malignancy prevalent worldwide consisting of esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC) [1]. ESCA is not only a threat to people’s health, but it also is the sixth most common cause of cancer-related deaths worldwide [2]. Besides, ESCA has higher morbidity and mortality. There is a poor prognosis of ESCA; for example, the 5-year survival rate in a nonmetastatic environment is between 20% and 35%. Even if the operation goes well, the patient might die from the complete resection of the primary tumor and multimode treatment [3, 4]. ESCA is usually caused by a malignant tumor that originates in the esophageal epithelium. ESCA mainly manifests as squamous cell carcinoma and adenocarcinoma. However, some rare syndromes present as mucoepidermoid carcinoma, small cell carcinoma, neuroendocrine tumors, and adenosquamous carcinoma [5]. It is difficult to detect early ESCA by conventional endoscopy and radiological examination, which allows the malignant progression of ESCA via distant metastases through the lymphatic system. Furthermore, even though an ESCA is detected, the developments in endoscopic imaging, ablation, and resection techniques have resulted in the dependence of endoscopy and have limited its role in the therapeutic model [6]. Therefore, it is of great clinical significance to identify reliable biomarkers for the diagnosis and prognosis of ESCA. In this study, the key targets of ESCA are screened by network pharmacology combined with bioinformatics. Then, data associated with esophageal carcinoma samples are downloaded from TCGA database for integrated analysis.
Epithelial cell adhesion molecule (EpCAM) is a transmembrane glycoprotein originally described by Koprowski et al. [7]. It is considered a reliable surface binding site for pure cell adhesion molecules and therapeutic antibodies. EpCAM is a cancer stem cell (CSC) marker, which is expressed in various epithelial carcinomas comprising ESCA. Recent research has shown a clear correlation between human cancers and high expression of EpCAM of human cancers [8, 9]. High expression of EpCAM in primary tumors is often associated with more aggressive phenotypes. It also has a negative impact on the patient’s prognosis [10]. Some carcinoma-associated antigen encoded by EpCAM is detected on healthy epithelial cells and gastrointestinal carcinomas [11, 12]. Therefore, it provides evidence that EpCAM is important for the diagnosis and treatment of these types of cancers.
Ginsenosides are mainly extracted from the Araliaceae plants, which are called the prototype ginsenosides. Their transformation products are named rare ginsenosides. More and more studies have shown that the rare ginsenosides have a strong antitumor effect and a wide range of antitumor mechanisms [13]. They can achieve the therapeutic effect through many ways, such as direct actions on cancer cells, inhibition of the tumor growth by indirect induction of apoptosis, and enhancement of immunity [14]. These highlight the ginsenosides’ functional advantages and they are expected to become an important drug in the treatment of cancer. At present, there are few reports about the effect of ginsenosides on esophageal cancer. Nonetheless, the existing studies have found that the rare ginsenoside Rk3, ginsenoside Rg5, ginsenoside Rh2, and ginsenoside Rg3 have a certain therapeutic effect in esophageal cancer [15–18]. Therefore, ginsenoside Rk3, ginsenoside Rg5, ginsenoside Rh2, and ginsenoside Rg3 have been selected as the research objects in the present study to investigate the interaction between them and EpCAM, a therapeutic target of esophageal cancer, at the molecular level. The virtual screening is carried out using a Surflex-Dock molecular docking module. The chemical components, which have been well bound to EpCAM, are screened out based on a total score >5 as a threshold. Ginsenosides and EpCAM are analyzed by LigPlot + v.2.2 software to identify the binding sites. The workflow for network pharmacology and bioinformatics analysis of publicly available datasets is shown in Figure 1.

2. Materials and Methods
2.1. Target Screening for ESCA
The targets of ESCA are obtained from GeneCards Human gene database (https://www.genecards.org/) and DisGeNET (http://www.disgenet.org/search). We searched “esophageal carcinoma” in GeneCards and DisGeNET. Common targets for GeneCards and DisGeNET were obtained by using Venn diagram package (FunRich 3.1).
2.2. GEO Data Download and Preprocessing
High-sequence data of GSE1420 (GPL96, Affymetrix Human Genome U133A Array) and GSE20347 (GPL571, Affymetrix Human Genome U133A 2.0 Array) are collected from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) by using the following keywords: “esophageal carcinoma,” “Homo sapiens,” and “gene expression data” [19]. GSE1420 comprises eight ESCA tissues and eight adjacent nontumor tissues. GSE20347 includes 17 ESCA tissues and 17 adjacent nontumor tissues. Briefly, the CEL format files are used as input. Background correction and normalization are conducted through using the robust multichip average function implemented in the affy package in R software (version 3.5.1).
2.3. Identification of Differentially Expressed Genes
In the present study, the LIMMA package was used for the identification of differentially expressed genes (DEGs) between the ESCA and normal tissues. The Benjamini-Hochberg procedure was introduced to reduce the false discovery rate (FDR) in multiple comparisons. The DEGs’ cutoff value was set as and FDR as <0.05 [20]. The common DEGs for GSE1420 and GSE20347 were obtained by using the Venn diagram package (FunRich 3.1).
2.4. TCGA Data Download and Preprocessing
Genome-wide transcriptome profiles of 160 ESCA tissues and 11 adjacent nontumor tissues are downloaded from The Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/) by using the following keywords: “esophageal carcinoma,” “Homo sapiens,” “gene expression quantification,” and “HTSeq-FPKM.” The clinical data of 183 ESCA patients are downloaded from TCGA database by searching the following keywords: “esophageal carcinoma,” “clinical,” and “BCR XML.” Clinical data include patients’ age, alcohol consumption history, gender, survival status, TNM classification, and pharmaceutical as well as histologic grade. All the data are processed by R software (version 3.5.1).
2.5. Statistical Analysis
The methods outlined by Jiao et al. is followed by the researchers [21]. The expression of EpCAM, MMP1, SPP1, and CRNN in patients in the TCGA-ESCA dataset is evaluated by using plot points. The overall survival (OS) is compared between the high EpCAM, MMP1, CRNN, and SPP1 expression groups through the method of Kaplan–Meier analysis. This analysis is performed based on the expression and clinical data of the TCGA database. Perl is used to merge the data. EpCAM, MMP1, SPP1, and CRNN are divided into two strata based on the expression level and median value. The chi-square and Fisher’s exact tests are applied to identify the correlation between EpCAM mRNA expression and the clinical features of ESCA. A univariate Cox analysis is performed to select potential prognostic factors. A multivariate Cox analysis is conducted to verify the correlation between EpCAM expression and survival along with other clinical features. is considered statistically significant.
2.6. Compounds and EpCAM Molecular Docking and Interaction
Molecular docking is a powerful computation tool that can be used to predict the interaction energy between a receptor and a ligand. In order to do so, it determines the orientation of the ligand that would form the lowest energy complex within the receptor’s binding pocket. In this docking assay, the 2D structures of ginsenoside Rk3, ginsenoside Rh2, ginsenoside Rg3, and ginsenoside Rg5 are downloaded from the PubChem database. Human epithelial cell adhesion molecule (hEpCAM, PDB ID:4MZV:1.865 Å) [22] receptors are retrieved from the Protein Data Bank (PDB). The SYBYL 2.1.1 program is used to evaluate the binding potential targets of EpCAM and ginsenosides. The Surflex-Dock scores (total score) are expressed in −log10 (Kd) units [23]. The ginsenoside and EpCAM interactions are observed and analyzed through using the LigPlot + v.2.2 software [24].
3. Results
3.1. Screening of Intersections of Genes in ESCA
Two GEO datasets are downloaded, preprocessed, and merged into a global dataset that contains 25 esophageal cancer and 25 normal samples (Table 1). The functions of GSE1420 and GSE20347 are normalized so as to make intragroup comparisons of the measurements of each time/group under various experimental conditions. The distributions of DEGs are presented in volcano plots and heat map (Figures 2(a)–2(d)). A total of 415 DEGs of GSE1420 and 221 DEGs of GSE20347 are identified by the LIMMA package. 97 DEGs are identified in the two samples in considering the intersections (Figure 2(e)). The ESCA targets are obtained from the GeneCards and DisGeNET databases, and then 580 (relevance score >10%) and 305 DEG targets are retrieved, respectively. Later, four ESCA disease targets (EpCAM, MMP1, SPP1, and CRNN) are obtained by combining GSE1420 and GSE20347 (Figure 2(f)).

(a)

(b)

(c)

(d)

(e)

(f)
3.2. Characteristics of the Study Population
The clinical data of 183 ESCA patients are downloaded from TCGA database comprising patients’ age, alcohol consumption history, gender, survival status as well as TNM classification, pharmaceutical, histologic grade, clinical stage, and radiation therapy of ESCA (Table 2).
3.3. Screening Key Targets
There is a comparison between EpCAM expression in ESCA and normal tissues. The results of comparison indicates that EpCAM expression is elevated in ESCA . Whether the identified prognostic markers have a value for predicting patient survival is evaluated; for example, EpCAM, overexpressed in ESCA, shows a negative correlation with patient survival (Figure 3). In other words, patients with a higher expression of EpCAM has worse overall survival . The expression of CRNN is normal, and ESCA has no statistical significance as well , according to Figure S1. MMP1 and SPP1 expression is elevated in ESCA. The expression levels of MMP1, SPP1, and CRNN show that there is no association between them and the prognosis of ESCA patients for overall survival . In the same way, we conducted no further analysis on those genes, but concentrated on the analysis of EpCAM.

(a)

(b)
3.4. High EpCAM Expression in ESCA
EpCAM expression in ESCA and normal tissues is compared (Figure 4).The differences in EpCAM expression are observed according to histologic grade , stage , patient age , N classification , radiation therapy (), and T classification .

3.5. High EpCAM Expression Is an Independent Risk Factor for Overall Survival among ESCA Patients
Univariate and multivariate Cox (Figure S2) analyses show that EpCAM expression is an independent risk factor for overall survival (OS) among ESCA patients (hazard ratio [HR] = 1.00, 95% confidence interval (CI): 1.00–1.01, ; Table 3).
3.6. Molecular Docking and Interaction
EpCAM is inputted into the SYBYL 2.1.1 program for molecular docking verification. Ginsenoside Rk3, ginsenoside Rh2, ginsenoside Rg3, and ginsenoside Rg5 are also inputted into the docking program (Table 4). The crystal structure of EpCAM is found to contain an original ligand, with active pockets determined by the original crystal structure A/DMU301. The docking results provide a similarity value that ranges between 0 and 1, whereby the larger the value is, the more similar the molecular conformation is. The similarity value for the docking and original conformations is 0.75, which indicates a good overlap and good accuracy of docking process. The total score parameter is used as an evaluation index of the molecular docking results. A total score ≥5.0 indicates that active molecules have strong binding activity with EpCAM. A total score ≥7.0 suggests that active molecules have an even stronger binding activity with EpCAM. CH4, C=O, and N-H are used as molecular probes to determine the cavity of the receptor and perform the docking. The results of the docking indicate that ginsenoside Rg3 and ginsenoside Rh2 had a strong binding activity with EpCAM. In addition, the glycosyl chains of the ginsenoside Rg3, ginsenoside Rk3, and ginsenoside Rg5 are found to bind to residues outside the active pocket by hydrogen bonding. The hydroxyl groups on the ginsenoside Rh2 glycoside skeleton are found to combine with the residues in the active pocket by hydrogen bonding, the total spatial structure of which is reasonable. However, the level of exposure in the active pocket is not exactly the same, and they interact with the molecules in the pocket (Figure 5). The analysis of interactions between ginsenoside Rk3, ginsenoside Rh2, ginsenoside Rg3, and ginsenoside Rg5 and EpCAM are performed by the LigPlot + v.2.2 software (Figure S3). In ginsenoside Rg3, 6′-hydroxyl on three glycosyl groups forms hydrogen bonding forces with Lys151, and 6″-hydroxyl on three glycosyl groups forms hydrogen bonding forces with Lys221 (dashed line, Figure S3(a)). The hydroxyl groups on the 12 sites of ginsenoside Rh2 glycoside framework are found to have a hydrogen bonding with Leu240 (dashed line, Figure S3(b)). In ginsenoside Rg5, 6′-hydroxyl on three glycosyl groups forms hydrogen bonding forces with Lys151, and 6″-hydroxyl on three glycosyl groups forms hydrogen bonding forces with Lys241 (dashed line, Figure S3(c)). In ginsenoside Rk3, the glycosyl groups on six sites forms hydrogen bonding forces with both Lys151 and Asp241 (dashed line, Figure S3(d)).

(a)

(b)

(c)

(d)

(e)
4. Discussion
With the extensive application of gene chip technologies, abundant expression profile information, and screening of DEGs, biomarkers in tumor tissues could be detected efficiently by integrating publicly available datasets. Multidisciplinary cross-integration contributes to the rapid development of bioinformatics [25]. The GEO database collects large amounts of genomics data such as gene chips, filters, and serial analyses of gene expressions, which are submitted by researchers from all over the world [19]. This GEO database facilitates genetic studies and provides important online resources for the reintegration and in-depth exploration of gene expression data [26]. TCGA is a database of cancer gene information that includes gene expression data, cancer mutation profiles, and related clinical information [27]. The above-mentioned databases represent valuable resources for studying the appearance, development, and prognostic status of cancer. In this study, we also integrate molecular docking technology, which can predict the binding degree of small molecules to targets and facilitate the screening of active components in Chinese medicines. The screening of prognostic genes for ESCA has been addressed in previous studies [28]. On the contrary, this study combines network pharmacology with bioinformatics to screen four ESCA targets. Moreover, it is found that the EpCAM gene may be a potential target for the early diagnosis and prognosis of ESCA by deep mining TCGA clinical data.
By reviewing the study characteristics, the EpCAM gene contains 314 amino acids with a relative molecular weight of 40 kDa. Given today’s resources, the characteristics of EpCAM that act as CSC markers can be considered [29, 30]. It has been proved that CSC subpopulations can initiate cancer development, promote cancer metastasis and drug resistance, and even lead to recurrence of ESCA [31]. Furthermore, in ESCA patients, it is known that the expression of the EpCAM is dynamic. In the formation of tumors, high expression is associated with the proliferation of cells, whereas low expression is related to migratory and invasive phenotypes of ESCA cells [32]. These two points show that the expression of EpCAM provides the conditions necessary for the growth of ESCA. In the present study, a high EpCAM expression in ESCA is observed, which is consistent with other findings related to the EpCAM expression in tumors. It is found that distinct histologic grades and a survival status are associated with EpCAM expression, which suggests a possible relationship between EpCAM expression and survival in ESCA. Collectively, our data demonstrate genome-wide transcriptional regulation by EpCAM and suggest target genes as biomarker candidates for EpCAM-associated ESCA.
Ginseng saponins are characterized by many positive biological activities, including antitumor, antiradiation, and antiaging activities [33, 34]. Modern studies have shown that ginsenosides have antitumor effects and relatively minor side effects. Ginsenoside Rh2 [15], ginsenoside Rg3 [16], ginsenoside Rk3 [17], and ginsenoside Rg5 [18] have been proven to play important roles in ESCA. Many studies have confirmed that the anticancer effect of ginsenosides is mainly related to cell apoptosis, cell autophagy, or cycle arrest. Ginsenoside Rh2 and ginsenoside Rg3, in particular, exhibit the strongest ability to inhibit cancer cells and have low toxicity to normal cells [35, 36]. In this study, ginsenoside Rg3, ginsenoside Rh2, ginsenoside Rg5, and ginsenoside Rh2 are found to form stable hydrogen bonds with Lys151, Lys221, Leu240, and Asp241 amino acid residues to increase the binding stability and then are concurrently subjected to hydrogen bond forces and strong hydrophobic interaction at the binding sites with Leu233, Asp232, Leu242, Pro244, Val220, and Asp219 hydrophobic residues. These two active molecules, ginsenoside Rg3 and ginsenoside Rh2, have strong binding activity with EpCAM targets, with docking scores higher than 5.0. In ginsenoside Rg3, 6′-hydroxyl on three glycosyl groups formed hydrogen bonding forces with Lys151, and 6″-hydroxyl on three glycosyl groups formed hydrogen bonding forces with Lys221. The hydroxyl groups on the 12 sites of the ginsenoside Rh2 glycoside framework were found to have hydrogen bonding with Leu240. The formation of hydrogen bonds plays an important role in binding of ginsenoside Rg3 and ginsenoside Rh2 to EpCAM, as well as the stability of EpCAM conformation.
This study, by the integration of content from multiple disciplines, confirmed that the high expression level of EpCAM is associated with poor overall survival of patients with ESCA. This finding is of great significance to the clinical diagnosis and treatment of ESCA and also provides a theoretical basis for clarification of the role of EpCAM in ESCA. This study provides novel insights into the molecular mechanism of ESCA and may serve as a reference for clinical studies in ESCA.
5. Conclusion
This study screens four ESCA targets combining network pharmacology with bioinformatics. Then, it is found that EpCAM may be a potential biomarker for early diagnosis and prognosis of ESCA through the deep mining of TCGA clinical data. Molecular docking technology indicates that ginsenoside Rg3 and ginsenoside Rh2 can easily establish good docking modes and have a high affinity with EpCAM. Moreover, it will further provide references for virtual screening of ginsenosides. There are some limitations in this study. Relevant clinical and basic experiments should be conducted in the future to verify the stability and practicability of the target.
Data Availability
The data used to support the findings of this study are included within the supplementary information file.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this manuscript.
Authors’ Contributions
Xin Yang and Yahui Li conceived and designed the research; Xin Yang and Haibing Qian analyzed the data; Xin Yang wrote the paper. All authors read and approved the final version of the manuscript.
Acknowledgments
This work was financed by the research projects on Science and Technology of Guizhou Province (Qian Kehe Platform Talents [2019] 1028), Guizhou Province Department of Education Youth Science and Technology Talent Growth Project (Guizhou Education KY word [2018] 211), Guizhou Provincial Administration of Traditional Chinese Medicine, Ethnic Medicine Science and Technology Research Topic (QZYY-2019-023), and Guizhou University of Traditional Chinese Medicine (20180007).
Supplementary Materials
Figure S1: MMP1, SPP1, and CRNN expression in ESCA. Figure S2: multivariate analysis of the correlation of EpCAM expression with OS among ESCA patients. Figure S3: structure of interaction between ginsenoside and EpCAM. (Supplementary Materials)