Abstract
Breast cancer is the most common malignancy in women, and lapatinib, an oral drug targeting dual targets of HER2/neu and EGFR in breast cancer patients, often causes drug resistance, but the exact mechanism of acquired resistance to lapatinib is not fully understood. Therefore, it is crucial to study the mechanisms of drug resistance. Bioinformatics and machine algorithms were used to screen and analyze pivotal expression genes for lapatinib resistance in breast cancer to explore biomarkers for diagnosing lapatinib resistance in breast cancer and correlation with immune cell infiltration. We downloaded breast cancer lapatinib resistance-related datasets from the GEO database, first performed DEG screening and enrichment analysis, and then used LASSO and support vector machine (SVM) algorithms to identify hub genes and validate them in the test set, in addition to investigating the correlation between breast cancer lapatinib resistance gene expression and survival, immune infiltration differences, and hub genes and immune cell analysis. We finally screened four pivotal genes, including TTK, CENPN, UGCG, and AURKB, whose expression differed between normal and drug-resistant related samples, all of which were predictive of prognosis in lapatinib-resistant breast cancer patients, and the expression of TTK and CENPN pivotal genes was associated with tumor infiltration of immune cells. The four pivotal genes screened in this study will help to further explore the molecular mechanisms of lapatinib treatment resistance in breast cancer.
1. Introduction
Breast cancer is the most common malignant cancer among women of reproductive age worldwide; its incidence increases with age [1]. In recent years, the mortality and incidence rate of Chinese women with breast cancer has been increasing at a rate of 3% year by year, and statistics in 2020 show that breast cancer has replaced lung cancer as the most frequent tumor in the world, which seriously threatens women’s life and health [2]. Long-term follow-up reveals that elderly breast cancer patients often die from chronic underlying diseases such as heart disease, lung disease, and cerebrovascular disease [3]. The cure rate for early stage breast cancer is high, reaching over 80% to 90%, but early stage patients are not obvious and can be easily overlooked, and many breast cancer patients are already in the middle and late stage when diagnosed, making treatment more difficult and survival rate lower [4]. The main cause of cancer is the mutation of cancer driver genes, and the screening of breast cancer driver genes is of great significance to study the pathogenesis of breast cancer, identify effective treatment options, and develop new anticancer drugs. Therefore, scientific and effective prediction methods are of great importance for the diagnosis of breast cancer, and early detection of the disease and timely treatment with doctors can effectively improve the chances of survival and reduce patients’ pain.
In breast cancer, 20% to 30% of patients have HER-2-positive expression in tumor tissue [5], and the prognosis of these patients is worse than other patients according to relevant reports. Patients can choose treatment options such as lapatinib after failure with trastuzumab [6]. Lapatinib is an orally administered dual-target epidermal growth factor receptor (EGFR), human epidermal growth factor receptor-1 (HER-1), and human epidermal growth factor receptor-2 (HER-2), a small molecule tyrosine kinase inhibitor [7]. It was approved by the U.S. FDA for use in HER-2-positive advanced or metastatic breast cancer previously treated with trastuzumab [8]. However, the drug currently has the disadvantage of clinical resistance, which is also one of the main reasons for treatment failure, and its resistance mechanism is not yet clear.
In this paper, we rely on high-throughput sequencing technology breast cancer lapatinib resistance gene expression profile data to screen the characteristic genes and make effective prediction of breast cancer. In this paper, we propose a LASSO-SVM-based software defect prediction method; firstly, we analyze the core idea of the LASSO method and how to use its feature selection capability to obtain a streamlined subset of complexity metric attributes; we analyze the core idea of SVM machine learning algorithm and use the cross-validation algorithm to optimize the relevant parameters of SVM to obtain the SVM optimization model. The streamlined subset of attributes obtained using the LASSO method was input into a software defect prediction model based on the SVM machine learning algorithm, and the effectiveness and prediction accuracy of the method were verified by comparing it with other traditional software defect prediction methods to mine lapatinib resistance-related breast cancer hub genes and investigate their resistance mechanisms.
2. Materials and Methods
2.1. Data Sources and Processing
The data were retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) of the National Center for Biological Information (NCBI) by entering search formula “Breast cancer [Title] AND lapatinib” and searching for human series containing normal controls. Three microarrays (GSE16179, GSE38376, and GSE61756) were downloaded, GSE16179 and GSE38376 were used as training set data, and GSE61756 was used as validation set data.
2.2. Differentially Expressed Genes (DEGs) Screening
The microarray data (GSE16179 and GSE38376) were normalized, and genes were combined into 1 dataset. The downloaded gene expression profiles were analyzed and screened using the “limma” package in R. The DEGs were screened using “|” as the fold change and as the cut-off criterion, and volcano plots and heat maps were produced.
2.3. LASSO Regression and SVM Were Used to Screen Hub Genes
The “glmnet” package of R e was used to find the points with the least error by cross-validation as the genes screened by the LASSO algorithm [9], and the “kerlab” and “caret” packages of the R software were used to find the points with the least error by cross-validation as the genes screened by the SVM algorithm. The SVM algorithm was used to filter the genes, the “venn” package of the R software was used to obtain the intersection of the genes and the common differentially expressed genes, and the “ggplot2” package was used to plot the graphs [10].
2.4. Pathway Function Analysis
The obtained differential genes in training set data were analyzed by using “stringi,” “clusterProfiler,” “ggplot2,” and “enrichplot” in the R software. The GO enrichment analysis was performed on the obtained differential genes, including three parts: cell composition (CC), biological process (BP), and molecular function (MF); KECG and GSEA were also analyzed by “stringi,” “clusterProfiler,” “ggplot2,” and “enrichplot,” in the R software for analysis.
2.5. Prognostic Survival Analysis
The Kaplan-Meier Plotter online database (http://kmplot.com/analysis/) was used for prognostic survival analysis of hub genes, and the best cut-off values were selected to plot overall survival (OS) and relapse-free survival (RFS) curves. The test level was .
2.6. Subject Operating Characteristic Curve (ROC) Analysis
Based on the expression of TTK and CENPN in the expression dataset GSE16179 and GSE38376, ROC curves were plotted in the R software using the “pROC” package, while testROC was plotted in the dataset GSE32571 using the “pROC” package, to assess the accuracy of differentiating disease status by gene expression.
2.7. Immune Cell Infiltration Analysis
The “pheatmap” and “vioplot” packages of R software were used to visualize the correlation between breast cancer tissue and normal tissue and immune cells. The “limma,” “reshape2,” “ggpubr,” and “ggExtra” packages of R software were used to visualize the relationship between resistance-associated tissues and normal tissues. The correlation between TTK, CENPN, and immune cells was analyzed using the “limma,” “reshape2,” “ggpubr,” and “ggExtra” packages of R software.
3. Results
3.1. Screening of Differentially Expressed Genes in Lapatinib-Resistant Breast Cancer Tissues
A total of 14 breast cancer samples after conventional lapatinib treatment and 16 lapatinib-resistant breast cancer samples were collated from the GEO database with datasets GSE16179 and GSE3837, and a total of 115 DEGs were obtained after performing differential analysis, of which 55 were downregulated genes and 60 were upregulated genes (Figures 1(a) and 1(b)).

(a)

(b)
3.2. Enrichment Analysis of Differential Genes
The GO enrichment analysis showed that the BP was mainly enriched in chromosome segregation, nuclear division, and organelle fission. In MF, the enrichment was mainly in tubulin binding, ATPase, activity and microtubule binding, and in CC, the enrichment was mainly in spindle, chromosomal region, chromosome, centromeric region, and other molecular functions (Figures 2(a) and 2(b)). The KEGG signaling pathway enrichment analysis revealed that DEGs were mainly enriched in cell cycle and progesterone-mediated oocyte maturation signaling pathways (Figures 3(a) and 3(b)). The GSEA enrichment analysis showed that cell cycle, DNA replication, oocyte meiosis, progesterone-mediated oocyte maturation, spliceosome, arachidonic acid metabolism, endocytosis, lysosome, peroxisome, and valine leucine and isoleucine degradation pathways were enriched (Figures 4(a) and 4(b)).

(a)

(b)

(a)

(b)

(a)

(b)
3.3. Screening of Hub Genes
The genes with minimal cross-validation error in the training dataset GSE16179 and GSE38376 were analyzed by the LASSO regression algorithm, and 8 genes were screened, including TTK, CENPN, ATP6V1B1, C6orf173, UGCG, AURKB, C1orf64, and S100A8 (Figure 5(a)); the SVM machine learning algorithm was used to analyze the genes with the smallest cross-validation error in the training dataset, 36 feature genes CENPN, CCDC99, TTK, KIF20A, PLK4, and UGCG were screened (Figure 5(b)), and the hub genes TTK, CENPN, UGCG, and AURKB were obtained after constructing a Venn diagram to take the intersection of the two algorithms (Figure 6(a)). In the dataset GSE61756 for TTK, CENPN, UGCG, and AURKB were validated in the dataset GSE61756, was considered a statistically significant difference, and the analysis obtained that the expression differences of TTK and CENPN in the validation set were statistically significant, except for UGCG which was expressed at high level in the lapatinib-resistant breast cancer group; the expression of the TTK, CENPN, and AURKB showed low expression level (Figures 6(b)–6(e)).

(a)

(b)

(a)

(b)

(c)

(d)

(e)
3.4. Clinical Prognostic Analysis
In the training datasets GSE16179 and GSE38376, the area under the curve (AUC) values for TTK, CENPN, UGCG, and AURKB were 0. 982, 0.973, 0.871, and 0.875, respectively; and in the validation dataset GSE61756, the AUC values for TTK, CENPN, UGCG, and AURKB were 0.735, 0.751, 0.644, and 0.827, respectively. The AUC values of 0.735, 0.751, 0.644, and 0.827 for TTK, CENPN, UGCG, and AURKB in the validation dataset GSE61756 indicated good potential for the diagnosis of the four genes as a mechanism of lapatinib-resistant breast cancer. In addition, clinical prognostic analysis showed that OS and RFS survival curves using the Kaplan-Meier method are shown in Figures 7 and 8. Evaluation of the prognostic value of hub genes showed that the expression of TTK, CENPN, UGCG, and AURKB in breast cancer was strongly associated with 0S and RFS (), with a good survival prognosis for those with high expression of UGCG, and a good survival prognosis for those with high expression of TTK, CENPN, and AURKB. The survival prognosis of those with high expression of UGCG was good, while those with high expression of TTK, CENPN, and AURKB had poor survival prognosis ().

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
3.5. Immune Infiltration Analysis
Based on the gene expression matrix with immune cells, the abundance of 22 immune cells in the training dataset GSE16179 and GSE38376 was calculated using the R software, and we only found that in mast cells resting, the level of the experimental group was higher than the control group, and the difference between them was statistically significant (). However, from the figure, it can be obtained that B cells naive, plasma cells, T cells CD8, T cells CD4 memory resting, NK cells resting, NK cells activated, monocytes, macrophages M0, dendritic cells resting, mast cells activated, and neutrophils were expressed at higher levels in controls, while B cells memory, T cells CD4 naive, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), T cells gamma delta, macrophages M1, macrophages M2, dendritic cells activated, mast cells resting, and eosinophils in the lapatinib-resistant group had higher expression (Figures 9(a) and 9(b)). We further analyzed the correlation between four genes, CENPN, UGCG, and AURKB and immune cells, and the results showed that CENPN was negatively correlated with dendritic cells activated and mast cells resting; TTK was negatively correlated with mast cells resting (Figures 9(c)–9(e)).

(a)

(b)

(c)

(d)

(e)
4. Discussion
In this study, we first performed DEG screening and enrichment analysis, and 115 DEGs were screened, including 55 upregulated genes and 60 downregulated genes. The results showed that GO analysis of DEGs involved in chromosome segregation, nuclear division, organelle fission, tubulin binding, ATPase activity, microtubule binding, spindle chromosomal region, chromosome, and centromeric region activity related to KEGG pathway enrichment analysis showed correlation in cell cycle and progesterone-mediated oocyte maturation signaling pathway. We found that the cell cycle is a noteworthy aspect of the normal cell cycle regulation process, where specific cell cycle regulatory molecules must be rapidly degraded at specific moments to play an important role in the completion of DNA replication, mitotic progression, maintenance of G1 phase, and cytoplasmic division, and when there is a disruption in one stage of the process, eventually, malignantly proliferating cells are formed and even lead to tumorigenesis. Its role in a variety of cancers has been demonstrated. Interestingly, we have also shown in the GSEA enrichment analysis that processes such as cell cycle and DNA replication suggest an association between cell cycle checkpoint genes and aspects of lapatinib resistance in breast cancer.
As you know, this is the first study to combine LASSO and SVM algorithms to identify biomarkers of lapatinib resistance-related breast cancer and validate them. We finally screened four hub genes, including TTK, CENPN, UGCG, and AURKB. the AUC values of the four hub genes in the training set were >0.870 but could only reach about 0.64 in the validation set, indicating that the constructed model has relatively robust validation performance, but its testing performance needs to be improved. TTK is a hub regulator of mitotic TTK, a hub gene that regulates mitotic checkpoints and chromosome attachment, and elevated levels of TTK gene can lead to centrosome enlargement and chromosome instability, which are closely associated with tumorigenesis and poor prognosis [11]. Triple negative breast cancer (TNBC) is characterized by the absence of estrogen receptor (ER), progesterone receptor (PR), and HER-2 and is more aggressive than other breast cancer subtypes [12]. Through genomic and proteomic analysis of human breast cancer samples, we found higher levels of TTK mRNA and protein expression especially in samples from TNBC patients with poor prognosis [13]. King et al. investigated the role of TTK in TNBC and found that high TTK expression was associated with the mesenchymal and proliferative phenotype of TNBC cells and pharmacological inhibition or gene silencing of TTK [14]. However, the involvement of TTK in drug resistance mechanisms in various malignancies, such as lung cancer [15], ovarian cancer [16], and TNBC [17], has been demonstrated, but there are no reports related to lapatinib specifically. CENPN is a component of the mitotic nucleosome-associated complex and plays a central role in kinetochore assembly, mitotic progression, and chromosome segregation [18]. It has been reported in the literature that high expression of CENPN severely affects the prognosis of breast cancer patients and is detrimental to survival, especially in patients with recurrent breast cancer with a history of smoking [19]. In addition CENPN expression levels can lead to triggering of the PI3K/Akt/mTOR signaling pathway that may affect DRFS in breast cancer patients and modulate tumor proliferation [20]. Reports on the properties of CENPN gene affecting drug resistance have only found its involvement in platinum resistance in ovarian cancer [21], but the relevance of CENPN to drug resistance in breast cancer currently remains to be further confirmed. UGCG can reduce the level of ceramide in vivo, resulting in cells evading the ceramide-induced apoptotic process, and its role as a common drug resistance gene [22].UGCG is a common drug resistance gene in leukemia [23], mesothelioma [24], and colorectal cancer [25], but there is no research on the mechanism of drug resistance in breast cancer. However, it has been demonstrated that UGCG is involved in metabolic activities such as glycolysis [22] and glutamine metabolism [26] in breast cancer to induce cancer promotion and progression. AURKB is a serine/threonine kinase, which regulates chromosome condensation, affects bipolar spindle formation and cytoplasmic division, and is closely related to tumor pathological processes [27]. In breast cancer cells, AURKB expression decreases DTL/RAMP phosphorylation and reduces DTL/RAMP protein stability [28]. Aberrant expression of AURKB has been reported to be associated with poor prognostic outcome in breast cancer [29]. Although no relationship between AURKB and lapatinib resistance has been found, it has been reported that AURKB phosphorylation leads to a PRKCE/AURKB/RAB27B secretion axis regulating paclitaxel resistance in breast cancer [30].
The tumor microenvironment (TME) has been shown to be an important source of potential therapeutic targets for tumors with high complexity, and a growing body of literature suggests that the TME plays a hub role in tumor progression and therapeutic response [31]. For example, immunotherapy efficacy and chemotherapy benefit can be predicted by the cellular component of TME at the time of diagnosis [32, 33]. Unfortunately, although we found differences in the distribution of immune cells between the control and resistant subgroups in terms of expression, with the exception of mast cells resting cells, the rest was not statistically significant. The reasons for this phenomenon are that, firstly, we believe that the current dataset on lapatinib resistance genes in breast cancer is small and contains a limited number and, secondly, the SVM and LASSO algorithm led to a further narrowing of the set of genes available for follow-up. Even if this is the case, we suggest that the distribution of immune cells may be related to lapatinib-resistant genes in breast cancer.
The data in this study were obtained from the GEO database, which has some limitations in terms of samples, and have only been validated in the bioinformatics data so far. Further validation at the mRNA and protein levels in cellular experiments and clinical samples from multiple study centers is needed to clarify whether the differential genes are statistically different. The TTK and CENPN screened by bioinformatics analysis in this study may provide clues for early diagnosis and prognosis detection of Lapatinib-resistant breast cancer and may serve as a theoretical basis for further insight into the mechanism of lapatinib-resistant breast cancer.
In conclusion, this study used bioinformatics methods to mine and analyze the data of lapatinib-resistant gene chip in breast cancer. The results are helpful to further explore the molecular mechanism of its process and make up for the defects that have not been verified by experiments, so as to realize the clinical value of genes.
Data Availability
The dataset used and/or analyzed during this study are available from the corresponding author on reasonable request.
Conflicts of Interest
The authors declare that they have no competing interests.
Authors’ Contributions
Ruihua Fan and Yaning Zhu have the same contribution in this study and are the first authors.
Acknowledgments
This work is supported in part by the Huai’an International Collaborative Project (No. HAC201621).