Abstract

Objective. Breast invasive carcinoma (BRCA), as a systemic disease, is currently the most malignant tumor among women. Early detection of BRCA will increase the probability of cure. Pyrimidine metabolism (PyM) stands for an essential metabolic pathway related to DNA replication of cancer cells, which may also serve as a diagnostic marker and therapeutic target. Therefore, the aim of this research is to discover a prognostic signature associated with PyM for BRCA. Methods. The BRCA mRNA sequencing data along with microarray data were obtained based on The Cancer Genome Atlas (TCGA) database. In addition, 4 PyM-related gene sets were profiled through gene set enrichment analysis (GSEA); it revealed the core genes differentially expressed in cancer and paracancerous tissue. Thereafter, genes were subjected to univariate as well as multivariate regression for constructing an mRNA signature to independently predict BRCA prognosis. Then, the Kaplan-Meier (KM) curve was applied for validation. The prognostic power of the signature was verified against the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) database. Results. We constructed a three-mRNA (RRM2B, NME3, and POLD2) gene signature related to PyM to predict overall survival (OS) for BRCA. The as-constructed gene signature was adopted to classify cases as high- or low-risk group, identifying patients with BRCA with poor prognosis. Additionally, the risk score obtained using our constructed 3-mRNA prognosis signature is independent from other clinical variables. Conclusion. Our findings suggested that PyM-related mRNA signature might be a combined prognostic biomarker for BRCA and can provide important reference that are useful for individualized treatment for BRCA patients.

1. Introduction

Nucleotide metabolism is a critical pathway that generates purine and pyrimidine molecules for DNA replication, RNA synthesis, and cellular bioenergetics [1]. The increasing metabolism of nucleotides facilitates out-of-control tumor growth, which serves as a cancer hallmark [2]. There has been an explosion of knowledge in disorders of pyrimidine metabolism during the last 20 years [3]. Pyrimidines have long been considered building blocks in synthesizing nucleic acids as well as intermediates for metabolic energy transfer. Growing attention has been paid to pyrimidines since their genetic alterations in metabolism are associated with different symptoms (like immunodeficiency, hyperuricemia, and even neurological disorders) [4]. PyM represents a complicated enzyme network, which combines salvage and de novo synthesis of nucleotides, along with catalytic pyrimidine degradation [5]. PyM contains 3 related pathways shown below: (1) free base and nucleoside salvage, (2) de novo synthesis based on ribose precursors and amino acids, and (3) excessive nucleoside and nucleotide catabolism [6]. Early success in cancer metabolism took advantage of this characteristic by making cancer cells vulnerable to inhibition of this pathway [7]. The importance of intact pyrimidine pathways in human physiology and their upregulation in malignancy [8] makes them ideal targets for pharmacological interventions. Agents inhibiting the synthesis and incorporation of nucleotides in DNA are widely used as chemotherapeutics to reduce tumor growth, cause DNA damage, and induce cell death [9]. Heidelberger and colleagues designed fluorinated uracil-based pyrimidine analogues, which disrupted tumor DNA biosynthesis and which are to this day used to treat colorectal and breast cancer [10, 11].

BRCA is a malignant tumor in which cancer cells have penetrated the basement membrane of the breast ducts or lobular acinars and invaded the interstitium. The vast majority of breast invasive carcinoma is adenocarcinomas, which originate from the parenchymal epithelial cells of the breast, especially the peripheral ductal lobular units of the breast [12]. Ninety-nine percent of BRCA patients are women, so this disease is the malignant tumor with the highest incidence among women [13]. Among the known risk factors for BRCA, in addition to age factors, individual family history, menstrual history, pregnancy history, and benign breast lesions are all closely related to the risk of breast cancer [14]. Early detection and early diagnosis of this disease is the key to improving the efficacy and can significantly extend the survival period [15]. Clinically, there is always a need for better or alternative methods to identify people at risk of cancer [16]. Previous studies have found many prognostic biomarkers in patients with BRCA [17]. However, there is little research on the systematic study of metabolic status as well as prognostic significance among tumor cases, in particular for research associated with PyM, and this is possibly a novel point cut in our study. Therefore, in this study, we are trying to exploit the gene signature associated with PyM in BRCA.

This work carried out GSEA for identifying the gene sets associated with PyM to differentiate clinical as well as molecular parameters for BRCA. We developed a PyM-related prognostic signature (RRM2B, NME3, and POLD2) with whole genome expression data from TCGA database. Surprisingly, the local PyM-related risk signature could independently classify patients with BRCA with a high risk of unfavorable outcome. Then, the prognostic power of the signature was validated in the METABRIC database. Our finding provides important references to understand the mechanism of PyM and to develop an individualized treatment for BRCA patients.

2. Materials and Methods

2.1. Collection of Gene Expression Data and Patient Clinicopathological Parameters

The whole mRNA expression data and corresponding clinical parameters of BRCA were extracted from TCGA (http://cancergenome.nih.gov/) and METABRIC database. METABRIC is a Canada-UK joint project that is aimed at further classifying breast tumors based on molecular characteristics that help determine the best course of treatment. Altogether, 1108 BRCA cases together with 113 normal subjects who had matched clinical characteristics were obtained from TCGA. Table 1 shows the clinical characteristics of all participants. And we collected 1892 BRCA samples from the METABRIC database.

2.2. Functional and Pathway Enrichment Analysis

GSEA (http://www.broadinstitute.org/gsea/index.jsp) can be applied to examine the significance of gene set-derived genomes in two gene expression data groups [18]. We discovered three PyM-related gene sets on the GSEA website, which were called GO_PYRIMIDINE_CONTAINING_COMPOUND_CATABOLIC_PROCESS, GO_PYRIMIDINE_CONTAINING_COMPOUND_BIOSYNTHETIC_PROCESS, KEGG_PYRIMIDINE_METABOLISM, and GO_PYRIDINE_CONTAINING_COMPOUND_METABOLIC_PROCESS in Molecular Signatures Database v4.0 (http://www.broadinstitute.org/gsea/msigdb/index.jsp). Gene sets were determined by the corrected value () in subsequent analysis. Thereafter, we also screened core genes (core enrichment: yes) in subsequent analysis. Later, the above screened genes were subjected to functional enrichment analysis by the bioinformatics approach Metascape (http://metascape.org). The aim of Metascape is to develop a set of reliable, productive, and intuitive tools that help the biomedical research communities to analyze gene/protein lists and make better data-driven decisions [19].

2.3. Establishment and Confirmation of a Prognostic Signature

Figure 1 displays the flowchart of the present work. In this study, we used the univariate Cox model to calculate the association of every screened PyM-related mRNA expression with patient OS. After that, the multivariate Cox analysis method was used to evaluate the weight of mRNA, and the prognostic gene from the previous step was further analyzed and confirmed as a factor to independently predict prognosis. Afterwards, we determined the risk scores for all BRCA cases according to mRNA expression as well as the regression coefficients acquired upon multivariate Cox regression. . In addition, R package was utilized for exploring the relationship between risk scores and OS. Thereafter, the median risk score value was adopted as the threshold for classifying 1108 BRCA cases as a high- or low-risk subgroups. KM curves were used for survival analysis of single genes.

2.4. Statistical Analysis

GraphPad Prism 7 and SPSS 16.0 were utilized for statistical analysis. At the same time, for prognostic genes in BRCA, their genetic alterations were determined using the cBioPortal web software (http://www.cbioportal.org/). The chi-square test was used to demonstrate the relationship between risk score and clinical parameters.

3. Results

3.1. Genes from the PyM-Related Gene Sets Show Significant Differences between Adjacent Cancer Samples and Tumor Samples

For BRCA, the mRNA expression profiles, together with matched clinical characteristics, were acquired with TCGA. We discovered four PyM-related gene sets on the GSEA website, which were called GO_PYRIMIDINE_CONTAINING_COMPOUND_CATABOLIC_PROCESS, GO_PYRIMIDINE_CONTAINING_COMPOUND_BIOSYNTHETIC_PROCESS, KEGG_PYRIMIDINE_METABOLISM, and GO_PYRIDINE_CONTAINING_COMPOUND_METABOLIC_PROCESS. First, we used GSEA to explore whether the genomes from the PyM-related gene sets show significant differences between the two groups of gene expression data. We found that only KEGG_PYRIMIDINE_METABOLISM gene set differs significantly between adjacent cancer samples and BRCA samples (normalized value = 0.002 < 0.05) (Table 2, Figure 2(a)). Next, the core gene from the abovementioned gene set was screened, that is, the gene that has made the main contribution to the enrichment score of the gene set. We then selected 61 core genes for further analysis.

The Metascape Bioinformatics Tool was utilized for functional analysis of core genes to verify the above conclusion. The histogram and network diagram showed that the most enriched KEGG pathway was pyrimidine metabolism, suggesting that the core genes are indeed associated with PyM (Figure 2(b)).

3.2. Identification of PyM-Related Genes Associated with Prognosis in BRCA

According to the overall design and flow diagram of this study, we used the univariate Cox model to calculate the relationship between the expression levels of 61 selected PyM-related mRNAs and the patient’s OS. It was found that there are 19 prognostic mRNAs in patients with BRCA. In total, 3 mRNAs (RRM2B, POLD2, and NME3) were selected upon multivariate Cox regression analysis to be the independent prognostic models () (Table 3, Figure 3(a)). Afterwards, those chosen mRNAs were divided into risk (RRM2B and POLD2, hazard ratio: ) or protective (NME3, ) subtype.

Thereafter, alterations of those 3 screened mRNAs within BRCA were examined using cancer samples derived from the cBioPortal database. As a result, RRM2B had 16% cases of gene mutation, including gene amplification, missense mutation, and deep deletion; there were 5% of gene mutations in NME3, including gene amplification and deep deletion; 1.3% cases of POLD2 had gene mutation, including gene amplification and missense mutation (Figure 3(b)).

Expression levels of these 3 genes in BRCA and matched noncarcinoma tissues were differentially analyzed. As a result, the 3 genes in breast invasive cancer tissues were significantly upregulated (, Figure 3(c)).

3.3. Establishment and Confirmation of a Prognostic Signature

Then, the risk scores for all BRCA patients were determined according to the 3 prognostic mRNA expression as well as the regression coefficient () acquired through multivariate Cox regression. . The unique value of the risk score for each BRCA patient in the dataset can be calculated and was ranked in increasing order (Figure 4(a)). Thereafter, the median risk score was utilized to be the threshold for classifying 1108 BRCA cases as a high- or low-risk subgroups. As shown by KM curve analysis, high-risk patients had a dismal prognosis related to low-risk counterparts (Log-rank, Figure 4(c)). Figure 4(b) displays the risk score, OS (years) together with life status for 1108 cases from the dataset. As observed, a greater risk score indicated the worse survival status and shorter survival time of patients.

In addition, the chi-square test was adopted for revealing the association of risk score with clinical characteristics (Table 4), implying that a higher risk score was associated with T (tumor), N (node), and ER (estrogen receptor) status by IHC (immunohistochemistry) ().

3.4. The Three-mRNA Prognostic Signature Is Robust in BRCA Patients

The constructed 3-mRNA signature was used in the validation set including 1892 BRCA samples from the METABRIC database to validate its prediction ability. In the validation set, we used the same risk prediction model to calculate the risk score of each patient with BRCA and divided them into high-risk and low-risk subgroups using the median risk score. Conforming to prior results, high-risk patients showed markedly reduced survival time relative to low-risk patients in the validation set (Log-rank value < 0.0001; Figure 5(c)). In addition, Figures 5(a) and 5(b) show the distribution of risk scores, life status, and survival time of 1892 BRCA patients. The above findings suggested that our constructed 3-mRNA signature contributed to the effective prediction of BRCA prognosis.

For verifying the superior effectiveness of the constructed 3-mRNA signatures on the single genes that make them up, we validated them through KM analysis. The results showed that when these 3 genes are used as an independent biomarker individually, the ability to predict the patient’s survival is lower than the 3-mRNA signature (Log-rank ) (Figure 5(d)).

3.5. The Three-mRNA Signature Is an Independent Prognostic Indicator in BRCA Patients

For assessing the independence of our 3-mRNA signature-derived risk score from other clinical variables, we carried out univariate as well as multivariate Cox regression analysis. As shown in Figure 6(a), the distributions of diverse clinical factors in each subject were analyzed. First of all, upon univariate Cox regression, age, PR status, HER-2 status, ER status, M, N, stage, and risk score were significantly related to patients’ survival with values less than 0.05 (Figure 6(b)). Moreover, multivariate Cox regression analysis showed that the risk score generated from the 3-mRNA signature was an independent prognostic indicator, after adjusting for N (Figure 6(c)). And the risk score is the most robust parameter predicting the prognosis of patients with BRCA, because the probability of death in patients with high risk is 2.995 times that of patients with low risk.

In addition, a stratified analysis was carried out based on these clinical characteristics to determine the appropriate patient group for the risk prediction model. It turns out that the risk score is still remaining with the ability to predict OS within each subgroup of age, N, and HER-2 status (Figures 7(a), 7(b), and 7(d)). However, the risk score is more suitable for the subgroup of M0 (Figure 7(c)), which suggests that BRCA may be diseases that need further explanation.

4. Discussion

Breast cancer is one of the most common cancers among women, and susceptibility is explained by genetic, lifestyle, and environmental components [20]. BRCA is a major type of breast cancer; the main feature is that the tumor of this breast cancer infiltrates nearby tissues and has an obvious tendency to metastasize so far [21]. It is still a challenge to detect BRCA early. Therefore, creditable diagnostic approaches that attain high accuracy in prediction of the least genes should be developed to detect BRCA earlier [22]. Thanks to technological development, an increasing number of biomarkers are identified for the effective prediction of BRCA prognosis [23]. For instance, BRCA1 and BRCA2, the two primary BRCA suppressor genes, have been discovered in the 1990s [24]. Breast tumors carrying BRCA1 mutants are linked to basal-like and triple-negative phenotypes [25], but those with BRCA2 mutations are generally of the luminal subtype [26]. lncRNA OIP5-AS1 promotes breast cancer progression by regulating miR-216a-5p/GLO1 [27].CLIC2 is a useful biomarker for identifying breast cancer patients who could benefit from immune checkpoint blockade [28]. In addition, serum proteomics can be used in combination with bioinformatics analysis for the early detection of BRCA [29]. RS/DJ-1, the PTEN regulator [30], has been identified as the circulatory antigen discovered in serum samples of 37% new BRCA cases, rather than from normal subjects [31]. It can be seen that the methods for screening biomarkers are becoming increasingly diversified, and bioinformatics methods have gradually become a new attempt for us.

In recent years, the energy metabolism of tumors has become an indispensable research hotspot. Different from resting cells, tumor cells continuously supply deoxyribonucleoside triphosphates (dNTPs) resting in the de novo pathway, which thus facilitates the out-of-control tumor growth [1]. PyM represents a part of nucleotide metabolism to generate deoxy/ribonucleotides and nucleosides of pyrimidine bases (including uracil, thymine, and cytosine) [32]. Notably, the deoxyribonucleotide pool necessary for the proliferation of cells can be generated based on purine metabolism [33]. The persistent dNTP supply plays a crucial role in maintaining cancer cell survival [7]. Therefore, the permanent activation of the PyM gene from the beginning is necessary for growing tumors. KRAS drives tumor growth in pancreatic cancer by activating PyM [34]. Additionally, tumor cells are able to utilize diverse mechanisms for activating PyM genes and desensitizing the feedback regulatory pathway, thus resulting in allosteric suppression and maintaining the persistent cell nitrogen flow into the pathway producing dNTP along with ribonucleotide phosphate [35].

GSEA was performed in the present work for identifying gene sets associated with PyM to differentiate those clinical as well as molecular parameters for BRCA. In addition, the BRCA mRNA sequencing data along with the microarray data were acquired on TCGA database. Moreover, we applied the prognostic signature related to PyM (RRM2B, NME3, and POLD2) to predict OS for BRCA using Cox regression analysis. The as-constructed gene signature was adopted for classifying cases as high- or low-risk subgroups, identifying patients with BRCA with poor prognosis. Then, the METABRIC database was used to validate the signature prognosis prediction ability. Surprisingly, our prognostic model performed well on the KM analysis in 1892 BRCA patients. Stratification analysis indicated that the 3-mRNA signature-based risk score might independently predict the prognosis of each subgroup of age, N, and HER-2 status. However, the risk score is more suitable for the subgroup of M0 rather than M1, that is, the prognostic signature we selected is more suitable for BRCA that had not undergone distant metastasis, which suggests that our 3-mRNA signature may have great significance for the early diagnosis of BRCA. These findings suggested that our constructed 3-mRNA signature might serve as a biomarker for BRCA cases. Yet the low sample size was its limitation.

5. Conclusions

The present work first proposes the use of 3-mRNA signatures associated with PyM using bioinformatics methods for the prognosis of BRCA. In our study, cases who had high risk scores were found to show dismal prognostic outcome. The constructed 3-mRNA signature can be used as a prognostic marker for BRCA irrespective of additional clinicopathological factors. We believe that bioinformatics methods can be well combined with the early detection of breast cancer and can provide general guidance for the future application of molecular medicine combined biomarkers in other diseases. In the future, we will follow up with biological experiments and verify these biomarkers with our collaborators.

Abbreviations

BRCA:Breast invasive carcinoma
PyM:Pyrimidine metabolism
TCGA:The Cancer Genome Atlas
GSEA:Gene set enrichment analysis
KM:Kaplan-Meier
METABRIC:Molecular Taxonomy of Breast Cancer International Consortium
OS:Overall survival
IHC:Immunohistochemistry.

Data Availability

The datasets generated and analyzed during the current study are available in TCGA (http://cancergenome.nih.gov/) and METABRIC (https://ega-archive.org/dacs/EGAC00001000484) database.

Consent is not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

XXZ, ZYP, QZ, and YL collected the data; XXX performed the statistical analysis; XXZ wrote the manuscript; LYJ designed the study. All authors read and approved the final manuscript.

Acknowledgments

We thank all the participants who were involved in this study. This work was supported by the National Key Research and Special Development Project of China (Grant/Award Number: 2016YFC1303002).