Abstract

Immune system dysregulation is associated with tumor incidence and growth. Here, we established an RNA-based individualized immune signature associated with prognosis for nonsmall cell lung cancer (NSCLC) to guide adjuvant therapy. We downloaded publicly accessible data on RNA expression and clinical characteristics of NSCLC from the Cancer Genome Atlas (TCGA). From immune-related genes (IRGs) retrieved from the immunology database and analysis portal (ImmPort) database, we then screened differentially expressed immune-related genes (DEIRGs). Using overall survival (OS) as a clinical endpoint, we identified 26 prognostic DEIRGs via univariate and multivariate Cox regression analysis, and then developed a risk model based on these 26 IRGs with an area under the curve (AUC) of 0.701, and its predictive ability independent from other clinical factors. We also downloaded tumor immune infiltrate data and analyzed the correlations between lymphocytic infiltration with our risk scores, but found no significant association. Furthermore, we retrieved 86 differentially expressed transcription factors (TFs) to assess their regulatory relationships with the 26 prognostic DEIRGs. In summary, we developed a robust risk model to predict survival in patients with NSCLC, based on the expression of 26 IRGs. It provides novel predictive and therapeutic molecular targets.

1. Introduction

Worldwide, lung cancer is the most commonly diagnosed cancer (~11.6%) and accounts for the most cancer-related deaths (~18.4%). Due to the lack of characteristic early symptoms, almost 70 percent of lung cancer patients have developed the locally advanced or metastatic disease at the time of diagnosis. About 85% of lung cancers are NSCLC and have a poor prognosis; 5-year OS remains low (15%) across all stages [1]. Mortality and morbidity for NSCLC are similar, which indicates that its treatment is unsatisfactory, and has room for improvement.

Cancer immunotherapy is a personalized modality that leverages the immune system to combat tumors [2, 3]. It has shown long-term survival benefits; the 5-year CheckMate-003 follow-up study improved the 5-year survival rate of patients with advanced NSCLC from 4.9% to 16% [4]. In recent decades, advances in immunotherapy for numerous cancer types have become clinically available [5, 6]. However, as not all patients can benefit from these therapies, accurate screening for suitable candidates is the focus of much current research. Some biomarkers have proved useful in predicting patient survival and disease prognosis [7]. Programmed cell death protein-1 expression is the cornerstone for immunotherapy prediction. Combined with other indicators, it can help identify immunotherapy candidates.

Signatures based on IRGs have been explored in numerous studies to help stratify lung cancer patients’ prognoses. Li et al. notably improved prognostic estimations among patients with nonsquamous NSCLC by establishing individualized immune signatures [8]; however, their findings were only applicable to patients with early-stage disease. Other studies have focused on lung adenocarcinoma [9, 10] or lung squamous cell carcinoma [11]. However, studies of its application to NSCLC have been sparse.

To understand further IRGs’ clinical roles in NSCLC, including their prognostic significance and possible applications as targets for therapy, we developed an individualized prognostic risk model for NSCLC that relies on IRG transcription expression levels.

2. Materials and Methods

2.1. Data Acquisition from Public Databases

We downloaded data on NSCLC samples for transcriptome RNA sequencing from the TCGA portal database, including data for both adjacent nontumor lung tissues () and tumor tissues () (all data has been normalized by fragments per kilobase per million), as well as clinical data including age, gender, and pathological TNM stage and OS from the patients who provided the samples. OS was used as the primary endpoint and defined as the time from diagnosis to death. The patients were censored if the date of death was unknown. We also downloaded the IRG list from ImmPort [12]. The ImmPort database provides timely and precise immunology data updates and offers an IRGs list that can be used for cancer research. The genes on the list were actively involved in immune activity.

2.2. Differential Gene Expression Analysis and Survival Analysis

We evaluated gene transcription data between adjacent nontumor and tumor samples to select differentially expressed genes involved in NSCLC onset, using the Wilcoxon test (R software limma package), setting a log2 |fold change|>1 and a false discovery rate (FDR) <0.01 as cutoff values. DEIRGs were extracted from these differentially expressed genes. We used analyses of functional enrichment that included the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO), via the Bioconductor package “clusterProfiler,” and visualized via “ggplot2.” The terms with FDR<0.05 were considered significantly enriched. OS was selected as the primary endpoint. Prognostic DEIRGs were then selected through univariate Cox analysis, using the R software survival package (). These prognostic DEIRGs were also subjected to functional enrichment analyses.

2.3. Constructing the Immune-Related Gene Prognostic Index (IRGPI-)-Based Risk Model

We performed multivariate Cox regression analysis on prognostic DEIRGs that were identified as significant in univariate Cox analysis. All independent prognostic indicators from the multivariate analysis were used for the IRGPI-based risk model. We calculated risk scores according to each IRG’s expression level multiplied by the coefficient from Cox regression. We classified patients whose scores were below and above the risk score median value as low and high risk, respectively; differences were evaluated via the log-rank test. We used Kaplan–Meier survival curves to evaluate the differences between the two groups. The AUC of the survival receiver operating characteristic (ROC) curve was constructed to evaluate the model’s performance using the survival ROC package in R software [13]. We evaluated OS correlation to clinical pathologic factors (gender, age, and pathological stage) and risk scores via univariate and multivariate Cox analyses.

2.4. Construction of a Transcription Factor Regulatory Network

We then explored the regulatory mechanisms of the IRGs included in the risk model. As TFs regulate gene expression, we wished to know the mechanisms through which the TFs operated. We downloaded information on 318 TFs from the Cistrome Cancer database [14]. TFs that showed differential expression were extracted to construct the molecular regulatory network with prognostic DEIRGs included in our risk model. Correlation coefficients >0.3 and were regarded as significantly associated.

2.5. Estimating the Correlation of IRGPI and Tumor-Infiltrating Immune Cells

As immune cells are recognized as the main tumor immune microenvironment (TIME) portion, we downloaded immune infiltrate data of patients with NSCLC from the Tumor Immune Estimation Resource (TIMER) online database, which estimates the abundance of six types of immune cells, including B cells, CD4+ T cells, macrophages, dendritic cells, CD8+ T cells, and neutrophils [15]. The correlation of tumor-infiltrating immune cells with IRGPI was assessed by the Pearson correlation coefficient test.

2.6. Association between Model and Clinical Variables

We evaluated how our risk model is associated with gender, age, and pathological stage. As age is a continuous variable, we used the median as a cutoff. Pathological stages were described as dichotomous categorical variables (stages I–II vs III–IV; T1–2 vs T3–4; N0 vs N1–3; and M0 vs M1). Statistical comparisons of gene expression for two groups were evaluated by Student’s t-test. R (version 3.6.1; https://www.r-project.org/) was used for all statistical analyses. was considered significant unless specified otherwise.

3. Results

3.1. Construction of Differentially Expressed and Survival-Associated IRGs

Of the 7,336 genes that were differentially expressed, 5,439 were upregulated, and 1,897 were downregulated on the tumor samples. Of the 529 extracted DEIRGs, 333 genes were upregulated, and 196 were downregulated (Figure 1, Supplementary Table 1).

GO functional enrichment analysis showed that the DEIRGs were significantly enriched for “humoral immune response” among biological processes, for “immunoglobulin complex” among cellular components, and for “antigen-binding” among molecular functions (Figure 2(a)); these GO terms are preferentially involved in immune functions. In the KEGG pathways, the above genes were significantly enriched in cytokine–cytokine receptor interactions (Figure 2(b)).

To develop prognostic signatures with potential clinical utility, we screened the 529 DEIRGs for correlations with clinical outcomes and found that 41 DEIRGs were significantly associated with OS. Enrichment analysis showed that these 41 prognostic DEIRGs were related to two major GO enrichment terms: extracellular region (GO:0005576) and growth factor activity (GO:0008083).

3.2. Construction of IRGPI-Based Risk Model

Of the 41 prognostic DEIRGs identified in univariate analysis for potential inclusion in the model, 26 remained as independent prognostic predictors after multivariate analysis (Figure 3). We calculated a risk score for each patient using their respective IRG expression levels × each IRG’s Cox regression-determined coefficient, as shown below:

We stratified patients into low and high immune risk groups, using the median value of the gene set risk scores. Survival analysis depicted a great difference between the two groups (Figure 4(a)). ROC curve analysis showed a moderate potential for survival prediction (AUC: 0.701; Figure 4(b)). In multivariate analysis, together with additional clinical factors (gender, age, and pathological stage), the risk score remained as an independent prognostic signature (hazard ratio: 1.132; 95% confidence interval: 1.101–1.164; ; Figure 5).

3.3. TF Regulatory Network

The two major protein networks were the protein-protein interaction network and the transcriptional regulation network. We screened 318 TFs derived from the Cistrome Cancer database and found differential expression between nontumor lung tissue and NSCLC samples in 86 TFs. We then constructed a gene regulatory network from the 86 TFs and 26 prognostic DEIRGs, based on their gene expression values (correlation coefficient threshold: ≥0.3; ; Figure 6).

3.4. Interactions between the IRGPI with Tumor Immune Cell Infiltration and Clinicopathologic Parameters

To see if IRGPI reflected TIME status, we analyzed associations between risk scores and tumor-infiltrated immune cells (B cells, CD4+ T cells, CD8+ T cells, macrophages, neutrophils, and dendritic cells). Correlations were not apparent (Figure 7). We also investigated the expression differences of IRGPI between categorical Clinicopathologic parameters; the expression of S100A16, RNASE7, LTB4R2, and INHA were different at different pathological stages; PLAU, CRABP1, IGKV1-6, and INHA were differentially expressed at different T stages; RBP2, RNASE7, LTB4R2, and GCGR expression associated with distant metastasis; and IL33, PDGFB, PNOC, and SHC3 correlated with lymph node metastasis. The major result is the higher pathological stage was associated with a higher risk score (; Table 1).

4. Discussion

The immune system influences cancer initiation and progression and dysregulated immune contexture, and Immunoscore can affect oncologic outcomes [16, 17]. As the conventional TNM staging system provides limited prognostic information, combining immunological classifications with the American Joint Committee on Cancer/Union for International Cancer Control TNM staging system could greatly improve prognostic stratification. However, Immunoscore evaluates immune cell infiltration rather than the overall tumor immune status. The development of high-throughput RNA sequencing technology allows us to use immune-related gene expression to assess the status of immune cells and tumor cells, leading to more precise prognoses. Correlations between IRGs and lung adenocarcinoma prognosis have been explored by other studies [8, 9], but not for NSCLC. Therefore, developing a reliable IRG-based prognostic model of NSCLC and exploring the IRGs’ respective clinical significances and molecular roles are critical.

In this study, we established a prognosis prediction model based on 26 IRGs. It showed moderate predictive ability (AUC: 0.701) and maintained its predictive ability independently from other clinical characteristics (gender, age, and pathological stage) and was strongly correlated with the clinicopathologic stage. Notably, our model indicates that fibroblast growth factor receptor 4 (FGFR4), a receptor tyrosine kinase, has potential as a therapeutic target in NSCLC, which has been also found in another study [18]. FGFR has been shown to play central functions in inflammation, embryogenesis, malignant tumor cell proliferation, and angiogenesis [19]. Studies have reported FGFR alterations in several solid malignancies, especially urothelial carcinoma, and it has recently become an object of targeted therapy. The FGFR inhibitors, erdafitinib, and rogaratinib have been approved for clinical practice [2022]. Hepatocyte nuclear factor 4-gamma (HNF4G) belongs to the orphan nuclear receptor superfamily and has been shown to influence growth and invasiveness in bladder cancer [23]; its place in our high-risk score group suggests its function on NSCLC which deserves further exploration. Teng, Y. C. et al. found that retinoblastoma-binding protein-2(RBP2), a histone demethylase, promoted lung tumorigenesis and progression, and expression of integrin-β1, which is associated with lung cancer metastasis [24]; our current study supports this result. Increased expression of delta-like ligand 4 (DLL4), another IRG in our study, has been observed in many tumor types and may be related to worse outcomes [2530]. DLL4-mediated Notch signaling signifies another key pathway for vascular development. Demcizumab, a humanized monoclonal antibody that inhibits DLL4 and interrupts Notch-mediated signaling, a phase IB trial has explored its feasibility combined with standard chemotherapy in metastatic nonsquamous NSCLC [31]. Subsequent randomized phase II clinical trials have shown some effect (NCT02259582). Dll4 blockade is a promising anti-angiogenesis therapy, particularly against resistant tumors [32].

Gene functional enrichment analyses suggest that the pathways implicated in DEIRGs are primarily associated with cytokine–cytokine receptor signaling pathways, which are crucial in angiogenesis, inflammatory processes, and chemotaxis [33]. A boosted inflammatory microenvironment is also a consistent feature in tumor progression and neoplastic processes [33, 34].

We constructed a TF-mediated network to discover molecular mechanisms of prognostic DEIRGs. TFs affect the prognosis mainly by regulating the expression of DLL4, THBS1, JAG1, SHC3, and LTB4R2. TCF-21 positively regulates FGFR4 according to our network, but conclusive evidence is not yet available. ERG interacts with Notch intracellular domain (NICD) and β-catenin and is required for Ang1-dependent β-catenin recruitment at the DLL4 locus [35]. The network also showed a strong correlation between LHX2 and NRTN, which implies an insight into changes in the NSCLC immune system at the molecular level.

High levels of tumor-infiltrating lymphocytes (TILs) are associated with better outcomes for patients with completely resected NSCLC [36]. Schalper et al. observed the prognostic value of CD8+ TILs in NSCLC [37]; other studies confirmed this result [38, 39]. However, Wakabayashi et al. suggest that CD4+ T cells in cancer stroma, rather than CD8+ T cells in cancer cell nests, are related to favorable prognosis in NSCLC patients [40]. These studies disagree on how tumor growth and prognosis are influenced by TILs. Few studies have explored the relationships between IRGPI and TILs in lung cancer, and results have not been consistent; although high neutrophil infiltration may predict worse outcomes [8, 11], and this correlation is not very strong [11]. We analyzed the relationships in our study to examine the immune microenvironment but found no significant correlation between them. Further studies are needed.

Taken together, our results show that IRGPI can both estimate the survival of NSCLC patients and indicate potential therapeutic targets for further study. Dysregulated IRGs may indicate variations in immunotherapy sensitivity, permitting individualized treatment strategies.

Inevitably, our research had several limitations. First, we used retrospective data to develop a signature from public databases. Second, clinical validation is needed to verify the signature’s efficacy. Third, as we did not enroll patients who were treated with immune checkpoint inhibitors, we were unable to confirm any association between immunotherapy response and the signature.

Additional prospective studies are needed to validate our model’s prognostic accuracy for survival and immunotherapy response in patients with NSCLC. Furthermore, our model’s IRGs suggest novel molecular targets and prognostic biomarkers, which also warrant investigation and clinical translation.

5. Conclusion

In summary, we developed a robust model to predict the survival of patients with NSCLC, based on the expression of 26 IRGs, which can potentially augment TNM staging. Although our model needs further validation, it may provide novel predictive and therapeutic molecular targets in patients with NSCLC.

Abbreviations

NSCLC:Nonsmall cell lung cancer
TCGA:The cancer genome atlas
IRGs:Immune-related genes
ImmPort:Immunology database and analysis portal
DEIRGs:Differentially expressed immune-related genes
OS:Overall survival
AUC:Area under the curve
TIMER:Tumor immune estimation resource
FDR:False discovery rate
KEGG:Kyoto Encyclopedia of Genes and Genomes
GO:Gene Ontology
IRGPI:Immune-related gene prognostic index
ROC:Receiver operating characteristic
TFs:Transcription factors
TIME:Tumor immune microenvironment
T:Tumor
N:Node
M:Metastasis
BP:Biological process
CC:Cellular components
MF:Molecular functions
HR:Hazard ratio
CI:Confidence interval
FGFR4:Fibroblast growth factor receptor 4
HNF4G:Hepatocyte nuclear factor 4 gamma
RBP2:Retinoblastoma-binding protein-2
Dll4:Delta-like ligand 4
TILs:Tumor-infiltrating lymphocytes
FC:Fold change.

Data Availability

We extracted RNA sequencing data and clinical information of NSCLC patients from the TCGA data portal (https://cancergenome.nih.gov/). In the ImmPort database (https://immport.niaid.nih.gov/), the IRGs list was retrieved, and TFs were retrieved from the Cistrome database (http://cistrome.org/). From TIMER online database (https://cistrome.shinyapps.io/timer/), we retrieved the information about the immune infiltrate of NCSLC patients. All data downloaded at 13 January 2020.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Juan Ye designed the study; Ying Cao participated in data extraction and analysis and drafting of the manuscript; Hongyu Zhu prepared the figures and tables; Juan Ye, Hailin Shen, Desen Liu, Zhenkai Li, Hailong Shang, Hongdi Du, and Ying Wang reviewed the manuscript for important intellectual content critically. Proofreading and approval of the final manuscript draft were done by all authors.

Acknowledgments

We thank Liwen Bianji, Edanz Group China (http://www.liwenbianji.cn/ac), for editing the English text of a draft of this manuscript.

Supplementary Materials

Supplementary Table 1: The DEIRGs between adjacent nontumor and tumor tissues. (Supplementary Materials)