Abstract
In medical visualization, nursing notes contain rich information about a patient’s pathological condition. However, they are not widely used in the prediction of clinical outcomes. With advances in the processing of natural language, information begins to be extracted from large-scale unstructured data like nursing notes. This study extracted sentiment information in nursing notes and explored its association with in-hospital 28-day mortality in sepsis patients. The data of patients and nursing notes were extracted from the MIMIC-III database. A COX proportional hazard model was used to analyze the relationship between sentiment scores in nursing notes and in-hospital 28-day mortality. Based on the COX model, the individual prognostic index (PI) was calculated, and then, survival was analyzed. Among eligible 1851 sepsis patients, 580 cases suffered from in-hospital 28-day mortality (dead group), while 1271 survived (survived group). Significant differences were shown between two groups in sentiment polarity, Simplified Acute Physiology Score II (SAPS-II) score, age, and intensive care unit (ICU) type (all ). Multivariate COX analysis exhibited that sentiment polarity (HR: 0.499, 95% CI: 0.409-0.610, ) and sentiment subjectivity (HR: 0.710, 95% CI: 0.559-0.902, ) were inversely associated with in-hospital 28-day mortality, while the SAPS-II score (HR: 1.034, 95% CI: 1.029-1.040, ) was positively correlated with in-hospital 28-day mortality. The median death time of patients with was significantly earlier than that of patients with (13.5 vs. 49.8 days, ). In conclusion, sentiments in nursing notes are associated with the in-hospital 28-day mortality and survival of sepsis patients.
1. Introduction
Sepsis, a syndrome of life-threatening physiologic, pathologic, and biochemical dysfunction due to uncontrolled responses to infection, is one of the leading causes of deaths in intensive care units (ICUs) [1]. Despite advances in care, sepsis remains among the costliest diseases, approximately accounting for over 20 billion (5.2%) of total United States (US) hospital costs [2]. In the US, admission for sepsis has overtaken that for stroke and myocardial infarction [3]. According to statistics, the prevalence of sepsis is up to 535 cases per 100 100,000 person-years and on the rise [4]. Population-level epidemiological data show that there are 31.5 million cases of sepsis and 19.4 million cases of severe sepsis worldwide, with 5.3 million potential deaths each year [5], and the in-hospital mortality reaches up to 25%-30% [6].
Currently, severity of illness scores (SOI) is usually used to predict mortality in ICUs. The SOI system is established according to the coded data of patients’ demographics, vital signs, and laboratory results usually accessed from the electronic health records, but there also exist unstructured data in the electronic health records, such as clinical notes written by clinicians which are not frequently used for predicting mortality [7]. Studies have demonstrated that clinicians can properly predict mortality in ICUs [8, 9]. Thus, their notes may provide some important information for patients’ health status assessment. A previous study showed that the sentiment of clinicians towards patients could be evaluated by sentiment analysis, a method to classify the subjective properties of written text [10]. Sentiments measured in clinical notes are different according to demographic features and clinical outcomes [10]. There are studies suggesting that sentiments measured in clinical notes are associated with hospital readmission and mortality [11, 12].
In this study, we investigated the association of sentiments in nursing notes with the in-hospital 28-day mortality of sepsis patients based on the Medical Information Mart for Intensive Care (MIMIC-III) database, a freely accessible critical care database, aimed at providing some evidence for the improvement of patients’ outcomes in ICUs.
2. Methods
2.1. Study Population
The data of patients and nursing notes were accessed from the MIMIC-III database developed by the MIT Lab for computational physiology. As an openly available dataset, MIMIC-III contains deidentified health data related to approximately 60,000 ICU admissions, including demographics, laboratory tests, medications, vital signs, transcribed nursing notes, diagnostic and procedure codes, fluid balance, length of stay, survival data, and others [13]. The inclusion criteria of this study were as follows: (1) patients diagnosed with sepsis, severe sepsis, and septic shock (International Classification of Diseases 9 (ICD-9) codes: 99591, 99592, and 78552) in the MIMIC-III database and (2) 15 years old or above at hospital admission. The exclusion criteria were as follows: (1) notes identified by physicians as errors, (2) notes written less than 12 hours before the time of death, and (3) patients without any data of nursing notes.
The data used in this study were obtained from the MIMIC-III database (https://mimic.physionet.org/), an openly available dataset. The data collection in the MIMIC-III was approved by the Ethics Review Board of the Beth Israel Deaconess Medical Center, and all private information has been desensitized.
2.2. Sentiment Analysis
Two techniques (syntactic and sematic) are mainly used to classify and compute the sentiment polarity in text [14]. A semantic approach means that the sentiment is extracted based on text meaning and is commonly obtained using a classifier [14]. To make inferences based on text structural features, this study employed a syntactic technique to extract sentiments.
Both the Python programming language and TextBlob natural language processing library were adopted to compute sentiment scores for the nursing notes [15]. The sentiment of text strings was computed using the pattern module in TextBlob. The pattern comprised a lexicon for various English language adverbs and adjectives able to be mapped to three dimensions of sentiment scores: polarity, subjectivity, and intensity [16]. The sentiment polarity was returned using TextBlob with a score from -1 to 1, and the sentiment subjectivity was returned with a score from 0 to 1. Higher scores showed more positive, subjective sentiments. In this study, both the polarity score and subjectivity score were assigned for each nursing note, and the scores were computed through establishment of a TextBlob object initialized with nursing note strings and extraction of sentiment attributes from the object [7]. The mean scores of sentiment polarity and subjectivity in nursing notes written during hospitalization were calculated for the first hospital admission of each patient and then used as predictors in the model of this study. For an example of sentiment polarity scores using TextBlob, see Table 1.
2.3. Mortality and Survival Assessment
As a common predictor of ICU mortality, Simplified Acute Physiology Score II (SAPS-II) is a composite score, including 17 variables (age, 12 physiology variables, type of admission, and 3 underlying disease variables) [17]. In this study, the SAPS-II score was calculated by the data from the MIMIC-III database and SQL scripts in the MIT Lab for computational physiology git repository. Additionally, gender and ICU type were also enrolled as variables because they were freely accessed from the MIMIC-III database, but not involved in SAPS-II. Survival was defined as the number of days from hospital admission to death or right-censoring time.
2.4. Statistical Analysis
Statistical analysis was performed using SPSS 22.0 software (IBM Corp., Armonk, NY, USA) and Python text analysis (version 3.7). Normally distributed data were compared by the -test and manifested as (); abnormally distributed data were compared with the Mann-Whitney rank-sum test and presented as median and quartile ( (Q1, Q3)). Enumeration data were compared by the test, with (%) as the manifestation. The COX proportional hazard model was used to analyze the relationship between sentiment scores in nursing notes and the in-hospital 28-day mortality of sepsis patients. The size power of our study was 0.858.
The common type of the COX model was , in which and represented the datum risk function and the risk function at time point, respectively, was the covariate vector quantity, and was the unknown vector quantity of the regression coefficient. The formula of the individual prognostic index (PI) was . Based on the COX model, the individual PI was calculated. The greater the individual PI, the worse the prognosis. The survival curves were compared using a log-rank test. Box plot, histogram, and forest plot in our study were plotted with Python software. The power analysis was carried out to assess the statistical power () using PASS 15.0 software (NCSS, LLC). The results showed that the power values of the sentiment polarity score and sentiment subjectivity score were all 1.000. It was indicated that our findings performed well reliability. A significant difference was shown at .
3. Results
3.1. Baseline Characteristics of the Study Population
In the MIMIC-III database, there were a total of 3567 patients admitted to the ICU. Of these patients, 1128 patients without sepsis, 356 cases lacking sentiment polarity and subjectivity scores, 172 with missing SAPS-II, and 60 with missing survival data were excluded. Totally, 1851 sepsis patients were eligible for the study, among whom 580 patients suffered from in-hospital 28-day mortality from the date of ICU admission (dead group), while 1271 patients survived (survived group). The baseline characteristics of the two groups were compared as shown in Table 2, and the flowchart is presented in Figure 1.

The sentiment polarity score of patients in the survived group was significantly higher than that in the dead group (), while the SAPS-II score was notably lower than that in the dead group () (Table 2, Figure 2). The differences were significant between the two groups in age () and ICU type (), but not in the sentiment subjectivity score () and gender () (Table 2, Figure 3).


3.2. COX Regression Analysis of the Association between Sentiments and 28-Day Mortality
As shown in Table 3, univariate analysis showed an inverse association between sentiment polarity and 28-day mortality (hazard ratio (HR): 0.458, 95% confidence interval (95% CI): 0.401-0.524, ) and no association between sentiment subjectivity and 28-day mortality (HR: 0.863, 95% CI: 0.657-1.133, ). The risk of 28-day mortality in sepsis patients would increase 0.04 times when 1 point in the SAPS-II score was increased each time (HR: 1.040, 95% CI: 1.036-1.045, ). There was no association between gender and 28-day mortality (HR: 1.104, 95% CI: 0.936-1.301, ).
In multivariate analysis, it was observed that both sentiment polarity (HR: 0.499, 95% CI: 0.409-0.610, ) and sentiment subjectivity (HR: 0.710, 95% CI: 0.559-0.902, ) were inversely associated with in-hospital 28-day mortality, while the SAPS-II score (HR: 1.034, 95% CI: 1.029-1.040, ) was positively correlated with in-hospital 28-day mortality. The patients aged ≥80 years had an increased risk of in-hospital 28-day mortality compared with those aged <40 years (HR: 1.612, 95% CI: 1.032-2.520, ). There were no differences in in-hospital 28-day mortality between the age of 40-59 (HR: 1.217, 95% CI: 0.781-1.886, ), 60-69 (HR: 1.479, 95% CI: 0.943-2.321, ), 70-74 (HR: 1.048, 95% CI: 0.637-1.723, ), 75-79 (HR: 1.030, 95% CI: 0.629-1.687, ), and <40 years. In addition, no significant difference was found between gender and 28-day mortality (HR: 1.104, 95% CI: 0.934-1.306, ). Patients that stayed in the trauma/surgical intensive care unit (TSICU) were least likely to die within 28 days after admission (HR: 0.280, 95% CI: 0.190-0.414, ) (Table 3, Figure 4).

3.3. Survival Analysis
According to the individual PI, patients were assigned into the high-risk group () and the low- and middle-risk group (), and the survival curves are illustrated in Figure 5. It could be observed that the median death time of the high-risk group was significantly earlier than that of the low- and middle-risk group (13.5 vs. 49.8 days, ).

4. Discussion
In the present study, a total of 1851 sepsis patients were eligible according to inclusion and exclusion criteria, among whom 580 cases suffered from in-hospital 28-day mortality, while 1271 cases survived. Multivariate COX analysis showed that sentiment polarity and sentiment subjectivity were inversely associated with in-hospital 28-day mortality. Based on the quartiles of the individual PI, patients were assigned into the high-risk group and the low- and middle-risk group. Survival analysis indicated that the high-risk group had earlier median death time compared with the low- and middle-risk group. These all suggested that the quantitative measurement of sentiments in nursing notes was associated with the in-hospital 28-day mortality and survival of sepsis patients; nursing notes containing rich information may serve as a potential predictor of clinical outcomes in the ICU.
To the best of our knowledge, brief fragments of the text are conducive to reflecting the author’s feelings about a given topic. Recently, language processing tools have been developed and allow the characterization of feelings, such as the sentiment in text documents [18]. Sentiment is usually described as the relative positivity or polarity of a text string and is measured by a number from -1 (very negative) to 1 (very positive) [14]. It can also be interpreted as the estimated probability of “positive” or “negative” through a classifier. Sentiment analysis permits us to gain insights into the clinicians’ emotions and attitudes towards patients through the subjective expressions made by clinicians in the text of clinical notes, thus contributing to the prediction of patients’ outcomes [19–22]. In health-related fields, sentiment analysis has been widely applied to Cancer Survivors Network (CSN) breast and colorectal cancer discussion posts [23], health reforms on Twitter [24], encounter notes of patients with critical illness [25], etc. This study was aimed at identifying the association between sentiments in nursing notes and the in-hospital 28-day mortality of sepsis patients. The results exhibited that both sentiment polarity and sentiment subjectivity were inversely associated with in-hospital 28-day mortality, supported by the results of McCoy et al. that the sentiment measured in hospital discharge notes was related to hospital readmissions and mortality risk [11]. Based on the COX model, the patients with were found to have a higher risk of death than those with , highlighting the potential value of sentiments in survival analysis. A previous study has shown a strong association between sentiments and the risk of death even after adjustment for severity of illness and baseline information [25].
The superiority of the present study was that it was the first study to investigate the association between sentiments in nursing notes and the in-hospital 28-day mortality of sepsis patients. The nursing notes written less than 12 hours before the time of death were excluded, which made the results more reliable. However, the present study also had several limitations that should be cautiously interpreted. First, nursing notes from the MIMIC-III database with single-center samples may manifest different characteristics because of variations in clinicians, experience, training, or working environment, easily causing the results to be nongeneralizable. Second, the approach used to measure the sentiment in the present study was not the only approach available. Other techniques could produce different results, such as those based on the machine learning model to make semantic inferences. Third, the mean sentiment scores could only characterize the variations at the level of patients, but not at the levels of sentences, paragraphs, or documents. Forth, the nursing notes were recorded by caregivers who are research nurses, medical doctors, or so on (available at https://mimic.mit.edu/docs/iii/tab les/caregivers/). It cannot be determined whether the sentiments based on nursing notes are based on past or personal experiences. Moreover, the subtle difference in sentiments was not obtained over time. In the future, the temporal mode of nursing notes will be examined to gain more insights.
5. Conclusions
Sentiments in nursing notes are associated with the in-hospital 28-day mortality and survival of sepsis patients, suggesting the importance of sentiments in nursing notes for the prediction of clinical outcomes in the ICU. Although predicting clinical outcomes is still a complex problem, the information extracted from unstructured data like nursing notes may contribute to further improving prediction performance.
Abbreviations
PI: | Prognostic index |
SAPS-II: | Simplified Acute Physiology Score II |
ICU: | Intensive care unit |
SOI: | Severity of illness scores |
MIMIC-III: | Medical Information Mart for Intensive Care II |
ICD-9: | International Classification of Diseases 9 |
HR: | Hazard ratio |
TSICU: | Trauma/surgical intensive care unit |
CSN: | Cancer Survivors Network. |
Data Availability
The data utilized to support the findings are available from the corresponding authors upon request. The data applied in the present study were from the MIMIC-III database (https://mimic.physionet.org/), a freely accessible database.
Ethical Approval
The data collected in the MIMIC-III was approved by the Ethics Review Board of the Beth Israel Deaconess Medical Center, and all private information has been desensitized.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Authors’ Contributions
QYG contributed to the study design and manuscript writing. DW, PPS, and WFW contributed to the data collection and analysis. XRL contributed to the study design and study supervision. QYG, DW, PPS, WFW, and XRL contributed to the critical revisions of important content. All authors revised and approved the final manuscript.
Acknowledgments
This research was funded by the Shandong Social Science Planning Research Project-2019 (19CCXJ05).