Sentiment Analysis Based on the Nursing Notes on In-Hospital 28-Day Mortality of Sepsis Patients Utilizing the MIMIC-III Database

Gao, Qiaoyan; Wang, Dandan; Sun, Pingping; Luan, Xiaorong; Wang, Wenfeng

doi:https://doi.org/10.1155/2021/3440778

Computational and Mathematical Methods in Medicine

On this page

Abstract Introduction Methods Results Discussion Conclusions Abbreviations Data Availability Ethical Approval Conflicts of Interest Authors’ Contributions Acknowledgments References Copyright Related Articles

Special Issue

Machine Learning and Artificial Intelligence Methods in Computer Vision and Visualization for Healthcare

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 3440778 | https://doi.org/10.1155/2021/3440778

Sentiment Analysis Based on the Nursing Notes on In-Hospital 28-Day Mortality of Sepsis Patients Utilizing the MIMIC-III Database

Qiaoyan Gao,¹Dandan Wang,¹Pingping Sun,¹Xiaorong Luan,²and Wenfeng Wang^3,4,5

Academic Editor: Pan Zheng

Received08 Jun 2021

Accepted24 Sept 2021

Published13 Oct 2021

Abstract

In medical visualization, nursing notes contain rich information about a patient’s pathological condition. However, they are not widely used in the prediction of clinical outcomes. With advances in the processing of natural language, information begins to be extracted from large-scale unstructured data like nursing notes. This study extracted sentiment information in nursing notes and explored its association with in-hospital 28-day mortality in sepsis patients. The data of patients and nursing notes were extracted from the MIMIC-III database. A COX proportional hazard model was used to analyze the relationship between sentiment scores in nursing notes and in-hospital 28-day mortality. Based on the COX model, the individual prognostic index (PI) was calculated, and then, survival was analyzed. Among eligible 1851 sepsis patients, 580 cases suffered from in-hospital 28-day mortality (dead group), while 1271 survived (survived group). Significant differences were shown between two groups in sentiment polarity, Simplified Acute Physiology Score II (SAPS-II) score, age, and intensive care unit (ICU) type (all ). Multivariate COX analysis exhibited that sentiment polarity (HR: 0.499, 95% CI: 0.409-0.610, ) and sentiment subjectivity (HR: 0.710, 95% CI: 0.559-0.902, ) were inversely associated with in-hospital 28-day mortality, while the SAPS-II score (HR: 1.034, 95% CI: 1.029-1.040, ) was positively correlated with in-hospital 28-day mortality. The median death time of patients with was significantly earlier than that of patients with (13.5 vs. 49.8 days, ). In conclusion, sentiments in nursing notes are associated with the in-hospital 28-day mortality and survival of sepsis patients.

1. Introduction

Sepsis, a syndrome of life-threatening physiologic, pathologic, and biochemical dysfunction due to uncontrolled responses to infection, is one of the leading causes of deaths in intensive care units (ICUs) [1]. Despite advances in care, sepsis remains among the costliest diseases, approximately accounting for over 20 billion (5.2%) of total United States (US) hospital costs [2]. In the US, admission for sepsis has overtaken that for stroke and myocardial infarction [3]. According to statistics, the prevalence of sepsis is up to 535 cases per 100 100,000 person-years and on the rise [4]. Population-level epidemiological data show that there are 31.5 million cases of sepsis and 19.4 million cases of severe sepsis worldwide, with 5.3 million potential deaths each year [5], and the in-hospital mortality reaches up to 25%-30% [6].

Currently, severity of illness scores (SOI) is usually used to predict mortality in ICUs. The SOI system is established according to the coded data of patients’ demographics, vital signs, and laboratory results usually accessed from the electronic health records, but there also exist unstructured data in the electronic health records, such as clinical notes written by clinicians which are not frequently used for predicting mortality [7]. Studies have demonstrated that clinicians can properly predict mortality in ICUs [8, 9]. Thus, their notes may provide some important information for patients’ health status assessment. A previous study showed that the sentiment of clinicians towards patients could be evaluated by sentiment analysis, a method to classify the subjective properties of written text [10]. Sentiments measured in clinical notes are different according to demographic features and clinical outcomes [10]. There are studies suggesting that sentiments measured in clinical notes are associated with hospital readmission and mortality [11, 12].

In this study, we investigated the association of sentiments in nursing notes with the in-hospital 28-day mortality of sepsis patients based on the Medical Information Mart for Intensive Care (MIMIC-III) database, a freely accessible critical care database, aimed at providing some evidence for the improvement of patients’ outcomes in ICUs.

2. Methods

2.1. Study Population

The data of patients and nursing notes were accessed from the MIMIC-III database developed by the MIT Lab for computational physiology. As an openly available dataset, MIMIC-III contains deidentified health data related to approximately 60,000 ICU admissions, including demographics, laboratory tests, medications, vital signs, transcribed nursing notes, diagnostic and procedure codes, fluid balance, length of stay, survival data, and others [13]. The inclusion criteria of this study were as follows: (1) patients diagnosed with sepsis, severe sepsis, and septic shock (International Classification of Diseases 9 (ICD-9) codes: 99591, 99592, and 78552) in the MIMIC-III database and (2) 15 years old or above at hospital admission. The exclusion criteria were as follows: (1) notes identified by physicians as errors, (2) notes written less than 12 hours before the time of death, and (3) patients without any data of nursing notes.

The data used in this study were obtained from the MIMIC-III database (https://mimic.physionet.org/), an openly available dataset. The data collection in the MIMIC-III was approved by the Ethics Review Board of the Beth Israel Deaconess Medical Center, and all private information has been desensitized.

2.2. Sentiment Analysis

Two techniques (syntactic and sematic) are mainly used to classify and compute the sentiment polarity in text [14]. A semantic approach means that the sentiment is extracted based on text meaning and is commonly obtained using a classifier [14]. To make inferences based on text structural features, this study employed a syntactic technique to extract sentiments.

Both the Python programming language and TextBlob natural language processing library were adopted to compute sentiment scores for the nursing notes [15]. The sentiment of text strings was computed using the pattern module in TextBlob. The pattern comprised a lexicon for various English language adverbs and adjectives able to be mapped to three dimensions of sentiment scores: polarity, subjectivity, and intensity [16]. The sentiment polarity was returned using TextBlob with a score from -1 to 1, and the sentiment subjectivity was returned with a score from 0 to 1. Higher scores showed more positive, subjective sentiments. In this study, both the polarity score and subjectivity score were assigned for each nursing note, and the scores were computed through establishment of a TextBlob object initialized with nursing note strings and extraction of sentiment attributes from the object [7]. The mean scores of sentiment polarity and subjectivity in nursing notes written during hospitalization were calculated for the first hospital admission of each patient and then used as predictors in the model of this study. For an example of sentiment polarity scores using TextBlob, see Table 1.

2.3. Mortality and Survival Assessment

As a common predictor of ICU mortality, Simplified Acute Physiology Score II (SAPS-II) is a composite score, including 17 variables (age, 12 physiology variables, type of admission, and 3 underlying disease variables) [17]. In this study, the SAPS-II score was calculated by the data from the MIMIC-III database and SQL scripts in the MIT Lab for computational physiology git repository. Additionally, gender and ICU type were also enrolled as variables because they were freely accessed from the MIMIC-III database, but not involved in SAPS-II. Survival was defined as the number of days from hospital admission to death or right-censoring time.

2.4. Statistical Analysis

Statistical analysis was performed using SPSS 22.0 software (IBM Corp., Armonk, NY, USA) and Python text analysis (version 3.7). Normally distributed data were compared by the -test and manifested as (); abnormally distributed data were compared with the Mann-Whitney rank-sum test and presented as median and quartile ( (Q1, Q3)). Enumeration data were compared by the test, with (%) as the manifestation. The COX proportional hazard model was used to analyze the relationship between sentiment scores in nursing notes and the in-hospital 28-day mortality of sepsis patients. The size power of our study was 0.858.

The common type of the COX model was , in which and represented the datum risk function and the risk function at time point, respectively, was the covariate vector quantity, and was the unknown vector quantity of the regression coefficient. The formula of the individual prognostic index (PI) was . Based on the COX model, the individual PI was calculated. The greater the individual PI, the worse the prognosis. The survival curves were compared using a log-rank test. Box plot, histogram, and forest plot in our study were plotted with Python software. The power analysis was carried out to assess the statistical power () using PASS 15.0 software (NCSS, LLC). The results showed that the power values of the sentiment polarity score and sentiment subjectivity score were all 1.000. It was indicated that our findings performed well reliability. A significant difference was shown at .

3. Results

3.1. Baseline Characteristics of the Study Population

In the MIMIC-III database, there were a total of 3567 patients admitted to the ICU. Of these patients, 1128 patients without sepsis, 356 cases lacking sentiment polarity and subjectivity scores, 172 with missing SAPS-II, and 60 with missing survival data were excluded. Totally, 1851 sepsis patients were eligible for the study, among whom 580 patients suffered from in-hospital 28-day mortality from the date of ICU admission (dead group), while 1271 patients survived (survived group). The baseline characteristics of the two groups were compared as shown in Table 2, and the flowchart is presented in Figure 1.

The sentiment polarity score of patients in the survived group was significantly higher than that in the dead group (), while the SAPS-II score was notably lower than that in the dead group () (Table 2, Figure 2). The differences were significant between the two groups in age () and ICU type (), but not in the sentiment subjectivity score () and gender () (Table 2, Figure 3).

3.2. COX Regression Analysis of the Association between Sentiments and 28-Day Mortality

As shown in Table 3, univariate analysis showed an inverse association between sentiment polarity and 28-day mortality (hazard ratio (HR): 0.458, 95% confidence interval (95% CI): 0.401-0.524, ) and no association between sentiment subjectivity and 28-day mortality (HR: 0.863, 95% CI: 0.657-1.133, ). The risk of 28-day mortality in sepsis patients would increase 0.04 times when 1 point in the SAPS-II score was increased each time (HR: 1.040, 95% CI: 1.036-1.045, ). There was no association between gender and 28-day mortality (HR: 1.104, 95% CI: 0.936-1.301, ).

In multivariate analysis, it was observed that both sentiment polarity (HR: 0.499, 95% CI: 0.409-0.610, ) and sentiment subjectivity (HR: 0.710, 95% CI: 0.559-0.902, ) were inversely associated with in-hospital 28-day mortality, while the SAPS-II score (HR: 1.034, 95% CI: 1.029-1.040, ) was positively correlated with in-hospital 28-day mortality. The patients aged ≥80 years had an increased risk of in-hospital 28-day mortality compared with those aged <40 years (HR: 1.612, 95% CI: 1.032-2.520, ). There were no differences in in-hospital 28-day mortality between the age of 40-59 (HR: 1.217, 95% CI: 0.781-1.886, ), 60-69 (HR: 1.479, 95% CI: 0.943-2.321, ), 70-74 (HR: 1.048, 95% CI: 0.637-1.723, ), 75-79 (HR: 1.030, 95% CI: 0.629-1.687, ), and <40 years. In addition, no significant difference was found between gender and 28-day mortality (HR: 1.104, 95% CI: 0.934-1.306, ). Patients that stayed in the trauma/surgical intensive care unit (TSICU) were least likely to die within 28 days after admission (HR: 0.280, 95% CI: 0.190-0.414, ) (Table 3, Figure 4).

3.3. Survival Analysis

According to the individual PI, patients were assigned into the high-risk group () and the low- and middle-risk group (), and the survival curves are illustrated in Figure 5. It could be observed that the median death time of the high-risk group was significantly earlier than that of the low- and middle-risk group (13.5 vs. 49.8 days, ).

4. Discussion

In the present study, a total of 1851 sepsis patients were eligible according to inclusion and exclusion criteria, among whom 580 cases suffered from in-hospital 28-day mortality, while 1271 cases survived. Multivariate COX analysis showed that sentiment polarity and sentiment subjectivity were inversely associated with in-hospital 28-day mortality. Based on the quartiles of the individual PI, patients were assigned into the high-risk group and the low- and middle-risk group. Survival analysis indicated that the high-risk group had earlier median death time compared with the low- and middle-risk group. These all suggested that the quantitative measurement of sentiments in nursing notes was associated with the in-hospital 28-day mortality and survival of sepsis patients; nursing notes containing rich information may serve as a potential predictor of clinical outcomes in the ICU.

To the best of our knowledge, brief fragments of the text are conducive to reflecting the author’s feelings about a given topic. Recently, language processing tools have been developed and allow the characterization of feelings, such as the sentiment in text documents [18]. Sentiment is usually described as the relative positivity or polarity of a text string and is measured by a number from -1 (very negative) to 1 (very positive) [14]. It can also be interpreted as the estimated probability of “positive” or “negative” through a classifier. Sentiment analysis permits us to gain insights into the clinicians’ emotions and attitudes towards patients through the subjective expressions made by clinicians in the text of clinical notes, thus contributing to the prediction of patients’ outcomes [19–22]. In health-related fields, sentiment analysis has been widely applied to Cancer Survivors Network (CSN) breast and colorectal cancer discussion posts [23], health reforms on Twitter [24], encounter notes of patients with critical illness [25], etc. This study was aimed at identifying the association between sentiments in nursing notes and the in-hospital 28-day mortality of sepsis patients. The results exhibited that both sentiment polarity and sentiment subjectivity were inversely associated with in-hospital 28-day mortality, supported by the results of McCoy et al. that the sentiment measured in hospital discharge notes was related to hospital readmissions and mortality risk [11]. Based on the COX model, the patients with were found to have a higher risk of death than those with , highlighting the potential value of sentiments in survival analysis. A previous study has shown a strong association between sentiments and the risk of death even after adjustment for severity of illness and baseline information [25].

The superiority of the present study was that it was the first study to investigate the association between sentiments in nursing notes and the in-hospital 28-day mortality of sepsis patients. The nursing notes written less than 12 hours before the time of death were excluded, which made the results more reliable. However, the present study also had several limitations that should be cautiously interpreted. First, nursing notes from the MIMIC-III database with single-center samples may manifest different characteristics because of variations in clinicians, experience, training, or working environment, easily causing the results to be nongeneralizable. Second, the approach used to measure the sentiment in the present study was not the only approach available. Other techniques could produce different results, such as those based on the machine learning model to make semantic inferences. Third, the mean sentiment scores could only characterize the variations at the level of patients, but not at the levels of sentences, paragraphs, or documents. Forth, the nursing notes were recorded by caregivers who are research nurses, medical doctors, or so on (available at https://mimic.mit.edu/docs/iii/tab les/caregivers/). It cannot be determined whether the sentiments based on nursing notes are based on past or personal experiences. Moreover, the subtle difference in sentiments was not obtained over time. In the future, the temporal mode of nursing notes will be examined to gain more insights.

5. Conclusions

Sentiments in nursing notes are associated with the in-hospital 28-day mortality and survival of sepsis patients, suggesting the importance of sentiments in nursing notes for the prediction of clinical outcomes in the ICU. Although predicting clinical outcomes is still a complex problem, the information extracted from unstructured data like nursing notes may contribute to further improving prediction performance.

Abbreviations

PI:	Prognostic index
SAPS-II:	Simplified Acute Physiology Score II
ICU:	Intensive care unit
SOI:	Severity of illness scores
MIMIC-III:	Medical Information Mart for Intensive Care II
ICD-9:	International Classification of Diseases 9
HR:	Hazard ratio
TSICU:	Trauma/surgical intensive care unit
CSN:	Cancer Survivors Network.

Data Availability

The data utilized to support the findings are available from the corresponding authors upon request. The data applied in the present study were from the MIMIC-III database (https://mimic.physionet.org/), a freely accessible database.

Ethical Approval

The data collected in the MIMIC-III was approved by the Ethics Review Board of the Beth Israel Deaconess Medical Center, and all private information has been desensitized.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Authors’ Contributions

QYG contributed to the study design and manuscript writing. DW, PPS, and WFW contributed to the data collection and analysis. XRL contributed to the study design and study supervision. QYG, DW, PPS, WFW, and XRL contributed to the critical revisions of important content. All authors revised and approved the final manuscript.

Acknowledgments

This research was funded by the Shandong Social Science Planning Research Project-2019 (19CCXJ05).

References

S. Li, X. Hu, J. Xu et al., “Increased body mass index linked to greater short- and long-term survival in sepsis patients: a retrospective analysis of a large clinical database,” International Journal of Infectious Diseases, vol. 87, pp. 109–116, 2019.
View at: Publisher Site | Google Scholar
M. Singer, C. S. Deutschman, C. W. Seymour et al., “The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3),” Journal of the American Medical Association, vol. 315, no. 8, pp. 801–810, 2016.
View at: Publisher Site | Google Scholar
C. W. Seymour, T. D. Rea, J. M. Kahn, A. J. Walkey, D. M. Yealy, and D. C. Angus, “Severe sepsis in pre-hospital emergency care,” American Journal of Respiratory and Critical Care Medicine, vol. 186, no. 12, pp. 1264–1271, 2012.
View at: Publisher Site | Google Scholar
A. J. Walkey, T. Lagu, and P. K. Lindenauer, “Trends in sepsis and infection sources in the United States. A population-based study,” Annals of the American Thoracic Society, vol. 12, no. 2, pp. 216–220, 2015.
View at: Publisher Site | Google Scholar
C. Fleischmann, A. Scherag, N. K. Adhikari et al., “Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations,” American Journal of Respiratory and Critical Care Medicine, vol. 193, no. 3, pp. 259–272, 2016.
View at: Publisher Site | Google Scholar
J. Cohen, J. L. Vincent, N. K. Adhikari et al., “Sepsis: a roadmap for future research,” The Lancet Infectious Diseases, vol. 15, no. 5, pp. 581–614, 2015.
View at: Publisher Site | Google Scholar
I. E. R. Waudby-Smith, N. Tran, J. A. Dubin, and J. Lee, “Sentiment in nursing notes as an indicator of out-of-hospital mortality in intensive care patients,” PLoS One, vol. 13, no. 6, article e0198687, 2018.
View at: Publisher Site | Google Scholar
G. Rocker, D. Cook, P. Sjokvist et al., “Clinician predictions of intensive care unit mortality,” Critical Care Medicine, vol. 32, no. 5, pp. 1149–1154, 2004.
View at: Publisher Site | Google Scholar
T. Sinuff, N. K. Adhikari, D. J. Cook et al., “Mortality predictions in the intensive care unit: comparing physicians with scoring systems,” Critical Care Medicine, vol. 34, no. 3, pp. 878–885, 2006.
View at: Publisher Site | Google Scholar
M. M. Ghassemi, R. G. Mark, and S. Nemati, “A visualization of evolving clinical sentiment using vector representations of clinical notes,” in 2015 Computing in cardiology conference (CinC), pp. 629–632, Nice, France, 2015.
View at: Publisher Site | Google Scholar
T. H. McCoy, V. M. Castro, A. Cagan, A. M. Roberson, I. S. Kohane, and R. H. Perlis, “Sentiment measured in hospital discharge notes is associated with readmission and mortality risk: an electronic health record study,” PLoS One, vol. 10, no. 8, article e0136341, 2015.
View at: Publisher Site | Google Scholar
N. Tran and J. Lee, “Using multiple sentiment dimensions of nursing notes to predict mortality in the intensive care unit,” in 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 283–286, Las Vegas, NV, USA, 2018.
View at: Publisher Site | Google Scholar
A. E. Johnson, T. J. Pollard, L. Shen et al., “MIMIC-III, a freely accessible critical care database,” Scientific Data, vol. 3, no. 1, article 160035, 2016.
View at: Publisher Site | Google Scholar
B. Liu and L. Zhang, “A survey of opinion mining and sentiment analysis,” in Mining Text Data, Springer, Boston, MA, 2012.
View at: Google Scholar
S. Loria, P. Keen, and M. Honnibal, TextBlob: Simplified Text Processing, 2018, 2018, https://media.readthedocs.org/pdf/textblob/dev/textblob.pdf.
J. Zhang and H. Jin, “Method of subjective lexicon creation for Chinese sentiment analysis,” Applied Mechanics & Materials, vol. 34-35, pp. 801–805, 2010.
View at: Publisher Site | Google Scholar
J. R. Le Gall, S. Lemeshow, and F. Saulnier, “A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study,” Journal of the American Medical Association, vol. 270, no. 24, pp. 2957–2963, 1993.
View at: Publisher Site | Google Scholar
P. S. Dodds, K. D. Harris, I. M. Kloumann, C. A. Bliss, and C. M. Danforth, “Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter,” PLoS One, vol. 6, no. 12, article e26752, 2011.
View at: Publisher Site | Google Scholar
J. Cioffi, “Recognition of patients who require emergency assistance: a descriptive study,” Heart & Lung, vol. 29, no. 4, pp. 262–268, 2000.
View at: Publisher Site | Google Scholar
M. Brabrand, J. Hallas, and T. Knudsen, “Nurses and physicians in a medical admission unit can accurately predict mortality of acutely admitted patients: a prospective cohort study,” PLoS One, vol. 9, no. 7, article e101739, 2014.
View at: Publisher Site | Google Scholar
S. A. Collins and D. K. Vawdrey, “"Reading between the lines" of flow sheet data: nurses' optional documentation associated with cardiac arrest outcomes,” Applied Nursing Research, vol. 25, no. 4, pp. 251–257, 2012.
View at: Publisher Site | Google Scholar
S. A. Collins, K. Cato, D. Albers et al., “Relationship between nursing documentation and patients' mortality,” American Journal of Critical Care, vol. 22, no. 4, pp. 306–313, 2013.
View at: Publisher Site | Google Scholar
K. Portier, G. E. Greer, L. Rokach et al., “Understanding topics and sentiment in an online cancer survivor community,” Journal of the National Cancer Institute. Monographs, vol. 2013, no. 47, pp. 195–198, 2013.
View at: Publisher Site | Google Scholar
D. King, D. Ramirez-Cano, F. Greaves, I. Vlaev, S. Beales, and A. Darzi, “Twitter and the health reforms in the English National Health Service,” Health Policy, vol. 110, no. 2-3, pp. 291–297, 2013.
View at: Publisher Site | Google Scholar
G. E. Weissman, L. H. Ungar, M. O. Harhay, K. R. Courtright, and S. D. Halpern, “Construct validity of six sentiment analysis methods in the text of encounter notes of patients with critical illness,” Journal of Biomedical Informatics, vol. 89, pp. 114–121, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Qiaoyan Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies