Abstract
With the emergence and tremendous growth of text mining, a computer-assisted approach for capturing sentiment viewpoints from textual data is gradually becoming a promising field, particularly when researchers are increasingly facing the problem of filtering bunches of useless information without capturing the essence in the big data era. This study aims at observing and classifying the sentiment orientation in CEO letters, digging the main corporate social responsibility (CSR) themes, and examining the effectiveness of CEO letters’ sentiment on forecasting financial performance. A specific sentiment dictionary has been proposed to identify and classify the sentiment orientation in CEO letters by utilizing the appraisal theory. Additionally, the qualitative data analysis software NVivo is applied to explore the CSR topics. Furthermore, a modified Altman’s Z-score model and machine-learning approach are employed to predict financial performance. The results of preliminary evaluations validate that approximately 62.14% of the texts represent positive polarity even when companies are not in a promising economic situation. The CSR themes mainly focus on business ethical responsibility, particularly ethical activities. Among various machine-learning approaches, the logistic regression approach is appropriate for predicting financial performance with the state-of-the-art accuracy of 70.46 %. The encouraging results indicate that the sentiment information inCEO letters is a vital factor for anticipating financial performance. This work not only offers a new analytic framework for associating linguistic theory with computer science and economic models but will also improve stakeholders’ decision-making.
1. Introduction
Due to researchers’ unrivalled and explosive expansion in data mining, big data, and artificial intelligence, natural language processing (NLP) in handling bunches of textual data becomes an explosive and prevalent field with great future prospects. The high-tech novel technique of sentiment analysis offers a more efficient and accurate way for text processing, and its amazing pace of innovation, low costs, and scalability make it a highly attractive and alternative approach.
Sentiment analysis, also known as “opinion/sentiment mining” or “subjectivity analysis,” uncovers a prominent interdisciplinary field of mixing computational linguistics and computer science, which attempts to extract subjective opinions, feelings, and attitudes contained in the text and analyzes how to use language to deliver subjectivity and viewpoints on a particular topic [1].
Correspondingly, corporate social responsibility report (CSRR), containing abundant sentiments, is crucial for reflecting companies’ sustainable standpoints on its operating ideas, strategies, and methods. For this reason, CSRR can be a valuable source for investors’ decision-making. For instance, CEO letter contains the ideas that corporations credibly desire to convey some information about themselves to their potential stakeholders, and these ideas may be important in determining investment and lending decisions because they cover valuable background information about interpreting and explaining financial performance, which may not be covered in companies’ financial statements [2]. This obviously boosts the requirements of sentiment analysis and turns into extremely precious resources for stakeholders’ decision-making [3]. China, as one of the biggest developing countries in the world, has developed CSR rapidly, especially the Chinese government has paid much more attention to sustainable development. However, fewer researchers have exerted empirical evidence on the sentiment analysis of CSR in China, particularly in the case of information asymmetry.
To our knowledge, prior studies have proposed manifold sentiment analysis techniques towards extracting useful content from massive social media [4, 5]; some scholars have categorized sentences into positive/negative [4], while other researchers have divided sentences into critical/emotional-based on researchers’ intuition. The taxonomy results, however, are inconsistent and incomparable as various self-designed frameworks have been applied in diverse studies. Although sentiment analysis of CSR or nonfinancial information appears to be a possible trend for predicting financial performance and assisting investors to make future investment decisions, little research has been done in this field.
As an attempt to make up the deficiency in the above research field, this paper is divided into three dimensions: firstly, in the microlevel analysis, it determines the sentiment polarity and sentiment attributes in letters to shareholders by utilizing appraisal theory; secondly, in the mesolevel analysis, it identifies the prominent CSR themes by employing NVivo 12 plus software; thirdly, in the macrolevel analysis, it explores the sentiment elements so as to anticipate financial performance by using the expression of Z-score.
The following parts of the paper are organized as follows: in the part of literature review, the related CSR, CEO letters, primary sentiment analysis approaches, appraisal theory, and forecasting financial performance through textual information will be reviewed; the part of research methodology will introduce the annotation study and the experimental procedures; in the part of experimental results and analysis, the experimental results will be stated and analyzed; in the conclusion part, the research results, contribution, limitations, and suggestions will be provided.
2. Literature Review
2.1. CEO Letters in CSR Report
The CEO letter (hereafter, shareholders’ letter or letter to shareholders) is the most widely read part of the CSRR and a valuable communication source for stakeholders’ decision-making. Generally, letter to shareholders is widely regarded as a promotional genre, which tries to provide companies’ subjective sentiment standpoints and aims to portray a positive image [6] such as what stakeholders desired to know about identifying the last year’s performance, tracking important events, displaying corporate social responsibility, and predicting the future vision of the companies. Thus, the letter stands in a vital important position to deliver company’s competitive advantage [7]. Although a number of scholars recognized the significance of CEO letters, for instance, Kohut and Segars [7] characterize the effectiveness of letter to shareholders and how this information can benefit the company. Patelli and Pedrini [8] indicate that, even under the tough economic conditions, companies still sincerely disclose nonfinancial information with stakeholders. Surprisingly, little research has focused on how companies try to construct their images and showed to readers through sentiment analysis.
2.2. Sentiment Analysis
In artificial intelligence (AI), text mining is an effective and efficient way to process a large number of textual contents through extracting the sentiment polarity based on natural language processing. In particular, sentiment analysis has gradually become a popular technique. A considerable amount of literature has concentrated on analyzing documents’ sentiment polarity [9]. Automatic sentiment classification has been extensively applied to reviews of products [10], movies [11], books [12], shopping [13], social networks [14], and students’ course evaluation comments [15, 16], which is a common application of classifying positive or negative reviews. Most recent work has involved in extracting the textual information in the financial reports, as the text may contain more information than the numerical part in an annual report [17, 18]. The final results indicate that sentiment analysis is an important technique for forecasting financial performance and, thus, can be used to support the decision-making process of potential stakeholders [19, 20]. Researchers, therefore, have realized the potential value of the textual information analysis of financial reports for predicting financial performance and managing risks.
In fact, the challenge of detecting sentiment from text has been tackled from various perspectives. Nonetheless, previous approaches to spot affect have been categorized into two main approaches: lexicon-based approach and machine-learning approach [21], which are explicitly depicted in Figure 1.

2.2.1. Lexicon-Based Approach
Lexicon-based methods are rule-based, requiring a predefined word list and polarity to identify viewpoints and sentiments [4, 22], containing computing orientation for a text from the semantic orientation of words or phrases [23]. In other words, when a new text has been selected, the words inside the text have to match with the words in the sentiment dictionary and then various algorithms have been employed to aggregate values. The total aggregation of positive and negative values of the words assembles the sentiment orientation of the whole text. An idealized operating mechanism is specified in Figure 2.

Table 1 enumerates some related work in the lexicon-based approach for sentiment analysis and illustrates the various types of objectives along with the associated models used and the experimental results produced. We employed precision, recall, and F-measure as evaluation metrics; they are common in information retrieval and document classification research. Precision measures the number of correctly classified items out of the total classified by a classification technique, and recall measures the amount of correctly classified items of those manually classified as the gold standard. The F-measure is the harmonic mean of precision and recall, which offers a better measure than the arithmetic mean of precision and recall.
The merits of employing lexicon-based approach are that, across diverse domains, lexicon-based methods do not require to alter dictionaries [4]. Similarly, Brooke, Tofiloski, and Taboada [29] claimed that, compared with machine-learning, rule-based approach is not a cumbersome task because scholars do not need to take a large amount of time and effort to annotate the training data in a specific domain.
The lexicon-based approach, however, has its own limitations. Sometimes, the individual word extraction may miss important meanings in the text. Since the existing sentiment dictionary cannot meet the specific context of letter to shareholders, for example, in the dictionary, words like lower and decrease simply do not have a negative meaning in the CSR. Hence, in order to alleviate the dispute, sentiment identification requires a more comprehensive and proprietary sentiment dictionary for the specific CSR domain.
2.2.2. Machine-Learning Approach
The machine-learning approaches are presented with training classifiers, such as support vector machine (SVM), Naive-Bayes (NB), logistic regression (LR), and maximum entropy (ME), to assess the contents’ positive or negative orientations based on an initial coding ten test upon the dataset to see if the sentiment is indeed captured [30]. To put it another way, human annotators code a small sample of dataset and then the machine takes over once it “learns” what kinds of words resemble these positive and negative sentiments [31]. A detailed operating mechanism is specified in Figure 2.
Generally speaking, the machine-learning approach can be further divided into supervised, unsupervised, and semisupervised [32]. Normally, the training objective is to be able to classify or distinguish instances. The main difference is that the supervised machine learning-based approach selects labeled instances to construct the model. The unsupervised approach is used for data mining to spot what is inside the unlabeled data. Semisupervised technique is halfway between supervised and unsupervised learning. This technique is about to use unlabeled data to learn a function improving the classification performance.
Table 2 summarizes a great diversity of relevant literature on machine-learning approach for sentiment identification, illustrating that most existing studies analyze text by creating a large dataset to measure subjectivity.
The advantages of the machine-learning approach are that, once the trained dataset is available, the classifier can rapidly determine the text’s polarity. For instance, Yang et al. [35] applied a naive Bayes classifier and class association rules to identify the sentiment of consumer reviews and proved to achieve a satisfactory result. Additionally, Troussas et al. [36] classified the people’s feelings and attitudes about certain topics and stated how sentiment analysis using naive Bayes can efficiently assist language learning. Specifically, Yuan et al. [39] manifested that the classification accuracy has been improved extremely based on the sentiment analysis of CSR in financial reports through the SVM approach. This provides the inspiration and basis for the further research of this study.
While the machine-learning approach has long been used in topical text classification with good results because of the efficiency and accuracy, it sometimes encounters with shortcomings as a result of high-level human manual intervention. That is to say, the classifiers necessitate sizable human annotated datasets for training and testing, especially amid the big data era, which is extremely costly and time-consuming [31]. Moreover, sometimes, the training datasets are unavailable. Previous attempts at categorizing movie reviews and book reviews used the machine-learning approach with limited success. Because the technique may not suitable for various texts, facing the obstacle of domain specificity, the accuracy of analysis has reduced greatly because the thorough custom-made training dataset may not generalize well to texts from other domains [40, 41]. Consequently, in this study, the particular characteristics should be captured to perfectly satisfy the specific domain of letters to shareholders.
2.3. Forecasting Financial Performance by Using Text Information
Recently, sentiment analysis has been widely explored in understanding the relationship between text information and corporate financial performance. Through evaluating authors’ opinions, attitudes, and sentiment polarity, the future financial performance could be predicted. A great number of scholars have recently engaged in analyzing textual information in annual reports, newspapers, and other documents to forecast stock return or financial performance. Table 3 lists some literature to indicate that the sentiment of documents can significantly correlate with financial performance.
From the above literature, our study will try to use letters to shareholders in CSRR to test whether the sentiment in the letter can predict the financial performance or not by employing various kinds of machine-learning methods. Furthermore, in order to have a better comparison of the sentiment attribute categorization and construct a suitable sentiment classification dictionary for the specific context, appraisal theory will be applied and the seed dictionary will be manually annotated for the domain of CEO letters.
2.4. Appraisal Theory
Although machine-learning methods show a better performance, the results of each experiment are not comparable because of various kinds of categorizations. A detailed theoretical framework is highly required to handle this difficulty. Researchers are asked to address more challenging tasks and attempt to perform more sophisticated sentiment analyses based on systematic and well-grounded sentiment theories and frameworks. These frameworks will become common theoretical platforms for comparing results across different studies.
To date, the existing sentiment techniques may not be sufficient for sentiment classification. Linguistic framework, therefore, has been employed as a theoretical platform for sentiment analysis. By far, computational linguists have projected several sentiment frameworks. Wiebe et al. [48] disclosed a sentiment framework based on private states with three types of expressions, that is, explicit mentions of private states, speech events expressing private states, and expressive subjective elements. The framework is designed to expand attitude types to include intention, warning, uncertainty, and evaluation. Asher, Benamara, and Mathieu [49] further denoted an annotation framework involving four categories, that is, reporting expressions, judgement expressions, advice expressions, and sentiment expressions to express various kinds of feelings and opinions. The most comprehensive linguistic theory of sentiment, however, is the appraisal framework [50], also known as interpersonal semantics, developed within the tradition of Systemic Functional Linguistics [51]. Appraisal theory is a framework employed in conveying the language of evaluation in text [52] and also an approach to linguistics that focuses on the semantics of text rather than its grammar [53].
The framework portrayed a taxonomy of the language to convey attitude, engagement, and graduation with respect to the evaluations of other people. The detailed taxonomy is depicted in Figure 3. Attitude considers how people conduct interpersonal interaction to express his or her opinions and emotions; engagement assesses the evaluation with respect to others’ opinions; and graduation denotes how to strengthen or weaken the attitude through language functions.

Attitude, engagement, and graduation can be further divided into several subtypes, which are explained more in-depth in the following paragraphs.
Attitude can be further divided into three distinct subsystems: affect, appreciation, and judgement. Affect identifies the feelings and emotions of the author (happy and sad), which can be a behavioral process or an internal mental state. Appreciation represents the reaction that a person talks about (beautiful and ugly), such as impact, quality, composition, complexity, or valuation. Judgement considers the author’s attitude towards the behavior of somebody (heroic and idiotic) and the evaluation may concern social sanctions or social esteem; social sanctions may involve veracity or propriety; and social esteem may involve the assessment of how normally someone behaves, how capable the person is, or how tenacious the person is.
Engagement contains two subcategories: monoglossic and heteroglossic. Monoglossic has no recognition of dialogistic alternatives; that is, the writer/speaker directly expressed the appraisal. Heteroglossic has the recognition of dialogistic alternative positions and voices, which means that the writer/speaker has either attributed to other methods or sources to make it credible.
Graduation also include two subclasses: force and focus. Force assesses the degree of intensity; focus covers to sharpen or soften the specification.
The main advantage of utilizing appraisal theory in sentiment identification is that it enables us to take a look deeper inside the thoughts of authors or publishers, revealing their real sensations by employing linguistic and psychological analysis of their texts. In our study, we want to extract the sentiment inside in the shareholders’ letters and may possibly draw some statistical conclusions about the types of sentiment that companies of shareholders’ letters express via the texts they write. At last, the analysis can help potential investors to make better decision-makings.
So far, scholars have made a great number of useful attempts at the level of sentiment analysis. Table 4 sums up the relevant sentiment analysis literature employing the appraisal framework. From the table, the research integrating the appraisal theory into the analysis of sentiment orientation is relatively scare. Korenek and Šimko [60] established the method of manually creating an emotional dictionary, utilizing the appraisal theory to evaluate emotional posts on Weibo posts; Taboada and Grieve [54] further made some improvements on how to aggregate the value of adjectives based on appraisal theory. In addition, Khoo et al. [50] extended the appraisal framework for analyzing long news reports, compared with microposts, and assessed the framework’s utility and possible problems, which is definitely a good attempt for our current study.
As indicated in Table 4, the aforementioned work has improved that appraisal framework is a highly useful tool for analyzing sentiment classification [50, 60]. This paper, thus, intends to augment the sentiment classification of president’s letter by employing the appraisal theory. The appraisal theory proves to be useful in uncovering various aspects of sentiment that should be valuable to researchers and assists researchers to understand the feelings of shareholders’ letters better. By following the doctor dissertation of Bloom [59], who laid the basis for sentiment classification by utilizing appraisal theory, a more systematic and well-grounded approach of sentiment analysis utilizing Martin and White’s appraisal theory has been heuristically employed, which can successfully assist potential investors to understand corporations’ attitudes accurately and make investment decisions more efficiently.
3. Research Methodology
As mentioned earlier, in order to achieve our goals in this study, a three-dimensional research framework has been established to enable us to systematically test different perspectives of text mining strategies with both quantitative and qualitative methods (see Figure 4).

The specific research questions that we address are the following: RQ1: what are the prominent sentiment attributes in CEO’s letters that can actively promote companies’ images and avoid negative views? RQ2: what are the main sentimental themes in CEO’s letters? And which one can best encourage other companies’ investment? RQ3: how can the sentiment attributes of CEO’s letters anticipate corporate financial performance that would be useful for stakeholders’ decision-making?
The above questions in three dimensions evaluate CEO letters. In the microlevel of identifying sentiment attributes, appraisal theory and lexicon-based sentiment method have been applied to classify various sentiment attributes precisely. In the mesolevel analysis, we utilize NVivo qualitative data analysis software to do the thematic analysis of CEO letters. In the macrolevel, machine-learning approach was selected to assess the power of CEO letters in predicting corporate financial performance in terms of the Z-score model.
3.1. Data Collection and Description
To date, various methods have been developed and introduced to measure sentiment. A suitable method to adopt for this study is to propose a specific sentiment dictionary based on the principles of appraisal theory. The advantage of utilizing the appraisal theory is that it allows researchers to categorize and compare sentiment attributes based on a systematic linguistic theory and also enables them to explore authors’ thoughts more thoroughly. The major difference from the previous works which use the appraisal theory is that the whole basic appraisal tree has been selected and shareholders’ letter specifics have been assessed.
In terms of the variety of genres, such as political news, movie reviews, and product reviews, the analysis results will significantly differ depending on distinct syntactic structure and lexical choices [56]. Therefore, in order to maintain consistency, the same genre has to been examined based on the appraisal framework. In this study, shareholders’ letters are good candidates for this study because they are likely to contain similar languages in the same genre of writing; also, examples of appraisal’s attributes can be easily detected. Most importantly, identifying valuable information from shareholders’ letters is a sufficient way to help companies to improve the quality of released information and facilitate stakeholders’ to make investment decisions.
All CSR reports with English version were downloaded from GRI’s Sustainability Disclosure Database (https://database.globalreporting.org). This database can access to all types of sustainability reports from various industries relating to the reporting organizations. For the sake of preventing problems with both industry-specific attributes and different financial performance evaluation, the financial industries were excluded. Furthermore, in order to ensure accurate comparable information, all selected reports must have official English version and should be listed in the Hong Kong publicly regulated markets.
Finally, 41 Chinese companies’ English version CSR reports from year 2016 had been collected. Then, 41 shareholders’ letters were retrieved from the CSR reports in China in 2016. The corpus contains 35,670 tokens in total, named SLC. In addition, all the statistical data for calculating Z-score are gathered from the Wind financial database.
The year 2016 was selected because the United Nations General Assembly unanimously adopted the Resolution 70/1, Transforming our World: the 2030 Agenda for Sustainable Development. This historic document lays out 17 sustainable development goals, which aim to integrate and balance the three dimensions of sustainable development: economic, social, and environmental. The new goals and targets will come into effect on 1 January 2016 and will guide for the next fifteen years.
3.2. Data Preprocessing
Most CEO letters were released in PDF format. Firstly, letters were manually transformed.pdf documents to the essential.txt groups and texts were sorted to have only one sentence in each line. Then, the text documents were linguistically preprocessed using tokenization, part of speech tagging, and lemmatization: Tokenisation—splitting texts into sentences and words Part of speech tagging—adding a POS tag to each word in a sentence Lemmatisation—converting a word into its basic lemma form Deletion of stop words and company name
To control for bias in the analysis, in particular, shareholders’ letters have a specific content that differs from other texts, which has to be coped with. For instance, the abbreviation of company’s name is Best Buy, including the positive word of “Best”, which may significantly convert the polarity of the text. As a result, the pretreatments were a combination of tokenization, lemmatization, part-of-speech-tagging, and deletion of stop words and named entities, which gave the best result.
3.3. Specific Dictionary Construction and Classification
The main issue with the sentiment analysis of textual documents is the right choice of positive terms. Obviously, the categorization of words is not always unambiguous and requires context knowledge. This is due to the various meanings of words and domain specific tone of the words, respectively. Therefore, the appraisal theory has been employed in this study.
To evaluate the appraisal theory, a dictionary tagged with attributes from Martin and White’s categorization (2005) has to be constructed. In terms of the work of Korenek and Šimko [60], they built a dictionary based on appraisal theory specializing in microblogs. By following their work, we accumulated approximately five hundred and fifty words from Martin and White’s classification. In order to broaden the dictionary, WordNet database, Collins Thesaurus, and Merriam-Webster Thesaurus have been utilized to find synonyms to words identified in the previous step, and this formed the basic sentiment lexicons. The next step is to be aimed at extracting candidate sentiment word lists from the self-constructed SLC corpus, employing the statistical approach based on the amount of pointwise mutual information (PMI) ratio, which can be used to compare candidate words with the existing basic sentiment lexicons. PMI calculation has been defined as follows:
In light of the PMI ratio, whether the word should be identified as a target or not must be decided. On completion of selecting targets, the target candidate words merged with the basic sentiment lexicons to construct a new dictionary specializing in shareholders’ letter content, with approximately six hundred words in total.
Due to the specifics of shareholders’ letters, the traditional lexicon-based approach cannot identify the complex context. So the statistical method of PMI ratio may not be accurate all the time. Manual screening is highly needed in this step.
To further categorize the new dictionary, each word is manually classified according to the attributes based on appraisal theory and in line with the research of [60], who assigned a 10-point Likert scale, from −5 to +5. The same scaling scope has been considered in this research. For the purpose of manual coding with participants’ subjective judgement, eight human annotators, a to h, were invited to assign polarity independently. The annotators were all well-versed in the appraisal framework. They were asked to specify the type of attitude, engagement, or graduation present and assign a scaling and polarity to candidate words. In order to address the concern of inconsistent understanding regarding some ambiguous words, the annotation was executed over two rounds, punctuated by an intermediary analysis of agreement and disagreement among all annotators until a consensus has been achieved.
A partial example of entries is presented in Table 5. Each entry contains a word, a part-of-speech tag, a category, and subcategories according to the appraisal theory and appraisal value.
Based on the newly established sentiment dictionary for a specific corpus, we can further precisely analyze the sentiment characteristics of CEO letters from micro-, meso-, and macrolevel analysis.
3.4. Data Analysis
3.4.1. Microlevel Analysis
Regarding RQ1, we categorized CEO letters based on the specific established sentiment classification dictionary considering the following eleven categories of terms: A. Positive attitude affect (e.g., happy, convinced, and satisfied) B. Negative attitude affect (e.g., sorry and sad) C. Positive attitude judgement (e.g., lucky, fortunate, and famous) D. Negative attitude judgement (e.g., imperfect, unknown, and severe) E. Positive attitude appreciation (e.g., exciting, dramatic, and excellent) F. Negative attitude appreciation (e.g., imbalanced, disharmonious, and conflicting) G. Positive engagement (e.g., certainly, obviously, naturally, and evidently) H. Negative engagement (e.g., falsely and compellingly) I. Positive graduation force (e.g., greatly, slightly, and somewhat) J. Negative graduation force (e.g., small and remote) K. Positive graduation focus (e.g., true and genuinely)
We assumed that well-performing corporations are being more positive, optimistic tones. Conversely, we expected a more active language in the case of poorly performing companies that need to take positive actions to improve their image and attract more investors.
3.4.2. Mesolevel Analysis
With the popularity of computer technology, a range of software packages emerged to assist with the analysis of qualitative data. It is generally accepted that computer-assisted qualitative data analysis software (CAQDAS) can enhance the data handling/analysis process if used appropriately and resolve analysts from complicated data analysis.
The software package NVivo (now update to version 12) is one of the most distinguished CAQDAS, which can help the analysis to work more efficiently and rigorously back up findings with grounded data [62].
In this study, NVivo has been prompted to do a thematic analysis for identifying the broad CSR topics existing in the shareholders’ letters. Thematic analysis is a way of categorizing data from qualitative research through analyzing similar themes and interpreting the research findings [63]. In this study, thematic analysis can be employed to detect the main ideas existing in the CEO letters and through NVivo software to label or code different types of CSR.
NVivo allows nodes to have more than one dimension (tree branch). Therefore, we were able to identify where concepts may have more than one dimension or group them within a more general concept. This is a revolution in finding connections because it prompts the analyst to think about their concepts in more detail, facilitating conceptual clarity, and early discourse analysis [64]. Figure 5 reveals a sample of the tree node structure of CSR.

Coding stripes function is a useful function for researchers to annotate various segments in the whole documents, which may facilitate the comparison of categories. Figure 6 depicts an example of data coded at the ethical responsibility node.

Generally speaking, NVivo offers us a valuable tool to explore the complexities of potential relationships without forcing the data to fit specific categories. In this way, when we identified a possible relationship with CSR, we defined this in NVivo using the free code to represent it first and later created tree node to identify the internal relationship.
3.4.3. Macrolevel Analysis
The Altman’s model of financial health (Altman’s Z-score) was selected for the quantitative evaluation of the assessed companies. The reason of the selection was that this model was created for assessing company financial health in industrial branch with shares tradable on the Hong Kong publicly regulated markets and exactly such companies were selected for the study.
The scope of this so-called bankruptcy model is to predict the probability of survival or bankruptcy of the analyzed company. The nearer to the bankruptcy a company is, the better Altman’s index works as a predictor of financial health. It predicts bankruptcy reliably about one to two years in advance. Altman’s Z-score model uses the following relation to define the value of a company in industrial branch with shares publicly tradable on the stock market:where i denotes the i-th company, X1 is the working capital/total assets, X2 is the retained earnings/total assets, X3 is the earnings before interest and tax/total assets, X4 is the market value of equity/total liabilities, and X5 is the sales/total assets. Detailed variable definition can be found in Table 6.
The Zi value is in range −4 to +8. The higher the value, the higher the financial health of a company. It holds true that if (1) Zi > 2.99, the company is in the “safe zone” (a company with high probability to survive—financially strong company); (2) 1.80 ≤ Zi ≤ 2.99 “grey zone” (the future of the company cannot be determined clearly—a company with certain financial difficulties); and (3) Zi < 1.80 “distress zone” (the company has serious financial problems—the company is endangered by bankruptcy).
Due to the special historical conditions of China's stock market formation, the types of stocks formed in China are different from those in foreign countries. In the stocks of listed companies, they can generally be divided into two categories: tradable shares and nontradable shares. In view of the fact that there is no market price for nontradable shares in the Hong Kong stock market, the model has made a slight change: X4 = (share price ∗ tradable shares + net asset value per share ∗ nontradable shares)/total liabilities, and X5 is the prime operating revenue/total assets.
The outputs of the forecasting models were represented by the classes of financial performance obtained using the Z-score bankruptcy model, namely, classes “safe zone,” “grey zone,” and “distress zone.” In addition, we obtained additional output classes (increase, no change, and decrease). Given the fact that the classes were imbalanced in the dataset, we use the Synthetic Minority Oversampling Technique (SMOTE) algorithm [65] to modify the training dataset. The algorithm oversamples the minority classes so that all classes are presented equally in the training dataset.
The set of eleven sentiment attributes presented in the previous section was drawn from shareholders’ letter. Following previous studies [46, 66], the input attributes were collected for Chinese companies in the year 2016, while the output financial performance (Z-score) was evaluated for the year 2017, and the change in the financial performance was measured as Z-score in 2017 related to its value in 2016. For the sake of preventing problems with both industry-specific attributes and different financial performance evaluation, the financial industry was excluded. As a result, among 41 Chinese companies, 9 companies were classified into “grey zone” and 32 companies were classified into “distress zone;” after making a comparison of financial situation between 2016 and 2017, only 2 companies were improved and the remaining 39 companies were unchanged.
In sum, the Z-score of Chinese companies in 2016 and 2017 could be spotted in Table 7.
Next, a various number of forecasting machine-learning methods have been explored, i.e., logistic regression, support vector machine, and naïve Bayes. Apart from logistic regression, the other two methods can process nonlinear data.
The logistic regression (LR) model has been used with a ridge estimator defined by Cessie and Houwelingen [67]. The classification performance of the logistic regression depends on the number of iterations and ridge factor, respectively.
Support vector machines (SVMs) are a set of related supervised learning methods, which are popular for performing classification and regression analysis using data analysis and pattern recognition.
Naïve Bayes (NB) is a simple multiclass classification algorithm with the assumption of independence between every pair of features. Naive Bayes can be trained very efficiently. Within a single pass on the training data, it computes the conditional probability distribution of each feature given label, and then it applies Bayes’ theorem to compute the conditional probability distribution of the label given observation and use it for prediction.
In sum, logistic regression, support vector machine, naïve Bayes are three methods designed to forecast the accuracy of financial performance.
4. Experimental Results and Discussion
In the data filtering, we select 41 CEO letters in Chinese companies CSR report and try to do in-depth analysis at microlevel, mesolevel, and macrolevel, respectively.
4.1. Sentiment Attributes
Regarding the eleven sentiment attributes, the detailed statistical data can be found in Table 8. Among all the categories, positive attitude, judgement, appreciation, and positive graduation force are the top three most frequent sentiment attributes.
From the previous data collection part, we know that among 41 Chinese companies, 9 companies were classified into “grey zone” and 32 companies were classified into “distress zone.” None of the companies were classified into the safe zone. Combing the eleven categorizations with our 2016 financial performance, interestingly, we found that poorly performing companies are expected to use a more active language to describe and evaluate their CSR. This phenomenon can be explained further by the impression management effect. Impression management refers to the process by which people try to manage and control the impression others make about themselves [68]. For corporations, the correct impression management can help companies to communicate with stakeholders smoothly. Companies try to use impression management to actively promote their images and avoid negative views. This is probably the reason why companies may encounter with a gloomy economic situation but still concentrate on shaping positive and optimistic images.
4.2. Sentimental Themes
The next segment is about to identify the sentimental themes in CEO letters through NVivo software. NVivo’s coding stripes functions enable us to examine all the relevant text and identify the sentences which contributed to that relationship and also to find out which node the sentences belong to. After coding all the relevant text, we gathered comprehensive sentimental themes existing in CEO letters (see Table 9).
According to Table 9, the most popular sentimental theme in CEO letters is the ethical responsibility. Business ethical responsibility for companies means a system of moral and ethical beliefs that guide companies’ behaviors, values, and decisions and minimizing unjustified harm, suffering, waste, or destruction to people and the environment [69]. This result reveals that a large amount of companies (32 among 41 companies) have put great efforts into business ethics. The concept of business ethics began in the 1960s as corporations became more aware of a rising consumer-based society that showed concerns regarding the environment, social causes, and corporate responsibility. In fact, the importance of business ethics reaches far beyond the strength of a management tea bond or employee loyalty. As with all business initiatives, the ethical operation of a company is directly related to companies’ short-term or long-term profitability. The reputation of a business in the surrounding community, other businesses, and individual investors is paramount in determining whether a company is a worthwhile investment. If a company is perceived to be unethical, investors are less inclined to support its operation.
In addition, in this study, we have divided the ethical responsibility into three parts: ethical accomplishment, ethical activity, and ethical ability. Ethical accomplishment means some awards that have been achieved by companies. Ethical ability expresses companies’ core competence and organizational identity. Ethical activity makes investors to know about the activities and functions taking place in the company. Detailed samples are scheduled below:(a)We implemented new energy efficiency projects worldwide, including the implementation of the ISO 50001 Energy Management System in all our European Union locations (Ethical activity, Lenovo 2016).(b)In 2016, we have achieved safe flight hours of 2.38 million, transported 115 million passengers, eliminated incidents by human errors, continued to maintain the best safety record in China’s civil aviation (Ethical accomplishments, China Southern Airlines 2016).(c)Consequently, China Telecom is among the first batch of the national demonstration bases for entrepreneurship and innovation (Ethical ability, China Telecom 2016).
From the coding results, ethical activity occupies the largest proportion, which means that firms would prefer to demonstrate their actions and practices to the society. Corporations have more incentives to be ethical as the area of socially responsible and ethical investing keeps growing. An increasing number of investors are seeking ethically operating companies to invest, which drives more firms to take this issue seriously. With the consistent ethical behavior, an increasingly positive public image can be established, and to retain a positive image, companies must be committed to operating on an ethical foundation as it relates to the treatment of employees, respecting the surrounding environment and fair market practices in terms of price and consumer treatment.
4.3. Machine-Learning Approach Forecasting Financial Performance
We tested many machine-learning approaches and selected only the best outcomes. The measures of classification performance were, in addition to Acc, represented by the averages of standard statistics applied in classification tasks [70]: true-positive rate (TP), false-positive rate (FP), precision (Pre), recall (Re), F-measure (F-m), and ROC curve. F-m is the weighted harmonic mean of precision and recall. ROC is a plot of the true-positive rate against the false-positive rate for the different cutpoints of a diagnostic test. In order to validate the accuracy, the experiments were realized using 10-fold cross validation. The best results in terms of the accuracy of correctly classified instances Acc (%) are presented in Table 10 (for the financial performance classes).
The results obtained by modeling point out that the logistic regression is more suitable for forecasting financial performance, reaching the highest accuracy of 70.46%. This evidence suggests that there exists a linear relationship between the sentiment and financial performance.
5. Conclusions
CEO letter contains information about corporate social responsibility performance in CSRR, which is designated for stakeholders to make their investment decisions. In this study, sentiment analysis has been applied to the evaluation of CEO letters from three perspectives: sentiment dictionary (microlevel), sentimental themes (mesolevel), and machine learning (macrolevel). In the microlevel analysis, a designated sentiment dictionary has been constructed for classifying sentiment attributes. The results denote that no matter the companies are in an active or passive economic situation, they are focusing on using a great proportion of positive words to establish a company image and attract investors. In the mesolevel analysis, a comprehensive tree node structure was identified to discover the sentimental topics relevant to CSR. In terms of outcome, the CEO letters contain a large quantity of ethical-related information, especially concentrating on the ethical activities that firms have organized. In the macrolevel, the logistic regression approach achieves the best result in forecasting future financial performance, which proves that there is a linear relationship between the sentiment and economic performance. In other words, sentiment information in CEO letters can be regarded as a vital determinant for forecasting financial performance.
The distinct contribution of this paper is threefold. Firstly, an advanced technique of sentiment analysis utilizing appraisal theory has been conducted, which is potentially useful for detecting the “concealed” information in letters. Secondly, a sentiment dictionary has been constructed successfully and specifically for shareholders’ letters, which can significantly upgrade the accuracy of sentiment classification. Lastly, this research can guide companies to further enhance the technique of releasing nonfinancial information and display fundamental causes for diverse texts.
The current research has its own limitations. Utilizing Z-score, the quantitative assessment to define companies’ economic situation may not be adequate. In the Z-score model, the stock market value is only a static value at some point in time, which cannot reflect a dynamic fluctuation. In fact, the need to interpret the firms’ released information is to predict its future performance, it is insignificant which economic model was selected, and the critical point is the influence of sentiment on the perception of the company by its stakeholders.
In the future research, it is possible to apply other economic models for predicting financial performance. Especially with the popularity and high development of sentiment analysis, a great number of new approaches would be probed by text mining to detect more concealed information for investors’ decision-making and to be applied to other languages as well.
Data Availability
All data, models, and code generated or used during the study appear in the submitted article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This study was supported by the 2019 Project of the National Social Science Foundation of China: On the Overseas CSR Driving Forces and Influencing Mechanism for Chinese Enterprises (Grant no. 19BGL116).