Abstract
The stock market prices of the company vary in a daily fashion. The social media pattern usage of the company can be determined to find the sentiment score values. The dependency factor between the social media tweet platform and the performance of an organization can have how much effect on the stock prices is determined. The historical data from the Yahoo Finance APIs are taken for the unique company ID and then the probability of stock being good or bad is determined. Also, the tweets related to the company are scanned and analyzed to find the positive and negative scores. The concentration value connected to growth, the intensity of capital expenditure, and the volume of promotion were among the factors utilized in the stock’s modeling. This paper also takes the yearly finances of the end-user based on LIC payments, medical insurance payments, and average rent and then performs a classification of the user. Based on the user classification, companies are recommended to the end-user based on descending order of stock value. The average volume, average price, average market index, average daily turnover, and sentiment discrepancy index are based on the tweets of a company and the predicted value of its performance. For the classification of the user, we make use of the support vector machine algorithm. For the sentiment analysis of the tweets, the naïve Bayes algorithm is made use of, and then stock classification is done based on mathematical modeling, which includes the sentiment analysis index.
1. Prime Investigation
The communication patterns of the company can be analyzed to understand the performance of the company. The effect of stock prices will depend upon multiple factors like communication on social media, Twitter, and history related to the prices, along with other factors of the stock exchange. The publicly available data related to various companies and stock data obtained from Yahoo Finance can be used to analyze the patterns of stock. The tweets can be converted into a set of statements. Each statement is then analyzed to get the sentimental flow of the overall Twitter data and then generated into a matrix based on Twitter analysis. Find the unique IDs of the products and then the total sentiment of the products is determined. This data can help change the recommendations to end users related to which company stocks are more suitable to trade or purchase. There are interesting flows among users who make use of unique social media applications [1]. Most of the research in the field of stock recommendation systems is concentrated in two areas: stock recommendation systems based on stock commentary and pricing [2, 3]. The work will help in the prediction of sales along with the computation of price [4]. Currently, stock recommendations based on price prediction are mostly based on numerical and statistical approaches, as well as a series data model and a machine learning model [5, 6]. There are channels like jabber and Microsoft teams that are used in the organization for internal communication and can learn the interests of the members along with other effects on company performance [7]. The incorporation of various technologies, including data mining, deep learning, herd psychology, and other unorthodox technologies, into trade recommendations, has become a hot topic in the present financial arena. Money net inflow is only utilized as a stock recommendation strategy in a few researches [8, 9]. Social media and e-mail data will be used by both intra and interorganization communication. The applications help in capturing the behavior exhibited by mining the events and communications created by the company’s leadership and employees. Robustness and stability are the few factors that can be determined by analysis of the communication [10].
Data on stock market values is created in vast numbers and changes on a daily basis. The stock market is a complex and tough system in which people may either profit or lose their whole life savings [11]. In general, we believe that a stock’s rise with significant trading volumes is related to a high order to buy shares. In order to determine the massive order net flow of stock, we must study the stock trading volume and churn. Cash net flow and money flow are separate concepts [12, 13]. As per the theory suggested in communication, the failure or success of a critical business function will depend upon the communication of employees. This will also have an impact on the performance of employees contributing to the success or failure of the product [14]. It has been noted that money flow is often greater than zero, while money net flow is less than zero. Separately, the money net flow is more than zero, but the huge order net movement is below zero [15, 16]. The model is created based on the merging of the company with another and new startup acquisitions, process changes, and bankruptcy can be studied to find the performance of the company on the stock trade platform [17]. A stock price prediction is significant since it is utilized by both businessmen and ordinary people. People who participate in stock market activities will either make money or lose their whole life savings. It is a broken system. Because price fluctuations are influenced by a wide range of elements such as news, social media data, fundamentals, corporate output, government bonds, historical pricing, and national economics, building an appropriate model is tough [18]. The historical data analysis of stocks [19] and rule-based stock formation [20]. Multiple variables are created for modeling the stock performance, namely, pattern flow, feedback provided by employees and the public, and specific events. The performance model involves a survey on job analysis, the performance of an employee, and the productivity of an individual as well as a group [21]. A few more models work on communication from subordinates, the supervisor of a team, quality of information, and autocorrelation computed between the set of employees [22].
2. Motivation
Since the market prices of any company have been changing on a daily basis, it is necessary to find a solution to predict these figures. The novelty of the work lies in the linkage of the social platforms, i.e., Yahoo with the Baiyes algorithm, and it has been found that Amazon shows the maximum prediction rate. The social media platforms have been immensely used nowadays and are of great importance for observing, analyzing, and speculating on the trends of the stock market. As the social platforms are directly linked to the economic, political, and social activities, which directly impacts their ups and downs in the stock market. Predicting prices is the most complex task because of the varying nature of prices. The value of a stock price is related to the sentiment expressed in the market related to the volatile value for a stock. These values are helpful in finding the best price. Trading is responsible for the fall and rise of the economies of the countries. The tools used to advise the customers on buying or selling stocks are stock brokerage applications. The tools make use of technical data and perform time series on it. The media-related texts contain comments that have a commercial nature; the emotional data will then be analyzed to find the sentiments. Web crawler tools like twitter4j can be used to get comments specific to companies and then analyze the data to generate the sentiments for each of the tweets. Then, across all the tweets, a total sentiment score is computed for each of the companies.
The various disadvantages in the existing system which can be rectified are(i) History data from Yahoo Finance API are used for performing the recommendations and does not take into consideration the public sentiments expressed by the company profiles or employees, as well as the general public.(ii) The users for whom suggestions are made are not classified, so even if a middle-income group is suggested, very high-value stocks are suggested, and hence it is not adaptive in nature. Hence, based on the monetary spending of the end users, the classification is not performed.
3. Background
The variables used in modeling the stock include concentration value related to growth, the intensity of capital investment, and the amount of advertising. These factors will help in the classification of the high-attractive or low-attractive company. The profit or loss of the company can be determined based on the firm’s variables computation. The behavior of study for employees can take into account factors like formal entity, informal entity, planning nature, rewarding points, along with skills and manager influence, are important for an organization’s success [23].
The technical performance of the projects will depend upon two models of communication. The first model is direct communication with the project team, and the second model consists of an intermediate person who communicates on behalf of the team. The communication values will involve the type of code used, communication between products, and models followed for collaboration within the organization and outside the organization [24]. Real-world settings are used in financial markets for prediction of stock performance. The weblogs are taken, the estimation of various emotions is done, and then stock market prices are determined. A variety of characteristics like anxiety value, worry level, and fear diagnostics can be used in finding the stock influence [25].
The development project groups will vary in terms of group longevity, which is defined as the amount of time people work together [26]. This information can be revealed inside or outside the organization. Both the technical as well as the communication performance are very important for the tenure composition of groups and will affect the stock performance of the company [27]. The quality of service for the organization will depend upon the profits. The stock price changes and its good impression is created by using communication and control processing of the efficient process to maintain the company, and the second is the results that the new process is producing. These steps help in securing a better market position in the stock market [28]. The search for information on the web is done by making use of accounting and market-based measures in order to compute the risk. The strategy is generally communicated to the stakeholders. Few companies, for example, general electric had a variety of divisions, which was very difficult for the analysts to perform influence on the stock market and had negative effects on its price [29]. Communication within a large organization has an impact on the stock price behavior. The properties and structure of the data can be used for training of the model. From the model, it is evident that there is more diverse communication. Mutual, interpersonal communication will affect the organizational crisis and can lead to failure in the organization [30, 31].
4. Methodology
The methodology involves the classification of users into various categories: GOLD, SILVER, PLATINUM, and BRONZE. The classification is done based on annual gross income, health insurance, rent, and other financial factors. The financial data from Yahoo Finance is taken for the companies. The Twitter data is taken and then found company-wise positive and negative scores. After that, the price computation is done, along with the classification of companies. For each kind of user, the different classes of companies are recommended. The different modules can be described as follows, as shown in Figure 1.

4.1. Registration
This module is executed by the end customer. Different entries like username, password, demographic details, gender, and if the username or password does not already exist, then the user is allowed to register, otherwise registration fails.
4.2. Login
This module is responsible for authentication of the end user. If the validation is successful, then the user can access the application. It can be either a customer or an admin. If t is a customer, based on the financial data for the end-user, the classification of the user and recommendations of the stock are made. For an admin, the sentiment analysis is done, company-wise stock classification, and then the classification of companies, and finally price-based recommendations are made.
4.3. Classification Using Random Forest
The various financial factors like expenses, medical insurance, PF, life insurance, child education, and cost to the company are used as the input factors along with the training data set present in user finance.arff and then random forest application with five different decision trees are executed with the C4.5 algorithm to predict the output class as either GOLD, SILVER, PLATINUM, or BRONZE.
The figure shows the proper steps for the presented method. This begins with the registration and then the login with credentials. The negative and positive score of the company depends on the tweets related to any company.
The following are the steps involved in random forest process: random forest is a flexible, user-friendly machine learning approach that, in most circumstances, produces great results even when no hyperparameters are adjusted. It is also one of the most commonly utilized algorithms due to its simplicity and adaptability (it can be used for both classification and regression tasks).(1)Read all the datasets from the history data with the following attributes: the datasets are collected in the format of having expenses, medical insurance, PF, life insurance, child education, and cost to the company. The admin will be able to view the data sets.(2)Calculate the number of instances of historical data.(3)Divide the entire data set into multiple groups randomly.(4)For each of the subsets, execute the C4.5 algorithm.(5)After executing the decision tree, the output of the decision tree which corresponds to the maximum class is treated as the user class.
4.4. Hashtag Storage
This module is responsible for the storage of the hashtag. The hashtag will be stored against the company name. The hashtag must be a valid one that belongs to the Twitter data along the path.
4.5. OAuth API for Twitter
The OAuth API for Twitter is responsible for communication with respect to our application and the Twitter application. This communication will happen based on a secret key and an OAuth token. The Twitter application will then validate the OAuth token and, if it is valid, only communication will happen.
4.6. Data Collection Using Twitter
The set of hashtags stored in the application is scanned. The count of hashtags is taken. For each of the hashtags, the list of tweets corresponding to the hashtag is taken and then stored in a valid format against the company name.
4.7. Real-Time Stock History Datasets from Yahoo Finance
The real-time stock history datasets are obtained by calling the Yahoo Finance API. The Yahoo Finance API will be called by passing the unique Yahoo Finance API key and the output will be real data obtained from the finance API average over the year with the values of volume, marketing index, daily turnover, and price. In a similar fashion, the process is repeated for all the keys related to the Yahoo Finance API.
4.8. Positive Keywords
This is a set of keywords that are used for training positive sentiments.
4.9. Negative Keywords
This is a set of keywords responsible for training the negative sentiments for the naïve Bayes method.
4.10. Polarity Computation Tweet Wise
This module will be used to provide the training of the naïve Bayes algorithm with respect to positive and negative classes. The output will be a class that can be either POSITIVE or NEGATIVE. The Algorithm 1 used for the classification of tweets is described as follows:
|
4.11. Company-Based Polarity Computation
This module will find the total positive and total negative sentiment for the entire company based on each individual tweet sentiment for the company. The company-based sentiments can be computed using the process described in Algorithm 2.
|
4.12. Sentiment Index Computation
The sentiment index computation is performed based on the following equation for each of the tweets:
4.13. Sentiment Discrepancy Index
This module is responsible for computing the sentiment discrepancy index for the stocks of the companies based on the following equation:
4.14. Real-Time Yahoo Stocks for Company
This is responsible for retrieving the real-time stocks for the company based on its unique company name and then finding average values of various stock attributes like volume, market index, daily turnover, and price.
4.15. Stock Price Prediction
The stock price prediction for the company is done based on various real-time Yahoo stock parameters along with sentiment index values. The stock price is computed using the following formula:
4.16. Classification of Company into Different Categories
The stock price is computed for each company level. The sorting of the company names is performed based on the stock price value. After that, the company is segmented into four different categories, namely, GOLD, SILVER, PLATINUM, and BRONZE.
4.17. Recommendation System
The recommendation system is responsible for providing the names of the companies to the end consumers based on customer classification and company classification. The user classification data is first obtained based on classification using random forest and then classification of company. First, based on the session, the user class is obtained. The companies related to the user class are filtered. After that, the recommendations of the companies are performed for the specific class based on the descending order of total stock price.
5. Experimental Results
This section describes the application snapshots for the developed methodology. Figure 2 shows the end consumer performing the registration. The input will be user id, first name, last name, e-mail, username, and password. If no two consumers have the same user ID or same e-mail, then the registration process will be completed; otherwise, the registration process will fail. Figure 3 shows the sign-in process.


This module is responsible to allow either the administrator or the consumer to perform valid authentication for the end consumer. If the authentication fails, then a role-based user is allowed to access various functionality of application; otherwise, failure happens.
Figure 4 shows the input for user classification. The classification includes expenses, PF, cost to company. Dataset used will be defaulted to userfinance.arff, which contains the training data in the class path, medical insurance, life insurance, and child education.

Figure 5 shows the output for the user classification. As shown in Figure 5, the successful message for the end consumer, which is the prediction of the category, is successful. The second panel is the label associated with the class which has been predicted as the output, and the third panel is the details of each class number along with the class label.

Figure 6 shows the grid in which an end user can see the class associated with him/her after the classification algorithm has been executed.

Figure 7 shows the Yahoo Finance output for real-time data retrieval for various companies like YATRA, Amazon, and Google, and the attributes are volume, price, market index, and daily turnover. The time cost variables are compared in the above figure for three companies. The YATRA has maximum values for all the real-time outputs. These values play a key role in predicting the stock market price.

Figure 8 shows the sentiment recommendations. The average volume, average price, average market index, average daily turnover, and sentiment discrepancy index based on the tweets of a company and the predicted value of performance are described.

Figure 9 shows the company’s prediction graphs. The more horizontal the graph is, the better the company’s performance. The y axis is the name of the company, and the x axis is the value of the prediction.

5.1. Analysis
Since the company’s stock market values fluctuate regularly, to establish the sentiment score values, we should look at the company’s social media usage patterns. How big of an impact the social media tweet platform has on stock prices are governed by the dependence factor. A job analysis, employee performance, and individual and group productivity are all part of the performance model. The GOLD, SILVER, PLATINUM, and BRONZE classifications are used in the methodology. The categorization is based on a variety of financial parameters, including yearly gross income, health insurance, rent, and other expenses. In the figures, the Yahoo Finance output for real-time data retrieval for YATRA, Amazon, and Google has been displayed.
6. Conclusion
In this paper, random forest has been used in order to classify the user’s finance. The forecast and various stock attributes are obtained based on real data, which is extracted using Yahoo Finance API. The company-based hashtags are found and for each hashtag, the Twitter data is obtained. From the data, the sentiment computation is made on each tweet. After that, the total sentiments are found out. The sentiment computation index is found for all companies. Finally, the prediction is made, which is nice to invest in based on real stock prices along with Twitter data as well. Finally, the companies are recommended based on their specific classification group. The companies are listed based on descending order of the company names.
Data Availability
The data used in this study will be available from the author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.