| | Input: Preprocessed tweets |
| | Output: Set of features |
| | (1) Read the Data from the file |
| | (2) Create empty lists for each feature that had to be extracted. |
| | (3) For each tweet, ti, do the following: |
| | //Extracting various features |
| | 3.1 word = nltk.pos_tag(indiv_tokens) |
| | 3.2 nouns = [‘NN’, ‘NNS’, ‘NNP’, ‘NNPS’] |
| | 3.3 verbs = [‘VB’, ‘VBD’, ‘VBG’, ‘VBN’, ‘VBP’, ‘VBZ’] |
| | 3.4 if word in nouns then |
| | increment noun_count |
| | 3.5 else if word in verbs then |
| | increment verb_count |
| | 3.6 return the normalized sum values of verbs and nouns. |
| | 3.7 Initialize pos_int and neg_int to 0 |
| | 3.8 sent_id = SentimentIntensityAnalyzer() |
| | 3.9 for index in range of tokens |
| | 3.10 if tokens in intensifier_list: |
| | compute sent_id.polarity_scores(tokens) |
| | 3.11 if score is negative then |
| | increment neg_int count |
| | 3.12 else |
| | increment pos_int count |
| | 3.13 return pos_int, neg_int |
| | 3.14 Initialize sk_value = 0 |
| | 3.15 var = [x for x in nltk.skipgrams(token, n, j) |
| | //n is the degree of the n grams & j is the skip distance |
| | 3.16 for i in range(len(var)): |
| | 3.17 for j in range(n): |
| | 3.18 word = sid.polarity_scores(var[i][j]) |
| | 3.19 if word corresponds to positive |
| | increment the sk_value |
| | 3.20 else |
| | decrement the sk_value |
| | 3.21 return sk_value |
| | 3.22 Read a tweet from the input data set |
| | 3.23 Load the dictionary containing popular emojis |
| | 3.24 for ‘i’ in emoji_list: |
| | 3.25 if i in tweet: |
| | 3.26 update the emoji list and increment sentiment value based |
| | on the total occurrence of that particular emoji |
| | 3.27 return the normalized emoji_sentiment value |
| | 3.28 Initialize the interjection counter to 0 |
| | 3.29 Load the file containing list of interjections for interjections in the list |
| | 3.30 if tweet contains the corresponding interjection |
| | 3.31 update the interjection count |
| | 3.32 for every word in tokens |
| | 3.33 if word.isupper() |
| | increment uppercase count |
| | //Apply regular expression to find out repeating letters |
| | 3.34 result = re.compile(r’(.)\1∗’) |
| | 3.35 for text segment in result (repeating letters) |
| | 3.36 if length of text segments exceeds 3 //minimum 3 consecutive |
| | occurrence of same letter |
| | Increment the repeated words count |
| | 3.37 Initialize pos_count, neg_count, flip_count to 0 |
| | 3.38 for words in tokens: |
| | 3.39 sent_score = sent_id.polarity_scores(words) |
| | 3.40 if the score obtained is negative then |
| | Increment neg_count |
| | 3.41 if the previous word encountered is positive then |
| | Increment flip_count value |
| | 3.42 if the score obtained is positive then |
| | Increment pos_count |
| | 3.43 if the previous word encountered is negative then |
| | Increment flip_count value |
| | 3.44 return pos_count. neg_count, flip_count |
| | 3.45 punct = punctuations_counter(tweet, [‘!’, ‘?’, ‘…’]) |
| | 3.46 return exclamation.append(punct[‘!’]) |
| | 3.47 return questionmark.append(punct[‘?’]) |
| | (4) Extract the features and append the features to the lists that was created initially. |