Abstract
Hadith judgment implies checking the validity of Hadith to decide whether it is correct (trustworthy) or false (bogus). “Matn” and “Isnad” are the main constituents of Hadith; “Matn” is the sayings of the prophet, whereas “Isnad” represents the narrators’ series. The first step of Hadith judgment is the extraction of narrators’ names, after that, the rules of judgment, which were set out by Hadith’s scientists, could be implemented, three of these rules are particularly related to the narrators’ series, and these rules are continuity of the transmission chain, the trustworthiness of the narrators, and the preciseness of the narrators. Therefore, to check the authenticity of Hadiths, the three conditions must be satisfied, and to do so, the narrators’ names must be extracted first. Isnad contains many words and phrases called “Isnad-Phrases”; these phrases have many types or categories called part of Isnads (POIs) like Narrator-Name, Prophet-Name, and Received-Method. A lot of computational research studies suggest serving Hadith sciences by extracting the narrators’ names and other POIs using various approaches. This study presents a new hybrid approach founded on the hidden Markov model (HMM) and gazetteer lists to process “Isnad.” The approach objective is to expect all POIs in the Isnad including narrators’ names. The experiments carried on 1,000 Hadiths from “Sahih Muslim”: 900 Hadiths as training dataset and 100 Hadiths as testing dataset, and the results show a noteworthy accuracy for the proposed hybrid approach.
1. Introduction
Hadith is the second enactment root in Islam behind Quran; it represents the entire life of Prophet Muhammad, such as his deeds, sayings, and actions. The main constituents of Hadith are as follows: “Isnad” and “Matn,” and Isnad represents the narrators’ series, whereas “Matn” is the saying of the prophet. Since the first century AH (seventh century CE), Hadith scientists set the rules of many Hadith sciences such as “Al-Jarh Wa Al-Ta’dil” and “Mustalah Al-Hadith.” Hundreds of glossaries and books were written to be in the service of these sciences, and the following are the essential books that gather the correct (authentic) Hadiths: “Sahih Muslim,” “Sahih Al-Bukhari,” “Sunan Ibn Majah,” “Sunan Abu Dawood,” “Sunan An-Nasa’i,” and “Sunan Al-Tirmizi” [1].
Fundamental science in Hadith is “Al-Jarh Wa Al-Ta’dil,” it provides a structural procedure to inspect the biography of Hadith’s narrators, and it involves two principal sections: “Al-Jarh” (criticism) and “Al-Ta’dil” (praising). Criticism examines the sincerity, trustiness, and honesty of narrators, because they may have a bad memory, fake narrators, unknown narrators, liar narrators, etc. This section ranks narrators into six levels, rank 1 shows that the narrator has less criticism level, the expressions used to indicate this rank are many such as his Hadith is soft, not safe, etc., whereas rank 6 shows that the narrator is completely liar, and the expressions used to indicate this rank are many such as compulsive liar, he is fabricator, he lies, etc.
Section two, praising, examines the trustworthiness of narrators, it ranks narrators into six levels, rank 1 shows that the narrator has the uppermost standard of praise, the expressions used to indicate this rank are many such as most reliable of the people, most established of the people, etc., whereas the rank 6 shows that the narrator has the lowest standard of praise, and the expressions used to indicate this rank are many such as acceptable, satisfactory in Hadith, etc.
Consequently, for Isnad judgment, the scientists of Hadith should have a robust understanding of the syntax and semantics of the Arabic language and in “Al-Jarh Wa Al-Ta’dil” science [2].
The next Hadith (Hadith 1) should be considered, that is, quoted from “Sahih Muslim” book [3]:
“(حَدَّثَنَا قُتَيْبَةُ بْنُ سَعِيدِ بْنِ جَمِيلِ بْنِ طَرِيفٍ الثَّقَفِيُّ، وَزُهَيْرُ بْنُ حَرْبٍ، قَالاَ حَدَّثَنَا جَرِيرٌ، عَنْ عُمَارَةَ بْنِ الْقَعْقَاعِ، عَنْ أَبِي زُرْعَةَ، عَنْ أَبِي هُرَيْرَةَ، قَالَ جَاءَ رَجُلٌ إِلَى رَسُولِ اللَّهِ صلى الله عليه وسلم فَقَالَ مَنْ أَحَقُّ النَّاسِ بِحُسْنِ صَحَابَتِي قَالَ ” أُمُّكَ “. قَالَ ثُمَّ مَنْ قَالَ” ثُمَّ أُمُّكَ”. قَالَ ثُمَّ مَنْ قَالَ” ثُمَّ أُمُّكَ”. قَالَ ثُمَّ مَنْ قَالَ“ ثُمَّ أَبُوكَ”).”
The Isnad of Hadith1 is as follows:
“(حَدَّثَنَا قُتَيْبَةُ بْنُ سَعِيدِ بْنِ جَمِيلِ بْنِ طَرِيفٍ الثَّقَفِيُّ، وَزُهَيْرُ بْنُ حَرْبٍ، قَالاَ حَدَّثَنَا جَرِيرٌ، عَنْ عُمَارَةَ بْنِ الْقَعْقَاعِ، عَنْ أَبِي زُرْعَةَ، عَنْ أَبِي هُرَيْرَةَ، قَالَ جَاءَ رَجُلٌ إِلَى رَسُولِ اللَّهِ صلى الله عليه وسلم).”
This Isnad might be separated into words and phrases called “Isnad-Phrases,” every phrase might be categorized under one of the following part of Isnads (POIs): Prophet-Name, Narrator-Name, Narrator-Name-Prefix, Received-Method, Received-Method-Prefix, Replacement, Title, and Others [4]. Table 1 shows the POIs of Hadith1.
To judge Hadith, the conditions and rules set by Hadith scholars must be followed, and part of these conditions is associated with narrators’ series. First, the narrator series must be not broken, and this implies that the first narrator (student) must meet the second narrator (teacher) and directly receive Hadith from him. Second, the trustworthiness of the narrators must be checked, which ensures morality and religiousness, morality certifies the respect of norms and traditions by the narrator and certifies that he has good behavior and mentality, and religiousness ensures that the narrator has no obscenity, sin, polytheism, etc. The third condition is preciseness, and this includes precision in writing Hadiths and precision in the narrator’s memory to ensure that he could correctly recall Hadiths.
Therefore, to check the authenticity of Hadiths, the three conditions must be satisfied, and to do so, the narrators’ names must be extracted first. Accordingly, the importance of this study could be summarized as follows:(1)One Hadith could have different judgments from different scholars, thus, we need to check which judgment is properly the correct one, to do so, we need to apply the Hadith rules to investigate the reasons for this controversy, and this implies the extraction of the narrators’ names as a first step.(2)Not all Hadith scientists are at the same rank, therefore, if one Hadith has one judgment, we need sometimes to check the validity of this judgment, and to do so, we need to apply the Hadith rules, which imply the extraction of the narrators’ names as a first step.(3)There are numerous narrations that are not considered as Hadiths; these narrations do not represent the saying of the prophet; instead, it represents the sayings of the prophet’s companions and successors. The validity of these narrations needs to be checked, to do so, we could apply the same Hadith’s rules on it, and this implies the extraction of the narrators’ names as a first step.
Consequently, the tagging system proposed in this study could be used in varied applications such as the tools that aim to automatically judge Hadiths, and the tools that aim to automatically extract narrators of Hadith.
The gazetteer lists have clear and straightforward steps [5], when we search for a specific Isnad-Phrase in Isnad, the approach simply searches through all lists and finds this Isnad-Phrase, so this approach does not need training. On the other hand, this method does not cover the entire corpus, and it also suffers from an ambiguity problem. Gazetteer lists take great time and effort from the expert, and it is also restricted to only one subject and cannot be used in others. The most important issue in this method is the low prediction accuracy compared to other approaches, but the accuracy of this method could be enhanced by increasing the number of entities in the lists [6].
Gazetteer lists handle multiple entities, and each of them is allocated to one of the following POI: Prophet-Name, Narrator-Name, Narrator-Name-Prefix, Received-Method, Received-Method-Prefix, Replacement, Title, and Others. These lists contain all the Isnad-Phrases of the training dataset, which are obtained from the “Sahih Muslim” book. An expert of Hadiths science formulated all these lists.
On other side, the hidden Markov model (HMM) is one of the significant methods used in natural language processing (NLP); it predicts new observations depending on previous states [7]. Transmission probability is an essential factor that determines how the process transmits from one state to another. HMM can be used in various applications such as bioinformatics, classification techniques, handwriting, and voice recognition [8]. HMM can be expressed as explained in Table 2.
These parameters are used by HMM to generate a series or a sequence of observations O1, O2, … OM. Therefore, the HMM model could be formally defined as follows:where π indicates the start probability, A indicates the transmission probability matrix, and B indicates the emission probability matrix.
HMM can be used to calculate the sequence probability of observations by computing the summation of all state probabilities in that sequence, and it can also predict the probability of one state in a sequence of states by using the forward-backward algorithm. The Viterbi algorithm can be used to compute the probability of state sequence that generates a certain observation sequence [9]. This study presents a new hybrid approach based on the hidden Markov model (HMM) and gazetteer lists to process “Isnad.” The rest of this study is organized as follows: Section 2 presents the related works, the proposed approach is explained in Section 3, the experimental results are discussed in Section 4, and finally, the conclusion is presented in Section 5.
2. Related Works
Many computational types of research studies are introduced in the literature to support Arabic natural language processing (ANLP) and Islamic sciences. In [10], Harrag and Hamdi-Cherif presented a classification technique for Hadiths depending on the query of the user. Vector space model, cosine, and TF-IDF are used in this study, it used 60 Hadiths to perform the experiments, and the precision and recall were 66% and 80%, respectively. In [11], the authors introduced a classification system of Hadiths’ authenticity depending on the fuzzy technique, the experiments were conducted on Al-Kafi book, the authors did not mention the number of Hadiths they used in the study, and the accuracy of the system was 94%. In [12], the researchers introduced a system to classify Hadiths depending on the artificial neural network (ANN) and singular value decomposition (SVD), the experiments were conducted on 453 Hadiths, and the F-measures were 85.75% and 88.33% for ANN and ANN with SVD, respectively.
Bounhas et al. [13] proposed a Naïve Bayes (NB) classifier for the narration chain reliability of Hadiths, the experiments were conducted on 1000 Hadiths, and the F-measure was 89.01%. In [14–16], the same researchers introduced a graph construction for narration series depending on context-free grammar (CFG), semantic web ontology, and memory-based learning, the experiments were conducted on 90 Hadiths from Sahih Muslim and Sahih Al-Bukhari, and the success rate was 86.7%. In [17], the authors proposed a new decision tree-based approach to classify Hadiths; this approach categorizes Hadiths into four collections: “Maudo,” “Daief,” “Hasan,” and “Sahih,” the experiments were conducted on 999 Hadiths from Sahih Al-Bukhari, Sunan Al-Tirmzi, and the compilation of Al-Albani, and the accuracy of the proposed approach was 97.60%.
Harrag et al. [18] proposed an association rule-based approach to build Hadith’s ontology from Sahih Al-Bukhari’s book, the study did not mention how many Hadiths were used, and also, the study did not remark any results. In [19], the researchers introduced a graph construction for narration series depending on k-nearest neighbor, Naïve Bayes, and decision tree. The experiments were conducted on Hadiths from Sahih Al-Bukhari’s book and Musnad Ibn Hanbal book, the study did not mention how many Hadiths were used, and the F-measures were 85%, 80%, and 86% for the k-nearest neighbor, Naïve Bayes, and decision tree, respectively.
Azmi and AlOfaidly [20] introduced a Heuristic rule-based approach to classifying the authenticity of Hadith, the experiments were conducted on 752 Hadiths from Sunan Al-Tirmzi and 2,180 Hadiths from Sahih Al-Bukhari, and the success rates were 93.6% and 99.6% for Sunan Al-Tirmzi and Sahih Al-Bukhari, respectively. Harrag [21] presented a text mining approach for knowledge extraction in Hadith, the approach was built using finite-state transducers (FSTs), the experiments were conducted on the complete set of Sahih Al-Bukhari’s Hadiths, and the F-measure was 52%. Abd Rahman et al. [22] proposed a system for recognizing narrators’ names in the Malay language depending on the rule-based approach, the experiments were conducted on 150 Hadiths from Sahih Al-Bukhari, and the study did not remark any results.
Alhawarat [23] introduced a technique that combines the rule-based method with the n-grams model to elicit narrators’ names from Hadiths, the experiments were conducted on the six books of Hadith, and the F-measures were 65.11% for the n-grams model and 70.76% for the n-grams and the rule-based approach. Faidi et al. [24] combined numerous classifiers and stemmers to compare Hadith classification tools, the experiments were conducted on 795 Hadiths from Sahih Al-Bukhari, and the best combination was the stemmer of Khoja with SVM, which got 57.50% as accuracy. Balgasem and Zakaria [25] used the log-likelihood ratio (LLR) with the rule-based approach to identifying narrators’ names, the experiments were conducted on 235 Hadiths from Sahih Al-Bukhari, and the F-measure was 82%.
Najib et al. [26] proposed a system to classify Hadiths (in the Malay language) based on k-nearest neighbor (k-NN), Naïve Bayes (NB), and support-vector machine (SVM), the experiments were conducted on 50 Hadiths from Sunan Al-Tirmzi and 50 Hadiths from Sahih Al-Bukhari, and the accuracies were 62%, 81%, and 82% for k-NN, NB, and SVM, respectively. Mahmood et al. [27] constructed an IR system depending on the conditional random field (CRF) and finite-state transducers (FSTs), the experiments were conducted on 7,563 Hadiths (in the Urdu language) from Sahih Al-Bukhari, and the F-measure was 92.41%.
Sari et al. [28] proposed an HMM-based method for treating Hadiths written in the Indonesian language, and the indexed data contain the person’s name, Hadiths’ collections, and Hadiths’ numbers. The experiments were conducted on 38,102 Hadiths from the books of Tirmzi, Nasai, Malik, Ibnu Majah, Darimi, Ahmad, Al-Bukhari, Muslim, and Abu Dawud, and the F-measure was 86%. Najeeb [4] suggested a new method to process Hadiths built on genetic algorithms (GAs), and this method seeks to foretell narrators’ names of Hadiths and the rest part of Isnads (POIs) as well. The experiments were conducted on 3,033 from the Sahih Muslim book, and the accuracy was 81.44%.
In [29], Najeeb et al. present the actions to launching an ANLP scientific laboratory in Al-Qunfudah college, Umm Al-Qura University, KSA, to serve Arabic and Islamic research studies. The same authors built a corpus-based lexicon to serve Hadith science [30]. An XML database for narrators and Hadiths was introduced in [31]. Saloot et al. [32] carried out comparative research of data mining and classification techniques of Hadiths.
In [33], the authors presented a regular expression-based approach to address web pages that comprise Hadiths, and the researchers built a conventional database by extracting data from Hadiths. Abdelaal et al. [34] used data mining and machine learning techniques to propose a classification system that forecasts the topic of Hadith such as fasting and prayer. Najeeb [35] inspects the field of employing deep learning in Hadith science. Fadele et al. [36] examined various detection techniques of the bogus Hadiths, and this includes data-based and knowledge-based techniques.
Saeed et al. [37] introduced a social network analysis for narrators, they discussed the structure of these networks and their properties and also examined the interaction patterns and the dominant narrators of these networks, and the authors suggested a new raking approach for narrators. Ubishat et al. [38] introduced a new classification approach for Hadiths based on the sine cosine algorithm (SCA). To improve the SCA exploitation, the simulated annealing (SA) algorithm was used, and to improve solution diversity, the singer chaotic map was used. Plentiful websites [39–45] provide various Hadith services such as e-books, translation tools, and search tools.
In [46], the author disseminated an important review of the scientific methods regarding Hadiths in the following types: natural language processing (NLP), information retrieval (IR), and knowledge extraction (KE). The researcher explained the difficulty of doing a comparison between diverse research studies related to Hadith science for two reasons: first, there are varied preprocessing steps used by these research studies, and second, there are varied Hadiths used by these research studies as a corpus.
Azmi et al. [47] introduced a remarkable review on Hadith research studies, and they investigate the computational and natural language processing research studies and classified them into three fields: Hadith content-based research studies, narration-based research studies, and overall research studies. Binbeshr et al. [48] conducted a systematic review on Hadith authentication and classification methods, and they compared 27 types of research: 13 types of research in classification and 14 types of research in the authentication. All classification research studies used Matn, while most of the authentication research studies used Isnad. The authors pointed out the shortage of exploring some feature extractions, like syntactic and semantic.
Table 3 presents a comparison of research studies related to Hadith processing: the research studies ordered by time of publication.
As it can be observed, various research studies proposed numerous techniques to identify narrators’ names; however, there are no studies to handle Isnads of Hadith using HMM and gazetteer lists for Arabic Hadiths. This study presents a new hybrid approach based on the hidden Markov model (HMM) and gazetteer method to process Arabic Isnads. The approach objective is to expect the narrators’ names and the other POIs in the Isnad. This technique tags the Isnad-Phrases with predefined POIs: Prophet-Name, Narrator-Name, Narrator-Name-Prefix, Received-Method, Received-Method-Prefix, Replacement, Title, and Others.
3. The Proposed Approach
The main architecture of the proposed hybrid approach is shown in Figure 1; this approach combines the gazetteer lists with the HMM method. The proposed approach deals with 1,000 Hadiths from “Sahih Muslim”: 900 Hadiths as a training dataset and 100 Hadiths as a testing dataset. The testing dataset represents the raw Hadiths, which are considered as the input for this approach. The “preprocessing phase” will treat these raw Hadiths—as explained later—then, the gazetteer lists will be invoked to predict the POI categories for Isnad-Phrases of all Hadiths. Subsequently, the HMM method tries to predict the most proper POI categories for all Isnad-Phrases that not predicted by the gazetteer lists. The results will be the “recognized POI” for all Isnad-Phrases in all Hadiths.

The preprocessing phase is shown in Figure 2. The raw Hadiths will be divided into “Matn” and “Isnad”; then, the Isnad-Phrase extraction process will be conducted by an expert scientist in “Hadith science.” Subsequently, all Isnad-Phrases will be assigned to the predefined POI categories. Finally, the Hadith-annotated file will be constructed by all these Isnad-Phrases and its equivalent POIs.

Gazetteer lists accept the Hadith-annotated file as a lookup file; when a new Isnad is coming, the gazetteer lists look for its Isnad-Phrases in the Hadith-annotated file and assigns them to the proper POIs. Note that not all Isnad-Phrases will be assigned in this method because some of these Isnad-Phrases may not exist in the Hadith-annotated file. Also, some of the predictions will not be correct due to the ambiguity problem.
HMM uses the Hadith-annotated file for training purposes; subsequently, the Viterbi algorithm is used to get the final POI results. In the training phase, HMM accepts the Hadith-annotated file as input and uses it to determine the following 4 HMM parameters: States-Array, Start-Probability (π) array, transition probability matrix (A), and emission probability matrix (B). To determine the States-Array, all the POIs are checked in the Hadith-annotated file and then inserted in the States-Array without duplication.
Algorithm 1 explains the process of finding the Start-Probability (π) array. Start-Probability (π) array represents the probability of starting one Isnad with a certain POI. For every POI, the start probability of this POI will equal the number of all Isnads in the Hadith-annotated file that starts with this POI divided by the total number of Isnads in the Hadith-annotated file, as shown from line 5 to line 8.
|
Algorithm 2 explains the process of finding the transition probability matrix (A), and it aims to calculate the transition probabilities between states. The input file of this algorithm is the Hadith-annotated file, while the output is the matrix of transition probability (A). To calculate the probability of transition from state Si to state Sj (i.e., the probability of the sequence SiSj), the algorithm calculates the number of SiSj sequence occurrences in the Hadith-annotated file and then divides this number by the total number of Si occurrences in the Hadith-annotated file, as shown from line 7 to 9.
|
Algorithm 3 shows the process of finding the emission probability (B) matrix; this process aims to calculate the probability of assigning an Isnad-Phrase to a specific POI. The algorithm checks all unique Isnad-Phrases in the Hadith-annotated file and calculates the emission probability for each of them according to a specific POI. So, the emission probability is calculated by dividing the number of times that a certain Isnad-Phrase assigns to a certain POI by the total number of occurrences of this POI in the Hadith-annotated file, as shown from line 6 to line 9.
|
The following example shows the implementation of the proposed approach using two Hadiths from the “Sahih Muslim” book, these Hadiths are listed in Table 4. The POIs of the first and second Hadiths are shown in Tables 5 and 6 respectively. The combination of these tables formulates the Hadith-annotated file for this example.
The HMM parameters can be calculated as follows:(1)States-Array: States-Array represents all POIs in the Hadith-annotated file without duplication. Therefore, the States-Array for this example is as follows: {Received-Method, Narrator-Name-Prefix, Narrator-Name, Received-Method-Prefix, Prophet-Name, Others}.(2)Start-Probability (π) Array: The Start-Probability (π) array for this example could be calculated as explained in Algorithm 1, and the results are shown in Table 7.(3)Transition probability matrix (A): The transition probability matrix (A) for this example could be calculated as explained in Algorithm 2, and the results are shown in Table 8.(4)Emission probability matrix (B): The emission probability matrix (B) for this example can be formed as explained in Algorithm 3, and the results are shown in Table 9. Note that the Hadith-annotated file contains many Isnad-Phrases that could not be displayed in this table, consequently, just a portion of the first Isnad has existed in the table, and the remainder of Isnad-Phrases can be counted likewise.
4. Experimental Results
The experiments were conducted on 1,000 Hadiths from the “Sahih Muslim” book [3]; 900 Hadiths as training dataset and 100 Hadiths as testing dataset. First, gazetteer lists alone were used to check the accuracy of predicting the correct POIs, and the results are shown in Table 10. The “Number of Isnad-Phrases” row in Table 10 represents the number of Isnad-Phrases that are assigned by the expert of Hadith science to a specific POI category. The “Correct” row values in the table represent the number of the Isnad-Phrases that are correctly assigned by the gazetteer lists to the right POIs. The last row in the table, namely, “Accuracy,” represents the accuracy percentage for each POI category. As explained in Table 11, the average accuracy of all POI predictions using gazetteer lists is 75%.
Subsequently, the HMM method alone was used to check the accuracy of the same Hadiths, and the results are shown in Table 12. The average accuracy of all POI predictions using the HMM method is 83%, as explained in Table 11. As it is clear, the HMM method has better accuracy than gazetteer lists. The last experiment was conducted on the same Hadiths using the proposed hybrid approach, which combines gazetteer lists with the HMM method, and the results are expressed in Table 13. As explained in Table 11, the average accuracy of all POI predictions using the proposed hybrid approach is 86%. Table 11 compares the average accuracy between these three methods. The results show that the proposed hybrid approach achieves higher accuracy among the other methods because gazetteer lists will be invoked to predict the POI categories for Isnad-Phrases of all Hadiths; then, the HMM method tries to predict the most proper POI categories for all Isnad-Phrases that are not predicted by gazetteer lists; so, this technique will increase the accuracy of prediction.
5. Conclusions
A new hybrid approach to process Isnads of Hadiths was introduced in this study. The proposed approach, which is based on hidden Markov model (HMM) and gazetteer lists, is used to expect all part of Isnads (POIs) in the Isnad including narrators’ names. The experiments were conducted on 1,000 Hadiths from “Sahih Muslim”: 900 Hadiths as training dataset and 100 Hadiths as testing dataset. The results show that the proposed hybrid approach achieves better accuracy than HMM or gazetteer lists. The results show that the accuracy of the proposed approach is 86%, which is lower than the desired objective; consequently, future works aim to enhance the training process of the HMM using genetic algorithms (GAs). Subsequently, the enhanced method will be used to introduce a novel “judgment” approach for Hadiths, which will differentiate the trustworthy (correct) Hadiths from the bogus (false) Hadiths.
Data Availability
The Hadiths of “Sahih Muslim” in the Arabic language are available from many websites (http://almeshkat.net/book/63).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The author would like to thank the Deanship of Scientific Research at Umm Al-Qura University for the financial support under grant no. 43508026. As well, the author would like to express his sincere thanks and gratitude to Dr. Naser Younes Sabra, Ph.D., in Islamic Studies and Hadith sciences, for his valuable guidance and support; without him, this research would not have been completed.