Abstract

Microalgae are present at all levels of nutrients and food networks, so in aquatic environments they are an important part of the food chain of aquatic organisms, which also play an important role as biological purifiers of water resources and regulation. They also affect the pH of the environment; also plants are the only organisms capable of synthesizing long-chain fatty acids. Therefore, microalgae are the supplier and primary source of unsaturated fatty acids (PUFA) for all organisms present in the food chain of the aquatic environment. It should be noted that many microalgae are also biological indicators of water and reflect the ecological status of the environment. Precise classification of microalgae is related to the human observation capability. The present study proposes a new optimized classification technique with higher accuracy to provide a computer-aided classification of the microalgae. The method begins with an image segmentation to determine the region of interest. The segmentation part has been optimized by a new metaheuristic to provide higher accuracy. Then, the features have been extracted and fed to a Support Vector Machine (SVM) for final classification. The comparison results of the proposed method with some other methods show that the proposed method with 0.828 Kappa, and 0.342 and 0.855 min and max value of F1, provides the highest accuracy compared to the others.

1. Introduction

Long-chain polyunsaturated fatty acids (LC-PUFAs) such as eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) are among the essential components of various metabolic processes in human life [1]. There are significant quantities of PUFAs in oily fish products that are used to prevent heart disease and cancer [2]. Some efforts have been made to decrease the expense of these products by replacing fish oil ingredients with microalgae products. In the C. Cohn microalgae life cycle, two distinct types are considered: motile and nonmotile cells (cysts). Furthermore, the varying sizes of the microalgae signify different phases of the cell life cycle. Small cells, for example, represent a young offspring and cysts represent an adult cell. Motile cells typically emerge in an oval form, while nonmotile cells are usually circular and appear larger than motile cells. Since microalgae are the main manufacturer in the aquatic food chain and oxygen maker in the aquatic environment; it is a substantial microscopic aquatic life form [3]. Algae are considered as a biological index in water resource management to show water quality due to their sensitivity to environmental variations. This turns microalgae detection into a significant subject in water resource management. However, this purpose takes a lot of time due to the need for expert biologists to finish it. One simpler technique for reaching this aim is to use image processing [4]. This study presents a new method for automatic segmentation of microalgae in microscopic images. However, microalgae have a high significance; there are a small number of research works about them. For example, OSA et al. [5] proposed a method for the segmentation of green microalgae images based on region growth. The method was started by a morphological feature extraction followed by a taxonomical classification of the species. The method is based on the seeded region growing principle and a fine-tuned filtering preprocessing stage, which are used for smoothing the input image. Simulation results indicated that the proposed method provides proper precision for green microalgae image segmentation compared with some state-of-the-art methods.

Giraldo-Zuluaga et al. [6] proposed an automatic detection system for Scenedesmus polymorphic microalgae in microscopic images. In the study, the image was first prepossessed by contrast equalization and then the image was segmented by the active contours. The statistical features evaluation with texture features was presented for algae description.

Wei et al. [7] proposed a method for improving the hyperspectral microscopic imaging system to provide hyperspectral images of microalgae samples over pretreated steps. Afterward, the Fisher algorithm was utilized for microalgae species identification. For analyzing the ability of the method, it is assessed in terms of sensitivity and specificity and the results showed high resolution for these two measurement indicators.

Final results indicated that using the proposed method provides fast and suitable results.

Liao et al. [8] presented a method based on deep learning to identify marine microalgae. The study used a convolutional neural network (CNN) to classify the marine microalgae. The study addressed two collections of classification experiments, where one group categorized the algae into 5 classes based on the family category, and the other one categorized the algae into 11 classes based on species category. Simulation results indicated that the polarization information developed using the high and robust classification precision of the proposed method for low-resolution microalgal images.

Xu et al. [9] proposed an identification methodology for microalgae images based on transmission hyperspectral microscopic imaging and transmission hyperspectral microscopic imager (THMI). Hyperspectral imaging (HSI) of three species of microalgae was proposed to confirm their absorption features. Principal Component Analysis (PCA) and peak ratio algorithms are used to analyze the transmission spectra for dimensionality reduction and feature extraction, and finally, the Support Vector Machine (SVM) model was employed for classification. Simulation results indicated that the proposed method has a good potential for the classification of microalgae.

As can be observed from the literature, lots of works were performed to provide a proper identifier for the microalgae. However, several works were done for this purpose; the recent works showed that using optimized techniques, especially metaheuristic algorithms, provides higher efficiency for microalgae classification. This motivates us to provide a new modified metaheuristic to use as an efficient tool for the classification. Therefore, in this research, we worked on a high-ability classifier for the identification of the microalgae in the images. The main contributions of this study can be highlighted as follows: (1) Providing a new technique for better segmentation of microalgae. (2) Designing a new improved version of the metaheuristic technique, called Quantum Thermal Exchange Optimization (QTEO) algorithm for this study. (3) Using the designed technique to optimize Kapur’s entropy thresholding method. (4) Validation of the proposed method with some state-of-the-art methods. In the next sections, the method for this purpose has been described and validated in detail.

2. Histogram Equalization

Image histogram equalization is one of the image processing methods that is used to better display the output image or for further analysis [1016]. A simple example of histogram equalization for microalgae images is shown in Figure 1.

As can be observed from Figure 1 that the probability density value for the equalized image is more distributed than the original image. This can be so helpful for image segmentation and extracting helpful features from the images.

3. The Quantum-Based Thermal Exchange Optimization Algorithm

3.1. Conception

Optimization in engineering and science refers to selecting the best member from a set of achievable members [17, 18]. In the simplest form, an attempt is made to systematically select data from an achievable and computable set to obtain the optimal value (minimum or maximum) of a real function [19]. Many optimization problems in engineering are naturally more complex and difficult than can be solved by conventional optimization methods such as dynamic programming methods and the like [20]. Several optimization problems today, including nonpolynomial (NP-Hard) problems, are approximately solvable with existing computers [21]. One of the solutions to deal with such problems is the use of approximate and inexact algorithms [22]. These algorithms do not guarantee that the solution obtained is optimal and only with a lot of time can a relatively accurate solution be obtained, and, depending on the time spent, the accuracy of the solution changes [23]. One way to solve these kinds of problems is to consider all possible solutions and calculate the objective functions associated with them and, finally, select the best solution [24]. It is clear that the method of complete counting ultimately leads to an accurate solution to the problem, but in practice, it is impossible to use it due to a large number of possible solutions [25]. Due to the problems related to the complete counting method, there has always been an emphasis on creating more effective and efficient methods [26]. In this regard, various algorithms have been developed, for example, Genetic Algorithm (GA) [27], Ant Lion Optimizer (ALO) algorithm [28], equilibrium optimizer [29], Mayfly Optimization Algorithm (MOA) [30], and World Cup Optimization (WCO) algorithm [31].

Recently, a new metaheuristic, called Thermal Exchange Optimization (TEO) algorithm, has been proposed, which is derived by the temperature behavior of the objects and their positions and their variations between cold and warm parts [32]. In this study, a new improved version of the TEO algorithm, called the Quantum-based TEO (QTEO) algorithm, has been proposed to improve the precision and consistency of the original method. The main target is to utilize this new algorithm for microalgae segmentation. The main conception of the method is based on the Newton law of cooling. This law explains that the body temperature ratio varies proportionally around the body temperature. If the temperature variation in the body and its neighborhood is small, the mean value of the heat exchanged between the body and its neighborhood is almost proportional to the difference in temperature of the body and the environment which is due to the convection, infrared radiation, and conduction. The law of cooling is achieved by the following equation [33]:where describes the coefficient of the heat transfer, signifies the heat, states the area surface which transmits heat, object geometry, and surface state, and and represent the ambient and body temperatures.

The losing time is which determines the temperature variation in the reserved heat as the temperature falls , i.e.,where signifies the density , describes the volume , and defines the specific heat .

Henceforth [33],where represents the early high temperature. The above equation is correct when does not depend on :

Therefore, by considering as a constant,

Consequently,

3.2. The Algorithm

Based on the Thermal Exchange Optimization (TEO) algorithm, some solution candidates are assumed as cooling ingredients and the remaining candidates are considered as the environment. Afterward, this process is established reversely.

The TEO algorithm starts with initializing a certain number of randomly distributed candidates as the solution candidates. This is done by the following equation [34]:where describes a random value in the range [0, 1], represents the algorithm primary population for the object, and and represent the minimum and maximum limitations. The candidates are then evaluated on the cost function to verify their ability. Afterward, number of best candidate vectors are saved in Thermal Memory (TM) to provide higher efficiency of the algorithm with lower complexity. Some of these candidates are then added up to the main candidates and the same number of them in the individuals has been eliminated. The candidates contain two equal types, including environment and the heat and cooling transfer objects. Figure 2 shows this state.

As can be seen from Figure 2, the candidates contain two equal types, including environment and the heat and cooling transfer objects. Assume as environment object for cooling object and vise versa. If is greater than the object, the temperature exchanges steadily. is mathematically achieved by the following:

To model the effect of time in the algorithm, the following formula has been considered:

For establishing the global searching, the environmental temperature variation is considered that can be assumed by the following equation:where defines the preceding object temperature modified by , and and describe the control variables, respectively.

By considering the previous models, the new object temperature is mathematically updated as follows:

Finally, the term is a component to show the cooling of the objects.

The candidates are compared with randomly distributed values, in the range [0, 1].

In this situation, if , one dimension of the the candidate is randomly chosen and the value is rephrased as follows:where represents the variable of the candidate number and and describe the lower and the upper limitations of the variable, respectively.

The algorithm is stopped if the termination condition has been reached.

3.3. The Quantum Thermal Exchange Optimization (QTEO) Algorithm

However, Thermal Exchange Optimization (TEO) algorithm is a well-organized optimization technique; it is sometimes stuck in the local optimum solution. This study uses a quantum theory-based modification to resolve this issue as is possible. Since the quantum position is not instantaneously definite, it should be determined by a wave function , where the candidates’ position is known by . This states that the square mode defines the density probability of the individual in space at position . The equation for this case can be formulated as follows:where signifies the probability density function to describe the following conditions:

The formulation for the position of the candidates is achieved by the following: defines the population size, indicates the local attraction area for the iteration, signifies mean value of the optimal situation, determines the weighted distance between the individuals and the mean optimal situation of the population, describes the maximum number of iterations, and represent the global optimal positions and the candidate position, respectively, and provide two stochastic values in the range [0, 1], and signifies the shrinkage-expansion coefficient in the range to . Here, and are 1 and 0.5, respectively. Based on the above explanations, the initial position can be updated as follows:

4. Segmentation of Microalgae

4.1. Concept of Entropy Criterion (Kapur’s Method)

Image segmentation is one of the most important basic operations for image analysis. The purpose of segmenting images is to divide the image into unequal areas so that these areas are not shared. Image segmentation is used in many applications of image processing and computer vision, especially medical imaging. In recent years, many methods have been proposed for image segmentation. These methods can be divided into two general categories of supervised and unsupervised methods. The unsupervised method is preferred in real-time applications because it does not require manual segmentation. Thresholding is one of the most important methods in image segmentation. The main goal in threshold-based methods is to find a threshold value for two-level thresholds and a few threshold values ​​for multilevel thresholds. In two-level thresholding, only one threshold value is selected and the image pixels are divided into two groups. A popular method for thresholding is Kapur thresholding. The Kapur thresholding is a nonparametric technique to provide optimal threshold values based on image entropy and the histogram of the considered image. The key idea is to maximize the image entropy to get the optimal threshold value. To get this end, it assumes as a vector of the image threshold, i.e. [35],

In this situation, the Kapur entropy method is defined by the following equation: describes the entropy of the image and is obtained as follows:where describes the histogram for , and signifies the histogram of the image intensity levels. defines the natural logarithm.

This study uses the formerly explained new designed Quantum-based Thermal Exchange Optimization Algorithm to achieve the optimal threshold values by threshold points tuning until is met.

4.2. Image Thresholding Using Quantum-Based Thermal Exchange Optimization Algorithm

Each one of the candidates in the proposed algorithm shows a decision variable for the threshold values and image segmentation. The population can be indicated as follows:where describes the population size, signifies the element of AT, indicates the transpose sign, and defines set for RGB images while is selected for grayscale images. The boundaries of the search space in the present study are between 0 and 255, which point to the image intensity levels.

4.3. Image Segmentation Evaluation

The proposed QTEO-based segmentation method is applied on MATLAB R2017b platform and based on a laptop with Intel®, CoreTM i7, and 16 GB RAM. The parameters of the proposed QTEO-based segmentation method are initiated with following values: the initial numbers of continents are selected as 5 and the number of individuals is set as 100. The value of the and is set to be 2, the dimension of the search space is (chosen threshold), and the maximum number of iterations is set 200. The algorithm will be terminated after maximizing the objective function (Jmax). To obtain a fair analysis, the method has been performed 30 times iteratively and the mean value is considered as the optimal threshold. In the present study, several databases from the Internet are tested and assessed to determine the performance of the suggested segmentation methodology. Figure 3 and 4 show the image segmentation based on QTEO-Kapur.

As can be observed from figures, the method can provide useful results for the segmentation of the microalgae.

5. Feature Extraction

The next step is to use a feature extraction to extract the segmented image features for classification of the image with higher accuracy and less time. In this study, the Gray-Level Cooccurrence Matrix (GLCM) is utilized as the feature extraction method. Tissue characteristics were evaluated by tissue statistical analyses over the statistical distributions of grayscale compounds observed in a specific position about each other. The GLCM is ​​a matrix with identical numbers of rows and columns that shows the grayscale levels of the image. In other words, if an image gray number is up to , the dimensions of the simultaneous event matrix will be matrix. The current study assessed the features including entropy, contrast, homogeneity, energy, and correlation. The formulation of these features is given below. The correlation feature is achieved as follows:

The correlation feature is achieved as follows:

The entropy of the features is achieved by the following:

The energy of the features is obtained as follows:and finally, the homogeneity is achieved by the following equations:where signifies the mean value, describes the standard deviation, states the intensity values for pixels at position , and and represent the image size.

6. Simulation Results

Various databases have been proposed for assessing the microalgae diagnosis systems. In this study, we use the AlgaeVision database as a popular one for the analysis. AlgaeVision database contains a virtual image collection of freshwater and terrestrial algae that are collected from Britain and Ireland. AlgaeVision includes 2,300 images of nearly 250 genera and 680 species of British and Irish algae [36]. The data includes 19 different classes. The minimum number of the elements of the class is 10. The classes that are considered in this study are Bacillariophyceae-1, Spirotrichea, Peridiniales, Nanoplankton <5 µm, Nanoplankton 5–10 µm, Gymnodiniales, Prorocentrales, Cochlodinium, Ellipsoidal Microplankton, Ellipsoidal Nanoplankton 10–20 µm, Spherical Nanoplankton 10–20 µm, Spherical Microplankton, Pseudonitzschia, Chaetocerotales, Coscinodiscophyceae, Bacillariophyceae-2, Naviculales, Corethron, and Rhizosoleniales.

For recognizing algae, the extracted features must be classified. In this study, the Support Vector Machine (SVM) algorithm is employed as a fast and simple classifier to classify positive and negative examples with a maximum margin. The SVM is an N-dimensional set of points that indicates the boundaries of the categories. The SVM algorithm finds the best limit in the data and makes the highest distance as much as possible from all categories. The idea is to recognize the best decision surface as follows:where defines a -dimensional test set vector, describes the training set vector, represents a class label in the range −1 and 1, N is the training set numbers, and b represent the model parameters, respectively, and states a kernel function. Here, the Sequential Minimal Optimization (SMO) is used for solving the quadratic programming (QP) problem of the SVM to remove the need for an extra storage matrix. To authenticate the efficiency of the suggested approach, different measurement indicators have been adopted. Due to the presence of multiple classes, we used specific features including Kappa coefficient, macro-, and microaverage. Kappa metric describes the agreement level of two or more observers in the same class.

Kappa’s value is between −1 and 1, where 1 describes the full agreement and 0 defines that the achieved results are too less than the expected value for agreement. F-Score metric indicates the harmonic mean value between the recall and precision. The MaxF1 and MinF1 describe the maximum value and the minimum value of the F-Score. The term MinF1 has more effect on the classifier efficiency on populations with more classes that represents the imbalanced dataset limitation. This metric is one of the widely used indicators to assess classifiers. Furthermore, the term MaxF1 has been affected by classes with small number of elements, which has an effective impact on the results evaluation. A 10-fold cross-validation method is used for evaluating the models. Table 1 illustrates the classification results in the microalgae dataset.

For analysis of the robustness of the algorithm and to achieve the oversampling strategy, a random resampling method has been used. This method can over/undersample the dataset based on a percentage through some original number of instances. This technique generates a random set of sample data by sampling with (or without) replacement. The uniformity of the resampled dataset in this technique is defined by a parameter called bias factor. It changes in the range [0, 1] where the values near 1 make a uniform dataset. The presented study evaluates different bias factors. This method allows the classifier to retain smaller classes while optimizing accuracy. As can be observed from Table 1, the value of MinF1 is larger than the MaxF1 values, which are due to the high efficiency of the algorithm to correctly organize the majority classes. The term MaxF1 has been more affected by the smaller classes. Table 2 indicates the resampling results which are performed with several sample sizes from 25% to 200% to deal with the imbalance issue.

Table 2 illustrates the improvement of the results. The method has a satisfactory improvement in terms of MaxF1; therefore, it can be observed that better results are obtained by the SMO. Moreover, the Kappa value achieved by the proposed SMO algorithm provides a larger growth slope which is predicted due to its better efficiency for Gaussian distributed spaces.

As mentioned in this paper, the proposed method uses an optimal technique using the Kapur method for segmentation of the microalgae. The main advantage of this technique is that using metaheuristic-based techniques can help to run from the local minimization of the problem. However, this can make the problem about more complexity of the proposed method toward the classical one.

7. Conclusions

The present study proposed a new optimized and pipelined method for in-line detection of microorganisms. The method was first used as histogram equalization to improve the quality of the image quality. Then an image segmentation based on an optimized Kapur method was designed and used for separating the microalgae. The optimization of the Kapur in the study is based on a new improved version of the Quantum-based Thermal Exchange Optimization Algorithm. The new algorithm was designed for improving the segmentation accuracy. The Gray-Level Cooccurrence Matrix (GLCM) was then used for extracting the main features of the lesion area. A 10-fold classification was then performed to classify the Sequential Minimal Optimization- (SMO-) based Support Vector Machine (SVM). The suggested method was compared with some state-of-the-art methods by three different metrics. The final results showed that the suggested technique with 0.828 Kappa, and 0.342 and 0.855 min and max value of F1, provides the best efficiency against the other compared methods.

Data Availability

The data are available at https://algaevision.myspecies.info/

Conflicts of Interest

The authors announce that no conflicts of interest exist in this research.

Acknowledgments

This research was supported by the Educational Science Foundation of Jiangxi Province (# 41562019).