Abstract
With the rapid emergence of the technology of deep learning (DL), it was successfully used in different fields such as the aquatic product. New opportunities in addition to challenges can be created according to this change for helping data processing in the smart fish farm. This study focuses on deep learning applications and how to support different activities in aquatic like identification of the fish, species classification, feeding decision, behavior analysis, estimation size, and prediction of water quality. Power and performance of computing with the analyzed given data are applied in the proposed DL method within fish farming. Results of the proposed method show the significance of contributions in deep learning and how automatic features are extracted. Still, there is a big challenge of using deep learning in an era of artificial intelligence. Training of the proposed method used a large number of labeled images got from the Fish4Knowledge dataset. The proposed method based on suitable feature extracted from the fish achieved good results in terms of recognition rate and accuracy.
1. Introduction
Given the acceleration in technology in all fields of knowledge, technological development has become the main alternative to keep pace with development. The economy has had the largest share in technological development because of its impact on human life [1], an important economic water product for many countries, especially those with seaports [2]. Fish farms are among the first economic products that extend back many years. Fish farms have a direct impact on the economies of countries, especially those located in coastal areas [3]. Due to the fact that the importance of fish farming has to be controlled using new technology models, many people began to simulate nature and raise fish in a way similar to the natural aquatic environment of fish by making fish farming inside the area of water [4]. Using deep learning is necessary to increase the product then revive the economy. By deep learning, we can use the feedback of a given result to estimate the next product [5, 6]. There are a lot of kinds of organisms located in aquaculture so, in this study, we focus on fish that have a direct effect on the product economy [4]. Data affected by factors derived from complicated environment most of the time are nonlinear which cause difficulty in controlling such system accurately. Classification or recognition system based normally on image or camera that has many properties or features used to make a decision regarding a product [7]. The problem with fish farming recognition is the image taken through the water and clearance of the image effect by the quality of the water as shown in Figure 1.

Many factors affect the image of fish under the water such as capturing image low quality underwater, change of luminosity, fast movement of the fish, and complex background [8, 9]. Some traditional classification techniques have not actually satisfied requirements due to the fact that features extracted are not considered real data. In addition to the features designed manually and subject to human errors, therefore, deep learning considers the best solution and importance at the same time so we can learn the system hierarchical of representing the data derived automatically via simulating neural network [10]. Despite vicissitudes in demanding and supplying via the changing place fisheries resources, they remain the main important source of food. The rate of reproduction and increase of aquatic fish decreased due to poor management and the old methods of fish farming were no longer sufficient [11]. Actually, fish play an important role in feeding the population and for protein intake on a global scale. Existing recognition researchers considered objects on the ground while the great demand is object underwater. The challenge here comes from how to extract the image from inside the depth of the water and process that image in the proposed system. The difficulty here is often due to impurities that obscure vision [12]. So, we have to control all problems and limitations for solving commercial fish farming.
However, the recognition of fish underwater is considered the real challenge specially in video mode. Video suffer from low quality due to extreme conditions in the sea. Due to the fact that a huge number of data devices that can capture underwater are designed with low resolution [13], many parameters have to be considered during designing such systems, such as absorption, light illumination, scattering, and visibility. In this paper, we are looking for solving the recognition of objects underwater. Then, we propose the system based on deep learning to classify and recognize fish in nature environment. Extracted features are learned from the training dataset (Fish4knowledge) F4N; therefore, there was no need for domain knowledge here [14]. Previously, fish recognition researchers focused mainly on the environment, that is, using matching of a contour of the fish in a small tank. Some researchers used color and shape descriptors of fish recognition [15]. Another used shape and the texture from fish appearance classified by training of classifier LDA with a dataset of 108 images of three types of fish under certain conditions [16]. Some researchers carried out an accuracy less than 60 because of the difficulty in extracting features and unsuitable classifying methods. 3179 images provided by standard dataset achieved high accuracy of 10 spaces collection of underwater images [17].
Some existing researchers were not satisfied because of four aspects: firstly, most of them focus on images that under limited condition; secondly, the datasets used contained a small number of images with a limited kind of fish; thirdly, the recognition methods used currently have handcraft features and often these features combined together for performance improvement; and fourthly, the suggested algorithms are not satisfying specially with large datasets and not suitable for certain conditions [18]. However, to meet the demand, we should consider accurate, robust, and efficient recognition with a large constrained dataset.
2. Problem Statements
Fisheries sector performance in many countries reaches below expectation with low supply. This is evident in the fact that most countries began importing fish after they were exporting fish. Therefore, increasing aquatic fish production leads to an increase in the economy of coastal countries. Fish farming can contribute to increasing the national income of countries wishing to develop the economy [19]. From this standpoint, this study tries to reduce the gaps that hinder fish farming and the use of modern technology that contribute to raising the economy.
3. Deep Learning
With the development in data science and modern technology such as big data and high-performance computers, it provided an opportunity for machine learning to understand data and its behavior through complex systems. Machine learning gives the machine the ability to learn in different algorithms without strict orders from a certain program or limited instruction [20].
Deep learning can be defined as a technique of machine learning to learn useful features directly from given images, sound, and text. Many layers are exploited by deep learning for nonlinear data processing of unsupervised or supervise feature extraction for classification and pattern recognition [21]. Deep learning motivation is greatly reduced by the artificial intelligence (AI) area, which simulates the ability of the human brain in terms of analyzing, making decisions, and learning. The deep learning goal is to emulate the approach of hierarchical learning of extracting features by the human brain directly from unsupervised data.
The core of deep learning is hieratically computing the features and representation of information, such as defining the features starting from low level to high level. With images, the standard techniques of machine learning do not work well when running directly due to ignoring the nature of image composition. In deep learning, features are extracted automatically from given images. The characteristics of this method of features are considered of the learning in the system [22].
Characterizing input images used as a feature is the key issue to the success of processing the medical image. There is a limitation for extracted features in the medical image such as Haar wavelet and HOG which is organizing the data [23]. For this reason, we can use deep learning by its feature extraction to solve limitations in the medical image.
As mentioned before, the main difference between machine learning and deep learning is the features selection method, as shown in Figure 2.

Features in deep learning will be generated automatically to simulate the appropriate results [22]. Different hidden layers participate in making decisions by using the feedback from one next layer to the previous one or the resulting layer will have been fed into the first layer [24]. DL enables computers to be able to perform complex calculations by relying on simpler calculations to optimize computer efficiency. It is difficult for a computer to understand complex data such as an image (set of pixels) or a series of data of a complex nature, so we use deep learning algorithms instead of usual learning methods [25].
4. Proposed Method
In the research proposal, we will rely on fish that are raised in aquaculture. First of all, the distinction or classification of aquatic fish is difficult to classify, due to the conditions that the fish themselves go through in the environment and the underwater in which they live. The devices that capture images in water are often of high quality and accuracy. The camera must be designed to withstand underwater conditions, fish movement, blurred vision, and so forth.
From this point, we can start with the proposed system designed first to eliminate or reduce the noise. Noise reduction is considered important in such a study; a blurred image is inescapable due to motion underwater being difficult to handle but the proposed system gets this issue in and handles it from the beginning. Figure 3 shows the fish image underwater.

The proposed system firstly processes noise reduction to prepare an image and the object inside an image. This step is important due to the fact that a clear object can give powerful features to extract considered the core of each deep learning system. For this reason, we aim to apply a noise reduction process which is compulsory. The mean of N image must be calculated for training data. The averaging of eight 8 pixels in the still image is calculated in a way of vertical, horizontal, and diagonal directions as shown in Figure 4.

Intensities of pixels value A (N, x, y) are located with coordinate (x, y); then, the averaging process can be expressed as
Image without noise easy to segment: we perform segmentation of the threshold method. This method considers simple, accurate, and powerful technique for segmentation images being different between background and foreground object. Segmentation is based on image region and connecting among these images. It converts to the binary image with the black and white area; the image is normally not empty and consists of objects like fish in order to make the information must segment the original image to get a simple image u, so to segment an image is done bywhere is the sum of regions and and are the boundaries.
The captured image often consists of several parts, which makes defragment process difficult, and thus extracting features is also difficult, so those details must be combined before processing. This process is performed by the following condition:
Every part of the object in the image affects the classification, so collecting all the segments is necessary, as well as not losing the smallest segment details. Rich features depend on the details in the segment area. During the segmentation process, the object will be extracted in the foreground, and the rest of the image will be devoid of detail and in one color.
By choosing the proper threshold (T), then separate the group of pixels into a similar region. Objects are consisting of coordinate (x, y) as a pixel, if this pixel intensity is equal to or greater than the threshold value (T) in one group; otherwise, they belong to the background. The present object will be subtracted from the background performed by
Segmentation or Region of Interest (ROI) is important in the processing of the image and machine learning. Segmentation of the image broadly consists of three main categories: first edge detecting, region determining, and pixels classification. Pixel classification here is used for segmentation; then, three main steps involved color rang, extracting features, and clustering.
In Figure 5, the red circle contains a miss classification area due to the fact that the system could not recognize any object in this area (because of blurring area and many noises); for this reason, the system considers noise reduction at the first process in the system. However, the yellow square at the same figure produces an unknown object and this may confuse the classifier since the feature extracted is unlagged.

The general framework will explain the main stages that the system involved. Figure 6 shows these stages and the relation among them.

The first stage is the preprocessing stage which contains two processes, noise reduction and segmentation, mentioned before in detail. The next stage is the feature extraction stage considered as the core of the system due to control of the next stage (classification) depending on the information provided from the previous stage. Feature extraction is the process of translating the characteristics of the object into vectors of data to be understood by the classifier.
The most important issues in fish recognition are correct features’ extraction and the structure of neural networks. In this research, features are extracted correctly and are informative; many features such as height, length, direction, and weight (explained in detail in the next section) are extracted with their weights to help build the structure of the system. The neural network is built under certain conditions in the proposed method such as the number of hidden layers and feedback inputs, the extracted specifications control the number of nodes in the neural network, and the proposed method aims to use new specifications to increase the efficiency of the system, which implements more than 1500 iterations.
5. Feature Extraction
The farming fish image consists of two kinds of features that can be extracted: texture and color. Color features are used mainly for segmentation and then yield special features, but in this case, the classification can work in difficulty when classifying this type of feature. Texture feature considers the attribute of local intensity that focuses on visual seen of affected area and region. Both spatial and frequency domains participate in texture features to give the classifier the ability to achieve better results. Gabor filter performs both domains as spatial and frequency decomposition where spatial represented by and frequency Fourier transform represented by present as follows:
We note , , and is considered as the constant frequency with the filter bank in the center. Hand pass filter is controlled by standard deviation derived from the Gaussian function; the band-pass filter contains bandwidth, orientation, and varying frequencies.
Given image to be used with Gabor wavelet, can be performed as
For example, is response filter with coordinate (x, y), m and n represent the value of certain pixel in the image and value starting from 1 to (M or N) which are the dimensions of fish image, for example, the coordinate pixel (176, 200). Standard deviation or response filter can be representing the region derived from the classification of image regions:
Features collected in a vector are called feature vector and are constructed using HT descriptor as follows:
In Figure 7 the texture feature starts as filter of channels from a given image and then combined with other features as shown in the figure.

A set of features is presented as data entered into the main process intended to recognize the objects. The recognition rate can be calculated by the ratio of corrected fish detected to the number of given detection:
The recognition rate is calculated in equation (9) as standard manipulation in the existing method and we follow the literature for benchmarking and evaluation. For classification, a complex procedure should be followed to adapt the features extracted with the neural network algorithm. The neural network is worthy of more complex feature extraction because it can allow the system to build many hidden layers with variable nodes. After feature combination, the neural network makes suitable distribution for these features according to their weights in the system, each layer of the neural system consists of different variable percentage values of the weighted feature. This map is described as follows:
For example,
So, the weighted features result in a vector:
This represents the corresponding hyperspectral vector. There are many potential features derived by the deep learning system such as height, length, size, estimated weight, and age. Figure 8 shows some of these features.

The neural network configures internal features and creates hidden layers to be processed according to the estimated subprocess. The number of hidden layers is variable but with the last iteration (1500 times) reaching 97 layers and the number of nodes around 39 nodes for each hidden layer. The transfer function within the hidden layers is (tan, sig) when the output layer will be linear. More than four replicates are included for the testing run. To make the neural network system appropriate with its topography, two statistical errors as criteria are considered [26]. These criteria are coefficient determination and root error performed in the following equation:
Equations (13) and (14) represent the two-criteria coefficient of determination and root mean square error, respectively, where ypre, yexp are considered as the predicted experimental variables, is the average of experimental variable, and N represents the total number of running systems. According to R2 as a maximum and RMSE as a minimum, the neural network will be selected.
The neural network system can manipulate one fixed set of features limited with a fixed number of hidden layers. In deep learning, we aim to learn the system to perform many hidden layers according to estimated sets of features from both provided and conclude from the neural network system itself. The deep learning system as proposed can be illustrated in Figure 9.

Running the system starts from extracting features from a given image and combining these features to be integrated inside the system. During running the system, new estimated features can take two effects: firstly, they feedback the result into previously hidden layers and, secondly, they may estimate a new combination of features through the hidden layers. The primary purpose of deep learning is the automated disposal of a highly efficient computer with the aim of obtaining high accuracy and satisfactory results through the integration of the proposed method. The most existing method tried to learn extraction of full features from the given object but automating the proposed method with full extraction is still needed. Building hidden layers in the neural network needs variables that come from the extracted specifications. To increase the efficiency of the neural network, and to get good results, we feedback the neural network outputs to be used as inputs again. Appropriate results allow controlling the fish farming and suggest better scheduling to dealing with it, therefore controlling the economic aquatic product.
6. Experimental Results
A fabulous dataset is used in training the proposed system. Fish 4 knowledge (F4K) is containing 8487 images with different conditions, each with different cases as well. Figure 10 shows samples of fishes used in training the system.

There are two kinds of control in fish farming: indoor and outdoor conditions. The system tries to convenience these circumstances. The outdoor limitations include background education of the farmers, full- or part-time farmers, weather, food type, and equipment maintenance. Indoor conditions include the quality of the water, sex of fish, topography within the basin, and type and shape of the basin. The proposed system tries to control some of these conditions depending on the data obtained from the basin. Change in the fish inside the farm in weight, movement, shape, and number will affect the system and give different results and then, the farmers take the action accordingly. The recognition rate achieved by the system refers to how the system is accurate in terms of classification as in equation (9). Table 1 shows the results of the proposed system benchmarked with the most interesting existing research.
Using good features allows making recognition with great accuracy. Feature extraction determines the way of learning system which is controlled by these features and represents how deep the learning is in the system. But the basic principle is that the features extracted from the image simulate the ones extracted manually, and by training the system, the system performs it after several automatic extraction operations. In deep learning, the process of extracting features is completely automatic, as the system makes the conclusion that is necessary to increase the accuracy of the output. This is the main reason for using deep learning and there is a clear difference from machine learning. When managing any fish farm, several factors must be monitored, such as size, weight, and quality. The quantitative assessment of fish catching represents the administrative basis for managing the fish farm and drawing up the appropriate strategy for fish production and fish production management. So, the accuracy increases in direct proportion with the number of images of training and iteration. Figure 11 illustrates the accuracy of the system increased with training.

Results of the system reflect the performance and worth of the system. Complexity of feature extraction leads to increase in the number of hidden layers and their nodes in neural system, and these can be achieved by suggestion of new features to make the system worthier.
7. Conclusion
In this study, a proposed method was designed to distinguish the classification of fish in aquatic fish farms. Many countries have economies that depend on fish production, so the idea of controlling fish farming is necessary. For system automation, the design was based on deep learning, especially the neural network. The idea behind the deep neural network is to give the computer the ability to guess new features based on the previously given features, thus increasing the number of hidden layers in the neural network. Features are designed according to given data taken from the environment and the strategy followed in fish farming feeding. Many factors that control such design, such as biomass, size, water quality, and acquisition device, in the proposed method prove its worth through satisfactory results achieved in both accuracy (96%) and recognition rate (98%). We hope that this study will open the horizons for other studies to achieve higher results and overcome obstacles such as the accuracy of data acquisition devices and others.
Data Availability
All data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors confirm that there are no conflicts of interest regarding the study of this paper.