Abstract
Deep learning (DL) has become a popular study topic in the field of artificial intelligence (AI) in recent years, due to its significant role in various application areas. It leverages supercomputing capacity in the era of big data to uncover the high-level abstract ideas in the original dataset and serves as decision support in the application sector by increasing the number of channels and the scale of parameters. This study designs and implements a heterogeneous medical education data analysis system based on DL technology. The proposed system adopts DL technology to model, analyzes the heterogeneous medical education data, uses the decision-level fusion strategy for the data model, and designs and implements the voting method and the weighting method. The decision value is statistically calculated to realize the improved DL algorithm for the medical college education data analysis method. In addition, this study also uses the Alzheimer’s disease public dataset with various structures and modalities of medical education data to compare and evaluate the systematic data preprocessing model performance and the effect of fusion methods. The experimental result validates the proposed model’s performance, demonstrating that the way of evaluating complete heterogeneous multimodal data is not only closer to the genuine diagnostic process but also aids clinicians in grasping the patient’s entire state and obtaining outcomes. Further, the essential ideas and implementation techniques of convolutional neural network (CNN) and stacked autoencoder as well as its application cases in medical college education data analysis are thoroughly explained.
1. Introduction
Due to its ability to learn from supplied data, DL technology is now regarded as one of the hottest subjects in the fields of machine learning, artificial intelligence, data science, and analytics. Compared with shallow-level learning, the main feature of DL is that it greatly increases the level of the learning model by extracting useful features from the data. In this case, the learner also increases the number of channels and the scale of parameters. With the help of supercomputing power in the era of big data technology, it has the ability to express complex functions. Compared with traditional machine learning algorithms, DL technology emphasizes learning from massive data to solve high-level problems in data such as high dimensionality, complexity, and high noise. Furthermore, DL technology incorporates feature learning into the model-building process and lowers the subjective influence of artificially constructed features using the traditional ML methods. Therefore, it has almost occupied the place of the ML algorithms in different application areas and particularly has shown good performance in the field of medical auxiliary diagnosis. Because of the advancement of DL technology, enterprises and initiatives concentrating on intelligent medical auxiliary diagnostics have sprung up both locally and internationally. Since the integration with machine learning and deep learning technology, these auxiliary diagnosis systems have made great achievements in the auxiliary screening and diagnosis of breast cancer, diabetes, and other diseases. However, medical education examination data usually have different data structures and sources (such as text data generated by consultation, index data generated by laboratory examinations, and medical image data generated by auxiliary examinations), that is, heterogeneous and multimodal phenomena. Most of the existing auxiliary diagnostic systems only analyze the data of one structure or modality in medical education examinations, which may miss important information in other examination data, thus limiting the reliability and accuracy of the extracted conclusions. A medical college curriculum is becoming increasingly linked with numerous particular commercial disciplines as computer software and hardware technology advances. The analysis, understanding, and knowledge discovery of such data are big challenges for the educational system of traditional medical colleges and universities. For example, using big data mining technology, the association between purchasing habits of products and personal information of customers may be discovered in a supermarket’s sales data, allowing services for goods procurement, shelf layout, and promotional activity design to be provided. Similarly, in a medical image diagnosis system, using DL for medical images and diagnosis results, the doctor’s diagnosis experience can be converted to the computer-aided diagnosis of medical images. The achievement of the above objectives requires two prerequisites: the collection of large amounts of data and the use of sophisticated algorithms to extract information from large amounts of data, both of which necessitate large amounts of processing power.
At present, DL technology has attracted the interest of a large number of ML and application field researchers, and a large number of valuable results have appeared in both theoretical and applied research. Hinton and Salakhutdinov [1] published an article in 2006, in which they proposed an effective method for training deep neural networks, which is considered to be a sign that DL research has begun to mature in recent years. The success of CNN in image understanding has greatly encouraged the field of ML and AI, the main representative of which is the AlexNet architecture proposed by Krizhevsky et al. [2] in 2012. On the image classification problem of the GENET dataset, the accuracy of the ML model has been greatly improved. The accuracy rate of CNN in image interpretation continued to rise in the ImageNet competition in the following years. Among them, Google’s GoogleNet-5 and Baidu’s ResNet-1 both performed very well; however, the training of these models requires a huge computational cost. AlphaGot, an AI software that uses DL technology, has defeated top human players, demonstrating the strength of DL algorithms over specific structures or modalities. The advantage of employing heterogeneous multimodal medical education data for auxiliary diagnosis is evident in the comprehensiveness of data and the diversity of analysis techniques in the auxiliary diagnosis system created by the data training deep model. The comprehensiveness of the data enables the system to realize information complementation by analyzing the data generated in various inspection processes [3]. When there is a certain type of missing data, the system can still extract information from the remaining data to ensure the normal operation of the auxiliary diagnosis system and improve the robustness of the system. The diversity of analysis methods enables the system to observe the disease phenomenon of patients from multiple angles with various analysis methods, making the diagnosis results more reliable and accurate.
This study discusses the application of DL technology in data analysis and knowledge discovery of information systems, expounds on the basic principles of two commonly used DL models, namely, CNN and stacked autoencoders, and gives their depth in medical college education. More specifically, this study designs and implements a heterogeneous medical education data analysis system based on DL technology. We have considered different use cases for the experimental results and analysis. The system designs and implements data preprocessing methods and classification models according to the structure and modalities of medical education data and uses a decision-level fusion strategy to achieve data fusion analysis, which effectively improves diagnostic accuracy. Therefore, the realization of a heterogeneous medical education data analysis system based on DL technology can promote the accurate diagnosis of diseases, lay a foundation for the early treatment and early prevention of diseases, and reduce the differences in diagnosis results, which has certain practical application values and practical significance.
The remaining of the study is organized as follows: Section 2 discusses two different DL models including CNN and stacked autoencoder. Section 3 is about the designing and implementation of these models and support vector machine (SVM). This section further describes the background information on medical diagnosis methods, their importance, and some of the challenges faced by the current medical diagnosis system. This section also discusses the result immediately after model implementation. Section 4 presents the demand analysis of the existing model for a better evaluation of the proposed models. Section 5 finally concludes the overall study.
2. Deep Learning (DL) Models
2.1. Convolutional Neural Network (CNN) Model
CNN is a well-known DL classification algorithm, which shows good performance in the classification of images, audio, and video data. Its main principle is to use a series of convolutional layers, pooling layers, nonlinear activation layers, and random masking layers. Linearization transformation is a strategy in ML, in which the essential features of the original input signals are gradually extracted, and the weights in the network are adjusted in a supervised way by backpropagating the error so that they can be used in the training and evaluation datasets. In CNN, the most important operation is the convolution operation [4]. In each convolution layer, through a fixed-size convolution kernel, the input signal is continuously scanned and the convolution operation is performed, which can effectively obtain the locality in the input signal. It is more suitable for solving problems related to image classification and labeling.
For the activation layer, the sigmoid function and the rectified linear unit (ReLU) function are generally used, and their forms are shown in the following formulas:
In Figure 1, the CNN abstracts the features layer by layer through the superposition of multiple blocks (Block), in which each block is composed of a convolution layer, a nonlinear activation layer, and a pooling layer. Among them, the features in a neighborhood of the pooling layer summarize the features by averaging or finding the maximum value [5], extracting key features, and reducing the feature dimension as shown in Figure 2. The random shielding layer shields the output units by a certain percentage so that the output cannot be heavily dependent on a few units, which enhances the capability of the model. The softmax output layer is a multiclassification function that obtains a 1-of-k encoded output. For multiclassification problems, a multiclass SVM or backpropagation (BP) network should be used.

(a)

(b)

(c)

2.2. Stacked Autoencoder
A stacked autoencoder is an unsupervised feature transformer, in which each autoencoder is a three-layer network, including an input layer, middle layer, and output layer, in which the middle layer performs a nonlinear transformation function. The goal of training is to reduce the output vector to the input vector as much as possible. Through the stacking of multiple autoencoders, the original input features are transformed layer by layer, but each layer keeps as much information as possible equivalent to the original input. Figure 3 shows the basic structure of a single autoencoder.

3. Model Design and Implementation
The concept of DL was first proposed in 2006, originally called hierarchical learning, and was applied to research fields related to pattern recognition. DL technology mainly revolves around two key factors: multilevel nonlinear computing methods and supervised or unsupervised learning methods. The multilevel nonlinear calculation method refers to the use of the nonlinear function to obtain the output of the current layer and use it as the input of the next layer to establish a hierarchical structure between each level to identify the usefulness and importance of data. Supervised or unsupervised learning describes how DL techniques learn by judging whether there is a target label or not. Supervised systems usually require labels as the task objective, while unsupervised systems do not need labels to learn. As far as auxiliary diagnostic tasks are concerned, both the complexity and the requirement of precision make supervised learning more widely used. The specific architecture of the model used for DL tasks is usually determined by the type of task and most of the DL algorithms and architectures currently used in medical-aided diagnosis tasks are derived from the artificial neural network (ANN) [6, 7]. The structure of ANN consists of many interconnected neurons, among which neurons not included in the input or output layers are called hidden units. Each hidden unit stores a set of weights W, which achieves the learning effect by iteratively updating the weights W when training the model. The DL models applied to medical-aided diagnosis introduced in the subsequent parts of this section are mostly based on the architecture and optimization strategy of ANN.
By analyzing the shortcomings of existing auxiliary diagnosis systems, this study builds a comprehensive and easily scalable heterogeneous medical education data analysis system based on DL technology and ML algorithms. The model and strategy have been experimentally verified. In addition, a data preprocessing process is designed for the models in the system to improve system performance and improve diagnostic accuracy. The main research contents of this study are as follows: (1) aiming at the heterogeneous multimodal phenomenon existing in medical education data [8], this study divides it by data structure and adopts different algorithms for data with different structures to realize the construction of classification and diagnosis model. For unstructured data such as medical images and electronic medical records, this study designs and implements multiple auxiliary diagnosis models based on DL technology. For the structured medical education inspection data such as population characteristics and inspection indicators, the auxiliary diagnosis model is designed and implemented in combination with ML algorithms. (2) This study adopts the decision-level fusion strategy to realize the fusion analysis of heterogeneous multimodal medical education data, uses the decision values output by multiple auxiliary diagnosis models as the input of the fusion strategy, and designs and implements the voting method and the weighting method. The fusion method further calculates the model decision values to output comprehensive diagnostic results. In addition, this study also uses the Alzheimer’s disease public medical education dataset as experimental data to experimentally verify the model and fusion method designed and constructed in this study. Further, it confirms that the fusion of heterogeneous multimodal data using a decision-level fusion strategy can improve the correctness of the diagnostic results and their validity.
The test of the above-mentioned DL model is carried out on the UCI dataset, and two databases from the information system are selected, namely, nursery and census Income, both of which have obvious information system application backgrounds. The description of the datasets is given in Table 1.
To check the performance of the model, the two datasets are divided into a training set, validation set, and test set, with a size ratio of 8 : 1 : 1, in which the training set is used for training the model, whereas the validation set and test set are used for the model validation and testing respectively. During the training process [9], the model’s training effect is evaluated, and the test set is used to evaluate the model’s performance once it has finished the training. In the setting of this study, the test set is not visible during the training process.
Diagnostics is a discipline in the medical field that studies how to use the basic theory, basic knowledge, basic skills, and diagnostic thinking of diagnosing diseases in patients. In the practice of diagnostics, medical education diagnosis refers to the fact that doctors collect medical history data by inquiring about the subjective symptoms of the patients and take inspection methods to obtain the diagnosis basis based on the symptoms and medical history data of the patients, so as to classify and identify the etiology and pathogenesis and identify the etiology and pathogenesis. This is used as the basis for the development of a treatment plan. Due to the diversification of disease types [10], the specific diagnostic steps for different diseases are different, but in general, they can be divided into five processes: medical history collection, physical examination, laboratory examination, auxiliary examination, diagnosis, and medical record writing. The medical education data mentioned above refer to the data generated in the following medical education diagnosis process:(1)Forms and Text Data Generated by the Collection of Medical History. The collection of medical history is a consultation that involves gathering questions and replies from doctors and patients. Medical history collection can provide doctors with information to understand the occurrence and development of the disease and usually collect data such as patient demographic characteristics, disease chief complaints, and past disease history. Among them, demographic characteristics are usually filled in and stored in tabular form, while disease chief complaints and past disease history are usually recorded in the patient’s medical records in the form of natural language texts.(2)Sign Data Generated by Physical Examination. Physical examination is the systematic observation and examination of patients by doctors using their own senses or traditional auxiliary devices (such as sphygmomanometers and thermometers) to reveal normal and abnormal signs of the body. Most of the patient’s physical data obtained by physical examination are recorded in the patient’s medical record in tabular form [11].(3)Index Data Generated by Laboratory Examination. Laboratory examination is the examination of blood, body fluids, secretions, and tissue samples of patients through physical, chemical, and biological laboratory methods. Most of the data examined by the laboratory have fixed indicators and are stored in the form of data tables.(4)Image Data Generated by Auxiliary Examinations. Auxiliary examinations require the application of various equipment and related examinations on patients (such as electrocardiogram, imaging equipment, and various endoscopy examinations) to generate various medical image data. However, auxiliary examinations are not necessary steps in diagnosis and can only be selected on the basis of consultation, physical examination, and necessary laboratory examinations.(5)Conclusive Data Generated by Diagnosis. Diagnosis is a process, in which a doctor integrates the basis obtained in all inspection procedures to identify and characterize a patient’s disease [12]. The conclusions generated by the diagnosis are the learning goals of the auxiliary diagnosis system.
In this use case, two DL models and an SVM classifier are implemented. For the DL model, the MatConvNet DL framework is used, which is implemented based on Matlab, and its core is written in C language, with a good user interface and excellent operating efficiency [13]. For the SVM classifier, the current best-performing LibSVM is utilized. The difficulty in analyzing medical education data is that it is mostly heterogeneous and multimodal. The concept of “heterogeneous data” can also be used to describe data of different types, characteristics, or structures in addition to the data stored separately in different database systems. However, data with different structures such as database data tables, XML documents, image data, and audio data are usually called structural heterogeneous data. From this perspective, structured heterogeneous data include structured data, semistructured data, and unstructured data. Structured data mostly describe the data stored in various relational databases in the form of relational tables. This kind of data can usually be expressed logically in a two-dimensional table structure. The data are represented in the form of attributes, and there are specific data types, such as numbers and characters. Compared with structured data, unstructured data do not have a unified data model, and operation methods such as audio, text, and images are typical unstructured data. Semistructured data are between these two. The structure and content of the data are mixed, and there is no obvious distinction. XML is a typical representative of semistructured data. Because medical education data include structured data such as patient demographic characteristics and laboratory tests, which are stored in data tables, as well as unstructured data such as patient complaints, family medical history, and medical images, it is a typical structured heterogeneous data. Heterogeneous data are the division of data from the structure of the data, and data with the same structure may contain multiple data modalities. Modality refers to information from different sources or forms. For instance, medical image data and medical record text data are data of various modalities according to various carriers; structured data, on the other hand, is comprised of queries according to various sources. The resulting demographic characteristics table, physical examination index table, and laboratory test index table can also be regarded as data of different modalities. Therefore, the medical education data described in this study belong to heterogeneous multimodal data.
Two experiments are designed to demonstrate the effect of the DL model. The first is to use CNN to predict the classification labels of the two datasets. The specific method is to convert the two datasets to their built-in object imdb through the API function of MatConvNet and perform min [14]. We normalize the maximum value, then write a network configuration script, use five blocks to connect with each other, fix the size of convolution kernel to 51, use the maximum value for pooling layer, and finally add three fully connected layers, with a 20% random shielding layer behind each layer. A total of 30 rounds of training were performed, the learning rate was 0.01 ∗ 15, 0.001 ∗ 10, 0.0005 ∗ 5, and the topper of each round was recorded. The total number of parameters used for the design of AlexNet (CNN model) is summarized in Table 2. Table 3 lists the accuracy and variance results of the model using the two datasets selected for the experiments. Similarly, Figures 4 and 5 show the correct rate and variance of the model, respectively.


The second is to encode the dataset with a stacked autoencoder and then train an SVM classifier using a 7-layer stacked autoencoder. The nursery dataset’s ultimate output dimension is ten, whereas the census income dataset’s final output dimension is twelve. The LibSVM software is used for the implementation of SVM. The kernel function of the SVM uses the radial basis function (RBF) with default value parameters and there was no penalty item. The SVM was trained with both original attributes (without passing from the autoencoder) and with attributes passed by the autoencoder in order to analyze and compare the effectiveness. Table 4 and Figure 6 show the classification accuracy of the SVM model alone and SVM combined with sparse autoencoder (SAE).

As can be seen from Figure 6, the dataset encoded by the stacked autoencoder performs better using the SVM, indicating that the DL model plays a significant role in the data analysis of medical college education.
4. Demand Analysis
In classification problems, the most prevalent model assessment criteria are accuracy. It represents the percentage of the correct instances predicted by the algorithm, to the total predicted instances, which can effectively summarize the performance of the model. But, in a case when the dataset is imbalanced or more attention is paid to the division of positive instances [15], the accuracy cannot effectively evaluate the performance of the model. In our study, the data augmentation process was adopted for the testing dataset and solving imbalance categorical problems to a certain extent. However, when evaluating the classification performance of the model, we still paid more attention to the identification of experimental positive samples (disease patients). As a result, accuracy is utilized as a criterion in this work to determine whether the model training process is normal, not to assess the final performance of the model. Compared with the accuracy rate, which is a characterization of model performance, the accuracy rate pays more attention to the division of positive samples. The precision represents the percentage of the true positive instances to all the predicted positive instances and illustrates the accuracy of the model’s prediction of the correct results, and its calculation is shown in the following formula:
In contrast to the precision, the recall rate in equation (4) represents the fraction of positive samples that are correctly predicted in all predicted instances:
The classification task in this study is multiclassification and the individuals are divided into three groups: AD, MCI, and CN. The definition of a positive sample in a multiclassification task is that when calculating the evaluation index of each category, the category is regarded as a positive sample alone, and all other categories are regarded as a negative sample [16]. As a result, TP reflects the number of data points, for which the projected value is a positive sample and the real value is likewise a positive sample among all predicted samples in the above method. TN represents the data of all predicted samples whose predicted value is negative and whose true value is also a negative sample number; FN is the number of data whose predicted value is a negative sample in the total number of predicted samples, but its true value is a positive sample [17]; correspondingly, FP indicates that the predicted value of the total number of predicted samples is a positive sample, but the real value is a negative sample. In addition to the above evaluation indicators, the ROC-AUC value is also a commonly used evaluation indicator in classification tasks [18]. In the ROC-AUC value, ROC refers to the curves drawn by taking the false positive rate (FPR) and the true positive rate (TPR) as the horizontal and vertical axes and taking different thresholds. The calculation method of the true rate is the same as that of the recall rate. Formulas (5) and (6) show the calculation methods of the false positive rate and the true positive rate, respectively. The ROC-AUC value refers to the area enclosed by the ROC curve and the horizontal axis. For classification tasks, the closer the ROC-AUC value is to 1, the better the classification effect of the model is:
The accuracy rate, recall rate, and ROC-AUC value of the model on test samples are used as comprehensive assessment indicators of the model in the experiment to assure the model’s recognition capacity for positive samples while also paying attention to the entire performance of the classification. Since the experiment belongs to a multiclassification task, it is necessary to use a multiclassification index calculation method when calculating the evaluation index. The text uses the Scikit-learn library to calculate the above indicators. Commonly used multiclass calculation methods in the library include micro, macro, and weighted. Taking the calculation of the precision index as an example, the calculation method of micro is to sum the TP values of all classes and then divide by the sum of the TP values and FP values of all classes. In the experimental environment [19, 20], baseline data and evaluation indicators are introduced. Then, the model architecture implemented in this study is tested and evaluated, and the performance of the model is verified by comparison. Finally, according to the experimental dataset, multiple diagnostic models are selected for fusion and analysis to verify the effectiveness of the fusion method. It also confirms that compared with the use of clinical data of a single structure and modality for auxiliary diagnosis and heterogeneous multimodality, the diagnostic results obtained by combining the data with the appropriate fusion method have shown better results in multiple evaluation indicators [21]. Figure 7 depicts the AUC-ROC curves for the census income dataset using both SVM and CNN models, while Figure 8 depicts the AUC-ROC curves for the nursery dataset.


5. Conclusion
Beginning with the importance of computer-aided diagnosis technology in the medical industry, a comparative analysis of existing medical-aided diagnosis systems reveals that the majority of existing systems have incomplete information and limited result accuracy due to a single analysis data structure. Based on the introduction of advanced computer-aided diagnosis technology and theoretical knowledge both locally and internationally, this study combines DL, ML, multimodal fusion, data augmentation, and other related technologies to design and implement heterogeneous medical education data analysis system. Finally, a comparative experiment is carried out on the designed model and fusion strategy using the Alzheimer’s disease public medical education dataset, and multiple evaluation metrics are used to prove the effective performance of the diagnostic model and fusion strategy. Based on this study, the application of data analysis in medical college education and the specific technical route is given. Further, through the application of two DL models on two datasets, the significant role of these models in data analysis was also demonstrated. In future research, we will combine DL models with big data analysis technology, introduce more business content carried by information systems, and establish a DL big data analysis platform oriented to industrial applications to provide better decision support for the enterprises’ effective protection.
Data Availability
The labeled dataset used to support the findings of this study is available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.