Multimodal Feature Fusion Based Hypergraph Learning Model

Yang, Zhe; Xu, Liangkui; Zhao, Lei

doi:https://doi.org/10.1155/2022/9073652

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Related Work Analysis Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Nature-inspired Computing for Web Intelligence

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 9073652 | https://doi.org/10.1155/2022/9073652

Multimodal Feature Fusion Based Hypergraph Learning Model

Zhe Yang,^1,2,3Liangkui Xu,^1,2,3and Lei Zhao^1,2,3

Academic Editor: Kapil Sharma

Received15 Mar 2022

Accepted25 Apr 2022

Published16 May 2022

Abstract

Hypergraph learning is a new research hotspot in the machine learning field. The performance of the hypergraph learning model depends on the quality of the hypergraph structure built by different feature extraction methods as well as its incidence matrix. However, the existing models are all hypergraph structures built based on one feature extraction method, with limited feature extraction and abstract expression ability. This paper proposed a multimodal feature fusion method, which firstly built a single modal hypergraph structure based on different feature extraction methods, and then extended the hypergraph incidence matrix and weight matrix of different modals. The extended matrices fuse the multimodal abstract feature and an expanded Markov random walk range during model learning, with stronger feature expression ability. However, the extended multimodal incidence matrix has a high scale and high computational cost. Therefore, the Laplacian matrix fusion method was proposed, which performed Laplacian matrix transformation on the incidence matrix and weight matrix of every model, respectively, and then conducted a weighted superposition on these Laplacian matrices for subsequent model training. The tests on four different types of datasets indicate that the hypergraph learning model obtained after multimodal feature fusion has a better classification performance than the single modal model. After Laplace matrix fusion, the average time can be reduced by about 40% compared with the extended incidence matrix, the classification performance can be further improved, and the index F1 can be improved by 8.4%.

1. Introduction

In the machine learning field, the graph is an important data model. If the research objects have a one-to-one relationship between each other, then they can be solved by an ordinary graph such as social networks, gene data, and web page ranking problems [1]. However, in reality, the objects always have a complicated one-to-many or many-to-many relationship between each other [2]. Taking a reference citation, for example, a thesis can cite multiple papers and can be cited by multiple papers. When solving with ordinary graphs, the multivariate relationship will be forcibly shifted into a binary relationship, simply causing information loss. Thus, a hypergraph–a variant of ordinary graph–emerges [3]. Since hypergraphs can better describe the multivariate information between objects, in recent years, hypergraph-based machine learning has become a research hotspot of the machine learning field and has obtained good effect in object segmentation [4], disease diagnosis [5], image classification [6], recommendation system [2, 7], etc.

There are two main methods of using the hypergraph learning model to solve the multivariate relation problem. One method is to extend the hypergraph into an ordinary graph and then use the ordinary graph to solve the hypergraph problems. The representative methods include clique extension, star extension, and line extension [8]. However, in the process of extending a hypergraph to an ordinary graph, the multivariate relation between vertices is changed into a binary relation, which may cause information loss. The other method is to directly aim at the hypergraph structure and its incidence matrix, and solve the optimal hypergraph cut after Laplacian matrix transformation, that is to say, obtain several tangent vectors of the hypergraph Laplacian matrix, and divide the hypergraph into different subsets for classification and clustering. In essence, this is a combinational optimization problem, and the representative methods include the Zhou’s normalized Laplacian [3], hypergraph learning regularity optimization [6, 9, 10], hypergraph multimodal structure [11], and hypergraph deep learning [12]. Such methods directly carry out Laplacian transformation and solving on the hypergraph incidence matrix, preventing information loss due to structural transformation. However, all of them are based on one feature extraction method to build a single modal hypergraph structure. If the feature extraction method is not capable enough to fully reflect the relation between objects, it will lead to low-quality hypergraph building and its incidence matrix, finally affecting the performance of the hypergraph learning model.

Therefore, this paper proposed a multimodal feature fusion method, using incidence matrix extension and Laplacian matrix fusion to improve the model performance. The specific work and innovations include the following conditions.(1)When building a hypergraph, first, a single modal hypergraph structure is built based on different feature extraction methods, then different models of hypergraph Laplacian matrices and their weight matrices are extended, and the extended multimodal Laplacian matrices receive model training. The test results show that the extended multimodal incidence matrix could effectively improve the classification performance of the hypergraph learning model.(2)If the dimension of the multimodal incidence matrix obtained by matrix extension is high, it will lead to high computational cost. Therefore, the Laplacian matrix fusion method is put forward, which firstly performs Laplacian matrix transformation on the hypergraph incidence matrix and weight matrix of every model, respectively, and then, these Laplacian matrices are weighted accumulated for further model training. The test results indicate that the Laplacian matrix fusion method can not only reduce the computational cost of the multimodal incidence matrix but also improve the model performance.

Since Zhou et al. firstly proposed the hypergraph learning model [3] and used the “Markov random walk” idea to explain the model, in recent years, there are mainly two approaches to solving multivariate relation problems with the hypergraph learning model: extension method and segmentation method.

The extension method is to expand the hypergraph into an ordinary graph and then use the ordinary graph method to solve the hypergraph problem. Zien firstly raised the star extension approach [13], introducing a new node to every hyperedge of hypergraphs, in which the new node is connected to every vertex of hyperedge by an edge. This approach does not take into account the connection relationship between vertexes within the same hyperedge. Afterwards, Agarwal puts forward a clique extension approach [14], considering every hyperedge as a fully connected subgraph. Although the relationship between vertexes within the same hyperedge is built, it still cannot comprehensively express the connection information of vertexes between hyperedges [12, 15]. To further reduce the information loss during extension, Yang et al. proposed a line extension approach [8] in 2020, considering hyperedges and the nodes within every hyperedge as a new vertex. This approach maximally retains the connection information of vertexes and hyperedges, but it may still bring information loss when dealing with some problems, such as the Fano plane problem [16]. To avoid the information loss problem led by structural transformation, Gao introduced the multimodal concept [11], making different modals correspond to one subhypergraph with weights and training parameters for all subhypergraphs. However, this approach has too many parameters to be optimized, resulting in a high time cost for training. In 2019, Feng proposed the hypergraph Laplacian extension method, which optimized the hypergraph structure while reducing the time cost of machine learning [17]. However, this extension method merely extends the incidence matrix without considering the impact of the weight matrix on the model.

The segmentation method is mainly based on the hypergraph Laplacian matrix to solve the optimal hypergraph cut. Chen et al. applied L2 regularization to optimize the weight parameters of the hypergraph learning model [6, 9]. Chen and Luo used the alternating least square (ALS) method [9] and the coordinate descent method [10] to optimize weight parameters, respectively. Guo utilized the random matrix diffusion idea to optimize the hypergraph Laplacian optimal cut, but this was limited by a single hypergraph structure, needed to optimize the additional target function, and had too high time cost [18]. After that, Zhang et al. raised a hypergraph inductive learning model using the category projection matrix to obtain the category label of the sample [19]. Although the time complexity was brought down to a certain extent, not all dataset information was used in model training, so the classification performance was lowered slightly.

Hence, all existing hypergraph model’s solving methods have certain restrictions. The extension approach breaks the multivariate relation between objects. The segmentation approach does not break the limitation of a single hypergraph structure but only optimizes the model solving process, easy to bring high time cost. The multimodal feature fusion method proposed in this paper fuses the abstract features of multiple modals by matrix extension and breaks the limitation of a single hypergraph structure. And in model solving, the Laplacian matrix fusion method helps to reduce the time cost and further enhance the model performance.

3. Multimodal Feature Fusion Based Hypergraph Model

3.1. General Hypergraph Model

Let denote a hypergraph, where is the vertex set, and is the hyperedge set. The hypergraph’s every hyperedge contains multiple vertexes , so the hypergraph can be expressed by incidence matrix , as shown in Figure 1. In , every element is defined as

(a)

(b)

Every hyperedge is given a positive weight , and is calculated by the Gaussian kernel method [20]. Where represents the Euclidean distance between vertex and , and represents the average of the distances between all vertexes.

The diagonal matrix is defined, and the elements in diagonal are the weight of every hyperedge , as shown in the following equation:

In the hypergraph, the degree of every vertex is defined as shown in the following equation:

The degree of hyperedge is defined as the number of vertexes included in this hyperedge, as shown in the following equation:

Two diagonal matrix and is defined. Similar to equation (3), their diagonal elements are the degree of every vertex and the degree of every hyperedge in the hypergraph.

The regularity optimization target function of the hypergraph learning model is shown in the following equation [6, 9]:where and are regularization parameters, to balance each item of equation (6). For a classification problem, is the eventually solved tangent vector matrix, including the predicted sample category information. is the vector containing real sample labels. In case the vertex falls into the category, then , . is the experience loss function, and is the standard loss function, see definitions in the following equations:

In equation (8), L is the hypergraph standardization Laplacian matrix.

In equation (8), only if two vertexes and are at the same hyperedge, it is required to make their standardized labels and to be similar as possible, so that the vertex label can be predicted more accurately. If the hypergraph has degenerated to ordinary graph, then is degenerated to , and it is obtained from equation (10) that the relationship between hypergraph’s Laplacian matrix and ordinary graph’s Laplacian matrix is 1/2 coefficient, in which A represents the adjacent matrix of an ordinary graph.

In equation (6), the variables in the optimization target function to be determined are and . Because the target function (6) alone is a convex function relative to and , so it is feasible to use the ALS method [9] and coordinate descend method [10] to optimize and .

In the case of using ALS, it is first to fix , let , then is optimized.

Second, fix and optimize . By letting , can be updated as

Third, repeat the above two steps until and tend to be stable.

In the case of using the coordinate descending method, in each iteration process, two values and should be selected from for updating, and such updating shall be conducted under the constraint conditions of (13), while limiting . and are the values of updated and , see updating process in (14). When becomes stable, we finally obtain an , that is, the ultimate label vector, and the category of a vertex corresponds to , that is, the subscript of maximum.

3.2. Hypergraph Laplacian Matrix

The N classification problem solved by the hypergraph model is an N-path hypergraph cut problem in essence while the solution to N-path hypergraph cut is actually the N feature vectors [3] of hypergraph Laplacian matrix. Therefore, if Laplacian matrix can describe more globally and in-depth hypergraph structural information, the accuracy of the hypergraph cut will be increased, and the corresponding classifying performance will also be improved. Zhou et al. firstly proposed the hypergraph learning model [1], adopted Markov random walk idea, and analyzed the hypergraph Laplacian matrix from a probability perspective. The results indicate that the target function (6) of the hypergraph learning model is derived through Markov random walk process. The Markov random walk probability of every vertex in the hypergraph shall follow the following rule: given the current position as , first, choose a hyperedge from all hyperedges relevant to the vertex at a certain probability and then uniformly randomly choose a vertex . Equation (15) provides the probability of a hypergraph vertex’s Markov random walk, and equation (16) is a matrix expression of the calculation process of equation (15).

Let denotes a subset of hypergraph vertex set , is the complementary set of , and a cut of a hypergraph is dividing a hypergraph into two parts and . represents the volume of the set , and is the set of cut hyperedge, . Equations (17) and (18) define the calculation method of and .

Equations (19)–(20) prove the stationary distribution of Markov random walk as [1].

It can be seen from equations (15)–(16) that the process of Markov random walk conforms to the hypergraph’s standard Laplacian operator in equation (9). If is the feature vector of , then is also the feature vector of . The Laplacian matrix can be deemed as the expression of the hypergraph structure after a random walk. The target function of the hypergraph cut is (21), and equations (19)–(23) prove the minimal cut of Markov random walk, that is to say, making the similarity of the edge connecting different clusters to be minimal (the vertex transition probability is the minimum), while the vertex transition probability within the same cluster is maximal and gradually tends to a stably distributed partition. On account of the difficulty in directly solving (21), it is converted to the basic loss function (8) corresponding to (6), that is, . The solution to is the feature vector corresponding to N minimal nonzero eigenvalues of hypergraph Laplacian matrix. Thus, the Laplacian matrix is very critical to the solving of the hypergraph problem.

3.3. Incidence Matrix Extension

According to equation (9), the hypergraph Laplacian matrix is relevant to the hypergraph incidence matrix H and weight matrix W. Generally, in the construction of hypergraph structure, it is first to extract the vectorization feature of objects, then similar vertexes are connected in the same one hyperedge based on vertex similarity, and finally, the point-edge incidence matrix H and weight matrix W are used to represent the basic structure of the hypergraph model. The reason that hypergraphs can depict more information than ordinary graphs is the Laplacian matrices H and I express more relationships between vertices and vertices and between vertices and hyperedges.

According to equation (15), the probability of every vertex’s random walk in the hypergraph is calculated based on the hypergraph incidence matrix. In this process, the hypergraph vertexes can fuse more neighborhood information to improve the vertex classification accuracy. Markov random walk is a conductive, reasonable incidence matrix extension that not only makes Markov random walk to contain vertexes’ neighborhood information in the process but also has the opportunity to contact farther vertexes for exploring global information, finally obtain a hypergraph Laplacian matrix fusing more global information. After blending different Laplacian matrices H together, the originally nonconnected two vertexes are connected again by different short-path combinations, which can achieve the effect of random matrix P “diffusion mapping” [19]. Meanwhile, it is possible to discover the geometry with different scales in heterogeneous hypergraph space, and compared to the original space, it can also keep the globality of hypergraph geometry.

Because different feature extraction methods obtain different feature spaces with different corresponding hypergraph structures, so the incidence matrix extension can be considered as the fusion of multimodal feature space. As shown in Figure 2, a single modal hypergraph incidence matrix is built based on the different feature extraction methods, and then, is further blended to obtain the multimodal incidence matrix . Hence, the hypergraph vertex probability transition formula can be rewritten as (24), with its matrix expression means as shown in the following equation:

It can be seen from equation (2) that the calculation of hypergraph weight is mainly affected by intervertex distance, and the sample feature extraction method or intersample similarity calculation method is different, and then, the intervertex distance obtained will be different, so the weight matrix obtained from different modals of incidence matrix will be different. To fuse the hyperedge weight information, the different models of weight matrix are also blended to obtain the multimodal weight matrix . As shown in Figure 2, substitute and into (9) to obtain . is substituted into target function (6), and the gradient descend method is used to solve. From the test data in Table 3, the matrix extension method can improve the model classification performance to a certain extent. This indicates that the extended multimodal incidence matrix and weight matrix fuse more abstract features, and the corresponding Laplacian matrix contains richer information, so the hypergraph cut obtained is more accurate with a better classification effect. However, as matrix extension results in the rapid growth of matrix dimension, and are expanded times than original matrices and , bringing greater time cost. In the next section, a Laplacian matrix fusion method will be proposed to solve this problem.

3.4. Laplacian Matrix Fusion

Figure 3 describes the main process of Laplacian matrix fusion. Different modals of hypergraph structures correspond to different Laplacian matrices. On the basis of building every modal hypergraph incidence matrix and weight matrix , we firstly figure out the corresponding Laplacian matrix under each modal and later perform a weighted sum denoted as ; that is, represents the number of fusion Laplacian matrices, and then, the standardized loss function is rewritten as

In Figure 3, the matrix L_i obtained based on each modal hypergraph contains the information of a single modal incidence matrix and weight matrix while the matrix fuses the matrix under all modals, containing more comprehensive and higher quality information. Same as Section 3.3, the fused matrix uses the gradient descent method to solve the target function (6) and trains the ultimate model.

4. Experiment and Analysis

4.1. Experimental Environment

Aiming at the classification problem, the experiment in this paper compared the hypergraph learning model obtained by training with the proposed method with a typical classification model, ordinary graph, and other hypergraph models. The experimental environment is as follows.

4.1.1. Hardware

The experiment was carried out on GPU colony, available resources as CPU 96 Core, GPU 2 Core GeForce_RTX_2080_Ti/2 Core Tesla_v100_ sxM2_32 GB, memory 512 GB, memory space 500 GB. The programming language is Python 3.7.

4.1.2. Parameters

The KNN method was used to build a hyperedge [21], and that is to say, the vertexes included in the hyperedge are the central vertex and K vertexes nearest to it. To ensure variable consistency, the regularization parameters λ and μ are set as 2; according to the best result of the experiment, the number of vertices contained in the hyperedge is set to 25.

4.1.3. Datasets

To better verify the performance of the proposed model, four different fields of datasets were selected for the experiment, as shown in Table 1.

Cat & Dog. Image datasets, sourced from the official website of Kaggle [22], containing images of cats and dogs, are often used for classification tasks.

Cifar 10. Image dataset, a typical computer vision dataset for object identification and classification [23], included a total of 10 categories.

Ctrip. Text datasets used the hotel comment data of CTRIP in 2018 [24], in which every comment has been labeled with an emotional direction, such as positive comments or negative comments.

Spambase. Numerical value data, the spam datasets provided by the official website of UCI [25], is mainly used to identify and classify spam.

4.1.4. Evaluation Metrics

In this paper, the evaluation indexes used are accuracy, precision, recall, and F1, see the calculation formula in (28)–(30). TP represents the number of samples that are actually positive and predicted as positive, FP represents the number of samples that are actually negative but predicted as positive, FN represents the number of samples that are actually positive but predicted as negative, TN represents the number of samples that are actually negative and predicted as negative.

Accuracy refers to the ratio of the prediction samples correctly classified in total samples.

Precision refers to the ratio of the actually positive samples to the samples predicted as positive.

Recall refers to the ratio of the samples predicted as positive to actually positive samples.

F1 refers to the harmonic mean of precision and recall, which measures the robustness of the classification model.

4.2. Single Modal Hypergraph Model Performance

This section investigates the performance of a single modal hypergraph models built based on different feature extraction methods and provides references for the experiment in Sections 4.3 and 4.4 about which modals should be selected for feature fusion. Table 2 compares the classification performance of a single modal hypergraph model built based on different feature extraction methods on image dataset Cat & Dog and text dataset Ctrip.

On Cat & Dog datasets, PHA, VGG, and ResNet represent extracted image features by perceptual harsh [26], VGG [27], and ResNet [28] methods. HSIFT and HVGG represent extracting image features by SIFT [29] and VGG methods after the images are preprocessed with a color difference histogram. RVGG represents extracting image features by the VGG method after the images are Soble sharpened to enhance edge information [30]. After using the above-mentioned methods to extract features, the Euclidean distance is applied to calculate the sample similarity. SIFT represents using the key point matching number to denote the similarity between samples, after extracting the features of images’ key points with SIFT.

On Ctrip datasets, TF-IDF, LSI, Word2Vec [31], and Doc2vec [32] represent using these methods to extract data features and using Euclidean distance to calculate sample similarity. Jaccard represents using the Jaccard computing method to measure the similarity between text samples. After obtaining the sample similarity, the hyperedge and hypergraph structures are built according to the KNN method.

It is known from Table 2 that, on Cat & Dog image dataset, the classification performance of the hypergraph model built based on the RVGG method is the best, and the model built based on the PHA method has the poorest quality. On the Ctrip text dataset, the model built based on the Doc2vec method is the best, and the model built based on the Jaccard method has the poorest quality. Thus, the single modal hypergraph models built with different feature extraction methods are different in classification ability. Therefore, it is necessary to find a suitable modal combination to perform incidence matrix extension and Laplacian matrix fusion on multimodal feature fusion hypergraphs.

4.3. Model Performance under Modal Combination

This section analyzes the impact of different model combinations, incidence matrix extension, and Laplacian matrix fusion on the hypergraph model’s classification performance on text and image datasets, as shown in Table 3. In the table, “Poor + Poor/PHA + SIFT” indicates a new model obtained by combining PHA and SIFT modals. According to the results of the table, the single modal hypergraph model built by PHA and SIFT methods has the poorest quality, belonging to a “poor + poor” combination approach. For each dataset, the second combination approach is a modal combination fusing the best quality and the poorest quality, and the third combination approach is a modal combination fusing the two best qualities.

Comprehensively analyzing Tables 2 and 3, when the hypergraph model quality of two models is equivalent, the incidence matrix extension and Laplacian matrix fusion method are beneficial to improve the classification performance of the new model. On Cat & Dog dataset, Table 2 shows that the quality of the PHA or SIFT-based single modal model is the poorest, and Table 3 shows that the performance of the PHA + SIFT combination is better than PHA and SIFT separately. Likewise, the quality of RVGG and HVGG based single modal models is the best in Table 2, and the performance of the RVGG + HVGG combination in Table 3 is higher than RVGG and HVGG separately. However, when two models’ model qualities are varied too much, the incidence matrix extension and Laplacian matrix fusion go on improving model performance. As shown in Table 3, the performance of the RVGG + PHA combination model is lower than RVGG, because the PHA-based single modal model quality is the poorest, which will hinder the performance of the fused new model. On Ctrip datasets, the same conclusion is made as well.

According to the data in Table 3, it is also discovered that after combining different models, the performance of models receiving double fusion of incidence matrix extension and Laplacian matrix is better than that obtained by only incidence matrix extension. Thereby, when choosing modal combinations, it is essential to choose single modal hypergraphs with higher and equivalent quality for the combination and then perform incidence matrix extension and Laplacian matrix fusion, to train and obtain high-quality models.

4.4. Influence of Modal Quantity

This section investigates the influence of fused modal quantity on hypergraph model performance, as shown in Figure 4. In the figure, the abscissa axis indicates the fused modal quantity, the data at the bottom of the table reflects the model’s classification indexes obtained after fusing multiple modals, and the four curves reflect the tendency that the classification indexes increase with modal quantity. Based on the conclusion of Table 3, the single modal combination with higher and equivalent quality can obtain a higher classification performance, so 1–4 modals are selected for fusion, respectively, following a descending order of quality. Therein, on Cat & Dog datasets, the 1–4 of abscissa axes correspond to such four modal combinations as RVGG, RVGG + HVGG, RVGG + HVGG + VGG, and RVGG + HVGG + VGG + Resnet, respectively. On Ctrip datasets, the 1–4 of abscissa axes correspond to such four modal combinations as Doc2vec, Doc2vec + Word2vec, Doc2vec + Word2vec + LSI, and Doc2vec + Word2vec + LSI + TF-IDF. In Figure 4, the growth trend of the four curves shows that after the fused modal number is larger than 2, the model performance basically tends to be stable. Therefore, in the test of Table 4, all models only fuse two optimal modal combinations.

(a)

(b)

4.5. Time Cost

Table 4 compares the classification performance and time cost of the proposed model with the four single modal models with the best quality on four datasets. Considering Spambase datasets are numeric data and do not involve feature extraction operation, the Euclidean distance and cosine distance are used to figure out the intersample similarity, respectively, and obtain two different single modal hypergraph models, and then, the incidence matrix extension and Laplacian matrix fusion are used, respectively, to compare with the classification performance of single modal models. The experimental results in Table 4 indicate that the proposed multimodal feature fusion method can effectively promote the performance of the hypergraph learning model. Although solely using the incidence matrix extension method may result in larger cost time due to the matrix dimension problem, applying the Laplacian matrix fusion method can effectively reduce the model’s time cost only slightly higher than the single modal model with the best quality. On the four datasets, the Laplacian matrix fusion can reduce the time cost by 44.2%, 44.6%, 30.3%, and 40.7% than the incidence matrix extension method, with an average reduction of 40% around.

4.6. Comparison with Other Models

This section compares the classification performance of the proposed model with other typical machine learning models, such as KNN, SVM, SVM evolutionary model, ordinary graph model [34], hypergraph model CD, hypergraph model Gd, and hypergraph model Feng. SVM evolutionary model uses the evolutionary algorithm to search the SVM variable value, while the variables of the SVM model are the default initial values of the software package. The hypergraph model CD refers to the hypergraph model obtained by solving the target function with the coordinate descending method, corresponding to equations (13)–(14). The hypergraph model Gd refers to the hypergraph model obtained by solving the target function with the gradient descending method, corresponding to equations (11)–(12). Hypergraph learning Feng refers to the model optimized by the incidence matrix proposed by Feng et al. [16]. For the Ctrip text dataset and Cat & Dog image dataset, it is first to extract features by Word2vec and RVGG approaches and then classify these comparison models compared with the proposed model (refer to Table 1 for the data set division of semisupervised model). 70% of the data set division of the classical classification model is a training set.

It can be seen from Figure 5 that the proposed multimodal fused hypergraph learning model has a better classification performance than typical models such as KNN, SVM, and ordinary graph. Both hypergraph model CD and hypergraph model GD models are single modal models built based on a single feature extraction method, with inferior performance than the proposed model. Although the hypergraph model Feng optimizes the incidence matrix, it ignores the information of hyperedge weight, so its effect is poorer than the proposed model. From the comparison in Figure 5, the proposed multimodal feature fusion method, by virtue of matrix extension and Laplacian matrix fusion, breaks the limits of the single hypergraph structure, fuses multimodal abstract features, and promotes the performance of the hypergraph model.

(a)

(b)

5. Conclusion

Existing hypergraph learning models are all single modal models built based on one feature extraction method, with limited feature extraction and abstract expression ability. This paper proposed a multimodal feature fusion method, making the single modal hypergraph structures built based on different feature extraction methods to fuse multimodal abstract features through extending the incidence matrix and its weight matrix. Then, by using the Laplacian matrix fusion method, every modal’s incidence matrix and weight matrix receive Laplacian matrix transformation, respectively, and then undertake weighted accumulation for further model training. In this way, not only the model’s time cost is reduced but also the model performance is further improved. As the hypergraph neural model is put forward, hypergraph shows its efficient learning ability in the deep learning field. Therefore, the future study will focus on the hypergraph neural network model and its optimization.

Data Availability

The data underlying the results presented in the study are available within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the National Natural Science Foundation of China (Grant nos. 61772356 and 61876117), Six Talent Peak Project of Jiangsu Province (XYDXX-084), Hong Kong Research Grant Council (GRF 11211519), CERNET Innovation Project (NGII20190314), China Postdoctoral Science Foundation (2020M671597), Jiangsu Postdoctoral Research Foundation (2020Z100), Suzhou Planning Project of Science and Technology (Grant nos. SNG2020073 and SS202023). And this work was also supported by Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

References

Z. Wu, S. Pan, F. Chen, and G. C. P. S. Long, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
View at: Publisher Site | Google Scholar
H. Tang, G. Zhao, X. Bu, and X. Qian, “Dynamic evolution of multi-graph based collaborative filtering for recommendation systems,” Knowledge-Based Systems, vol. 228, Article ID 107251, 2021.
View at: Publisher Site | Google Scholar
D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: clustering, classification, and embedding,” in Proceedings of the 19th International Conference on Neural Information Processing Systems, pp. 1601–1608, ACM, Vancouver, CA, USA, 2007.
View at: Google Scholar
I. R. M. Association and Y. Huang, “Hypergraph based visual segmentation and retrieval,” Graph-Based Methods in Computer Vision: Developments and Applications, IGI Global, vol. 1, pp. 118–139, 2013.
View at: Google Scholar
M. Liu, J. Zhang, P.-T. Yap, and D. Shen, “View-aligned hypergraph learning for Alzheimer’s disease diagnosis with incomplete multi-modality data,” Medical Image Analysis, vol. 36, pp. 123–134, 2017.
View at: Publisher Site | Google Scholar
Z. Chen, G. Yue, M. Brent et al., “Identifying disease-related subnetwork connectome biomarkers by sparse hypergraph learning,” Brain Imaging and Behavior, vol. 13, no. 4, pp. 879–892, 2018.
View at: Google Scholar
Y. Guo, P. Dong, and S. Hao, “Automatic segmentation of Hippocampus for longitudinal infant brain MR image sequence by spatial-temporal hypergraph learning,” Patch-based techniques in medical imaging, vol. 9993, pp. 1–8, 2016.
View at: Publisher Site | Google Scholar
C. Yang, R. Wang, S. Yao, and T. Abdelzaher, “Hypergraph learning with line expansion,” in Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), pp. 669–678, IEEE, Orlando, Fl, USA, May 2020.
View at: Google Scholar
Z. Chen, G. Yue, M. Brent et al., “Identifying high order brain connectome biomarkers via learning on hypergraph,” in Proceedings of the International Workshop on Machine Learning in Medical Imaging, pp. 1–9, MLML, Athens, Greece, October 2016.
View at: Google Scholar
F. Luo, L. Zhang, X. Zhou, T. Guo, Y. Cheng, and T. Yin, “Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 6, pp. 1082–1086, 2020.
View at: Publisher Site | Google Scholar
Y. Gao, W. Meng, H. B. Luan, J. Shen, S. Yan, and D. Tao, “Tag-based social image search with visual-text joint hypergraph learning,” in Proceedings of the 19th International Conference on Multimedia, pp. 1517–1520, ACM, Scottsdale, AZ, USA, November 2011.
View at: Publisher Site | Google Scholar
S. Bandyopadhyay, K. Das, and M. N. Murty, “Hypergraph attention isomorphism network by learning line graph expansion,” in Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), pp. 669–678, IEEE, Atlanta, GA, USA, December 2020.
View at: Publisher Site | Google Scholar
T. Hu and M. K, “Multi-terimnal flows in hypergraphs,” VLSI Circuit Layout: Theory and Design, pp. 87–93, 1985.
View at: Google Scholar
S. Agarwal, K. Branson, and S. Belongie, “Higher order learning with graphs. Higher order learning with graphs,” in Proceedings of the ACM international conference on Machine learning, pp. 17–24, ACM, Pennsylvania, USA, June 2006.
View at: Google Scholar
G. Wang, K. Wang, H. Wang, H. Lu, X. Zhou, and Y Feng, “Uncovering local community structure on line graph through degree centrality and expansion,” International Journal of Modern Physics B, vol. 35, no. 08, Article ID 2150120, 2021.
View at: Publisher Site | Google Scholar
Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3558–3565, AAAI Press, Hawaii, USA, September 2019.
View at: Publisher Site | Google Scholar
Y. Dong, W. Sawin, and Y. Bengio, “HNHN: hypergraph networks with hyperedge neurons,” 2020, https://arxiv.org/abs/2006.12278.
View at: Google Scholar
P. Guo, R. F. Li, and H. Hu, “A clustering learning method based on hypergraph Markov chain slack,” Computer Science, vol. 46, no. B06, 5 pages, 2019.
View at: Google Scholar
Z. Zhang, Y. Liang, and E. Hancock, “Joint hypergraph learning and sparse regression for feature selection,” Pattern Recognition, vol. 63, pp. 291–309, 2017.
View at: Publisher Site | Google Scholar
K. Pliakos and C. Kotropoulos, “Personalized and geo-referenced image recommendation using unified hypergraph learning and group sparsity optimization,” in Proceedings of the International Symposium on Communications, pp. 306–309, IEEE, Athens, Greece, May 2014.
View at: Publisher Site | Google Scholar
L. Nong, J. Wang, J. Lin, H. Qiu, L. Zheng, and W. Zhang, “Hypergraph wavelet neural networks for 3D object classification,” Neurocomputing, vol. 463, pp. 580–595, 2021.
View at: Publisher Site | Google Scholar
Dogs & Cats Images[Db/Ol], https://www.kaggle.com/.
Cifar-10 dataset [Db/Ol], https://www.cs.toronto.edu/∼kriz/cifar-10-python.tar.gz.
Tan Songbo Corpus[Db/Ol], http://lcc.software.ict.ac.cn/-tansongbo/corpusl.php.
Machine Learning Repository[Db/Ol], http://archive.ics.uci.edu/ml.
H. X. Liu, X. J. Wu, and J. Yu, “Trans-modal search algorithm of combined harsh feature and classifier learning,” Pattern Recognition and Artificial Intelligence, vol. 33, no. 2, p. 6, 2020.
View at: Google Scholar
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Computer Science, vol. 12, 2014.
View at: Google Scholar
Z. Allen-Zhu and Y. Li, “What can resnet learn efficiently, going beyond kernels?” Advances in Neural Information Processing Systems, vol. 32, 2019.
View at: Google Scholar
Y. Wang, Y. Yuan, and Z. Lei, “Fast SIFT feature matching algorithm based on geometric transformation,” IEEE Access, vol. 8, no. 99, pp. 88133–88140, 2020.
View at: Publisher Site | Google Scholar
X. Y. Luo, F. L. Liu, C. F. Yang, and C. Yang, “PS image and steganographic image detection based on noise model and feature combination,” Chinese Journal of Computers, vol. 8, no. 6, p. 13, 2010.
View at: Google Scholar
G. Di Gennaro, A. Buonanno, and F. A. N. Palmieri, “Considerations about learning Word2Vec,” The Journal of Supercomputing, vol. 77, no. 11, pp. 12320–12335, 2021.
View at: Publisher Site | Google Scholar
S. Kim, I. Park, and B. Yoon, “SAO2Vec: d,” PLoS One, vol. 15, no. 2, Article ID e0227930, 2020.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Zhe Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Computational Intelligence and Neuroscience

Nature-inspired Computing for Web Intelligence

Multimodal Feature Fusion Based Hypergraph Learning Model

Abstract

1. Introduction

2. Related Work

3. Multimodal Feature Fusion Based Hypergraph Model

3.1. General Hypergraph Model

3.2. Hypergraph Laplacian Matrix

3.3. Incidence Matrix Extension

3.4. Laplacian Matrix Fusion

4. Experiment and Analysis

4.1. Experimental Environment

4.1.1. Hardware

4.1.2. Parameters

4.1.3. Datasets

4.1.4. Evaluation Metrics

4.2. Single Modal Hypergraph Model Performance

4.3. Model Performance under Modal Combination

4.4. Influence of Modal Quantity

4.5. Time Cost

4.6. Comparison with Other Models

5. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright