Truncating Regular Vine Copula Based on Mutual Information: An Efficient Parsimonious Model for High-Dimensional Data

Alanazi, Fadhah Amer

doi:https://doi.org/10.1155/2021/4347957

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 4347957 | https://doi.org/10.1155/2021/4347957

Truncating Regular Vine Copula Based on Mutual Information: An Efficient Parsimonious Model for High-Dimensional Data

Fadhah Amer Alanazi¹

Academic Editor: Mohammad Amin Hariri-Ardebili

Received06 Aug 2021

Revised10 Sept 2021

Accepted27 Sept 2021

Published20 Oct 2021

Abstract

Based on (different) bivariate copulas as simple building blocks to model complex multivariate dependency patterns, vine copulas provide flexible multivariate models. They, however, lose their flexibility with dimensions. Attempts have been existing to reduce the model complexity by searching for a subclass of truncation vine copulas, of which only a limited number of vine trees are estimated. However, they are either time-consuming or model-dependent or require additional computational efforts. Inspired by the relationship between copula’s parameters (and the corresponding Kendall’s tau) and the mutual information on the one side and the mutual information and copula entropy on another side, this study proposed a novel truncation vine copula model using only mutual information values among variables. This newly proposed truncation method is evaluated in simulation studies and a real application of financial returns dataset. The simulated and real studies show that the model is sufficiently good to find the most appropriate truncation level with a good fit of a given data.

1. Introduction

Copula models have become a popular statistical dependence modeling tool in the literature. They gain their flexibility from(1) allowing modeling univariate marginal distribution functions independently from modeling the dependency patterns among variables and (2) modeling a wide range of complex nonlinear dependency patterns among variables. Formally, -dimensional copula is a multivariate distribution function on , with standard uniform margins. Regardless of their advantages, copula models are known as a complex problem in high-dimensional cases [1]. Vine copulas have become the most promising alternatives to copulas models, based on the work of [2–4]. In the vine copula model, also known as pair-copula construction, -dimensional copula density is built up as a building block of (conditional) bivariate copulas (pair-copula), which do not necessarily be of the same copula type. Therefore, vine copula models provide considerable flexibility for modeling a wide range of highly complicated dependence structures among variables.

Vine copula models have received considerable attention in broad areas. For instance, in engineering, Amini et al. [5] demonstrated the importance of the copula theory, using canonical (C-), drawable (D-) vine copulas, in modeling the nonlinear dependency patterns among variables in the dam engineering field. Torre et al. [6] proposed a general framework to perform uncertainty quantification (UQ) where the regular vine (R-vine) copula models, with various types of bivariate copulas, are used to deal with complex nonelliptical dependency structures between variables. The implementation of their proposed method was hold using the vine copula Matlab toolbox for vine copula inferences.

In spatial applications, El Adlouni [7] applied spatial quantile C-vine, copula model, to describe the spatial dependency pattern among variables. Their study illustrates the ability of their model to estimate the quantiles for the spatial variables. Tian-Jian et al. [8] used vine copula models to deal with multivariate distribution of multiple parameters of soil. Their study shows that the non-Gaussianity and diversity among the multiple parameters were captured well using the vine copula model compared to the traditional multivariate normal distribution. Yu et al. [9] introduced C-, D-, and R-vine copulas to identify the key indicator of the multivariate population risk of water. Their result demonstrates the great performance of their model.

In finance filed, Syuhada and Hakim [10] applied the vine copula model to model the risk dependency and portfolio value at risk forecast. The authors concluded that the vine copula model provides a better forecast accuracy and a well-diversified portfolio.

The computation complexity of vine copula, however, increases dramatically with dimensions. The higher the data dimension is, the more bivariate copula needs to be fitted, and hence, a large number of parameters are to be estimated. This difficultly leads to searching for subclass vine copula models that reduce the model complexity and yet fit the data well. Possible ways to reduce the number of estimated model parameters are either by (1) truncating vine copula model, of which only a limited number of vine trees are estimated, while conditional independent pair-copulas are fitted for the remaining trees, and (2) pruning method, where some bivariate copulas (with very weak dependency parameters or based on p-values of the independent test) are replaced by conditional independent bivariate copulas. The first truncation vine copula method was studied by [11] of which the most appropriate truncation level is estimated sequentially (tree-by-tree) using selection criteria, such as Akaike Information Criteria (AIC)/Bayesian Information Criteria (BIC) or Vuong test. In their model, the selection criteria value at each tree is used to measure the gain of the new adding tree. If the gain of the newly added tree is negligible, then the procedure is stopped, and the model is truncated at the current tree. Their model requires a complete specification for each vine tree (tree structures, identification, and estimation of each bivariate copula). Moreover, the sequential process of their method prevents an efficient investigation of the research space of the truncated level and converges to local optimal instead of global optimal [12]. Alanazi [13] incorporates the sequential truncation method of [11] with mixture R-vine copula models. The author shows that their method results in a dramatic reduction in the computational complexity of the R-vine copula mixture model. Brechmann and Joe [12] proposed new truncation strategies based on fit indices. In their model, the empirical partial correlation coefficients are computed from normalized observation of the original data. Their model selects the truncation level of R-vine copula separately from choosing the vine tree structure and identification and estimation of bivariate copulas. The authors illustrate that their newly proposed estimation method of the truncation level provides a much better result in comparison with [11]. However, their approach requires, for each tree, selecting the best minimum spanning tree for each previous tree among a large number of possible candidates. Furthermore, their method requires a choice of some control parameters, such as the number of best-spanning trees and 1-neighbors to be searched in each step to generate new candidate solutions, which can be computational demanding in high-dimensional cases, that is, as the dimensions increase, their models demand extra model computation.

The existing truncation methods of the vine copulas are either model-dependent or time-consuming and require more computational effort. Therefore, an alternative subclass vine copula requires less model computation and provides a good fit to the data that is still an open question.

Recently, there is a growing interest in applying mutual information (MI) with vine copula models. The relationship between copula parameters and MI for some copula families has been studied by [14, 15]. Ni et al. [16] developed a model that select the R-vine structures using MI based on the original data instead of copula data (transformed observations) for hydrological dependence. They concluded that the MI-based method satisfactorily modelled various types of dependencies among variables. In comparison with Kendall’s tau-based method, the author illustrates that the MI-based approach provided more information on the variables. For truncation purpose, they apply the idea of [11] using AIC. Sharma and Habib [17] studied the dependence between Indian stocks at the high frequency level using the MI-based approach. The authors found that the MI-based method models nonlinear dependencies better than the traditional-based correlation method. Sharma and Sahni [18] established an R-vine copula model based on mutual information for high frequency log return of stocks and then to estimate the value at risk (VaR) of a portfolio of stocks. For a comparison reason, a similar model has been considered using Kendall’s tau-based method. The authors illustrated that the VaR predictions using the MI-based method outperforms the one based on Kendall’s tau method. The authors also found that MI’s success rate is higher than Kendall’s tau.

In this study, inspired by the work of [14, 15] and build on the work of [16], we develop a novel and ever flexible model-independent truncation approach to solve the problem of selecting the truncation level of the R-vine copula model. The newly introduced model is solely based on the mutual information values among variables and, hence, does not require the selection and estimation of pair-copulas. Moreover, the computation of the model is feasible, fast, and does not influence by the given data dimensions (expected to be applied to any dimension ). Thus, this study contributes significantly to the current state of the art by providing an optimising and extreme flexible model-independent truncation method to dramatically reduce the complex computation of a high-dimensional dataset and yet still provide a good model-fit to the underlying data.

The remainder of this study is organised as follows. Section 2 introduces mutual information and its relationship with copula entropy. It also discusses the relationship between the mutual information values and copula parameters and the corresponding Kendall’s tau. Vine copula, then, is introduced and defined in Section 3. In Section 4, we propose the new truncation method based on the mutual information values. This newly proposed approach is then evaluated in the simulation study in Section 5 and in real application on financial return dataset in Section 6. Finally, the conclusion marks are provided in Section 7.

2. Mutual Information and Copula Entropy

This section mainly discusses the association between mutual information and copula entropy. -dimensional copulas are multivariate distribution function on with standard uniform margins. The backbone of copula models is the theorem by [19], which states that for any multivariate distributions functions, there exist a copula function as follows: let , with cumulative distribution function and marginal distribution functions ; then, there exists a copula function, such that .where is a copula function. If the marginal distribution function is continuous, then is unique. Interested reader refers to [20] for more details on copulas. Their use in finance and economic time series are reviewed by [21]. Similar to pair-wise Kendall’s tau , Spearman , and correlation matrix, mutual information measures the association between random variables. In addition, conditional mutual information (CMI) measures the relationship between two random variables given other variable(s) [22]. Moreover, there is a relationship between MI and copula [23, 24], which can measure the uncertainty of dependency [16].

2.1. Theoretical Relationship between MI and CE

Ma and Sun [23] showed that the MI is equivalent to CE as follows:

The relationship between copula and CMI was discussed in [16], and it is as follows.

Let be the Kullback–Leibler divergence; if two variables, , , conditioned on a set of given random variables , then can be given as

Ni et al. [16] showed that CMI can be expressed as a weighted average of the negative conditional CE.

3. Vine Copula

Nowadays, mutual information has received increasing interest in vine copula models. The value of vine copula is mainly due to its flexibility to model the various bivariate dependencies among variables based on bivariate copulas. That is, vine copula models build a hierarchal level using bivariate copula as building blocks. Vine copula models requires (different) bivariate copula to be selected and estimated. It was first introduced by [4]. Later, Bedford [2, 3] introduced so-called regular vine (R-vine) as a graphical model of linked sequence of acyclic graph known as tree. Each tree contains a set of nodes (the variables) connected by a set of edges (define the (conditional) copula). Formally, the vine copula model can be defined as follows:

Definition 1. (Regular vine (R-vine)). Let be a set of nodes connected by a set of edges and be a set of linked sequences of trees, , such that , then(i) is a tree with a set of nodes , and a set of edges .(ii), , are trees with nodes and edge set (iii)For , two nodes and in the set are connecting by an edge; then, exactly one of the s equals one of the s. This is known as proximity condition.

Hence, -dimensional consists edges that requires to be associated with bivariate copula known as pair-copulas, that is, each edge represents the pair-copula that models the dependency between the two (conditional) variables. By assigning a pair-copula to the all the edges of , we end up by so-called R-vine copula. The vine copula model is mainly built up by three steps: (1) constructing the most appropriate tree structure (pairing the variables), (2) identifying the best fit copula type for each pair of variables, and (3) estimating pair-copula parameters. Therefore, constructing the R-vine copula model is a challenging task for (1) and (2). That is, different vine structures and choices of pair-copulas will yield different statistical models. In addition, the choice of (1) will affect the selection of the type of bivariate copulas. Morales Napoles et al. [25] shows that there are different n-dimensional R-vines, which is a huge number of possible R-vine tree structure. Hence, fitting all these classes of R-vine structure with their bivariate copula types, estimating their parameters and selecting the most appropriate among them, is highly infeasible in practice. For this point, several attempts have been introduced in the literature. The most widely used approach is building the vine copula tree structures tree-by-tree using different criteria as weight edges; see, for example, [16, 26]. Besides the best selection of R-vine structures, reducing the number of the model parameters, which increased dramatically with dimensioned, gain interesting tries to construct a subclass of R-vine which can provide a good model-fit to the data of interest but more parsimonious. This subclass is known as truncated R-vine copula ,where conditional independent bivariate copulas are set for all pairs at specific levels (trees). Hence, only a limited number of R-vine levels are estimated.

4. Truncation and Mutual Information

Truncated R-vine copula aims to increase the flexibility of R-vine copula models by reducing the number of model parameters. Brechmann et al. [11] and Brechmann and Joe [12] tried to capture the strongest dependencies between random variables at few first trees of the R-vine copula model. Hence, all remaining trees can be fitted by independent pair-copula as the relationship among variables is weak. However, the challenge of selecting the best truncation level without ignoring the rest of the trees of the R-vine copula is still an open question. This section will introduce the relationship between copula parameters and their corresponding Kendall tau, which leads to developing a truncation level based on mutual information. Blumentritt and Schmid [14] estimates MI values for Frank and Clayton copulas using Monte Carlo simulations. Ghalibaf [15] also estimated the values of MI for some copula functions. From the latter two studies, the MI values are minimal for small copula parameters (and Kendall’s tau). The closer the copula is to the independent case, the smaller the MI values become. The range of MI values for the weak dependence parameter of copulas as shown by [14, 15] is possibly ranging from 0 (independent copula) to 0.10 for a weak dependency (Kendall’s tau = 0 to 0.25). It is worth mentioning that the MI values differ based on the copula type, yet they are still very minimal as the dependency is close to independencies. This association forms the idea of the present study building on the work of [14–16] based on two facts: (1) strong dependencies are captured in the first R-vine levels (using MI/CMI as an edge weight); (2) minimal values of MI/CMI indicates independencies to very weak copula parameters. Hence, the newly proposed method identifies the most appropriate truncation R-vine copula model based only on MI/CMI values, thus (1) no R-vine tree is ignored unless their contribution to the model is too small; (2) the method provides a straighforward method to specify any strong dependencies that may fall behind the truncation levels, and hence, all strong dependencies are considered; and (3) the selection of the truncation levels does not requires selecting and estimating pair-copulas, and therefore, it is fast and require less computation.

4.1. Methodology

This section introduces a novel truncation vine copula model. The truncation method allows truncating the regular vine copula model after constructing the tree structure and before selecting and estimating the pair-copula. Hence, it provides considerable flexibility to the state of the art as it does not require estimating the parameters of bivariate copulas but only the selection of R-vine structure using mutual information. The method consists of three steps. First, the vine tree structure is constructed based on mutual information. After constructing the vine tree structures, the second step is selecting the truncation level based on the strength of the mutual infromation values. The third steps is then fitting the vine copula model to the truncated vine model. At this step, the bivariate copulas are selected and estimated for each pair of variables.

To end this section, we summarize the steps of the new method as follows:(i)Calculate the mutual information between all pairs of variables of the original data and select the maximum spanning tree (Tree 1).(ii)For Tree 2: n−1, and follow the proximity condition of the vine copula model, calculate the conditional mutual information for each pair of variables. Then, construct Tree 2: n−1, based on maximum spanning tree.(iii)Construct two matrices, one for R-vine tree structures, while the other one is for MI/CMI values based on the first matrix. In this step, we adopted the idea of the matrix notation of [27].(iv)Determine the level of the model based on the strength of the dependency (MI/CMI values) among the variables.(v)Fit truncated vine copula model to the rank data (copula data) based on the specified truncation level from the previous step.(vi)Select bivariate copulas for each pair of variables and estimate the model parameters.

5. Simulation Study

5.1. Bivariate Simulation Study: Relationship between Copula Dependency Parameters and Mutual Information

This study aims to approximate the range of mutual information for weak dependency copulas parameters. For this reason, in this section, we will show the relationship between weak copula dependency parameters and mutual information values for commonly used copulas, namely, Frank, Clayton, Joe, Gumbel, and Gaussian copulas. The summaries of properties of these copulas are given in Table 1. As t-student copula gives the same result for Gaussian copula, it is not considered here. For more details, interesting readers are referred to [20, 28, 29].

For this simulation study, we considered a different range of weak dependency parameters for each copula. The simulated data sample size is 500, 1000, and repeated 500 times each. The results of the simulation study for each copula are given in Table 2. The mutual information for both sample sizes gives the same relationship with the copula parameters. Hence, we only show the result of 1000 sample size. Not surprisingly, the rotation versions of copulas give the same MI values so their results are omitted here. From Table 2, we can see the range of MI values that correspond to very weak copula parameters (and corresponding ) are almost from 0 to 0.11.

5.2. Simulation Study of Vine Copula

In this section, we test the performance of the proposed method using the simulation study. In this study, the 5-dimensional R-vine copula model is constructed, where all pairs at level 2 are simulated from very weak dependency conditional bivaraite copulas, and all pair-copulas at levels 3 and 4 are set to conditional independent bivariate copulas. Hence, we can truncate the simulated R-vine copula model at level 1. The tree structure, pair-copulas, and dependency parameters are stored in matrices, , , and , respectively. The size of the simulated data is 2000, and it is repeated 200 times.

Having determined the simulated R-vine copula model, the new proposed method is applied to the simulated data. The summary of the result is as follows ( is the mutual information matrix):

Matrix represents the bivariate (conditional) bivariate MI values among the variables of the simulated data, based on the R-vine tree structures shown in matrix . From the matrix, we can see that the mutual information values at the first tree are higher than all the other trees, which reflects the strong association between variables at the first level. Also, the matrix shows minimal CMI values at levels 2, 3, and 4. The small values of CMI are within the very weak/independent range of bivariate copulas. In other words, CMI values at these levels range from 0.06 to 0.012, which indicate a minimal/independent association among these variables. Hence, we can truncate the R-vine model at level 1. Hence, the simulation studies illustrate the performance of the newly proposed method.

6. Real Data Application

In this section, we evaluate the performance of our newly proposed approach. For this, we consider the daily log return of major German companies, presented in the index DAX. The period of the time series is from January 2005 to August 2009 with 1158 observations. A GARCH (1, 1) model with Student t innovations has been applied to each univariate time series. The data are available in R package [29] and known as “daxreturn.” Also, for a comparison reason, we investigate the performance of our new approach over the sequential truncation method of [11] as its widely used in the literature [13, 16]. Applying our method to this data is because vine copulas are commonly applied in finance areas. In addition, most finance data consist of high-dimensional data where a truncation method is strongly requested to reduce model computation complexity. As the first step of the R-vine copula model, we borrow the idea of [16] to construct the R-vine tree structure. For the first tree, MI values for each pair of variables are computed. Then, the maximum spanning tree is selected. Then, from , the bivariate CMI values are calculated for each pair of variables, and similar to the first tree, the maximum spanning trees are selected (for MI and CMI, we use the R package [30] (“infotheo”)). The edges weight round to one decimal places. The first four trees of R-vine tree structure are shown in Figure 1, while the full R-vine structure is given in matrix .

Then, MI and CMI values are stored in a lower triangular matrix. Here, we adopted the matrix notation of R-vine copula [27]. represents the R-vine tree structure, while stores the MI and CMI between variables based on . We can read the highlighted values in and matrices as follows: the MI values between variables 1 and 11 are 0.36, while it is 0.21 between variable 6 and variable 11 given variable 1. From the matrix, we can see that there are weak dependency values in the second tree. However, it is still higher than the range of the relationship between CMI and copula dependency parameters. From level 3, the association becomes even weaker. The range of CMI values at these levels is very close to the scope of weak copula dependency parameters. That is, they are almost in the range of truncation values, except for some pairs with CMI values . Hence, from the matrix, we can truncate the model at level 3, with the pruning method for pairs of variables at level 1 (variables ) and level 3 (conditional pair of variables , , , , , and ), as their MI and CMI values are very close to the range of weak/independent structures (MI/CMI ). Thus, conditional independent bivariate copulas will be specified for levels . In addition to that, the MIdax matrix eases the pruning vine method. We can quickly identify the poor association between each pair of variables from the matrix, even if they are within the non-truncation level. Using mutual information matrix provides a flexible way to apply both pruning and truncation methods and hence reduce the model complexity.

After constructing the vine tree structure, arbitrary number of copula families is fitted to each pair of variables, and the best fit copula for each pair of variables is selected based on AIC (the calculation is done using [29]).where is the estimation values of the parameters, n is the number of observation, and P is the number of the model parameters.

From Table 3, we can see that there are no significant differences between the log-likelihood and AIC values of level 3, the full specification model, and the 6-truncation level based on the work of [11]. Therefore, there is no negotiable gain from level 4 up to level 14. In other words, the gain from level 4 : 14 with 66 bivariate copulas is only 262.469 (in terms of the log-likelihood), which is too small for the high number of estimated pair-copulas at these levels. That is, the strong dependency is captured at the first three trees. Eventhough the AIC selects the full specification R-vine copula models, AIC tends to choose the model with more parameters [31]. Moreover, truncate the R-vine copula at level 6 is very close to the full specification model, meaning too minimal gain from level 7 : 14. Hence, our proposed model provides an even more parsimonious model, which adds extra flexibility.

Listing 1 represented 6-level of the R-vine copula model estimated for the daxreturn data. From the result, we can see that the dependencies structures at levels (trees) range from very weak to independent dependency. In addition, the result supports the pruning method based on MI and CMI for some pairs (namely, pairs at level 1 (, , , , , and ) at levels 3, where refers to “conditioning” as mentioned above. Therefore, truncating the model at 3 is well estimated, where the new approach ignored no strong dependencies. Eventhough the CMI value for the pair at level 2 was 0.2, while , all other pairs are accurately estimated by the newly proposed approach results are accurately estimated. Hence, the result of listing 1 verifies that truncating the model at level 3 explains the data well. Thus, the contribution of 66 bivariate copulas at levels 4 : 14 is minor. Therefore, our model provides a more parsimonious subclass vine copula model, yet still offers a good fit for the data. The contour plot of the fitted copula is shown in Figure 2, which supports the results of our newly proposed truncation approach.

That is, from Figure 2, one can clearly see that the dependency patterns among variables at levels (bottom to top) are either poor or independent. This result is superior to the sequential truncation method of [11] (using AIC), which select level 6 for these data.

Listing 1: the 6-level R-vine copula model. The first line shows the level (e.g., tree 1). Tree 1: 1, 11 t (par = 0.73, par2 = 4.22, tau = 0.52) ## the relationship between variables 1 and 11 are captured by t-copula. 2, 3 t (par = 0.61, par2 = 4.62, tau = 0.42) 5, 15 t (par = 0.53, par2 = 11.27, tau = 0.35) 5, 4 t (par = 0.68, par2 = 8.77, tau = 0.48) 7, 9 Survival BB1 (par = 0.13, par2 = 1.21, tau = 0.22) 6, 7 Survival BB8 (par = 4.87, par2 = 0.56, tau = 0.34) 1, 13 t (par = 0.54, par2 = 6.92, tau = 0.37) 6, 5 t (par = 0.64, par2 = 5.64, tau = 0.44) 6, 1 t (par = 0.73, par2 = 4.52, tau = 0.52) 8, 12 t (par = 0.76, par2 = 4.63, tau = 0.55) 2, 8 t (par = 0.55, par2 = 5.44, tau = 0.37) 2, 10 t (par = 0.59, par2 = 11.66, tau = 0.4) 14, 2 t (par = 0.63, par2 = 4.11, tau = 0.43) 14, 6 t (par = 0.65, par2 = 5.36, tau = 0.45) Tree 2: 6, 11; 1 t (par = 0.2, par2 = 17.01, tau = 0.13) ## the relationship between variables 6 and 11 given variable 1 is captured by t-copula. 10, 3; 2 t (par = 0.25, par2 = 10.67, tau = 0.16) 4, 15; 5 Gaussian (par = 0.2, tau = 0.13) 6, 4; 5 Survival BB8 (par = 2, par2 = 0.81, tau = 0.2) 6, 9; 7 Survival BB8 (par = 1.3, par2 = 0.89, tau = 0.09) 14, 7; 6 Frank (par = 1.32, tau = 0.14) 6, 13; 1 BB8 (par = 2.59, par2 = 0.59, tau = 0.17) 14, 5; 6 t (par = 0.34, par2 = 8.16, tau = 0.22) 14, 1; 6 t (par = 0.32, par2 = 8.32, tau = 0.21) 2, 12; 8 BB8 (par = 1.5, par2 = 0.85, tau = 0.12) 14, 8; 2 t (par = 0.23, par2 = 9.38, tau = 0.15) 14, 10; 2 Survival Gumbel (par = 1.23, tau = 0.19) 6, 2; 14 Survival BB8 (par = 2.92, par2 = 0.64, tau = 0.23) Tree 3: 14, 11; 6, 1 Frank (par = 0.77, tau = 0.08) 14, 3; 10, 2 Frank (par = 1.1, tau = 0.12) 6, 15; 4, 5 t (par = 0.09, par2 = 14.52, tau = 0.06) 14, 4; 6, 5 Frank (par = 0.87, tau = 0.1) 14, 9; 6, 7 Frank (par = 0.57, tau = 0.06) 1, 7; 14, 6 Survival BB8 (par = 1.46, par2 = 0.82, tau = 0.1) 14, 13; 6, 1 Survival BB8 (par = 2.87, par2 = 0.54, tau = 0.18) 1, 5; 14, 6 Frank (par = 1.29, tau = 0.14) 2, 1; 14, 6 t (par = 0.18, par2 = 10.3, tau = 0.11) 14, 12; 2, 8 Gumbel (par = 1.05, tau = 0.05) 6, 8; 14, 2 Frank (par = 0.94, tau = 0.1) 6, 10; 14, 2 Survival BB8 (par = 1.19, par2 = 0.95, tau = 0.07) Tree 4: 7, 11; 14, 6, 1 t (par = 0.13, par2 = 14.49, tau = 0.08) 6, 3; 14, 10, 2 Frank (par = 0.73, tau = 0.08) 14, 15; 6, 4, 5 t (par = 0.05, par2 = 14.05, tau = 0.03) 1, 4; 14, 6, 5 Survival Gumbel (par = 1.06, tau = 0.06) 1, 9; 14, 6, 7 Gumbel (par = 1.03, tau = 0.03) 13, 7; 1, 14, 6 Frank (par = 0.82, tau = 0.09) 2, 13; 14, 6, 1 Frank (par = 0.8, tau = 0.09) 2, 5; 1, 14, 6 t (par = 0.14, par2 = 13.28, tau = 0.09) 10, 1; 2, 14, 6 Frank (par = 0.92, tau = 0.1) 6, 12; 14, 2, 8 Survival Clayton (par = 0.09, tau = 0.04) 10, 8; 6, 14, 2 Frank (par = 0.46, tau = 0.05) Tree 5: 9, 11; 7, 14, 6, 1 Independence 8, 3; 6, 14, 10, 2 Frank (par = 0.96, tau = 0.11) 1, 15; 14, 6, 4, 5 Independence 2, 4; 1, 14, 6, 5 Frank (par = 0.45, tau = 0.05) 13, 9; 1, 14, 6, 7 Independence 2, 7; 13, 1, 14, 6 t (par = 0.05, par2 = 14.9, tau = 0.03) 5, 13; 2, 14, 6, 1 Frank (par = 0.52, tau = 0.06) 10, 5; 2, 1, 14, 6 Frank (par = 0.5, tau = 0.06) 8, 1; 10, 2, 14, 6 t (par = 0.04, par2 = 18.98, tau = 0.03) 10, 12; 6, 14, 2, 8 Independence Tree 6: 13, 11; 9, 7, 14, 6, 1 Independence 12, 3; 8, 6, 14, 10, 2 Frank (par = 0.55, tau = 0.06) 2, 15; 1, 14, 6, 4, 5 Tawn type 2 (par = 6.2, par2 = 0, tau = 0) 10, 4; 2, 1, 14, 6, 5 Frank (par = 0.4, tau = 0.04) 2, 9; 13, 1, 14, 6, 7 Frank (par = 0.53, tau = 0.06) 5, 7; 2, 13, 1, 14, 6 Frank (par = 0.38, tau = 0.04) 10, 13; 5, 2, 14, 6, 1 Gumbel (par = 1.02, tau = 0.02) 8, 5; 10, 2, 1, 14, 6 Frank (par = 0.26, tau = 0.03) 12, 1; 8, 10, 2, 14, 6 Independence

7. Conclusion

In this study, we proposed a novel method to estimate the truncation level of R-vine copula models independently from selecting the R-vine tree structure and pair-copula estimation. The proposed approach used (conditional) mutual information values among variables to select the most appropriate truncation level. The new process allows direct control of the truncation level without ignoring any levels of the R-vine model. In other words, after constructing the R-vine structure using mutual information as an edge’s weight among variables, one can explore the whole R-vine dependency level (the strongest and poor dependencies pairs) for each pair of variables based on the value of the MI and CMI. Thus, our model provides fast identification of the truncation level of the R-vine model and any weak dependencies (pruning method) independently of structure selection, choosing, and estimating bivariate copulas, and it is free of extra computational complexity, such as computing the value of selection criteria or controlling other model parameters.

The major conclusions of the novelty of this study are as follows:(i)The R-vine copula model has been truncated so early. Hence, only a few trees have been investigated and hence a dramatic reduction in the model computations and complexity.(ii)Also, the model identifies weak dependency structures among the pair of variables in the non-truncation levels. Hence, no useful information is missing.(iii)The newly proposed model identifies the truncation levels accurately. All levels after the truncation level are either independent or exhibit very poor decencies patterns among variables.(iv)The method provides a more parsimonious model and fits the given data efficiently.(v)The results of the newly established method are superior to the existing sequential estimation truncation method.

Therefore, our newly established process provides a significant contribution to the state of the art.

The limitation of this work is the required computation of the mutual information, yet less complex than the full specification of the R-vine copula model. One possible future work is extending this work by using the nonparametric copula method as an estimation of the truncation level.

Data Availability

The dataset used to support the findings of this study has been deposited in the “VineCopula” ([29]) package https://cran.r-project.org/web/packages/VineCopula/VineCopula.pdf of R-program ([30]) https://www.r-project.org/.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

The author would like to acknowledge the support of Prince Sultan University for paying the article processing charges (APC) of this publication.

References

K. Aas and D. Berg, “Models for construction of multivariate dependence–a comparison study,” The European Journal of Finance, vol. 15, no. 7-8, pp. 639–659, 2009.
View at: Publisher Site | Google Scholar
T. Bedford and R. M. Cooke, “Probability density decomposition for conditionally dependent random variables modeled by vines,” Annals of Mathematics and Artificial Intelligence, vol. 32, no. 1, pp. 245–268, 2001.
View at: Publisher Site | Google Scholar
B. Tim, M. Roger, and C. Vines, “A new graphical model for dependent random variables,” Annals of Statistics, vol. 30, no. 4, pp. 1031–1068, 2002.
View at: Publisher Site | Google Scholar
H. Joe, Families of m-variate distributions with given margins and m (m−1)/2 bivariate dependence parameters, Institute of Mathematical Statistics, Hayward, CA, 1996.
A. Amini, A. Abdollahi, M. A. Hariri-Ardebili, and U. Lall, “Copula-based reliability and sensitivity analysis of aging dams: adaptive kriging and polynomial chaos kriging methods,” Applied Soft Computing, vol. 109, p. 107524, 2021.
View at: Google Scholar
E. Torre, S. Marelli, E. Paul, and S. Bruno, “A general framework for data-driven uncertainty quantification under complex input dependencies using vine copulas,” Probabilistic Engineering Mechanics, vol. 55, no. 1–16, 2019.
View at: Publisher Site | Google Scholar
S. El Adlouni, “Quantile regression c-vine copula model for spatial extremes,” Natural Hazards, vol. 94, no. 1, pp. 299–317, 2018.
View at: Publisher Site | Google Scholar
L. Tian-Jian, X.-S. Tang, D.-Q. Li, and X.-H. Qi, “Modeling multivariate distribution of multiple soil parameters using vine copula model,” Computers and Geotechnics, vol. 118, p. 103340, 2020.
View at: Google Scholar
R. Yu, R. Yang, C. Zhang, M. Špoljar, N. Kuczyńska-Kippen, and G. Sang, “A vine copula-based modeling for identification of multivariate water pollution risk in an interconnected river system network,” Water, vol. 12, no. 10, p. 2741, 2020.
View at: Publisher Site | Google Scholar
K. Syuhada and A. Hakim, “Modeling risk dependence and portfolio var forecast through vine copula for cryptocurrencies,” PLoS One, vol. 15, no. 12, Article ID e0242102, 2020.
View at: Publisher Site | Google Scholar
E. C. Brechmann, C. Czado, and K. Aas, “Truncated regular vines in high dimensions with application to financial data,” Canadian Journal of Statistics, vol. 40, no. 1, pp. 68–85, 2012.
View at: Publisher Site | Google Scholar
E. C. Brechmann and H. Joe, “Truncation of vine copulas using fit indices,” Journal of Multivariate Analysis, vol. 138, pp. 19–33, 2015.
View at: Publisher Site | Google Scholar
F. A. Alanazi, “Sequential truncation of r-vine copula mixture model for high-dimensional datasets,” International Journal of Mathematics and Mathematical Sciences, vol. 2021, Article ID 3214262, 14 pages, 2021.
View at: Publisher Site | Google Scholar
T. Blumentritt and F. Schmid, “Mutual information as a measure of multivariate association: analytical properties and statistical estimation,” Journal of Statistical Computation and Simulation, vol. 82, no. 9, pp. 1257–1274, 2012.
View at: Publisher Site | Google Scholar
M. B. Ghalibaf, “Relationship between kendall’s tau correlation and mutual information,” Revista Colombiana de Estadística, vol. 43, no. 1, pp. 3–20, 2020.
View at: Publisher Site | Google Scholar
L. Ni, D. Wang, J. Wu et al., “Vine copula selection using mutual information for hydrological dependence modeling,” Environmental Research, vol. 186, p. 109604, 2020.
View at: Google Scholar
C. Sharma and A. Habib, “Mutual information based stock networks and portfolio selection for intraday traders using high frequency data: an indian market case study,” PLoS One, vol. 14, no. 8, Article ID e0221910, 2019.
View at: Publisher Site | Google Scholar
C. Sharma and N. Sahni, “A mutual information based r-vine copula strategy to estimate var in high frequency stock market data,” PLoS One, vol. 16, no. 6, Article ID e0253307, 2021.
View at: Publisher Site | Google Scholar
A. Sklar, “Fonctions de répartition á n dimensions et leurs marges,” PubI. Inst. Statisti. Univ. Paris, vol. 8, pp. 229–231, 1959.
View at: Google Scholar
R. B. Nelsen, An Introduction to Copulas, Springer, New York, NY, USA, 2nd edition, 2006.
A. J. Patton, “A review of copula models for economic time series,” Journal of Multivariate Analysis, vol. 110, no. 4–18, 2012.
View at: Publisher Site | Google Scholar
T. M. Cover, Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, USA, 1999.
J. Ma and Z. Sun, “Mutual information is copula entropy,” Tsinghua Science and Technology, vol. 16, no. 1, pp. 51–54, 2011.
View at: Publisher Site | Google Scholar
L. Chen, V. P. Singh, S. Guo, A. K. Mishra, and J. Guo, “Drought analysis using copulas,” Journal of Hydrologic Engineering, vol. 18, no. 7, pp. 797–808, 2013.
View at: Publisher Site | Google Scholar
O. Morales Napoles, R. M. Cooke, and D. Kurowicka, 2010, About the Number of Vines and Regular Vines on N Nodes.
C. Czado, J. Stephan, and M. Hofmann, “Selection strategies for regular vine copulae,” Journal de la Société Française de Statistique, vol. 154, no. 1, pp. 174–191, 2013.
View at: Google Scholar
J. Dißmann, E. C. Brechmann, C. Czado, and D. Kurowicka, “Selecting and estimating regular vine copulae and application to financial returns,” Computational Statistics & Data Analysis, vol. 59, no. Supplement C, pp. 52–69, 2013.
View at: Publisher Site | Google Scholar
H. Joe, Dependence Modeling with Copulas, CRC Press, Boca Raton, FL, USA, 2014.
U. Schepsmeier, J. Stoeber, E. C. Brechmann, B. Graeler, T. Nagler, and E. Tobias, 2017, VineCopula: Statistical Inference of Vine Copulas.
R. R Development Core Team., A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.
H. Bozdogan, “Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions,” Psychometrika, vol. 52, no. 3, pp. 345–370, 1987.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Fadhah Amer Alanazi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies