An Improved LambdaMART Algorithm Based on the Matthew Effect

Li, Jinzhong; Liu, Guanjun

doi:https://doi.org/10.1155/2018/3082970

Mathematical Problems in Engineering

On this page

Abstract Introduction Related Work Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 3082970 | https://doi.org/10.1155/2018/3082970

An Improved LambdaMART Algorithm Based on the Matthew Effect

Jinzhong Li^1,2and Guanjun Liu³

Academic Editor: Mohammad A. Hariri-Ardebili

Received09 Aug 2018

Revised10 Oct 2018

Accepted16 Oct 2018

Published06 Nov 2018

Abstract

Matthew effect is a desirable phenomenon for a ranking model in search engines and recommendation systems. However, most of algorithms of learning to rank (LTR) do not pay attention to Matthew effect. LambdaMART is a well-known LTR algorithm that can be further optimized based on Matthew effect. Inspired by Matthew effect, we distinguish queries with different effectiveness and then assign a higher weight to a query with higher effectiveness. We improve the gradient in the LambdaMART algorithm to optimize the queries with high effectiveness, that is, to highlight the Matthew effect of the produced ranking models. In addition, we propose strategies of evaluating a ranking model and dynamically decreasing the learning rate to both strengthen the Matthew effect of ranking models and improve the effectiveness of ranking models. We use Gini coefficient, mean-variance, quantity statistics, and winning number to measure the performances of the ranking models. Experimental results on multiple benchmark datasets show that the ranking models produced by our improved LambdaMART algorithm can exhibit a stronger Matthew effect and achieve higher effectiveness compared to the original one and other state-of-the-art LTR algorithms.

1. Introduction

Ranking is an important component that directly affects the performances of information retrieval systems such as search engines, recommendation systems, and electronic commerce platforms. For instance, the PageRank algorithm [1, 2] of the Google search engine computes the Web page scores based on a graph inferred from the link structure of the Web. This algorithm assigns a Web page with a higher score if the sum of its backlinks is high. In addition, this algorithm also considers the cast votes of pages. A page has an important weight if it casts votes for itself, and the reason is that it can help enhance the importance of other pages [3]. The PageRank algorithm exhibits the Matthew effect [4], which refers to the phenomenon that the rich get richer and the poor get poorer. This is valuable and reasonable in a search service, since people prefer to click on the links to some high-quality pages from other high-quality ones.

In some application scenarios, for examples of the search of hot words for search engines and the recommendation of popular products for recommendation systems, their ranking methods usually expect a Matthew effect. That is, Matthew effect is a desirable phenomenon for their ranking models, which can make their ranking results more effective and can meet the users’ requirements better. Therefore, it has a practical significance to develop a ranking model with a strong Matthew effect in these ranking scenarios. However, the existing LTR methods do not pay attention to the Matthew effect and thus this becomes the motivation of our work.

In this paper, we improve the LambdaMART [5, 6] algorithm via some new strategies in order to strengthen the Matthew effect. Our improvements can not only strengthen the Matthew effect of a ranking model but also enhance the effectiveness of the ranking model. To the best of our knowledge, this is the first attempt to improve the algorithm of learning to rank under considering Matthew effect.

The main contributions of this work are summarized as follows:

(1) We present a new function of the gradient in the LambdaMART algorithm that can highlight Matthew effect and prove that the function satisfies the consistency property.

(2) We propose two strategies to evaluate the ranking models and dynamically decrease the learning rate, respectively. They can not only strengthen the Matthew effect but also improve the effectiveness of the final ranking model.

(3) Experiments on multiple benchmark datasets are done to show the advantages of our improved LambdaMART algorithm compared to the original one and other state-of-the-art LTR algorithms.

The remainder of this paper is organized as follows. Section 2 reviews some related work. Section 3 describes the improvements of the LambdaMART algorithm via the idea of Matthew effect and some strategies. Section 4 presents the evaluation measures of the Matthew effect. Our experimental results are illustrated in Section 5. Section 6 concludes this paper.

An algorithm of learning to rank (LTR) in the field of information retrieval is to use a machine learning technique to train a ranking model for solving ranking problem. A large number of LTR algorithms have been proposed [5–19]. For example, RankNet [7] is a neural-network-based LTR algorithm that optimizes a cross-entropy cost function using gradient descent algorithm to train a neural network model for ranking. LambdaRank [8] improves RankNet by defining the gradients of a given cost function only at the points of interest. It directly optimizes a cost function of effectiveness measures while avoiding the difficulties of working with nondifferentiable measures of information retrieval. LambdaMART [5, 6], denoted as , improves LambdaRank by using multiple additive regression trees (MART) rather than neural network and thus exhibits better effectiveness of a generated model. [5, 6] is the boosted regression tree version of LambdaRank. It models the gradients using the ranked positions of those documents with relation to a given query and utilizes the gradients to compute the optimal weights for combining the weak learners in a boosting model. treats the force of upward or downward movement of each document unequally by the gradient of the document but treats effectiveness of each query equally, which results in not only a relatively lower ratio of queries with low effectiveness (i.e., poor queries) but also a relatively lower ratio of queries with high effectiveness (i.e., rich queries). Therefore, the algorithm cannot highlight the Matthew effect. To solve this problem, we improve based on the Matthew effect. In other words, queries with different effectiveness are distinguished and those with higher effectiveness are given higher weights, and the differences of moving forces between different document-pairs are enhanced.

There are some studies improving the algorithm. Ganjisaffar et al. [9] combine bagging and boosting for bagged to achieve the high prediction accuracy and low variance. Asadi et al. [10] explore the topology of tree-based ensembles and modify the algorithm to perform the stage-wise pruning for improving efficiency. Ferov et al. [19] propose a modification of the algorithm, which utilises oblivious trees instead of standard regression trees in the learning process in order to improve the effectiveness. This paper differs from them. We not only improve the gradient but also propose two strategies: one is to evaluate the current ranking models and the other is to dynamically decrease the learning rate. These improvements can both strengthen the Matthew effect and improve the effectiveness of the algorithm.

There are some studies such as RankBoost [11] and AdaRank [12] that are also based on boosting algorithm. In the two LTR approaches, the document-pairs are reweighted by decreasing the weights of those document-pairs that are correctly ranked and increasing the weights of those document-pairs that are incorrectly ranked. Then, the weak rankers are repeatedly constructed by reweighting the training data, and the learning at the next iteration will focus on the training of a weak ranker that can work on the ranking of those poor queries. Our work also differs from RankBoost [11] and AdaRank [12] because we strengthen the differences of the gradients of different document-pairs at each boosting iteration and highlight the importance of rich queries based on the Matthew effect.

Some approaches of LTR treat the queries in the training dataset equally while ignoring the differences of any two queries in the training process of the ranking models. In general, for two different queries, their effectiveness scores are different in a ranking model. Therefore, they should not be treated equally if we expect that a ranking model can make a query with high effectiveness (i.e., rich query) to have higher effectiveness and make a query with low effectiveness (i.e., poor query) to have lower effectiveness. Notice that there are some studies that consider the differences of queries. Geng et al. [20] propose a K-Nearest Neighbor approach for query-dependent ranking which employs different ranking models for different queries to improve the effectiveness of ranking model. Cai et al. [21] propose two methods of query weighting to measure the query importance for ranking model adaptation. Li et al. [22] make a comprehensive examination of different query-level weighting strategies for two unsupervised ranking frameworks. Their aim of considering the differences of queries is to improve the effectiveness of ranking models, while our aim is to improve not only the effectiveness but also the Matthew effect of ranking models.

Most of the existing LTR algorithms focus on improving the effectiveness of the ranking model, while they rarely pursue the Matthew effect of the ranking model. We know that different queries usually have different effectiveness in a ranking model. Therefore, if a ranking model can guarantee that a rich query has higher effectiveness, then the rich query should be given a greater weight so that it is focused on optimization in the training process of the ranking models.

3. The Improvements of the λ-MART Algorithm via the Idea of Matthew Effect and Some Strategies

We first recall algorithm and then improve its gradient and propose a strategy of evaluating the current ranking model to characterize the Matthew effect. In addition, we propose a strategy of dynamically decreasing the learning rate. Finally, we analyze the computational complexity.

3.1. The Gradient of Algorithm

The basic idea of [5, 6] is to train an ensemble of weak learners by using MART and approximate Newton step and linearly combining their predictions into a stronger and more effective learner. tunes the gradients of the regression trees based on a gradient-based optimization method. The gradient function is calculated according to the selected evaluation metric of effectiveness, and this metric is optimized directly in the training process of a ranking model.

The key of the algorithm is the gradient function which is defined as a smooth approximation to the gradient of a target cost with respect to the score of a document. The gradient function quantifies the adjusted direction (up or down) and the force of a “to-be-sorted” document in the next iteration. The two documents of any given document-pair have the equal gradient values but opposite moving directions. The gradient of the document in the positive direction pushes it toward the higher rank position of the sorted list, while the gradient of the document in the negative direction pushes it toward the lower rank position of the sorted list.

algorithm optimizes the gradient of each document for each query . For a given query , if the relevance grade between and is higher, and the ranked position of is more closed to the bottom of the ranked list, then the positive value of indicates a push toward the top of the ranked list, and a bigger value of represents a stronger force. Similarly, we can understand the opposite case.

algorithm integrates the evaluation criteria of information retrieval (such as [23]) into the gradient computation. The gradient for each document is obtained by the summation of all over all pairs of that participates in for query ; that is, is computed as follows:In (1), denotes the gradient of the document-pair and is calculated as follows:where is a shape parameter for the sigmoid function, and represent the score assigned to and by the ranking model, respectively, and represents the change on the effectiveness measure by swapping the two documents and at the rank positions and accordingly (while keeping the rank positions of all other documents unchanged). The effectiveness evaluation criteria can be measured by any common information retrieval metrics such as [23] and [24]. is calculated as follows:where denotes the effectiveness of the query for a ranked list of all documents with respect to and denotes the effectiveness of after swapping and at the rank positions and for the ranked list. algorithm incorporates an effectiveness evaluation criterion into the gradient function in the training process of the ranking models, and thus it optimizes both the loss function and the effectiveness evaluation criterion.

3.2. Improvement of the Gradient Function

In the algorithm, we treat all queries unequally in the training process of the ranking models. We discriminate the effectiveness score of each query and assign different weights to different queries. In order to highlight the gradients of rich queries and enhance the effectiveness of rich queries, the weights of rich queries should be given a higher value based on the idea of the Matthew effect. Therefore, we define the weight of a query as the -th power of the original effectiveness score of the query, where is an integer and is bigger than 1. The greater the value of , the greater the difference between the assigned weights of the rich query and the poor query. Therefore, our method can reflect the importance of rich query in the ranking model, and thus the Matthew effect of the ranking model is highlighted to a certain extent. Thus, the new effectiveness score of a query is defined as the -th power of the original effectiveness score of the query, and the original effectiveness of the gradient is replaced by to optimize the rich ranking model as a new objective. Therefore, is replaced by that is defined as follows: denotes the difference of the -th power value of the effectiveness of after swapping and at the rank positions and (while fixing the rank positions of all other documents). For example, when , we have that , and when =3, we have that . When the hyperparameter 1, the power operation enhances Matthew effect. Because the ranking effectiveness measure is changed into , the difference of effectiveness becomes more prominent with the increase of , and then the effectiveness of Matthew effect is more obvious in the learning process.

Meanwhile, we incorporate the coefficient into the gradient in order to highlight the difference in effectiveness for different queries in the ranking model. Consequently, is defined as follows:where the positive constant is used to adjust the value of in order to make and in the same order of magnitude. measures the degree of difference in effectiveness among all queries and will be introduced in the next section. The different document-pair for each query is optimized according to the new gradient in the learning process of ranking models, which strengthens the differences of upward or downward ranking force among the document-pairs for different queries in the next iteration and thus enhances the optimization of rich queries and weakens the optimization of poor queries and highlights the effectiveness of Matthew effect.

The gradient function of is changed and we must prove its consistency, which guarantees that our improved algorithm can be used to produce the ranking models. Now, we prove that the new objective satisfies a desirable consistency property proposed in [5, 6]: when swapping the ranked positions of two documents and in a ranked list of documents where is more relevant than but is ranked after , the optimization objective should be increased. In other words, for any document-pairs, the pairwise swap between the correctly ranked documents and for the same query must lead to a decrease of , and the pairwise swap between the incorrectly ranked documents and for the same query must lead to an increase of .

Theorem 1. The new optimization objective function satisfies the consistency property.

Proof. According to the previous analysis, we know that is the effectiveness of for a ranked list of all documents with respect to , is the effectiveness of after swapping and for the ranked list such that and , and is the changed effectiveness on the new optimization objective by swapping and at the rank positions and while leaving all other ranks unchanged in the ranked list. Furthermore, can be expanded according to variance formula in factorization; namely, . Therefore, (4) becomesLet . Then (6) is rewritten according to (3) as follows:To demonstrate that satisfies the consistency property, we only need to show that the change in tends to move in the same direction with the change in of the original optimization objective if the positions of and are swapped. This is because the original optimization objective satisfies the consistency property and can be directly optimized by the algorithm.
We know that no matter which kind of commonly used measure of effectiveness and is taken in the information retrieval field, their values are both greater than or equal to 0; that is, and . Therefore, when 1. Hence, the directions of changes of and are the same according to (7). The bigger value of indicates that it has better effectiveness in the original algorithm, and the bigger value of indicates that it has also better effectiveness in our improved algorithm. Therefore, the optimization directions of and are also the same. Therefore, the new optimization objective with the improved gradient satisfies the required consistency property, and it can also be optimized by using the improved algorithm.

3.3. A Strategy of Evaluating the Current Ranking Model

In the traditional algorithms of LTR, the strategy of evaluating the current ranking models is directly calculated by the effectiveness measure , such as [23] and [24], at each learning iteration, which is difficult to reflect the Matthew effect of ranking models. To reflect the Matthew effect, we propose a strategy of evaluating the current ranking models by integrating the Gini coefficient and the new optimization objective . The strategy is that we use to compute the scores of the current ranking model on the training and validation datasets and evaluate the total score of the current ranking model to measure its performance in each iteration of training ranking models. After the end of training, an optimal ranking model is chosen by using the training and validation datasets. The ranking model with the maximum value of the sum of on the training and validation datasets is selected as our final one.

3.4. A Strategy of Dynamically Decreasing the Learning Rate

How to choose a learning rate is a critical issue. The fixed learning rate may lead to a slow convergence for a gradient decent algorithm [25]. Generally, the value of learning rate (i.e., shrinkage coefficient) has an influence on the speed of convergence for some LTR algorithms with shrinkage coefficient, which directly affects their efficiency and effectiveness to a certain extent. In the algorithm, the learning rate is unchanged during the gradient boosting process; that is, the learning rate is a constant over training iterations. This fixed learning rate is not conducive to the acceleration of convergence when the optimization result has not yet reached the convergence in a given number of iterations. Therefore, the performance of ranking model is decreased at the case of the fixed learning rate if the iteration is not sufficient. In order to overcome the shortcoming, we propose a strategy of dynamically decreasing learning rate to update the weights. The fixed learning rate in the - algorithm is replaced by a dynamical learning rate :where and denote the maximum and minimum learning rates, respectively, represents the total number of trees or boosting iterations, and represents the current number of iterations or the -th tree. represents the learning rate at iteration -th. From the definition of , we can know that this strategy starts the training process with the given maximum learning rate , and the learning rate is being shrunk gradually, while these trees are being established. When the learning rate decreases to the given minimum value , the training process terminates.

is asymptotically decreased throughout the whole training. It is beneficial to speed up the convergence and improves the effectiveness of the ranking model. Because the produced ranking models are rough in the early stages of the training process, using a relatively high learning rate to train ranking models can speed up the convergence and avoid a local minima. A large value of the shrinkage coefficient can speed up the initial improvement when the algorithm does not converge. Because the produced ranking models are fine in the late stage of the training process, using a relatively low learning rate to refine ranking models can improve a local search and help it converge to a better optimizing point. A small value of the shrinkage coefficient can lead to a better performance when the algorithm is gradually close to convergence.

3.5. Complexity Analysis

In the original - algorithm, the time complexity of the algorithm is , where represents the number of training samples, represents the total number of trees, and represents the number of leaves per tree. In our improved algorithm, we have improved the gradient and proposed strategies of evaluating a ranking model and dynamically decreasing the learning rate, but these improvements only modify the related computing methods, which do not affect the time complexity of the algorithm; that is, the time complexity of our improved algorithm is also .

In addition, if no performance gain on the validation data is observed in the set number of rounds during the process of learning, then the training process terminates in the original algorithm. Our strategy of dynamically decreasing the learning rate may lead to an early stop with a greater probability and then accelerate the convergence of the algorithm. Therefore, the total number of iterations or computational time cost of our improved algorithm is reduced in most cases, as shown in Figure 12.

4. Evaluation Measures of Matthew Effect

A ranking model is richer than another model if (1) the former has both more rich queries and more poor queries than the latter, or (2) the distribution of the effectiveness of queries of the former is more discrete than the latter. A richer model is of a stronger Matthew effect. In order to measure the performances of the ranking model yielded by our improved method, we introduce the following utility metrics to characterize the Matthew effect of ranking models from different perspectives. Based on the effectiveness evaluation criteria for information retrieval, we introduce Gini coefficient, mean-variance, and quantity statistics to measure the performances.

4.1. Gini Coefficient

The Gini coefficient [26, 27] is a measure of the statistical dispersion that represents the income distribution of the residents in a nation. It was developed by the Italian statistician and sociologist Corrado Gini in 1912. The Gini coefficient is usually defined mathematically based on the Lorenz curve and is most easily calculated from unordered size data as the “relative mean difference” divided by the mean size . The “relative mean difference” is the mean of the difference between every possible pair of individuals. The Gini coefficient is calculated using the following formula: where denotes the income of individual , denotes the absolute value of the difference between and , denotes the mean value of all individuals’ incomes, and denotes the total number of individuals. The smaller the inequality of income, the smaller the value of the Gini coefficient, and vice versa.

Matthew effect is reflected by using Gini coefficients for the measurement in many economic areas. Gini coefficient can capture the Matthew Effect, which can be used to measure the performance of the ranking model. If we make an analogy with the distribution of the national income in the field of finance, the query in the LTR task resembles the individual in the distribution of the national income, and the effectiveness of the query resembles the income of the individual. Therefore, the Gini coefficient of LTR can be defined as follows:where denotes the -th query, represents the total number of queries in query set , and represents the effectiveness of . is used to measure the degree of difference in effectiveness among all the queries in a ranking model and reflects the Matthew effect. If the value of obtained by a ranking model is larger, then the ranking model has a stronger Matthew effect.

4.2. Mean-Variance

Mean is the average value of all random variables. In order to observe the effectiveness of a ranking model, the mean of a ranking model is defined as follows:

For a ranking model, measures the average effectiveness (such as [23] and [24]) of all queries in the query set; that is, it refers to the average effectiveness of the ranking model. The greater , the better the average effectiveness of the ranking model, and vice versa.

Variance is used to measure the degree of deviation between a set of random variables and their mean, and thus it is an important and common metric for calculating the discrete trend. The greater the variance, the greater the degree of deviation, and vice versa.

In a ranking model, some queries are of high effectiveness but some queries are of low effectiveness. Therefore, in order to observe the degrees of their deviation, we divide the variance into the upside semivariance and the downside semivariance. The upside semivariance and the downside semivariance of a ranking model are defined as follows:where and denote the set of queries with above-mean effectiveness and below-mean effectiveness in the query set , respectively, and and denote the number of sets of queries and , respectively.

For a ranking model, measures the discrete degree of effectiveness of the queries that are over the value of , and measures the discrete degree of effectiveness of the queries that are under the value of . The Matthew effect of a ranking model can be reflected by its and . The higher the values of both and of a ranking model, the stronger the Matthew effect of the ranking model, and vice versa.

4.3. Quantity Statistics

Generally, for most of the effectiveness metrics in information retrieval, the range of their values is between 0 and 1, such as [23] and [24]. To compute the effectiveness distribution of all queries in a query set, the range of values of the effectiveness is divided into 4 intervals: , , , and . We compute the number of queries in these different intervals according to the effectiveness values of the queries in a given query set for different ranking models, and the purpose is to evaluate the strengths of their Matthew effects. We use an array to express the quantity statistics of different intervals for the effectiveness of queries, and is defined as follows:

If the values of and of a ranking model are larger, then the ranking model has a stronger Matthew effect.

5. Experiments

In order to illustrate the advantages of our improved algorithm (named as ), we implement the algorithm based on the open-source RankLib library of LTR algorithms developed by Van Dang et al. (http://sourceforge.net/p/lemur/code/HEAD/tree/RankLib/trunk/). We conduct extensive experiments to evaluate and compare the ranking performance of with three well-known LTR algorithms, [5, 6], RankBoost [11], and AdaRank [12], over three standard benchmark LTR datasets (including one large scale LTR dataset: MSLR-WEB30K (http://research.microsoft.com/en-us/projects/mslr/download.aspx), with 30000 queries and a total of 136 features in the Microsoft LTR datasets and two small LTR datasets, MQ2007 (http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx), with 1692 queries and a total of 46 features in the LETOR 4.0 datasets, and WCL2R with 79 queries and a total of 29 features released by Alcântara et al. [28], which includes click-through features. In order to make the order of magnitude of the gradient’s value and the learning rate in be the same as ’s, we set parameters for as follows: or 3, , , and , where , , , and are introduced in Section 3. We denote as and when and , respectively. The values of other parameters of are the same as ’s. We use [23] as the effectiveness optimization criterion during the training process for each LTR algorithm and each dataset. Based on the produced ranking models, we calculate the effectiveness measures (including and ) and our introduced Matthew effect measures (including , , , , and according to (10), (12), (13), and (14)). These results are shown in Figures 1–9. All experimental results are summed on these five folds for the test dataset. Meanwhile, the comprehensive evaluations are conducted and the performance of the strategy of dynamically decreasing the learning rate is also evaluated. These results are shown in Figures 10–12.

5.1. Experiments on Large Dataset: MSLR-WEB30K

Figures 1–6 provide the total results of five folds on MSLR-WEB30K dataset in terms of , , , , , and for different LTR algorithms, respectively.

From mean (including , , , , , and ) in Figure 1, we can see that the values of obtained by are bigger than , , RankBoost, and AdaRank. These results show that obtains the highest and best effectiveness. There are two main reasons. One is that incorporates the effectiveness evaluation measures into the gradient and combines the overall scores on training and validation datasets to choose a ranking model as the final ranking model. The other one is that uses the strategy of dynamically decreasing a learning rate to obtain an optimal tree. The main reason why the effectiveness of outperforms is that the enhancement of the Matthew effect of a ranking model usually reduces its effectiveness of the ranking model, while with is more able to strengthen the differences of the gradients and the differences of the effectiveness of queries than with . Therefore, -- strengthens the optimization of Matthew effect more than does.

From Gini coefficient in Figure 2, and -- both obtain a bigger value than . These results show that the effectiveness across different individual queries in has a greater difference and thus indicates that the ranking model trained by has a stronger Matthew effect than . Figure 2 also indicates that the values of of are bigger than the one of . The main reason is that the value of in is bigger than that in . The bigger the value of , the greater the difference between the assigned weights of the rich query and the poor query. Therefore, the rich query can get more optimization than the poor query in the process of learning a ranking model. Therefore, strengthens the optimization of Matthew effect compared to , and the Matthew effect of is highlighted to a certain extent.

From Figures 3 and 4, the values of variances and generated by and are bigger than , RankBoost, and AdaRank in most of cases (except that the values of generated by and are smaller than RankBoost and AdaRank in terms of , , and , and the value of generated by is smaller than in terms of ). These results demonstrate that the effectiveness across different individual queries in has also a greater difference generally. Therefore, the ranking model trained by usually exhibits a stronger Matthew effect than , RankBoost, and AdaRank.

From Figures 5 and 6, the values of with regard to rich queries and with regard to poor queries obtained by and are bigger than , RankBoost, and AdaRank in most of cases. These results show that leads to a relative polarization, thereby further exhibiting a stronger Matthew effect than in general.

Since highlights the corresponding differences of upward or downward ranking force between documents with regard to rich queries with high effectiveness and documents with regard to poor queries with low effectiveness, it strengthens the optimization of rich queries and weakens the optimization of poor queries in the training process of the ranking models. Consequently, more forces are used to optimize the ranked positions of the documents in the rich queries and less for the poor queries. Finally, increases the degree of difference or dispersion in the effectiveness across all the queries. Therefore, the ranking model trained by shows a stronger Matthew effect than .

5.2. Experiments on Small Datasets: WCL2R and MQ2007

Wu et.al. [5] have pointed out that smaller trees and shrinkage may be used if the training dataset is smaller. WCL2R with 79 queries and MQ2007 with 1692 queries are small datasets, so we set parameters for as follows: , , and , where represents the number of trees. We use as the effectiveness optimization criterion during the training process on WCL2R and MQ2007 for these LTR algorithms. Figures 7–9 provide the overall results of five folds on the two datasets in terms of measures , , , , , and for these LTR algorithms, respectively. These results also show that the algorithm performs better than others in most cases.

5.3. The Comprehensive Evaluation: Winning Number

We use the winning number to comprehensively evaluate the effectiveness and Matthew effect of LTR algorithms. is introduced by Liu [29], which counts the times a method beats other ones over all datasets. We extend to a cross-measure for each LTR algorithm, which counts the times a method beats other ones over all measures and datasets. We redefine as follows: where is the number of the compared methods, and are the indices of the compared methods, is the number of LTR datasets, is the index of an LTR dataset, is the number of measures, is the index of a measure, is the performance of the -th method on the -th dataset in terms of the -th measure, and is an indicator function such that is a comprehensive evaluation of ranking performance. The larger is, the better the -th method performs.

Figure 10 shows for these LTR algorithms on the three datasets in terms of the effectiveness (i.e., , including , , , , , and ) and Matthew effect (including , , , , and ). From this figure, we can observe the following facts:

(1) In terms of of effectiveness, and are better than , RankBoost, and AdaRank, and is better than RankBoost and AdaRank.

(2) In terms of of Matthew effect, and are also better than , RankBoost, and AdaRank, but is worse than RankBoost and AdaRank.

5.4. Performance of the Strategy of Dynamically Decreasing the Learning Rate

We conduct experiments on the MSLR-WEB30K dataset to evaluate the advantage of the strategy of dynamically decreasing the learning rate; that is, the effectiveness and convergence efficiency are indeed enhanced when the dynamically decreasing learning rate replaces a fixed one. The fixed learning rate in the algorithm is replaced with the dynamical learning rate in (8). For the large-scale dataset, in order to keep the consistency of values of parameters of the algorithm with the fixed learning rate, we set parameters of the algorithm with the dynamically decreasing learning rate as follows: , , and other parameters are unchanged. We use and as the effectiveness optimization criteria during the training process of the ranking models, respectively. Figure 11 reports the overall results of including , , , , , and on the five folds, respectively. Figure 12 reports the total number of iterations accordingly. Obviously, the strategy of dynamically decreasing the learning rate not only improves the effectiveness of the ranking models but also enhances the convergence efficiency of our improved .

In summary, experimental results on three benchmark datasets demonstrate that can achieve a better performance on both Matthew effect and effectiveness compared with the state-of-the-art LTR algorithms. In addition, the experiments prove that our proposed strategy of dynamically decreasing the learning rate is better than the fixed learning rate in the algorithm.

6. Conclusion

In this paper, we integrate the idea of Matthew effect into the algorithm of LTR and present a new function of the gradient in the algorithm which can highlight Matthew effect and prove that the function satisfies the consistency property. In addition, we propose two strategies to evaluate the ranking models and dynamically decrease the learning rate, respectively. They can not only strengthen Matthew effect but also improve the effectiveness of the produced ranking model. Experiments do prove that our improvements work.

Some interesting issues can be explored in the future research.

(1) Other learning rates such as optimal learning rate and adaptive learning rate can be considered in LTR.

(2) It is obvious that different information has different popularity in different time periods, which leads to the users’ queries always changing their popularity over time. Some queries (hot queries) gain a huge popularity with numerous searchers, while some queries (cold queries) are just the opposite. We plan to integrate the hot and cold queries into an LTR algorithm.

(3) The multiobjective evolutionary algorithms and reinforcement learning techniques can be incorporated into LTR to develop a multiobjective and online LTR algorithm.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants nos. 61762052 and 61572360, in part by the Natural Science Foundation of Jiangxi Province of China under Grant no. 20171BAB202010, in part by the Opening Foundation of Network and Data Security Key Laboratory of Sichuan Province under Grant no. NDSMS201602, and in part by the Doctoral Scientific Research Startup Foundation of Jinggangshan University under Grant no. JZB1804.

References

S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks, vol. 56, no. 18, pp. 3825–3833, 2012.
View at: Publisher Site | Google Scholar
L. Page, S. Brin, and R. Motwani, “The PageRank citation ranking: Bringing order to the web. Stanford InfoLab,” Tech. Rep., Stanford Univ., Stanford, CA, USA, 1999.
View at: Google Scholar
X. Wu, V. Kumar, J. R. Quinlan et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008.
View at: Publisher Site | Google Scholar
R. K. Merton, “The matthew effect in science,” Science, vol. 159, no. 3810, pp. 56–62, 1968.
View at: Publisher Site | Google Scholar
Q. Wu, C. J. C. Burges, K. M. Svore, and J. Gao, “Adapting boosting for information retrieval measures,” Information Retrieval, vol. 13, no. 3, pp. 254–270, 2010.
View at: Publisher Site | Google Scholar
C. J. C. Burges, “From ranknet to lambdarank to lambdamart: An overview,” Microsoft Research, Microsoft Research, Redmond, WA, USA, 2010.
View at: Google Scholar
C. Burges, T. Shaked, E. Renshaw et al., “Learning to rank using gradient descent,” in Proceedings of the 22nd International Conference on Machine Learning (ICML '05), pp. 89–96, ACM, August 2005.
View at: Publisher Site | Google Scholar
C. J. Burges, R. Ragno, and Q. V. Le, “Learning to rank with nonsmooth cost functions,” in Proceedings of the 19th international conference on Advances in Neural Information Processing Systems, pp. 193–200, Vancouver, 2006.
View at: Google Scholar
Y. Ganjisaffar, R. Caruana, and C. V. Lopes, “Bagging gradient-boosted trees for high precision, low variance ranking models,” in Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'11, pp. 85–94, China, July 2011.
View at: Google Scholar
N. Asadi and J. Lin, “Training Efficient Tree-Based Models for Document Ranking,” in Advances in Information Retrieval, vol. 7814 of Lecture Notes in Computer Science, pp. 146–157, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
View at: Publisher Site | Google Scholar
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient boosting algorithm for combining preferences,” Journal of Machine Learning Research, vol. 4, no. 6, pp. 933–969, 2003.
View at: Google Scholar | MathSciNet
J. Xu and H. Li, “AdaRank: a boosting algorithm for information retrieval,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '07), pp. 391–398, ACM, July 2007.
View at: Publisher Site | Google Scholar
O. A. Ibrahim and D. Landa-Silva, “An evolutionary strategy with machine learning for learning to rank in information retrieval,” Soft Computing, vol. 22, no. 10, pp. 3171–3185, 2018.
View at: Publisher Site | Google Scholar
S. Wang, Y. Wu, B. J. Gao, K. Wang, H. W. Lauw, and J. Ma, “A Cooperative Coevolution Framework for Parallel Learning to Rank,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, pp. 3152–3165, 2015.
View at: Publisher Site | Google Scholar
C. C. A. De Sá, M. A. Gonçalves, D. X. Sousa, and T. Salles, “Generalized BROOF-L2R: A general framework for learning to rank based on boosting and random forests,” in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 95–104, Italy, July 2016.
View at: Google Scholar
J. Li, G. Liu, C. Yan, and C. Jiang, “Robust Learning to Rank Based on Portfolio Theory and AMOSA Algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 6, pp. 1007–1018, 2017.
View at: Publisher Site | Google Scholar
O. Wu, Q. You, F. Xia, L. Ma, and W. M. Hu, “Listwise learning to rank from crowds,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 11, no. 1, Article ID 4, 2016.
View at: Google Scholar
M. Ibrahim and M. Carman, “Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank,” ACM Transactions on Information and System (TOIS), vol. 34, no. 4, Article ID 20, 2016.
View at: Google Scholar
M. Ferov and M. Modrý, “Enhancing LambdaMART using oblivious trees,” 2016, https://arxiv.org/abs/1609.05610.
View at: Google Scholar
X. Geng, A. Arnold, T.-Y. Liu, H. Li, T. Qin, and H.-Y. Shum, “Query dependent ranking using K-nearest neighbor,” in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008, pp. 115–122, Singapore, July 2008.
View at: Google Scholar
P. Cai, W. Gao, and A. Zhou, “Query weighting for ranking model adaptation,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 112–122, Portland, 2011.
View at: Google Scholar
P. Li, M. Sanderson, M. Carman, and F. Scholer, “On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections,” in Proceedings of the the 25th ACM International, pp. 1413–1422, Indianapolis, Indiana, USA, October 2016.
View at: Publisher Site | Google Scholar
K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques,” ACM Transactions on Information and System Security, vol. 20, no. 4, pp. 422–446, 2002.
View at: Publisher Site | Google Scholar
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan, “Expected reciprocal rank for graded relevance,” in Proceedings of the 18th ACM International Conference on Information and Knowledge Management (CIKM '09), pp. 621–630, ACM, New York, NY, USA, November 2009.
View at: Publisher Site | Google Scholar
C. L. P. Chen, J. Wang, C. H. Wang, and L. Chen, “A new learning algorithm for a fully connected neuro-fuzzy inference system,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 10, pp. 1741–1757, 2014.
View at: Google Scholar
P. M. Dixon, J. Weiner, T. Mitchell-Olds, and R. Woodley, “Bootstrapping the Gini coefficient of inequality,” Ecology, vol. 68, no. 5, pp. 1548–1561, 1987.
View at: Publisher Site | Google Scholar
C. Damgaard and J. Weiner, “Describing inequality in plant size or fecundity,” Ecology, vol. 81, no. 4, pp. 1139–1142, 2000.
View at: Publisher Site | Google Scholar
O. D. A. Alcântara, Á. R. Pereira Jr., H. M. Almeida, M. A. Gonçalves, C. Middleton, and R. Baeza-Yates, “WCL2R: A Benchmark collection for learning to rank research with clickthrough data,” Journal of Information and Data Management, vol. 1, no. 3, pp. 551–566, 2010.
View at: Google Scholar
T.-Y. Liu, “Learning to rank for Information retrieval,” Foundations and Trends in Information Retrieval, vol. 3, no. 3, pp. 225–231, 2009.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2018 Jinzhong Li and Guanjun Liu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Mathematical Problems in Engineering

An Improved LambdaMART Algorithm Based on the Matthew Effect

Abstract

1. Introduction

2. Related Work

3. The Improvements of the λ-MART Algorithm via the Idea of Matthew Effect and Some Strategies

3.1. The Gradient of Algorithm

3.2. Improvement of the Gradient Function

3.3. A Strategy of Evaluating the Current Ranking Model

3.4. A Strategy of Dynamically Decreasing the Learning Rate

3.5. Complexity Analysis

4. Evaluation Measures of Matthew Effect

4.1. Gini Coefficient

4.2. Mean-Variance

4.3. Quantity Statistics

5. Experiments

5.1. Experiments on Large Dataset: MSLR-WEB30K

5.2. Experiments on Small Datasets: WCL2R and MQ2007

5.3. The Comprehensive Evaluation: Winning Number

5.4. Performance of the Strategy of Dynamically Decreasing the Learning Rate

6. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright