nips nips2012 nips2012-323 knowledge-graph by maker-knowledge-mining

323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space


Source: pdf

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract This paper is concerned with the statistical consistency of ranking methods. [sent-9, score-0.774]

2 Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. [sent-10, score-1.5]

3 This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. [sent-11, score-0.735]

4 In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. [sent-12, score-0.091]

5 We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. [sent-13, score-1.077]

6 What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. [sent-14, score-0.165]

7 Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications. [sent-15, score-0.87]

8 1 Introduction Ranking is a central problem in many applications, such as document retrieval, meta search, and collaborative filtering. [sent-16, score-0.069]

9 In recent years, machine learning technologies called ‘learning to rank’ have been successfully applied. [sent-17, score-0.032]

10 In training, a number of sets (queries) of objects (documents) are given and within each set the objects are labeled by assessors, mainly based on multi-level ratings. [sent-19, score-0.295]

11 The target of learning is to create a model that provides a ranking over the objects that best respects the observed labels. [sent-20, score-0.672]

12 In testing, given a new set of objects, the trained model is applied to generate a ranked list of the objects. [sent-21, score-0.158]

13 Ideally, the learning process should be guided by minimizing a true loss such as the weighted pairwise disagreement loss (WPDL) [11], which encodes people’s knowledge on ranking evaluation. [sent-22, score-1.263]

14 However, the minimization can be very difficult due to the nonconvexity of the true loss. [sent-23, score-0.078]

15 For example, RankSVM [14], RankBoost [12], and RankNet [3] minimize the hinge loss, the exponential loss, and the crossentropy loss, respectively. [sent-25, score-0.046]

16 In machine learning, statistical consistency is regarded as a desired property of a learning method [1, 21, 20], which reveals the statistical connection between a surrogate loss function and the true loss. [sent-26, score-0.681]

17 Statistical consistency in the context of ranking have been actively studied in recent years 1 [8, 9, 19, 11, 2, 18]. [sent-27, score-0.77]

18 According to the studies in [11], many existing pairwise ranking methods are, surprisingly, inconsistent with WPDL, even in a low-noise setting. [sent-28, score-0.859]

19 However, as we know, the pairwise ranking methods have been shown to work very well in practice, and have been regarded as state-of-the-art even today [15, 16, 17]. [sent-29, score-0.829]

20 For example, the experimental results in [2] show that a weighted preorder loss in RankSVM [4] can outperform a consistent surrogate loss in terms of NDCG (See Table 2 in [2]). [sent-30, score-0.617]

21 The contradiction between theory and application inspires us to revisit the statistical consistency of pairwise ranking methods. [sent-31, score-1.095]

22 In particular, we will study whether there exists a new assumption on the probability space that can make statistical consistency naturally hold, and how this new assumption compares with the low-noise setting used in [11]. [sent-32, score-0.386]

23 To perform our study, we first derive a sufficient condition for statistical consistency of ranking methods called rank-consistency, which is in nature very similar to edge-consistency in [11] and order-preserving in [2]. [sent-33, score-0.748]

24 Then we give an assumption on the probability space where ratings (labels) of objects come from, which we call a rank-differentiable probability space (RDPS). [sent-34, score-0.428]

25 Intuitively, RDPS reveals the reason why an object (denoted as object A) should be ranked higher than another object (denoted as object B). [sent-35, score-0.512]

26 That is, the probability of any ratings consistent with the preference1 is larger than that of its dual ratings (obtained by exchanging the labels of object A and object B while keeping others unchanged). [sent-36, score-0.643]

27 We then prove that with the RDPS assumption, the weighted pairwise surrogate loss, which is a generalization of many surrogate loss functions used in existing pairwise ranking methods (e. [sent-37, score-1.439]

28 , the preorder loss in RankSVM [2], the exponential loss in RankBoost [12], and the logistic loss in RankNet [3]), is statistically consistent with WPDL. [sent-39, score-0.592]

29 Please note that our theoretical result contradicts the result obtained in [11], mainly due to the different assumptions used. [sent-40, score-0.132]

30 What is interesting, and to some extent inspiring, is that our RDPS assumption is not stronger than the low-noise setting used in [11], and in some sense they are very similar to each other (although they focus on different aspects of the probability space). [sent-41, score-0.149]

31 We then conducted detailed comparisons between them to gain more insights on what affects the consistency of ranking. [sent-42, score-0.248]

32 According to our theoretical analysis, we argue that it is not yet appropriate to draw any conclusion about the inconsistency of pairwise ranking methods, especially because it is hard to know what the probability space really is. [sent-43, score-0.987]

33 In this sense, we think the pairwise ranking methods are still good choices for real ranking applications, due to their good empirical performances. [sent-44, score-1.265]

34 Sections 2 defines the consistency problem formally and provides a sufficient condition under which consistency with WPDL is achieved for ranking methods. [sent-46, score-0.901]

35 Section 3 gives the main theoretical results, including formal definition of RDPS and conditions of statistical consistency of pairwise ranking methods. [sent-47, score-1.025]

36 Further discussions on whether RDPS is a strong assumption and why our results contradict with that in [11] are presented in Section 4. [sent-48, score-0.117]

37 2 Preliminaries of Statistical Consistency Let x = {x1 , · · · , xm } be a set of objects to be ranked. [sent-50, score-0.154]

38 Suppose the labels of the objects are given as multi-level ratings r = (r1 , · · · , rm ) from space R, where ri denotes the label of xi . [sent-51, score-0.408]

39 Without loss of generality, we adopt K-level ratings used in [7], that is, ri ∈ {0, 1, · · · , K −1}. [sent-52, score-0.376]

40 If ri > rj , xi should be ranked higher than xj . [sent-53, score-0.198]

41 Assume that (x, r) is a random variable of space X × R according to a probability measure P . [sent-54, score-0.055]

42 Following existing literature, let f be a ranking function that gives a score to each object to produce a ranked list and denote F as the space of all ranking functions. [sent-55, score-1.315]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ranking', 0.507), ('rdps', 0.465), ('wpdl', 0.349), ('pairwise', 0.228), ('consistency', 0.197), ('ranksvm', 0.154), ('ratings', 0.146), ('loss', 0.139), ('surrogate', 0.128), ('objects', 0.127), ('preorder', 0.116), ('ranked', 0.115), ('disagreement', 0.106), ('ranknet', 0.103), ('inspiring', 0.103), ('rankboost', 0.103), ('chinese', 0.093), ('object', 0.088), ('academy', 0.074), ('inconsistent', 0.069), ('ag', 0.064), ('regarded', 0.057), ('weighted', 0.057), ('ri', 0.055), ('nonconvexity', 0.051), ('labels', 0.049), ('inspires', 0.047), ('conclusive', 0.047), ('unexplained', 0.047), ('contradict', 0.047), ('assumption', 0.045), ('reveals', 0.045), ('argue', 0.045), ('inconsistency', 0.045), ('meta', 0.045), ('statistical', 0.044), ('list', 0.043), ('contradicts', 0.042), ('exchanging', 0.042), ('asia', 0.042), ('mainly', 0.041), ('ndcg', 0.041), ('technology', 0.039), ('consistent', 0.038), ('sciences', 0.038), ('respects', 0.038), ('guided', 0.038), ('stronger', 0.037), ('institute', 0.037), ('today', 0.037), ('revisit', 0.037), ('adopt', 0.036), ('years', 0.036), ('lan', 0.035), ('contradiction', 0.035), ('guo', 0.033), ('technologies', 0.032), ('cheng', 0.031), ('unchanged', 0.031), ('space', 0.031), ('ij', 0.031), ('studies', 0.031), ('actively', 0.03), ('bridge', 0.03), ('please', 0.03), ('really', 0.029), ('rj', 0.028), ('rank', 0.028), ('xm', 0.027), ('true', 0.027), ('theoretical', 0.027), ('queries', 0.026), ('concerned', 0.026), ('insights', 0.026), ('know', 0.026), ('hinge', 0.025), ('affects', 0.025), ('discussions', 0.025), ('especially', 0.025), ('existing', 0.024), ('probability', 0.024), ('collaborative', 0.024), ('preliminaries', 0.024), ('microsoft', 0.024), ('aforementioned', 0.024), ('ndings', 0.023), ('people', 0.023), ('think', 0.023), ('surprising', 0.022), ('ideally', 0.022), ('formal', 0.022), ('keeping', 0.022), ('extent', 0.022), ('assumptions', 0.022), ('denoted', 0.022), ('encodes', 0.022), ('statistically', 0.021), ('minimize', 0.021), ('sense', 0.021), ('alternatively', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

2 0.24984808 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

3 0.17315157 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

4 0.17022654 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

Author: Harish G. Ramaswamy, Shivani Agarwal

Abstract: We study consistency properties of surrogate loss functions for general multiclass classification problems, defined by a general loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting. We then introduce the notion of classification calibration dimension of a multiclass loss matrix, which measures the smallest ‘size’ of a prediction space for which it is possible to design a convex surrogate that is classification calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, as one application, we provide a different route from the recent result of Duchi et al. (2010) for analyzing the difficulty of designing ‘low-dimensional’ convex surrogates that are consistent with respect to pairwise subset ranking losses. We anticipate the classification calibration dimension may prove to be a useful tool in the study and design of surrogate losses for general multiclass learning problems. 1

5 0.17000876 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

6 0.16228752 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

7 0.11929671 165 nips-2012-Iterative ranking from pair-wise comparisons

8 0.11762369 148 nips-2012-Hamming Distance Metric Learning

9 0.10419806 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

10 0.1000898 286 nips-2012-Random Utility Theory for Social Choice

11 0.084100224 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

12 0.067551486 16 nips-2012-A Polynomial-time Form of Robust Regression

13 0.061364546 196 nips-2012-Learning with Partially Absorbing Random Walks

14 0.059958443 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

15 0.056105323 227 nips-2012-Multiclass Learning with Simplex Coding

16 0.053617664 60 nips-2012-Bayesian nonparametric models for ranked data

17 0.053138033 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

18 0.048895981 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

19 0.04717524 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

20 0.044654828 30 nips-2012-Accuracy at the Top


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.118), (1, 0.023), (2, -0.003), (3, -0.034), (4, 0.044), (5, -0.107), (6, 0.001), (7, 0.182), (8, 0.022), (9, 0.228), (10, -0.105), (11, 0.048), (12, -0.069), (13, 0.044), (14, 0.116), (15, 0.022), (16, 0.068), (17, 0.014), (18, 0.001), (19, 0.082), (20, 0.043), (21, -0.006), (22, -0.081), (23, 0.149), (24, 0.066), (25, 0.047), (26, 0.033), (27, 0.111), (28, -0.124), (29, -0.064), (30, 0.162), (31, -0.189), (32, 0.11), (33, 0.039), (34, 0.109), (35, -0.031), (36, 0.006), (37, -0.069), (38, 0.083), (39, -0.015), (40, 0.041), (41, -0.051), (42, 0.007), (43, 0.07), (44, -0.049), (45, -0.04), (46, -0.043), (47, 0.007), (48, -0.026), (49, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97475207 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

2 0.886087 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

3 0.80292046 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

4 0.5962739 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

Author: Kevin Swersky, Brendan J. Frey, Daniel Tarlow, Richard S. Zemel, Ryan P. Adams

Abstract: In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a count-conditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a Poisson-Binomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multi-object classification, learning to rank, and top-K classification. 1

5 0.52988601 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

6 0.5210728 196 nips-2012-Learning with Partially Absorbing Random Walks

7 0.51922387 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

8 0.48480353 165 nips-2012-Iterative ranking from pair-wise comparisons

9 0.46709144 286 nips-2012-Random Utility Theory for Social Choice

10 0.44483832 280 nips-2012-Proper losses for learning from partial labels

11 0.42384049 22 nips-2012-A latent factor model for highly multi-relational data

12 0.41077176 148 nips-2012-Hamming Distance Metric Learning

13 0.39280167 330 nips-2012-Supervised Learning with Similarity Functions

14 0.38888919 30 nips-2012-Accuracy at the Top

15 0.36964405 338 nips-2012-The Perturbed Variation

16 0.35658774 217 nips-2012-Mixability in Statistical Learning

17 0.35650986 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

18 0.32549793 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

19 0.32348698 288 nips-2012-Rational inference of relative preferences

20 0.31754997 16 nips-2012-A Polynomial-time Form of Robust Regression


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.013), (21, 0.014), (38, 0.103), (39, 0.369), (42, 0.02), (74, 0.054), (76, 0.212), (80, 0.092), (92, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86838347 351 nips-2012-Transelliptical Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1

2 0.84162581 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

Author: Gal Elidan, Cobi Cario

Abstract: The empirical success of the belief propagation approximate inference algorithm has inspired numerous theoretical and algorithmic advances. Yet, for continuous non-Gaussian domains performing belief propagation remains a challenging task: recent innovations such as nonparametric or kernel belief propagation, while useful, come with a substantial computational cost and offer little theoretical guarantees, even for tree structured models. In this work we present Nonparanormal BP for performing efficient inference on distributions parameterized by a Gaussian copulas network and any univariate marginals. For tree structured networks, our approach is guaranteed to be exact for this powerful class of non-Gaussian models. Importantly, the method is as efficient as standard Gaussian BP, and its convergence properties do not depend on the complexity of the univariate marginals, even when a nonparametric representation is used. 1

3 0.83215988 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We define this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global verification. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

same-paper 4 0.8051632 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

5 0.8001008 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Author: Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou

Abstract: Both random Fourier features and the Nystr¨ m method have been successfully o applied to efficient kernel learning. In this work, we investigate the fundamental difference between these two approaches, and how the difference could affect their generalization performances. Unlike approaches based on random Fourier features where the basis functions (i.e., cosine and sine functions) are sampled from a distribution independent from the training data, basis functions used by the Nystr¨ m method are randomly sampled from the training examples and are o therefore data dependent. By exploring this difference, we show that when there is a large gap in the eigen-spectrum of the kernel matrix, approaches based on the Nystr¨ m method can yield impressively better generalization error bound than o random Fourier features based approach. We empirically verify our theoretical findings on a wide range of large data sets. 1

6 0.79935133 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

7 0.75793409 352 nips-2012-Transelliptical Graphical Models

8 0.64780283 310 nips-2012-Semiparametric Principal Component Analysis

9 0.63086635 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

10 0.61947 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

11 0.6095137 163 nips-2012-Isotropic Hashing

12 0.60425323 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

13 0.60231119 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

14 0.601825 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

15 0.59529203 75 nips-2012-Collaborative Ranking With 17 Parameters

16 0.59385371 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

17 0.59299868 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

18 0.59151965 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

19 0.58758324 346 nips-2012-Topology Constraints in Graphical Models

20 0.58676767 280 nips-2012-Proper losses for learning from partial labels