nips nips2011 nips2011-20 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
Reference: text
sentIndex sentText sentNum sentScore
1 il Abstract Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). [sent-4, score-0.417]
2 The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. [sent-5, score-0.458]
3 Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). [sent-6, score-0.53]
4 Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. [sent-7, score-0.281]
5 Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? [sent-9, score-0.331]
6 1 Introduction We study the problem of learning to rank from pairwise preferences, and solve an open problem that has led to development of many heuristics but no provable results. [sent-10, score-0.146]
7 The input is a set V of n elements from some universe, and we wish to linearly order them given pairwise preference labels, given as response to which is preferred, u or v? [sent-11, score-0.361]
8 The goal is to linearly order the elements from the most preferred to the least preferred, while disagreeing with as few pairwise preference labels as possible. [sent-13, score-0.507]
9 Our performance is measured by two parameters: The loss (number of disagreements) and query complexity (number of preference responses we need). [sent-14, score-0.393]
10 2 The loss minimization problem given the entire n × n preference matrix is a well known NP-hard problem called MFAST (minimum feedback arc-set in tournaments) [5]. [sent-16, score-0.187]
11 In our case each edge from the input graph is given for a unit cost, hence we seek query efficiency. [sent-18, score-0.207]
12 Our algorithm samples preference labels non-uniformly and adaptively, hence we obtain an active learning algorithm. [sent-19, score-0.368]
13 5 scale rating for restaurants, or 0, 1 rating (irrelevant/relevant) for candidate documents retrieved for a query (known as the binary ranking problem). [sent-27, score-0.229]
14 The preference graph induced from these labels is transitive, hence no combinatorial problems arise due to nontransitivity. [sent-28, score-0.281]
15 Some LTR literature does consider the pairwise preference label approach, and there is much justification to it (see [11, 22] and reference therein). [sent-30, score-0.288]
16 [26]) discuss pairwise or higher order ∗ Supported by a Marie Curie International Reintegration Grant PIRG07-GA-2010-268403 1 (listwise) approaches, but a close inspection reveals that they do not use pairwise (or listwise) labels, only pairwise (or listwise) loss functions. [sent-33, score-0.303]
17 A key change to their algorithm, which is not query efficient, involves careful sampling followed by iterated sample refreshing steps. [sent-37, score-0.212]
18 General bounds are difficult to use: [8] provides general purpose active learning bounds which are quite difficult to use in actual specific problems; The A2 algorithm [7], analyzed in [21] using the disagreement coefficient is not useful here. [sent-44, score-0.159]
19 [10] work in a Bayesian setting; In [19], the input preference graph is transitive, and labels are nondeterministic. [sent-47, score-0.243]
20 In this work the input is worst case and not Bayesian, query responses are deterministic and elements do not necessarily have a latent value. [sent-49, score-0.242]
21 Paper Organization: Section 2 presents basic definitions and lemmata, and in particular defines what a good decomposition is and how it can be used in learning permutations from pairwise preferences. [sent-50, score-0.273]
22 Section 3 presents our main active learning algorithm which is, in fact, an algorithm for producing a good decomposition query efficiently. [sent-51, score-0.305]
23 1 We assume an unknown preference function W on pairs of elements in V , which is unknown to us. [sent-56, score-0.26]
24 We wish to predict W using a hypothesis h from concept class H = Π(V ), where Π(V ) is the set of permutations π over V viewed equivalently as binary functions over V × V satisfying, for all u, v, w ∈ V , π(u, v) = 1 − π(v, u) and π(u, w) = 1 whenever π(u, v) = π(v, w) = 1. [sent-61, score-0.123]
25 Abusing notation, we also view permutations as injective functions from [n] to V , so that the element π(1) ∈ V is in the first, most preferred position and π(n) is the least preferred one. [sent-63, score-0.318]
26 In this paper we show an improved, almost optimal statistical learning theoretical bound using recent important breakthroughs in combinatorial optimization of a related problem called minimum feedback arc-set in tournaments (MFAST). [sent-69, score-0.139]
27 This PTAS is not useful however for the purpose of learning to rank from pairwise preferences because it is not query efficient. [sent-77, score-0.379]
28 Given a set V of size n, an ordered decomposition is a list of pairwise disjoint subsets V1 , . [sent-82, score-0.15]
29 For a permutation π ∈ Π(v) we let π|Vi denote its restriction to the elements of Vi (hence, π|Vi ∈ Π(Vi )). [sent-90, score-0.203]
30 We denote the set of permutations π ∈ Π(V ) respecting the decomposition V1 , . [sent-95, score-0.203]
31 We say that a subset U of V is small in V if |U | ≤ log n/ log log n, otherwise we say that U is big in V . [sent-102, score-0.322]
32 (We assume throughout that E is chosen with repetitions and is hence 2 |E| u πv a multiset; the accounting of parallel edges is clear. [sent-116, score-0.103]
33 The VC dimension of the set of permutations on V , viewed as binary classifiers on pairs of elements, is n − 1. [sent-122, score-0.123]
34 It is easy to show that the VC dimension is at most O(n log n), which is the logarithm of the number of permutations. [sent-123, score-0.089]
35 If E is chosen uniformly at random (with repetitions) as a sample of m elements from V , where m > n, then with probability at least 1 − δ over the sample, all permutations π 2 n log m+log(1/δ) m satisfy: |CE (π, V, W ) − C(π, V, W )| = n2 O . [sent-127, score-0.362]
36 Hence, if we want to minimize C(π, V, W ) over π to within an additive error of µn2 with probability at least 1−δ, it suffices to choose a sample E of m = O(µ−2 (n log n+log δ −1 )) elements from V 2 uniformly at random (with repetitions), and optimize CE (π, V, W ) instead. [sent-128, score-0.239]
37 3 Assume δ ≥ e−n , so that we get a more manageable sample bound of O(µ−2 n log n). [sent-129, score-0.132]
38 For two permutations π, σ, the Kendall-Tau metric dτ (π, σ) is defined as dτ (π, σ) = u=v 1[(u π v) ∧ (v σ u)] . [sent-131, score-0.123]
39 The Spearman Footrule metric dfoot (π, σ) is defined as dfoot (π, σ) = u |ρπ (u) − ρσ (u)| . [sent-132, score-0.412]
40 The following is well known [18]: dτ (π, σ) ≤ dfoot (π, σ) ≤ 2dτ (π, σ) . [sent-133, score-0.206]
41 3) Clearly C(·, V, ·) extends dτ (·, ·) to distances between permutations and binary tournaments, with the triangle inequality dτ (π, σ) ≤ C(π, V, W ) + C(σ, V, W ) satisfied for all W and π, σ ∈ Π(V ). [sent-135, score-0.172]
42 By definition of dfoot , this means that the averege element v ∈ V is translated Ω(µn) positions away from its position in π ∗ . [sent-141, score-0.303]
43 denotes the set of unordered pairs of distinct elements in V . [sent-145, score-0.11]
44 Assume E is chosen uniformly at random (with repetitions) as a sample of m elements from i∈B Vi , where m > n. [sent-162, score-0.15]
45 ) Then with probability at 2 least 1 − e−n over the sample, all permutations π ∈ Π(V ) satisfy: CE (π, {V1 , . [sent-176, score-0.123]
46 , Vk }, W ) − i∈B C(π|Vi , Vi , W |Vi ) = i∈B ni 2 O n log m+log(1/δ) m . [sent-179, score-0.147]
47 , n, and let E denote a random sample of O(ε n log n) elements from Vi Vi i∈B 2 , each element chosen uniformly at random with repetitions. [sent-193, score-0.324]
48 Given a set V of size n, a preference oracle W and an error tolerance parameter 0 < ε < 1, there exists a poly(n, ε−1 )-time algorithm returning, with constant probabiliy, an ε-good partition of V , querying at most O(ε−6 n log5 n) locations in W on expectation. [sent-211, score-0.225]
49 We define πv→i to be the permutation obtained by moving the rank of v to i in π, and leaving the rest of the elements in the same order. [sent-216, score-0.184]
50 , (v, uN ), drawn so that for each j ∈ [N ] the element uj is chosen uniformly at random from among the elements lying between v (exclusive) and position i (inclusive) in π. [sent-234, score-0.204]
51 Additionally, for any δ > 0, except with probability of failure δ, | TestMoveE (π, V, W, v, i) − TestMove(π, V, W, v, i)| = O |i − ρπ (v)| log δ −1 N . [sent-236, score-0.089]
52 It is a query efficient improvement of the PTAS in [23] with the following difference: here we are not interested in an approximation algorithm for MFAST, but just in an ε-good decomposition. [sent-239, score-0.169]
53 Whenever we reach a small block (line 3) or a big block with a probably approximately sufficiently high cost (line 8) in our recursion of Algorithm 2), we simply output it as a block in our partition. [sent-240, score-0.141]
54 Due to space limitations, we focus on the differences, and specifically on Procedure ApproxLocalImprove (Algorithm 3), replacing a greedy local improvement step in [23] which is not query efficient. [sent-252, score-0.169]
55 SampleAndRank (Algorithm 1) takes the following arguments: The set V , the preference matrix W and an accuracy argument ε. [sent-253, score-0.187]
56 It is implicitly understood that the argument W passed to SampleAndRank is given as a query oracle, incurring a unit cost upon each access. [sent-254, score-0.213]
57 The query complexity of this step is O(n log n) on expectation (see [3]). [sent-256, score-0.295]
58 The cost C(π, V, W ) of π computed in line 2 of SampleAndRank is O(1) times that of the optimal π ∗ , and the query cost incurred in the computation is O(n log n). [sent-260, score-0.419]
59 ApproxLocalImprove takes a set V of size N , W , a permutation π on V , two numbers C0 , ε and an integer n. [sent-266, score-0.101]
60 The procedure starts by creating a sample ensemble S = {Ev,i : v ∈ V, i ∈ [B, L]}, where B = log Θ(εN/ log n) and L = log N . [sent-272, score-0.359]
61 The size of each Ev,i ∈ S is Θ(ε−2 log2 n), and each element (v, x) ∈ Ev,i was added (with possible multiplicity) by uniformly at random selecting, with repetitions, an element x ∈ V positioned at distance at most 2i from the position of v in π. [sent-273, score-0.185]
62 Let Dπ denote the distribution space from which S was drawn, and let PrX∼Dπ [X = S] denote the probability of obtaining a given sample ensemble S. [sent-274, score-0.154]
63 S will enable us to approximate the improvement in cost obtained by moving a single element u to position j. [sent-275, score-0.141]
64 Fix u ∈ V and j ∈ [n], and assume log |j − ρπ (u)| ≥ B. [sent-278, score-0.089]
65 6): 1 TestMoveEu, (π, V, W, u, j) − TestMove(π, V, W, u, j) ≤ 2 ε|j − ρπ (u)|/ log n . [sent-286, score-0.089]
66 S is a good approximation if it is succesful and a good approximation at all u ∈ V , j ∈ [n] satisfying log |j − ρπ (u)| ∈ [B, L]. [sent-287, score-0.089]
67 In lines 15-18 we sought (using S) an element u and position j, such that moving u to j (giving rise to πu→j ) would considerably improve the cost w. [sent-295, score-0.141]
68 Unfortunately the sample ensemble S becomes stale: even if S was a good approximation, it is no longer necessarily so w. [sent-299, score-0.092]
69 We refresh it in line 16 by applying a transformation ϕu→j on S, resulting in a new sample ensemble ϕu→j (S) approximately distributed by Dπu→j . [sent-303, score-0.174]
70 It can be easily shown that each of T1 and T2 contains O(1) elements that are not contained in the other, and it can be assumed (using a simple clipping argument - omitted) that this number is exactly 1, hence |T1 | = |T2 |. [sent-315, score-0.111]
71 Finally, 6 Algorithm 3 ApproxLocalImprove(V, W, π, ε, n) (Note: π used as both input and output) 1: N ← |V |, B ← log(Θ(εN/ log n) , L ← log N 2: if N = O(ε−3 log3 n) then 3: return 4: end if 5: for v ∈ V do 6: r ← ρπ (v) 7: for i = B . [sent-318, score-0.178]
72 (setting := log |j − ρπ (u)| ): ∈ [B, L] and TestMoveEu, (π, V, W, u, j) > ε|j − ρπ (u)|/ log n do 16: For v ∈ V , i ∈ [B, L] refresh Ev,i w. [sent-325, score-0.219]
73 The difference between S and S , defined as dist(S, S ) := v,i Ev,i ∆Ev,i bounds the query complexity of computing mutations. [sent-352, score-0.242]
74 Let E2 denote the event that the cost approximations obtained in line 5 of SampleAndDecompose are successful at all recursive calls. [sent-367, score-0.116]
75 By Hoeffding tail bounds, this happens with probability 1 − O(n−4 ) for each call, there are O(n log n) calls, hence we can lower bound the probability of success of all executions by 0. [sent-368, score-0.163]
76 6 Let π ∗ denote the optimal permutation for the root call to SampleAndDecompose with V, W, ε. [sent-375, score-0.097]
77 By E1 we conclude that the cost of πX u→j is always an actual improvement compared to πX (for the current value of πX , u and j in iteration), and the improvement in cost is of magnitude at least Ω(ε|ρπX (u) − j|/ log n), which is Ω(ε2 NX / log2 n) due to the use of B defined in line 1. [sent-394, score-0.218]
78 ) Since C(πX , VX , W|VX ) ≤ N2 , the number of iterations is hence at most 2 −2 O(ε NX log n). [sent-396, score-0.127]
79 11 the expected query complexity incurred by the call to ApproxLocalImprove is therefore O(ε−5 NX log5 n). [sent-398, score-0.238]
80 Summing over the recursion tree, the total query complexity incurred by calls to ApproxLocalImprove is, on expectation, at most O(ε−5 n log6 n). [sent-399, score-0.312]
81 Let π1X denote the permutation obtained at that point, returned to SampleAndDecompose short in line 10. [sent-401, score-0.138]
82 We know by assumption, that the last sample ensemble S used in ApproxLocalImprove was a good approximation, hence long ∗ ∗ for all u ∈ VX : (*) TestMove(π1X , VX , W|VX , u, ρπX (u)) = O(ε|ρπ1X (u) − ρπX (u)|/ log n). [sent-405, score-0.219]
83 Let cross denote the (random) set of elements u ∈ VX that cross kX . [sent-407, score-0.166]
84 Following (*), the X long elements u ∈ VX can contribute at most O ε long |ρπ1 (u) − X u∈VX ∗ O(εdfoot (π1X , πX )/ log n) which ∗ ρπX (u)|/ log n to TX . [sent-409, score-0.251]
85 By the triangle inequality and the definition of πX , the last expresshort sion is O(εC(π1X , VX , W|VX )/ log n). [sent-412, score-0.138]
86 Under the constraints ∗ ∗ ∗ |ρπ1X (u) − ρπX (u)| ≤ dfoot (π1X , πX ) and |ρπ1X (u) − ρπX (u)| = O(εNX / log n), ∗ ∗ this is O(dfoot (π1X , πX )εNX /(NX log n)) = O(dfoot (π1X , πX )ε/ log n). [sent-416, score-0.473]
87 3) and the triangle inequality, the bound becomes O(εC(π1X , VX , W|VX )/ log n). [sent-418, score-0.138]
88 short u∈VX 4 Future Work We presented a statistical learning theoretical active learning result for pairwise ranking. [sent-427, score-0.188]
89 The main vehicle was a query (and time) efficient decomposition procedure, reducing the problem to smaller ones in which the optimal loss is high and uniform sampling suffices. [sent-428, score-0.218]
90 A typical example of such a subspace is the case in which each element v ∈ V has a corresponding feature vector in a real vector space, and we only seek permutations induced by linear score functions. [sent-430, score-0.177]
91 In followup work, Ailon, Begleiter and Ezra [1] show a novel technique achieving a slightly better query complexity than here with a simpler proof, while also admitting search in restricted spaces. [sent-431, score-0.247]
92 7 This also bounds the number of times a sample ensemble is created by O(n4 ), as required by E1 . [sent-435, score-0.128]
93 8 References [1] Nir Ailon, Ron Begleiter, and Esther Ezra, A new active learning scheme with applications to learning to rank from pairwise preferences, arxiv. [sent-436, score-0.233]
94 [4] Nir Ailon and Kira Radinsky, Ranking from pairs and triplets: Information quality, evaluation methods and query complexity, WSDM, 2011. [sent-445, score-0.169]
95 [8] Maria-Florina Balcan, Steve Hanneke, and Jennifer Vaughan, The true sample complexity of active learning, Machine Learning 80 (2010), 111–139. [sent-467, score-0.167]
96 Dasgupta, Coarse sample complexity bounds for active learning, Advances in Neural Information Processing Systems 18, 2005, pp. [sent-498, score-0.203]
97 [17] Sanjoy Dasgupta, Daniel Hsu, and Claire Monteleoni, A general agnostic active learning algorithm, NIPS, 2007. [sent-504, score-0.139]
98 Sebastian Seung, Eli Shamir, and Naftali Tishby, Selective sampling using the query by committee algorithm, Mach. [sent-518, score-0.169]
99 [21] Steve Hanneke, A bound on the label complexity of agnostic active learning, ICML, 2007, pp. [sent-522, score-0.176]
100 [22] Eyke H¨ llermeier, Johannes F¨ rnkranz, Weiwei Cheng, and Klaus Brinker, Label ranking by u u learning pairwise preferences, Artif. [sent-524, score-0.161]
wordName wordTfidf (topN-words)
[('vx', 0.499), ('approxlocalimprove', 0.247), ('vi', 0.24), ('dfoot', 0.206), ('sampleanddecompose', 0.206), ('preference', 0.187), ('mfast', 0.185), ('query', 0.169), ('vk', 0.15), ('testmove', 0.144), ('nx', 0.133), ('sampleandrank', 0.123), ('testmovee', 0.123), ('permutations', 0.123), ('ptas', 0.116), ('ailon', 0.108), ('schudy', 0.103), ('tournaments', 0.103), ('pairwise', 0.101), ('ce', 0.097), ('log', 0.089), ('active', 0.087), ('listwise', 0.082), ('vl', 0.081), ('elements', 0.073), ('vr', 0.071), ('nir', 0.068), ('permutation', 0.066), ('repetitions', 0.065), ('preferences', 0.064), ('fix', 0.062), ('begleiter', 0.062), ('kenyon', 0.062), ('ltr', 0.062), ('kx', 0.061), ('ranking', 0.06), ('tx', 0.06), ('ni', 0.058), ('labels', 0.056), ('big', 0.055), ('element', 0.054), ('agnostic', 0.052), ('multiset', 0.05), ('preferred', 0.049), ('decomposition', 0.049), ('vc', 0.049), ('triangle', 0.049), ('omitted', 0.049), ('ensemble', 0.049), ('transitive', 0.047), ('rank', 0.045), ('cost', 0.044), ('sample', 0.043), ('position', 0.043), ('ei', 0.043), ('lemma', 0.043), ('recursion', 0.042), ('line', 0.041), ('cei', 0.041), ('disagreeing', 0.041), ('ezra', 0.041), ('followup', 0.041), ('footrule', 0.041), ('quicksort', 0.041), ('refresh', 0.041), ('testmoveeu', 0.041), ('balcan', 0.041), ('beygelzimer', 0.041), ('regret', 0.038), ('proposition', 0.038), ('hence', 0.038), ('poly', 0.038), ('querying', 0.038), ('hoeffding', 0.037), ('unordered', 0.037), ('complexity', 0.037), ('breakthroughs', 0.036), ('warren', 0.036), ('disagreements', 0.036), ('claire', 0.036), ('executions', 0.036), ('monteleoni', 0.036), ('scratch', 0.036), ('bounds', 0.036), ('integer', 0.035), ('uniformly', 0.034), ('ron', 0.033), ('lemmata', 0.033), ('spearman', 0.033), ('restriction', 0.033), ('calls', 0.032), ('dasgupta', 0.032), ('arguments', 0.032), ('nition', 0.032), ('incurred', 0.032), ('denote', 0.031), ('chaos', 0.031), ('sorting', 0.031), ('cross', 0.031), ('jk', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
2 0.13615523 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
3 0.12135402 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
4 0.11821204 226 nips-2011-Projection onto A Nonnegative Max-Heap
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
5 0.10713975 260 nips-2011-Sparse Features for PCA-Like Linear Regression
Author: Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail
Abstract: Principal Components Analysis (PCA) is often used as a feature extraction procedure. Given a matrix X ∈ Rn×d , whose rows represent n data points with respect to d features, the top k right singular vectors of X (the so-called eigenfeatures), are arbitrary linear combinations of all available features. The eigenfeatures are very useful in data analysis, including the regularization of linear regression. Enforcing sparsity on the eigenfeatures, i.e., forcing them to be linear combinations of only a small number of actual features (as opposed to all available features), can promote better generalization error and improve the interpretability of the eigenfeatures. We present deterministic and randomized algorithms that construct such sparse eigenfeatures while provably achieving in-sample performance comparable to regularized linear regression. Our algorithms are relatively simple and practically efficient, and we demonstrate their performance on several data sets.
6 0.10576691 231 nips-2011-Randomized Algorithms for Comparison-based Search
7 0.087226868 162 nips-2011-Lower Bounds for Passive and Active Learning
8 0.079153448 256 nips-2011-Solving Decision Problems with Limited Information
9 0.077261284 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
10 0.072375439 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time
11 0.067336664 90 nips-2011-Evaluating the inverse decision-making approach to preference learning
12 0.060873751 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
13 0.056303784 305 nips-2011-k-NN Regression Adapts to Local Intrinsic Dimension
14 0.054532804 28 nips-2011-Agnostic Selective Classification
15 0.054147694 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
16 0.051921554 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits
17 0.051800869 29 nips-2011-Algorithms and hardness results for parallel large margin learning
18 0.050072622 198 nips-2011-On U-processes and clustering performance
19 0.049822845 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
20 0.049521592 21 nips-2011-Active Learning with a Drifting Distribution
topicId topicWeight
[(0, 0.163), (1, -0.043), (2, -0.061), (3, -0.032), (4, 0.034), (5, 0.007), (6, -0.056), (7, -0.057), (8, 0.009), (9, -0.029), (10, 0.007), (11, 0.043), (12, -0.012), (13, -0.002), (14, 0.201), (15, -0.036), (16, 0.073), (17, -0.01), (18, -0.053), (19, 0.021), (20, 0.062), (21, -0.017), (22, 0.093), (23, -0.057), (24, -0.168), (25, 0.065), (26, 0.14), (27, 0.022), (28, -0.024), (29, -0.138), (30, 0.138), (31, -0.095), (32, 0.004), (33, 0.036), (34, -0.019), (35, 0.082), (36, -0.097), (37, -0.02), (38, -0.003), (39, 0.014), (40, -0.031), (41, -0.046), (42, 0.007), (43, -0.048), (44, 0.061), (45, -0.002), (46, 0.097), (47, -0.06), (48, 0.017), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.933092 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
2 0.79344726 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
3 0.69287366 231 nips-2011-Randomized Algorithms for Comparison-based Search
Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer
Abstract: This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which defines approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.
4 0.65089649 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
5 0.53490925 162 nips-2011-Lower Bounds for Passive and Active Learning
Author: Maxim Raginsky, Alexander Rakhlin
Abstract: We develop unified information-theoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the so-called Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient.” For passive learning, our lower bounds match the upper bounds of Gin´ and Koltchinskii up to constants and generalize analogous results of Mase sart and N´ d´ lec. For active learning, we provide first known lower bounds based e e on the capacity function rather than the disagreement coefficient. 1
6 0.47990012 272 nips-2011-Stochastic convex optimization with bandit feedback
7 0.43450415 226 nips-2011-Projection onto A Nonnegative Max-Heap
8 0.43227106 256 nips-2011-Solving Decision Problems with Limited Information
9 0.42622724 64 nips-2011-Convergent Bounds on the Euclidean Distance
10 0.41537279 95 nips-2011-Fast and Accurate k-means For Large Datasets
11 0.41457152 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
12 0.41406506 21 nips-2011-Active Learning with a Drifting Distribution
13 0.4105241 260 nips-2011-Sparse Features for PCA-Like Linear Regression
14 0.39980125 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
15 0.39599651 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
16 0.38589668 198 nips-2011-On U-processes and clustering performance
17 0.38186616 90 nips-2011-Evaluating the inverse decision-making approach to preference learning
18 0.38118008 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension
19 0.34358338 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
20 0.3387326 177 nips-2011-Multi-armed bandits on implicit metric spaces
topicId topicWeight
[(0, 0.031), (4, 0.044), (20, 0.043), (26, 0.493), (31, 0.043), (33, 0.016), (43, 0.045), (45, 0.082), (56, 0.011), (57, 0.026), (74, 0.036), (83, 0.019), (99, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.91286206 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
2 0.87353075 162 nips-2011-Lower Bounds for Passive and Active Learning
Author: Maxim Raginsky, Alexander Rakhlin
Abstract: We develop unified information-theoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the so-called Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient.” For passive learning, our lower bounds match the upper bounds of Gin´ and Koltchinskii up to constants and generalize analogous results of Mase sart and N´ d´ lec. For active learning, we provide first known lower bounds based e e on the capacity function rather than the disagreement coefficient. 1
3 0.83575904 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
4 0.80345869 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
5 0.79419827 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1
6 0.60428208 22 nips-2011-Active Ranking using Pairwise Comparisons
7 0.53087115 21 nips-2011-Active Learning with a Drifting Distribution
8 0.49690348 29 nips-2011-Algorithms and hardness results for parallel large margin learning
9 0.49634781 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
10 0.4907209 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
11 0.48177686 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits
12 0.47709978 226 nips-2011-Projection onto A Nonnegative Max-Heap
13 0.47196135 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
14 0.47170749 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
15 0.46778175 231 nips-2011-Randomized Algorithms for Comparison-based Search
16 0.46517858 294 nips-2011-Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning
17 0.46302411 222 nips-2011-Prismatic Algorithm for Discrete D.C. Programming Problem
18 0.44947681 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
19 0.44889832 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
20 0.44275013 256 nips-2011-Solving Decision Problems with Limited Information