nips nips2008 nips2008-168 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon, Kristen Grauman
Abstract: Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efficient updates to data structures used for fast approximate similarity search. We demonstrate our algorithm on multiple datasets and show that it outperforms relevant baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. [sent-4, score-0.3]
2 However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. [sent-5, score-0.364]
3 Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. [sent-6, score-0.242]
4 We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. [sent-7, score-0.771]
5 We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. [sent-8, score-0.44]
6 To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efficient updates to data structures used for fast approximate similarity search. [sent-9, score-0.527]
7 1 Introduction A number of recent techniques address the problem of metric learning, in which a distance function between data objects is learned based on given (or inferred) similarity constraints between examples [4, 7, 11, 16, 5, 15]. [sent-11, score-0.428]
8 Most successful results have relied on having access to all constraints at the onset of the metric learning. [sent-13, score-0.281]
9 For instance, in image search applications on the internet, online click-through data that is continually collected may impact the desired distance function. [sent-15, score-0.417]
10 To address this need, recent work on online metric learning algorithms attempts to handle constraints that are received one at a time [13, 4]. [sent-16, score-0.475]
11 Thus a good online metric learner must also be able to support fast similarity search routines. [sent-20, score-0.576]
12 , locality-sensitive hashing [6, 1] or kd-trees) assume a static distance function, and are expensive to update when the underlying distance function changes. [sent-23, score-0.349]
13 1 The goal of this work is to make metric learning practical for real-world learning tasks in which both constraints and queries must be handled efficiently in an online manner. [sent-24, score-0.515]
14 To that end, we first develop an online metric learning algorithm that uses LogDet regularization and exact gradient descent. [sent-25, score-0.496]
15 The new algorithm is inspired by the metric learning algorithm studied in [4]; however, while the loss bounds for the latter method are dependent on the input data, our loss bounds are independent of the sequence of constraints given to the algorithm. [sent-26, score-0.42]
16 We devise a method to incrementally update locality-sensitive hash keys during the updates of the metric learner, making it possible to perform accurate sub-linear time nearest neighbor searches over the data in an online manner. [sent-29, score-1.045]
17 We show that our method outperforms existing approaches, and even performs comparably to several offline metric learning algorithms. [sent-31, score-0.251]
18 To evaluate our approach for indexing a large-scale database, we include experiments with a set of 300,000 image patches; our online algorithm effectively learns to compare patches, and our hashing construction allows accurate fast retrieval for online queries. [sent-32, score-0.639]
19 The information-theoretic metric learning method of [4] includes an online variant that avoids eigenvector decomposition. [sent-37, score-0.426]
20 However, because of the particular form of the online update, positive-definiteness still must be carefully enforced, which impacts bound quality and empirical performance, making it undesirable for both theoretical and practical purposes. [sent-38, score-0.245]
21 There are a number of existing online algorithms for other machine learning problems outside of metric learning, e. [sent-40, score-0.44]
22 Locality-sensitive hashing [6] is an effective technique that performs approximate nearest neighbor searches in time that is sub-linear in the size of the database. [sent-44, score-0.26]
23 Most existing work has considered hash functions for Lp norms [3], inner product similarity [1], and other standard distances. [sent-45, score-0.457]
24 While recent work has shown how to generate hash functions for (offline) learned Mahalanobis metrics [9], we are not aware of any existing technique that allows incremental updates to locality-sensitive hash keys for online database maintenance, as we propose in this work. [sent-46, score-1.116]
25 2 Online Metric Learning In this section we introduce our model for online metric learning, develop an efficient algorithm to implement it, and prove regret bounds. [sent-47, score-0.445]
26 1 Formulation and Algorithm As in several existing metric learning methods, we restrict ourselves to learning a Mahalanobis distance function over our input data, which is a distance function parameterized by a d × d positive definite matrix A. [sent-49, score-0.417]
27 In general, one learns a Mahalanobis distance by learning the appropriate positive definite matrix A based on constraints over the distance function. [sent-53, score-0.259]
28 These constraints are typically distance or similarity constraints that arise from supervised information—for example, the distance between two points in the same class should be “small”. [sent-54, score-0.367]
29 In contrast to offline approaches, which assume all constraints 2 are provided up front, online algorithms assume that constraints are received one at a time. [sent-55, score-0.351]
30 A constraint is received, encoded by the triple (ut , vt , yt ), where yt is the target distance between ut and vt (we restrict ourselves to distance constraints, though other constraints are possible). [sent-57, score-1.007]
31 Using At , we first predict the distance yt = dAt (ut , vt ) using our current distance function, and incur a ˆ loss ℓ(ˆt , yt ). [sent-58, score-0.754]
32 One common choice is the squared loss: ℓ(ˆt , yt ) = 1 (ˆt − yt )2 . [sent-63, score-0.459]
33 We also consider a variant of the model where the input is a quadruple y y 2 (ut , vt , yt , bt ), where bt = 1 if we require that the distance between ut and vt be less than or equal to yt , and bt = −1 if we require that the distance between ut and vt be greater than or equal to yt . [sent-64, score-1.562]
34 In that case, the corresponding loss function is ℓ(ˆt , yt , bt ) = max(0, 1 bt (ˆt − yt ))2 . [sent-65, score-0.588]
35 y y 2 A typical approach [10, 4, 13] for the above given online learning problem is to solve for At+1 by minimizing a regularized loss at each step: At+1 = argmin D(A, At ) + ηℓ(dA (ut , vt ), yt ), (2. [sent-66, score-0.587]
36 ℓ′ (dA (ut , vt ), yt ) is approximated by ℓ′ (dAt (ut , vt ), yt ) [10, 13, 4]. [sent-74, score-0.648]
37 Since we use the exact gradient, our analysis will become more involved; however, the resulting algorithm will have several advantages over existing methods for online metric learning. [sent-81, score-0.44]
38 1) is: At+1 = At − T η(¯ − yt )At zt zt At y , TA z 1 + η(¯ − yt )zt t t y (2. [sent-83, score-0.836]
39 2) T where zt = ut − vt and y = dAt+1 (ut , vt ) = zt At+1 zt . [sent-84, score-0.932]
40 2) on the left by zt and on the right by zt and T noting that yt = zt At zt , we obtain the following: ˆ y = yt − ¯ ˆ η(¯ − yt )ˆt y y2 ηyt yt − 1 + ˆ , and so y = ¯ 1 + η(¯ − yt )ˆt y y (ηyt yt − 1)2 + 4η yt ˆ ˆ2 . [sent-88, score-2.332]
41 The final loss bound in [4] depends on the regularization parameter ηt from each iteration and is in turn dependent on the sequence of constraints, an undesirable property for online algorithms. [sent-96, score-0.362]
42 2 Analysis We now briefly analyze the regret bounds for our online metric learning algorithm. [sent-103, score-0.474]
43 To evaluate the online learner’s quality, we want to compare the loss of the online algorithm (which has access to one constraint at a time in sequence) to the loss of the best possible offline algorithm ˆ (which has access to all constraints at once). [sent-105, score-0.633]
44 Let dt = dA∗ (ut , vt ) be the learned distance between ˆ points ut and vt with a fixed positive definite matrix A∗ , and let LA∗ = t ℓ(dt , yt ) be the loss suffered over all t time steps. [sent-106, score-0.735]
45 The goal is to demonstrate that the loss of the online algorithm LA is competitive with the loss of any offline algorithm. [sent-112, score-0.313]
46 At each step t, 1 1 αt (ˆt − yt )2 − βt (dA∗ (ut , vt ) − yt )2 ≤ Dld (A∗ , At ) − Dld (A∗ , At+1 ), y 2 2 where 0 ≤ αt ≤ 1+η R 2 + η q 2 R2 4 , βt = η, and A∗ is the optimal offline solution. [sent-117, score-0.544]
47 LA ≤ 1+η R + 2 R2 1 + 4 η 2 L A∗ + R + 2 1 + η R2 1 + 4 η 2 Dld (A∗ , A0 ), where LA = y t ℓ(ˆt , yt ) is the loss incurred by the series of matrices At generated by Equation (2. [sent-122, score-0.27]
48 1: t 1 1 αt (ˆt − yt )2 − βt (dA∗ (ut , vt ) − yt )2 y 2 2 ≤ t Dld (A∗ , At ) − Dld (A∗ , At+1 ) . [sent-126, score-0.544]
49 For the squared hinge loss ℓ(ˆt , yt , bt ) = max(0, bt (ˆt − yt ))2 , the corresponding algorithm has the y y same bound. [sent-128, score-0.607]
50 An adversary can always provide an input (ut , vt , yt ) so that the regularization 4 parameter has to be decreased arbitrarily; that is, the need to maintain positive defininteness for each update can prevent ITML-Online from making progress towards an optimal metric. [sent-137, score-0.494]
51 In summary, we have proven a regret bound for the proposed LEGO algorithm, an online metric learning algorithm based on LogDet regularization and gradient descent. [sent-138, score-0.567]
52 3 Fast Online Similarity Searches In many applications, metric learning is used in conjunction with nearest-neighbor searching, and data structures to facilitate such searches are essential. [sent-141, score-0.255]
53 For online metric learning to be practical for large-scale retrieval applications, we must be able to efficiently index the data as updates to the metric are performed. [sent-142, score-0.69]
54 This poses a problem for most fast similarity searching algorithms, since each update to the online algorithm would require a costly update to their data structures. [sent-143, score-0.41]
55 We employ locality-sensitive hashing to enable fast queries; but rather than re-hash all database examples every time an online constraint alters the metric, we show how to incorporate a second level of hashing that determines which hash bits are changing during the metric learning updates. [sent-145, score-1.129]
56 This allows us to avoid costly exhaustive updates to the hash keys, though occasional updating is required after substantial changes to the metric are accumulated. [sent-146, score-0.608]
57 1 Background: Locality-Sensitive Hashing Locality-sensitive hashing (LSH) [6, 1] produces a binary hash key H(u) = [h1 (u)h2 (u). [sent-148, score-0.468]
58 Each individual bit hi (u) is obtained by applying the locality sensitive hash function hi to input u. [sent-152, score-0.404]
59 To allow sub-linear time approximate similarity search for a similarity function ‘sim’, a locality-sensitive hash function must satisfy the following property: P r[hi (u) = hi (v)] = sim(u, v), where ‘sim’ returns values between 0 and 1. [sent-153, score-0.542]
60 This means that the more similar examples are, the more likely they are to collide in the hash table. [sent-154, score-0.331]
61 A LSH function when ‘sim’ is the inner product was developed in [1], in which a hash bit is the sign of an input’s inner product with a random hyperplane. [sent-155, score-0.407]
62 The hash function in [1] was extended to accommodate a Mahalanobis similarity function in [9]: A can be decomposed as GT G, and the similarity function ˜ ˜ ˜ ˜ is then equivalently uT v , where u = Gu and v = Gv. [sent-157, score-0.469]
63 To perform sub-linear time nearest neighbor searches, a hash key is produced for all n data points in our database. [sent-160, score-0.412]
64 Given a query, its hash key is formed and then, an appropriate data structure can be used to extract potential nearest neighbors (see [6, 1] for details). [sent-161, score-0.415]
65 , hrb ,A , and storing the corresponding hash keys for each data point in our database. [sent-168, score-0.399]
66 1) are dependent on the Mahalanobis distance; when we update our matrix At to At+1 , the corresponding hash functions, parameterized by Gt , must also change. [sent-170, score-0.383]
67 To update all hash keys in the database would require O(nd) time, which may be prohibitive. [sent-171, score-0.49]
68 η(¯−y )A z z T A y t t t t Recall the update for A: At+1 = At − 1+η(¯−yt )ˆt t , which we will write as At+1 = At + y y T βt At zt zt At , where βt = −η(¯ − yt )/(1 + η(¯ − yt )ˆt ). [sent-173, score-0.888]
69 Then At+1 = y y y t T T T GT (I + βt Gt zt zt GT )Gt . [sent-175, score-0.396]
70 The square-root of I + βt Gt zt zt GT is I + αt Gt zt zt GT , where t t t t T T T αt = ( 1 + βt zt At zt −1)/(zt At zt ). [sent-176, score-1.386]
71 1) is to find the sign of T r T Gt+1 x = r T Gt u + αt r T Gt zt zt At u. [sent-179, score-0.426]
72 2) 5 Suppose that the hash functions have been updated in full at some time step t1 in the past. [sent-181, score-0.331]
73 Now at time t, we want to determine which hash bits have flipped since t1 , or more precisely, which examples’ product with some r T Gt has changed from positive to negative, or vice versa. [sent-182, score-0.465]
74 In summary, when performing online metric learning updates, instead of updating all the hash keys at every step (which costs O(nd)), we delay updating the hash keys and instead determine approximately which bits have changed in the stored entries in the hash table since the last update. [sent-193, score-1.677]
75 Once many of the bits have changed, we perform a full update to our hash functions. [sent-195, score-0.438]
76 4 Experimental Results In this section we evaluate the proposed algorithm (LEGO) over a variety of data sets, and examine both its online metric learning accuracy as well as the quality of its online similarity search updates. [sent-198, score-0.755]
77 As baselines, we consider the most relevant techniques from the literature: the online metric learners POLA [13] and ITML-Online [4]. [sent-199, score-0.432]
78 We also evaluate a baseline offline metric learner associated with our method. [sent-200, score-0.262]
79 The final metric (AT ) obtained by each online algorithm is used for testing (T is the total number of time-steps). [sent-206, score-0.406]
80 LEGO outperforms the Euclidean baseline as well as the other online learners, and even approaches the accuracy of the offline method (see [4] for additional comparable offline learning results using [7, 15]). [sent-208, score-0.314]
81 For each problem, 1,000 constraints are chosen and the final metric obtained is used for testing. [sent-216, score-0.262]
82 25 Figure 1: Comparison with existing online metric learning methods. [sent-261, score-0.44]
83 Left: On the UCI data sets, our method (LEGO) outperforms both the Euclidean distance baseline as well as existing metric learning methods, and even approaches the accuracy of the offline algorithm. [sent-262, score-0.39]
84 Right: On the Photo Tourism data, our online algorithm significantly outperforms the L2 baseline and ITML-Online, does well relative to POLA, and nearly matches the accuracy of the offline method. [sent-264, score-0.314]
85 The goal is to learn a metric that measures the distance between image patches better than L2 , so that patches of the same 3-d scene point will be matched together, and (ideally) others will not. [sent-267, score-0.429]
86 Since the database is large, we can also use it to demonstrate our online hash table updates. [sent-268, score-0.583]
87 We attribute the poor accuracy of ITML-Online to its need to continually adjust the regularization parameter to maintain positive definiteness; in practice, this often leads to significant drops in the regularization parameter, which prevents the method from improving over the Euclidean baseline. [sent-274, score-0.258]
88 Finally, we present results using our online metric learning algorithm together with our online hash table updates described in Section 3. [sent-277, score-1.015]
89 The results show that the accuracy achieved by our LEGO+LSH algorithm is comparable to the LEGO+linear scan (and similarly, L2 +LSH is comparable to L2 +linear scan), thus validating the effectiveness of our online hashing scheme. [sent-281, score-0.501]
90 Such a scenario is useful in many applications: for example, an image retrieval system such as Flickr continually acquires new image tags from users (which could be mapped to similarity constraints), but must also continually support intermittent user queries. [sent-285, score-0.27]
91 For the Photo Tourism setting, it would be useful in practice to allow new constraints indicating true-match patch pairs to stream in while users continually add photos that should participate in new 3-d reconstructions with the improved match distance functions. [sent-286, score-0.287]
92 To experiment with this scenario, we randomly mix online additions of 50,000 constraints with 10,000 queries, and measure performance by the recall value for 300 retrieved nearest neighbor examples. [sent-287, score-0.405]
93 Our method achieves similar accuracy to both the linear scan method (LEGO Linear) as well as the naive LSH method where the hash table is fully recomputed after every constraint update (LEGO Naive LSH). [sent-291, score-0.533]
94 62 0 1000 2000 4000 6000 Number of queries 8000 10000 Figure 2: Results with online hashing updates. [sent-314, score-0.39]
95 ‘LEGO LSH’ denotes LEGO metric learning in conjunction with online searches using our LSH updates, ‘LEGO Linear’ denotes LEGO learning with linear scan searches. [sent-316, score-0.562]
96 Our online similarity search updates make it possible to efficiently interleave online learning and querying. [sent-319, score-0.606]
97 (In this case the naive LSH scheme is actually slower than a linear scan, as updating the hash tables after every update incurs a large overhead cost. [sent-324, score-0.456]
98 Conclusions: We have developed an online metric learning algorithm together with a method to perform online updates to fast similarity search structures, and have demonstrated their applicability and advantages on a variety of data sets. [sent-326, score-0.823]
99 We have proven regret bounds for our online learner that offer improved reliability over state-of-the-art methods in terms of regret bounds, and empirical performance. [sent-327, score-0.351]
100 For future work, we hope to tune the LSH parameters automatically using a deeper theoretical analysis of our hash key updates in conjunction with the relevant statistics of the online similarity search task at hand. [sent-331, score-0.744]
wordName wordTfidf (topN-words)
[('lego', 0.483), ('hash', 0.331), ('pola', 0.287), ('lsh', 0.251), ('gt', 0.225), ('yt', 0.22), ('online', 0.213), ('zt', 0.198), ('dld', 0.196), ('metric', 0.193), ('hashing', 0.137), ('ut', 0.13), ('mahalanobis', 0.114), ('ine', 0.112), ('la', 0.107), ('vt', 0.104), ('scan', 0.094), ('niteness', 0.088), ('distance', 0.08), ('constraints', 0.069), ('photo', 0.069), ('logdet', 0.069), ('similarity', 0.069), ('keys', 0.068), ('regularization', 0.067), ('itml', 0.065), ('tourism', 0.065), ('updates', 0.065), ('patches', 0.065), ('bits', 0.055), ('continually', 0.052), ('update', 0.052), ('loss', 0.05), ('changed', 0.049), ('bt', 0.049), ('sim', 0.049), ('nearest', 0.049), ('da', 0.047), ('search', 0.046), ('searches', 0.042), ('euclidean', 0.042), ('queries', 0.04), ('regret', 0.039), ('database', 0.039), ('baseline', 0.038), ('patch', 0.038), ('naive', 0.035), ('neighbors', 0.035), ('kulis', 0.034), ('existing', 0.034), ('bound', 0.032), ('neighbor', 0.032), ('learner', 0.031), ('positive', 0.03), ('sign', 0.03), ('dat', 0.029), ('bounds', 0.029), ('jain', 0.028), ('hi', 0.027), ('retrieval', 0.026), ('learners', 0.026), ('image', 0.026), ('reconstructions', 0.025), ('recall', 0.024), ('fast', 0.024), ('plot', 0.024), ('outperforms', 0.024), ('gradient', 0.023), ('inner', 0.023), ('dame', 0.023), ('notre', 0.023), ('flickr', 0.023), ('photos', 0.023), ('quadruple', 0.023), ('mnist', 0.022), ('accuracy', 0.021), ('maintain', 0.021), ('offline', 0.021), ('indyk', 0.021), ('query', 0.02), ('percentile', 0.02), ('austin', 0.02), ('eigenvector', 0.02), ('conjunction', 0.02), ('access', 0.019), ('updating', 0.019), ('scheme', 0.019), ('squared', 0.019), ('user', 0.019), ('gu', 0.019), ('locality', 0.019), ('distances', 0.018), ('comparable', 0.018), ('uci', 0.018), ('retrieved', 0.018), ('matches', 0.018), ('metrics', 0.018), ('divergence', 0.017), ('learned', 0.017), ('nsf', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 168 nips-2008-Online Metric Learning and Fast Similarity Search
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon, Kristen Grauman
Abstract: Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efficient updates to data structures used for fast approximate similarity search. We demonstrate our algorithm on multiple datasets and show that it outperforms relevant baselines.
2 0.18982843 219 nips-2008-Spectral Hashing
Author: Yair Weiss, Antonio Torralba, Rob Fergus
Abstract: Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art. 1
3 0.17595372 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization
Author: Shai Shalev-shwartz, Sham M. Kakade
Abstract: We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for Follow-The-Leader and for the gradient descent algorithm proposed in Hazan et al. [2006]. We then show that one can interpolate between these two extreme cases. In particular, we derive a new algorithm that shares the computational simplicity of gradient descent but achieves lower regret in many practical situations. Finally, we further extend our framework for generalized strongly convex functions. 1
4 0.12560901 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
Author: Sham M. Kakade, Ambuj Tewari
Abstract: This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound, that holds with high probability, on the excess risk of the output of an online algorithm in terms of the average regret. This allows one to use recent algorithms with logarithmic cumulative regret guarantees to achieve fast convergence rates for the excess risk with high probability. As a corollary, we characterize the convergence rate of P EGASOS (with high probability), a recently proposed method for solving the SVM optimization problem. 1
5 0.12305904 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems
Author: Alan S. Willsky, Erik B. Sudderth, Michael I. Jordan, Emily B. Fox
Abstract: Many nonlinear dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our nonparametric Bayesian approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, and the IBOVESPA stock index.
6 0.12020868 112 nips-2008-Kernel Measures of Independence for non-iid Data
7 0.10931841 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging
8 0.10333303 184 nips-2008-Predictive Indexing for Fast Search
9 0.099137828 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions
10 0.097906455 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces
11 0.091701619 214 nips-2008-Sparse Online Learning via Truncated Gradient
12 0.090911955 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
13 0.083525039 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
14 0.076976165 22 nips-2008-An Online Algorithm for Maximizing Submodular Functions
15 0.066764437 15 nips-2008-Adaptive Martingale Boosting
16 0.066415757 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation
17 0.066348918 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
18 0.064639002 62 nips-2008-Differentiable Sparse Coding
19 0.063390568 171 nips-2008-Online Prediction on Large Diameter Graphs
20 0.063303262 224 nips-2008-Structured ranking learning using cumulative distribution networks
topicId topicWeight
[(0, -0.17), (1, 0.013), (2, -0.122), (3, -0.034), (4, -0.129), (5, 0.15), (6, -0.012), (7, 0.033), (8, 0.098), (9, -0.112), (10, -0.054), (11, 0.038), (12, -0.046), (13, 0.129), (14, 0.104), (15, -0.162), (16, -0.131), (17, 0.008), (18, 0.052), (19, 0.026), (20, -0.046), (21, 0.073), (22, 0.066), (23, 0.078), (24, -0.005), (25, 0.068), (26, 0.045), (27, 0.036), (28, 0.003), (29, 0.025), (30, -0.085), (31, 0.035), (32, 0.035), (33, -0.091), (34, -0.003), (35, -0.069), (36, 0.132), (37, -0.027), (38, -0.047), (39, -0.103), (40, -0.027), (41, 0.13), (42, -0.118), (43, -0.101), (44, -0.028), (45, 0.008), (46, -0.09), (47, -0.041), (48, 0.058), (49, 0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.93549162 168 nips-2008-Online Metric Learning and Fast Similarity Search
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon, Kristen Grauman
Abstract: Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efficient updates to data structures used for fast approximate similarity search. We demonstrate our algorithm on multiple datasets and show that it outperforms relevant baselines.
2 0.58677572 219 nips-2008-Spectral Hashing
Author: Yair Weiss, Antonio Torralba, Rob Fergus
Abstract: Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art. 1
3 0.5206899 22 nips-2008-An Online Algorithm for Maximizing Submodular Functions
Author: Matthew Streeter, Daniel Golovin
Abstract: We present an algorithm for solving a broad class of online resource allocation problems. Our online algorithm can be applied in environments where abstract jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v, τ ), where τ is the time invested in activity v. Under this assumption, our online algorithm performs near-optimally according to two natural metrics: (i) the fraction of jobs completed within time T , for some fixed deadline T > 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1
4 0.51756245 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
Author: Sham M. Kakade, Ambuj Tewari
Abstract: This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound, that holds with high probability, on the excess risk of the output of an online algorithm in terms of the average regret. This allows one to use recent algorithms with logarithmic cumulative regret guarantees to achieve fast convergence rates for the excess risk with high probability. As a corollary, we characterize the convergence rate of P EGASOS (with high probability), a recently proposed method for solving the SVM optimization problem. 1
5 0.50533706 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging
Author: Ofer Dekel
Abstract: We present cutoff averaging, a technique for converting any conservative online learning algorithm into a batch learning algorithm. Most online-to-batch conversion techniques work well with certain types of online learning algorithms and not with others, whereas cutoff averaging explicitly tries to adapt to the characteristics of the online algorithm being converted. An attractive property of our technique is that it preserves the efficiency of the original online algorithm, making it appropriate for large-scale learning problems. We provide a statistical analysis of our technique and back our theoretical claims with experimental results. 1
6 0.48730472 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems
7 0.48512554 169 nips-2008-Online Models for Content Optimization
8 0.48121569 214 nips-2008-Sparse Online Learning via Truncated Gradient
9 0.46756846 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization
10 0.43570137 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions
11 0.39615372 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning
12 0.38727072 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
13 0.36574763 112 nips-2008-Kernel Measures of Independence for non-iid Data
14 0.35745192 15 nips-2008-Adaptive Martingale Boosting
15 0.35697225 73 nips-2008-Estimating Robust Query Models with Convex Optimization
16 0.35370106 184 nips-2008-Predictive Indexing for Fast Search
17 0.34511486 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces
18 0.34476733 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
19 0.2980434 167 nips-2008-One sketch for all: Theory and Application of Conditional Random Sampling
20 0.29475719 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance
topicId topicWeight
[(6, 0.083), (7, 0.051), (12, 0.04), (28, 0.174), (50, 0.265), (57, 0.06), (59, 0.014), (63, 0.033), (71, 0.017), (73, 0.015), (77, 0.037), (83, 0.079), (89, 0.021)]
simIndex simValue paperId paperTitle
1 0.83486551 163 nips-2008-On the Efficient Minimization of Classification Calibrated Surrogates
Author: Richard Nock, Frank Nielsen
Abstract: Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates. In this paper, we propose an algorithm which provably minimizes any classification calibrated surrogate strictly convex and differentiable — a set whose losses span the exponential, logistic and squared losses —, with boosting-type guaranteed convergence rates under a weak learning assumption. A particular subclass of these surrogates, that we call balanced convex surrogates, has a key rationale that ties it to maximum likelihood estimation, zerosum games and the set of losses that satisfy some of the most common requirements for losses in supervised learning. We report experiments on more than 50 readily available domains of 11 flavors of the algorithm, that shed light on new surrogates, and the potential of data dependent strategies to tune surrogates. 1
same-paper 2 0.77332252 168 nips-2008-Online Metric Learning and Fast Similarity Search
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon, Kristen Grauman
Abstract: Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efficient updates to data structures used for fast approximate similarity search. We demonstrate our algorithm on multiple datasets and show that it outperforms relevant baselines.
3 0.64611727 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
Author: Tong Zhang
Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
4 0.64430958 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
5 0.64300591 194 nips-2008-Regularized Learning with Networks of Features
Author: Ted Sandler, John Blitzer, Partha P. Talukdar, Lyle H. Ungar
Abstract: For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, and when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Here we present a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights. The prior knowledge is encoded as a network whose vertices are features and whose edges represent similarities and dissimilarities between them. During learning, each feature’s weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using networks of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature networks constructed from declarative human knowledge significantly improve prediction accuracy. 1
6 0.64260471 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
7 0.64145362 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
8 0.64134341 202 nips-2008-Robust Regression and Lasso
9 0.64093083 62 nips-2008-Differentiable Sparse Coding
10 0.64034158 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
11 0.63888228 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
12 0.63878173 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
13 0.63853973 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
14 0.63851255 196 nips-2008-Relative Margin Machines
15 0.63817221 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
16 0.63691348 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
17 0.63635224 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
18 0.6361326 143 nips-2008-Multi-label Multiple Kernel Learning
19 0.6359455 239 nips-2008-Tighter Bounds for Structured Estimation
20 0.63593364 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition