nips nips2009 nips2009-74 knowledge-graph by maker-knowledge-mining

74 nips-2009-Efficient Bregman Range Search


Source: pdf

Author: Lawrence Cayton

Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. [sent-3, score-0.366]

2 The range search task is to return all points in a potentially large database that are within some specified distance of a query. [sent-4, score-0.352]

3 In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. [sent-6, score-0.226]

4 Here we describe an algorithm for range search for an arbitrary Bregman divergence. [sent-7, score-0.273]

5 Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. [sent-9, score-0.384]

6 We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. [sent-10, score-0.629]

7 1 Introduction Range search is a fundamental proximity task at the core of many learning problems. [sent-11, score-0.271]

8 The task of range search is to return all points in a database within a specified distance of a given query. [sent-12, score-0.352]

9 Neighborhood graphs—used in manifold learning, spectral algorithms, semisupervised algorithms, and elsewhere—can be built by connecting each point to all other points within a certain radius; doing so requires range search at each point. [sent-16, score-0.288]

10 These methods achieve speedups by pruning out large portions of the search space with bounds derived from KD or metric trees that are augmented with statistics of the database. [sent-20, score-0.373]

11 Some of these algorithms are direct applications of range search; others rely on very similar pruning techniques. [sent-21, score-0.229]

12 One fairly substantial limitation of these methods is that they all derive bounds from the triangle inequality and thus only work for notions of distance that are metrics. [sent-22, score-0.19]

13 The present work is on performing range search efficiently when the notion of dissimilarity is not a metric, but a Bregman divergence. [sent-23, score-0.322]

14 The family of Bregman divergences includes the standard 2 2 distance, Mahalanobis distance, KL-divergence, Itakura-Saito divergence, and a variety of matrix dissimilarity measures. [sent-24, score-0.373]

15 Because Bregman divergences can be asymmetric and need not satisfy the triangle inequality, the traditional metric methods cannot be applied. [sent-28, score-0.426]

16 In this work we present an algorithm for efficient range search when the notion of dissimilarity is an arbitrary Bregman divergence. [sent-29, score-0.341]

17 The task of efficient Bregman range search presents a technical challenge. [sent-32, score-0.254]

18 Our algorithm cannot rely on the triangle inequality, so bounds must be derived from geometric properties of Bregman divergences. [sent-33, score-0.175]

19 The algorithm makes use of a simple space decomposition scheme based on Bregman balls [8], but deploying this decomposition for the range search problem is not straightforward. [sent-34, score-0.406]

20 We derive properties of Bregman divergences that imply efficient algorithms for these problems. [sent-36, score-0.302]

21 2 Background In this section, we briefly review prior work on Bregman divergences and proximity search. [sent-37, score-0.381]

22 Bregman divergences originate in [7] and have become common in the machine learning literature, e. [sent-38, score-0.28]

23 The Bregman divergence based on f is df (x, y) ≡ f (x) − f (y) − f (y), x − y . [sent-43, score-0.476]

24 Standard examples include f (x) = 1 x 2 , 2 2 yielding the 2 distance df (x, y) = 1 x − y 2 , and f (x) = i xi log xi , giving the KL-divergence 2 2 2 i df (x, y) = i xi log xi The Itakura-Saito divergence and Mahalanobis distance are other examples y of Bregman divergences. [sent-45, score-0.948]

25 Strict convexity of f implies that df (x, y) ≥ 0, with equality if, and only if, x = y. [sent-46, score-0.422]

26 Though Bregman divergences satisfy this non-negativity property, like metrics, the similarities to metrics end there. [sent-47, score-0.318]

27 In particular, a Bregman divergence need not satisfy the triangle inequality or be symmetric. [sent-48, score-0.195]

28 Bregman divergences do possess several geometric properties related to the convexity of the base function. [sent-49, score-0.326]

29 Most notably, df (x, y) is always convex in x (though not necessarily in y), implying that the Bregman ball Bf (µ, R) ≡ {x | df (x, µ) ≤ R} is a convex body. [sent-50, score-1.012]

30 Recently, work on a variety of geometric tasks with Bregman divergences has appeared. [sent-51, score-0.372]

31 [1] studies core-sets under Bregman divergences and gives a provably correct approximation algorithm for k-median clustering. [sent-53, score-0.299]

32 [8] describes the Bregman ball tree in the a context of nearest neighbor search; we will describe this work further momentarily. [sent-55, score-0.174]

33 The present paper contributes an effective algorithm for range search, one of the core problems of computational geometry [2], to this repertoire. [sent-57, score-0.175]

34 The Bregman ball tree (BB-tree) was introduced in the context of nearest neighbor (NN) search [8]. [sent-58, score-0.321]

35 Though NN search has a similar flavor to range search, the bounds that suffice for NN search are not sufficient for range search. [sent-59, score-0.533]

36 Moreover, though the extension of metric trees to range search (and hence to the previously described statistical tasks) is fairly straightforward because of the triangle inequality, the extension of BB-trees is substantially more complex. [sent-61, score-0.445]

37 2 Several other papers on Bregman proximity search have appeared very recently. [sent-62, score-0.268]

38 study some improvements to the BB-tree [21] and develop a related data structure which can be used with symmetrized divergences [20]. [sent-64, score-0.305]

39 develop extensions of the VA-file and the R-tree for Bregman divergences [26]. [sent-66, score-0.305]

40 That chapter also contains theoretical results on the general Bregman range search problem attained by adapting known data structures via the lifting technique (also used in [26] and previously in [19]). [sent-70, score-0.284]

41 3 Range search with BB-trees In this section, we review the Bregman ball tree data structure and outline the range search algorithm. [sent-71, score-0.522]

42 The search algorithm relies on geometric properties of Bregman divergences, which we derive in section 4. [sent-72, score-0.212]

43 Each node i in the tree owns a subset of the points Xi and also defines a Bregman ball Bf (µ, R) such that Xi ⊂ Bf (µ, R). [sent-76, score-0.215]

44 Each leaf node contains some small number of points and the root node contains the entire database. [sent-79, score-0.174]

45 Since k-means has been generalized to arbitrary Bregman divergences [4], it is a natural choice for a clustering algorithm. [sent-83, score-0.28]

46 1 Search algorithm We now turn to the search algorithm, which uses a branch-and-bound approach. [sent-85, score-0.166]

47 The search algorithm starts at the root node and recursively explores the tree. [sent-90, score-0.226]

48 At a node i, the algorithm compares the node’s Bregman ball Bx to Bq . [sent-91, score-0.168]

49 We can thus stop the recursion and return all the points associated with the node without explicitly computing the divergence to any of them. [sent-94, score-0.169]

50 The dotted, shaded object is the query range and the other is the Bregman ball associated with a node of the BB-tree. [sent-110, score-0.286]

51 focus on range search, these types of prunings are useful in a broad range of statistical problems. [sent-111, score-0.214]

52 This type of pruning is another form of inclusion pruning and can be accomplished with the same technique. [sent-113, score-0.241]

53 In order to manage high-dimensional datasets, practitioners often use approximate proximity search techniques [8, 10, 17]. [sent-115, score-0.248]

54 Determining whether two Bregman balls intersect, or whether one Bregman ball contains another, is non-trivial. [sent-117, score-0.162]

55 For the range search algorithm to be effective, it must be able to determine these relationships very quickly. [sent-118, score-0.273]

56 Since we cannot rely on the triangle inequality for an arbitrary Bregman divergence, we must develop novel techniques. [sent-120, score-0.147]

57 We develop algorithms for determining (1) whether one Bregman ball is contained in another and (2) whether two Bregman balls have non-empty intersection. [sent-122, score-0.239]

58 We wish to evaluate if Bx df (x for all x q) Bq . [sent-125, score-0.422]

59 Simplifying notation, the core problem is determining df (x q) max x subject to: df (x ) R (maxP) Unfortunately, this problem is not convex. [sent-127, score-0.897]

60 This difficulty is particularly problematic in the case of range search, as the search algorithm will need to solve this problem repeatedly in the course of evaluating a singe range query. [sent-129, score-0.38]

61 a point x Bf ( R) that is not the max) will render the solution to the range search incorrect. [sent-132, score-0.254]

62 The Lagrangian is ν(x, λ) ≡ df (x, q) − λ(df (x, µ) − R), (1) where λ ≥ 0. [sent-143, score-0.422]

63 The Lagrange dual is L(θ) ≡ df (xθ , q) + θ (df (xθ , µ) − R). [sent-149, score-0.455]

64 1−θ Then for any θ ∈ (1, ∞), we have df (xp , q) ≤ L(θ) (3) ∗ by weak duality. [sent-150, score-0.422]

65 We now show that there is a θ > 1 satisfying df (xθ∗ , µ) = R. [sent-151, score-0.422]

66 One can check that the derivative of df (xθ , µ) with respect to θ is 2 ∗ (θ − 1)(µ − q ) f (xθ )(µ − q ). [sent-152, score-0.422]

67 We conclude that df (xθ , µ) is increasing at an increasing rate with θ. [sent-154, score-0.422]

68 Thus there must be some θ∗ > 1 such that df (xθ∗ , µ) = R. [sent-155, score-0.422]

69 Plugging this θ∗ into the dual, we get θ∗ L(θ∗ ) = df (xθ∗ , q) + (df (xθ∗ , µ) − R) 1 − θ∗ = df (xθ∗ , q). [sent-156, score-0.844]

70 Combining with (3), we have df (xp , q) ≤ df (xθ∗ , µ). [sent-157, score-0.844]

71 Thus determining if Bx ⊂ Bq reduces to searching for θ∗ > 1 satisfying df (xθ∗ , µx ) = Rx and comparing df (xθ∗ , µq ) to Rq . [sent-159, score-0.874]

72 Without such an upper bound, one needs to use a line search method that does not require one, such as Newton’s method or the secant method. [sent-161, score-0.18]

73 Both of these line search methods will converge quickly (quadratic in the case of Newton’s method, slightly slower in the case of the secant method): since df (xθ , µx ) is monotonic in θ, there is a unique root. [sent-162, score-0.602]

74 Then for all z, we have df (x, z) ≥ df (x, y) + df (y, z), where y ≡ argminy∈C df (y, z) is the projection of z onto C. [sent-169, score-1.764]

75 However, the inequality is actually the reverse of the standard triangle inequality and only applies to the very special case when y is the projection of z onto a convex set containing x. [sent-171, score-0.265]

76 Let x be the projection of µq onto Bx and let q be the projection of µx onto Bq . [sent-179, score-0.152]

77 The projection of x onto Bq lies on the dual curve between x and µy ; thus projecting x onto Bq yields q and similarly projecting q onto Bx yields x. [sent-182, score-0.237]

78 By the Pythagorean theorem, df (z, x) ≥ df (z, q) + df (q, x), (5) since q is the projection of x onto Bq and since z ∈ Bq . [sent-183, score-1.342]

79 (6) Inserting (5) into (6), we get df (z, q) ≥ df (z, q) + df (q, x) + df (x, q). [sent-185, score-1.688]

80 Rearranging, we get that df (q, x) + df (x, q) ≤ 0. [sent-186, score-0.844]

81 Thus both df (q, x) = 0 and df (x, q) = 0, implying that x = q. [sent-187, score-0.863]

82 The proceeding claim yields a simple algorithm for determining whether two balls Bx and Bq are disjoint: project µx onto Bq using the line search algorithm discussed previously. [sent-190, score-0.394]

83 3 Otherwise, they are disjoint and exclusion pruning can be performed. [sent-192, score-0.166]

84 5 Experiments We compare the performance of the search algorithm to standard brute force search on several datasets. [sent-193, score-0.35]

85 We are particularly interested in text applications as histogram representations are common, datasets are often very large, and efficient search is broadly useful. [sent-194, score-0.166]

86 However, one can show that both projections will be in the intersection using the monotonicity of df (xθ , ·) in θ. [sent-204, score-0.457]

87 Application of the range search algorithm for the KL-divergence raises one technical point: Claim 1 requires that the KLball being investigated lies within the domain of the KL-divergence. [sent-254, score-0.301]

88 When it did occur (which can be checked by evaluating df (µ, xθ ) for large θ), we simply did not perform inclusion pruning for that node. [sent-256, score-0.563]

89 There are two regimes where range search is particularly useful: when the radius γ is very small and when it is large. [sent-257, score-0.298]

90 When γ is small, range search is useful in instance-based learning algorithms like locally weighted regression, which need to retrieve points close to each test point. [sent-258, score-0.332]

91 When γ is large enough that Bf (q, γ) will contain most of the database, range search is potentially useful for applications like distance-based outlier detection and anomaly detection. [sent-260, score-0.254]

92 For the small radius experiments, γ was chosen so that about 20 points would be inside the query ball (on average). [sent-263, score-0.197]

93 On the rcv datasets, the BB-tree range search algorithm is an order of magnitude faster than brute search except of the the two datasets of highest dimensionality. [sent-265, score-0.476]

94 The algorithm provides a useful speedup on corel, but no speedup on semantic space. [sent-266, score-0.185]

95 The algorithm reflects the widely observed phenomenon that the performance of spatial decomposition data structures degrades with dimensionality, but still provides a useful speedup on several moderatedimensional datasets. [sent-268, score-0.17]

96 dataset corel pubmed4 pubmed8 pubmed16 pubmed32 pubmed64 pubmed128 pubmed256 rcv8 rcv16 rcv32 rcv64 rcv128 rcv256 semantic space dimensionality 64 4 8 16 32 64 128 256 8 16 32 64 128 256 371 speedup small radius large radius 2. [sent-270, score-0.289]

97 Here, we follow [18] and simply cut-off the search process early. [sent-305, score-0.147]

98 Thus a practical way to deploy the range search algorithm is to run it until enough points are recovered. [sent-308, score-0.307]

99 6 Conclusion We presented the first algorithm for efficient ball range search when the notion of dissimilarity is an arbitrary Bregman divergence. [sent-313, score-0.43]

100 A different research goal is to develop efficient algorithms for proximity search that have rigorous guarantees on run-time; theoretical questions about proximity search with Bregman divergences remain largely open. [sent-317, score-0.823]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bregman', 0.604), ('df', 0.422), ('bq', 0.28), ('divergences', 0.28), ('bx', 0.229), ('search', 0.147), ('bf', 0.125), ('range', 0.107), ('proximity', 0.101), ('pruning', 0.1), ('xp', 0.09), ('ball', 0.089), ('triangle', 0.085), ('corel', 0.08), ('balls', 0.073), ('dissimilarity', 0.068), ('maxp', 0.067), ('exclusion', 0.066), ('speedup', 0.064), ('node', 0.06), ('pythagorean', 0.058), ('claim', 0.056), ('divergence', 0.054), ('nielsen', 0.053), ('alexander', 0.052), ('onto', 0.05), ('geometric', 0.046), ('radius', 0.044), ('andrew', 0.044), ('metric', 0.042), ('inclusion', 0.041), ('database', 0.039), ('mahalanobis', 0.038), ('semantic', 0.038), ('brute', 0.037), ('speedups', 0.037), ('inequality', 0.037), ('intersection', 0.035), ('points', 0.034), ('ctm', 0.033), ('paolo', 0.033), ('piro', 0.033), ('pubmed', 0.033), ('secant', 0.033), ('sketching', 0.033), ('dual', 0.033), ('tree', 0.032), ('rq', 0.032), ('structures', 0.03), ('decomposition', 0.03), ('query', 0.03), ('convex', 0.03), ('determining', 0.03), ('michel', 0.029), ('piotr', 0.029), ('ting', 0.029), ('lies', 0.028), ('neighbor', 0.027), ('degrades', 0.027), ('voronoi', 0.027), ('projection', 0.026), ('nearest', 0.026), ('geometry', 0.026), ('nn', 0.026), ('variety', 0.025), ('gray', 0.025), ('bounds', 0.025), ('develop', 0.025), ('frank', 0.025), ('distance', 0.025), ('documents', 0.025), ('though', 0.024), ('histograms', 0.024), ('lawrence', 0.023), ('core', 0.023), ('indyk', 0.022), ('rectangular', 0.022), ('algorithms', 0.022), ('ef', 0.022), ('trees', 0.022), ('topic', 0.022), ('decomposable', 0.022), ('retrieve', 0.022), ('tasks', 0.021), ('elsewhere', 0.021), ('getting', 0.021), ('rx', 0.021), ('multimedia', 0.021), ('recursion', 0.021), ('density', 0.02), ('papers', 0.02), ('leaf', 0.02), ('dimensionality', 0.019), ('metrics', 0.019), ('datasets', 0.019), ('algorithm', 0.019), ('implying', 0.019), ('satisfy', 0.019), ('fairly', 0.018), ('moore', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 74 nips-2009-Efficient Bregman Range Search

Author: Lawrence Cayton

Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1

2 0.26782373 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

3 0.15854986 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

Author: Sylvain Arlot, Francis R. Bach

Abstract: This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression or spline smoothing, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows’ CL penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as 10-fold cross-validation or generalized cross-validation. 1

4 0.10058384 98 nips-2009-From PAC-Bayes Bounds to KL Regularization

Author: Pascal Germain, Alexandre Lacasse, Mario Marchand, Sara Shanian, François Laviolette

Abstract: We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classifier that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard p -regularized objective functions currently used, such as ridge regression and p -regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost. 1

5 0.098525316 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1

6 0.080432966 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

7 0.07175716 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

8 0.064578526 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

9 0.060794272 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

10 0.059444096 24 nips-2009-Adapting to the Shifting Intent of Search Queries

11 0.055793855 115 nips-2009-Individuation, Identification and Object Discovery

12 0.055569794 91 nips-2009-Fast, smooth and adaptive regression in metric spaces

13 0.055296399 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

14 0.052075665 177 nips-2009-On Learning Rotations

15 0.050809309 48 nips-2009-Bootstrapping from Game Tree Search

16 0.048504002 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

17 0.044500761 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

18 0.042655639 220 nips-2009-Slow Learners are Fast

19 0.041920546 260 nips-2009-Zero-shot Learning with Semantic Output Codes

20 0.041360404 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.142), (1, 0.086), (2, -0.019), (3, 0.018), (4, 0.014), (5, -0.027), (6, -0.07), (7, 0.034), (8, 0.002), (9, 0.04), (10, 0.007), (11, -0.048), (12, 0.169), (13, 0.102), (14, 0.108), (15, -0.132), (16, 0.135), (17, -0.011), (18, 0.062), (19, 0.035), (20, -0.124), (21, 0.002), (22, 0.033), (23, -0.09), (24, 0.073), (25, -0.038), (26, -0.021), (27, 0.113), (28, 0.114), (29, -0.113), (30, -0.017), (31, 0.095), (32, -0.077), (33, -0.155), (34, -0.016), (35, 0.033), (36, 0.088), (37, -0.042), (38, -0.004), (39, 0.007), (40, 0.133), (41, 0.005), (42, 0.105), (43, -0.074), (44, 0.069), (45, -0.004), (46, -0.018), (47, -0.019), (48, 0.087), (49, -0.158)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95525318 74 nips-2009-Efficient Bregman Range Search

Author: Lawrence Cayton

Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1

2 0.69279808 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

3 0.56250829 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1

4 0.52793634 91 nips-2009-Fast, smooth and adaptive regression in metric spaces

Author: Samory Kpotufe

Abstract: It was recently shown that certain nonparametric regressors can escape the curse of dimensionality when the intrinsic dimension of data is low ([1, 2]). We prove some stronger results in more general settings. In particular, we consider a regressor which, by combining aspects of both tree-based regression and kernel regression, adapts to intrinsic dimension, operates on general metrics, yields a smooth function, and evaluates in time O(log n). We derive a tight convergence rate of the form n−2/(2+d) where d is the Assouad dimension of the input space. 1

5 0.45398027 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

Author: Parikshit Ram, Dongryeol Lee, William March, Alexander G. Gray

Abstract: Several key computational bottlenecks in machine learning involve pairwise distance computations, including all-nearest-neighbors (ďŹ nding the nearest neighbor(s) for each point, e.g. in manifold learning) and kernel summations (e.g. in kernel density estimation or kernel machines). We consider the general, bichromatic case for these problems, in addition to the scientiďŹ c problem of N-body simulation. In this paper we show for the ďŹ rst time O(đ?‘ ) worst case runtimes for practical algorithms for these problems based on the cover tree data structure [1]. 1

6 0.45277843 234 nips-2009-Streaming k-means approximation

7 0.42530698 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

8 0.42340976 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

9 0.37575161 249 nips-2009-Tracking Dynamic Sources of Malicious Activity at Internet Scale

10 0.36068711 24 nips-2009-Adapting to the Shifting Intent of Search Queries

11 0.35490566 48 nips-2009-Bootstrapping from Game Tree Search

12 0.34565485 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

13 0.32656208 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

14 0.30683208 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

15 0.29690963 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

16 0.2950013 129 nips-2009-Learning a Small Mixture of Trees

17 0.28668422 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

18 0.28325158 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

19 0.28032076 55 nips-2009-Compressed Least-Squares Regression

20 0.27970645 182 nips-2009-Optimal Scoring for Unsupervised Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.01), (24, 0.058), (25, 0.058), (35, 0.053), (36, 0.099), (39, 0.047), (48, 0.02), (58, 0.068), (61, 0.028), (71, 0.045), (81, 0.044), (84, 0.275), (86, 0.081), (91, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86897403 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

Author: Tomoharu Iwata, Takeshi Yamada, Naonori Ueda

Abstract: We propose a probabilistic topic model for analyzing and extracting contentrelated annotations from noisy annotated discrete data such as web pages stored in social bookmarking services. In these services, since users can attach annotations freely, some annotations do not describe the semantics of the content, thus they are noisy, i.e. not content-related. The extraction of content-related annotations can be used as a preprocessing step in machine learning tasks such as text classification and image recognition, or can improve information retrieval performance. The proposed model is a generative model for content and annotations, in which the annotations are assumed to originate either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method by using synthetic data and real social annotation data for text and images.

same-paper 2 0.78895617 74 nips-2009-Efficient Bregman Range Search

Author: Lawrence Cayton

Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1

3 0.75414085 90 nips-2009-Factor Modeling for Advertisement Targeting

Author: Ye Chen, Michael Kapralov, John Canny, Dmitry Y. Pavlov

Abstract: We adapt a probabilistic latent variable model, namely GaP (Gamma-Poisson) [6], to ad targeting in the contexts of sponsored search (SS) and behaviorally targeted (BT) display advertising. We also approach the important problem of ad positional bias by formulating a one-latent-dimension GaP factorization. Learning from click-through data is intrinsically large scale, even more so for ads. We scale up the algorithm to terabytes of real-world SS and BT data that contains hundreds of millions of users and hundreds of thousands of features, by leveraging the scalability characteristics of the algorithm and the inherent structure of the problem including data sparsity and locality. SpeciÄ?Ĺš cally, we demonstrate two somewhat orthogonal philosophies of scaling algorithms to large-scale problems, through the SS and BT implementations, respectively. Finally, we report the experimental results using Yahoo’s vast datasets, and show that our approach substantially outperform the state-of-the-art methods in prediction accuracy. For BT in particular, the ROC area achieved by GaP is exceeding 0.95, while one prior approach using Poisson regression [11] yielded 0.83. For computational performance, we compare a single-node sparse implementation with a parallel implementation using Hadoop MapReduce, the results are counterintuitive yet quite interesting. We therefore provide insights into the underlying principles of large-scale learning. 1

4 0.69559586 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors

Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye

Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.

5 0.55389512 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

Author: Kyomin Jung, Pushmeet Kohli, Devavrat Shah

Abstract: We consider the question of computing Maximum A Posteriori (MAP) assignment in an arbitrary pair-wise Markov Random Field (MRF). We present a randomized iterative algorithm based on simple local updates. The algorithm, starting with an arbitrary initial assignment, updates it in each iteration by first, picking a random node, then selecting an (appropriately chosen) random local neighborhood and optimizing over this local neighborhood. Somewhat surprisingly, we show that this algorithm finds a near optimal assignment within n log 2 n iterations with high probability for any n node pair-wise MRF with geometry (i.e. MRF graph with polynomial growth) with the approximation error depending on (in a reasonable manner) the geometric growth rate of the graph and the average radius of the local neighborhood – this allows for a graceful tradeoff between the complexity of the algorithm and the approximation error. Through extensive simulations, we show that our algorithm finds extremely good approximate solutions for various kinds of MRFs with geometry.

6 0.55229849 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

7 0.55207652 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

8 0.55165917 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

9 0.54997098 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

10 0.54894429 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

11 0.5489071 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

12 0.54855412 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

13 0.54804933 129 nips-2009-Learning a Small Mixture of Trees

14 0.54760665 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

15 0.54725927 3 nips-2009-AUC optimization and the two-sample problem

16 0.54711193 104 nips-2009-Group Sparse Coding

17 0.54690707 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

18 0.5468322 70 nips-2009-Discriminative Network Models of Schizophrenia

19 0.54566205 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

20 0.54455769 72 nips-2009-Distribution Matching for Transduction