nips nips2007 nips2007-16 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. [sent-6, score-0.574]
2 We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. [sent-7, score-0.305]
3 We derive a generalization theory for these data structure classes and present simple learning algorithms for both. [sent-8, score-0.153]
4 1 Introduction Nearest neighbor (NN) searching is a fundamental operation in machine learning, databases, signal processing, and a variety of other disciplines. [sent-10, score-0.132]
5 , xn }, and on an input query q, we hope to return the nearest (or approximately nearest, or k-nearest) point(s) to q in X using some similarity measure. [sent-14, score-0.342]
6 A tremendous amount of research has been devoted to designing data structures for fast NN retrieval. [sent-15, score-0.17]
7 Most of these structures are based on some clever partitioning of the space and a few have bounds (typically worst-case) on the number of distance calculations necessary to query it. [sent-16, score-0.44]
8 In contrast to the various data structures built using geometric intuitions, this learning framework allows one to construct a data structure by directly minimizing the cost of querying it. [sent-18, score-0.34]
9 In our framework, a sample query set guides the construction of the data structure containing the database. [sent-19, score-0.26]
10 In the absence of a sample query set, the database itself may be used as a reasonable prior. [sent-20, score-0.349]
11 The problem of building a NN data structure can then be cast as a learning problem: Learn a data structure that yields efficient retrieval times on the sample queries and is simple enough to generalize well. [sent-21, score-0.451]
12 A major benefit of this framework is that one can seamlessly handle situations where the query distribution is substantially different from the distribution of the database. [sent-22, score-0.212]
13 We consider two different function classes that have performed well in NN searching: KD-trees and the cell structures employed by locality sensitive hashing. [sent-23, score-0.39]
14 The known algorithms for these data structures do not, of course, use learning to choose the parameters. [sent-24, score-0.139]
15 Nevertheless, we can examine the generalization properties of a data structure learned from one of these classes. [sent-25, score-0.157]
16 1 2 Related work There is a voluminous literature on data structures for nearest neighbor search, spanning several academic communities. [sent-29, score-0.36]
17 Work on efficient NN data structures can be classified according to two criteria: whether they return exact or approximate answers to queries; and whether they merely assume the distance function is a metric or make a stronger assumption (usually that the data are Euclidean). [sent-30, score-0.2]
18 The framework we describe in this paper applies to all these methods, though we focus in particular on data structures for RD . [sent-31, score-0.177]
19 Perhaps the most popular data structure for nearest neighbor search in RD is the simple and convenient KD-tree [1], which has enjoyed success in a vast range of applications. [sent-32, score-0.323]
20 One recent line of work suggests randomly projecting points in the database down to a low-dimensional space, and then using KD-trees [3, 4]. [sent-35, score-0.172]
21 Locality sensitive hashing (LSH) has emerged as a promising option for high-dimensional NN search in RD [5]. [sent-36, score-0.131]
22 A recent trend is to look for data structures that are attuned to the intrinsic dimension, e. [sent-41, score-0.139]
23 There has been some work on building a data structure for a particular query distribution [11]; this line of work is perhaps most similar to ours. [sent-45, score-0.255]
24 Nevertheless, the learning theoretic approach in this paper is novel; the study of NN data structures through the lens of generalization ability provides a fundamentally different theoretical basis for NN search with important practical implications. [sent-47, score-0.225]
25 , xn } denote the database and Q the space from which queries are drawn. [sent-53, score-0.359]
26 We take a nearest neighbor data structure to be a mapping f : Q → 2X ; the interpretation is we compute distances only to f (q), not all of X. [sent-55, score-0.325]
27 For example, the structure underlying LSH partitions RD into cells and a query is assigned to the subset of X that falls into the same cell. [sent-56, score-0.298]
28 We want to only compute distances to a small fraction of the database on a query; and, in the case of probabilistic algorithms, we want a high probability of success. [sent-58, score-0.226]
29 More precisely, we hope to minimize the following two quantities for a data structure f : • The fraction of X that we need to compute distances to: sizef (q) ≡ |f (q)| . [sent-59, score-0.567]
30 n • The fraction of a query’s k nearest neighbors that are missed: missf (q) ≡ |Γk (q) \ f (q)| k (Γk (q) denotes the k nearest neighbors of q in X). [sent-60, score-0.781]
31 2 In -approximate NN search, we only require a point x such that d(q, x) ≤ (1 + )d(q, X), so we instead use an approximate miss rate: missf (q) ≡ 1 [ x ∈ f (q) such that d(q, x) ≤ (1 + )d(q, X)] . [sent-61, score-0.619]
32 None of the previously discussed data structures are built by explicitly minimizing these quantities, though there are known bounds for some. [sent-62, score-0.231]
33 One reason is that research has typically focused on worst-case sizef and missf rates, which require minimizing these functions over all q ∈ Q. [sent-64, score-0.884]
34 In this work, we instead focus on average-case sizef and missf rates—i. [sent-66, score-0.851]
35 To do so, we assume that we are given a sample query set Q = {q1 , . [sent-69, score-0.206]
36 We attempt to build f minimizing the empirical size and miss rates, then resort to generalization bounds to relate these rates to the true ones. [sent-73, score-0.238]
37 The first is based on a splitting rule for KD-trees designed to minimize a greedy surrogate for the empirical sizef function. [sent-75, score-0.461]
38 The second is a algorithm that determines the boundary locations of the cell structure used in LSH that minimize a tradeoff of the empirical sizef and missf functions. [sent-76, score-1.131]
39 1 KD-trees KD-trees are a popular cell partitioning scheme for RD based on the binary search paradigm. [sent-78, score-0.194]
40 The data structure is built by picking a dimension, splitting the database along the median value in that dimension, and then recursing on both halves. [sent-79, score-0.414]
41 To find a NN for a query q, one first computes distances to all points in the same cell, then traverses up the tree. [sent-86, score-0.253]
42 Explore right subtree: Do not explore: Typically the cells contain only a few points; a query is expensive because it lies close to many of the cell boundaries and much of the tree must be explored. [sent-89, score-0.419]
43 Learning method Rather than picking the median split at each level, we use the training queries qi to pick a split that greedily minimizes the expected cost. [sent-90, score-0.633]
44 A split s divides the sample queries (that are in the cell being split) into three sets: Qtc , those q that are “too close” to s—i. [sent-91, score-0.453]
45 The split also divides the database points (that are in the cell being split) into Xl and Xr . [sent-95, score-0.377]
46 3 cost(s) is a greedy surrogate for i sizef (qi ); evaluating the true average size would require a potentially costly recursion. [sent-97, score-0.399]
47 Using a sample set led us to a very simple, natural cost function that can be used to pick splits in a principled manner. [sent-99, score-0.167]
48 2 Locality sensitive hashing LSH was a tremendous breakthrough in NN search as it led to data structures with provably sublinear (in the database size) retrieval time for approximate NN searching. [sent-101, score-0.512]
49 It is built on an extremely simple space partitioning scheme which we refer to as a rectilinear cell structure (RCS). [sent-104, score-0.275]
50 1 Project database down to O(log n) dimensions: xi → Rxi . [sent-106, score-0.143]
51 On query q, one simply finds the cell that q belongs to, and returns the nearest x in that cell. [sent-109, score-0.414]
52 We wish to select boundary locations minimizing the cost i missf (qi ) + λsizef (qi ), where λ is a tradeoff parameter (alternatively, one could fix a miss rate that is reasonable, say 5%, and minimize the size). [sent-117, score-0.817]
53 The cost of placing the boundaries at p1 , p2 , pB+1 can be decomposed as c[p1 , p2 ] + · · · + c[pB , pB+1 ], where c[pi , pi+1 ] = |{x ∈ [pi , pi+1 ]}| . [sent-121, score-0.153]
54 missf (q) + λ q∈[pi ,pi+1 ] q∈[pi ,pi+1 ] Let D be our dynamic programming table where D[p, i] is defined as the cost of putting the ith boundary at position p and the remaining B + 1 − i to the right. [sent-122, score-0.564]
55 Generalization theory2 5 In our framework, a nearest neighbor data structure is learned by specifically designing it to perform well on a set of sample queries. [sent-124, score-0.372]
56 Recall the setting: there is a database X = {x1 , . [sent-126, score-0.143]
57 , qm } drawn iid from some distribution D on Q, and we wish to learn a data structure f : Q → 2X drawn from a d Dp is p-stable if for any v ∈ Rd and Z, X1 , . [sent-132, score-0.27]
58 We are interested in the generalization of sizef (q) ≡ |f (q)| , and missf (q) ≡ n |Γk (q)\f (q)| , both of which have range [0, 1] ( missf (q) can be substituted for missf (q) throughout k this section). [sent-139, score-1.849]
59 Suppose a data structure f is chosen from some class F, so as to have low empirical cost 1 m m sizef (qi ) and i=1 1 m m missf (qi ). [sent-140, score-0.95]
60 i=1 Can we then conclude that data structure f will continue to perform well for subsequent queries drawn from the underlying distribution on Q? [sent-141, score-0.307]
61 In other words, are the empirical estimates above necessarily close to the true expected values Eq∼D sizef (q) and Eq∼D missf (q) ? [sent-142, score-0.851]
62 , ud , B)-rectilinear cell structure (RCS) in RD is a partition of RD into B d cells given by x → (h1 (x · u1 ), . [sent-162, score-0.33]
63 , ud ∈ RD , and, for some positive integer B, let the set of data structures F consist of all (u1 , . [sent-173, score-0.278]
64 Suppose there is an underlying distribution over queries in RD , from which m sample queries q1 , . [sent-178, score-0.464]
65 Then sup E[missf ] − f ∈F 1 m m 2d(B − 1) log(m + n) + m missf (qi ) ≤ 2 i=1 log(2/δ) m and likewise for sizef . [sent-182, score-0.917]
66 Along each axis ui there are B − 1 boundaries to be chosen and only m + n distinct locations for each of these (as far as partitioning of the xi ’s and qi ’s is concerned). [sent-195, score-0.286]
67 , ud are chosen randomly, but, more remarkably, even if they are chosen based on X (for instance, by running PCA on X). [sent-202, score-0.139]
68 , ud , B)-rectilinear cell structures for all choices of u1 , . [sent-208, score-0.384]
69 Then with probability at least 1 − δ, sup E[missf ] − f ∈F 1 m m missf (qi ) ≤ 2 i=1 2 + 2d(D + B − 2) log(m + n) + m and likewise for sizef . [sent-212, score-0.917]
70 Let F be the set of all depth η KD-trees in RD and X ⊂ RD be a database of points. [sent-220, score-0.185]
71 Suppose there is an underlying distribution over queries in RD from which q1 , . [sent-221, score-0.216]
72 Then with probability at least 1 − δ, sup E[missf ] − f ∈F 1 m m missf (qi ) ≤ 2 i=1 (2η+1 − 2) log (D(3m + n)) + m log (2/δ) m A KD-tree utilizing median splits has depth η ≤ log n. [sent-225, score-0.857]
73 The depth of a KD-tree with learned splits can be higher, though we found empirically that the depth was always much less than 2 log n (and can of course be restricted manually). [sent-226, score-0.241]
74 1 KD-trees First let us look at a simple example comparing the learned splits to median splits. [sent-229, score-0.279]
75 Figure 1 shows a 2-dimensional dataset and the cell partitions produced by the learned splits and the median splits. [sent-230, score-0.424]
76 The KD-tree constructed with the median splitting rule places nearly all of the boundaries running right through the queries. [sent-231, score-0.294]
77 As a result, nearly the entire database will have to be searched for queries drawn from the center cluster distribution. [sent-232, score-0.396]
78 The KD-tree with the learned splits places most of the boundaries right around the actual database points, ensuring that fewer leaves will need to be examined for each query. [sent-233, score-0.379]
79 For this set of experiments, we used a randomly selected subset of the dataset as the database and a separate small subset as the test queries. [sent-237, score-0.143]
80 For the sample queries, we used the database itself—i. [sent-238, score-0.175]
81 We compare performance in terms of the average number of database points we have to compute distances to on a test set. [sent-242, score-0.222]
82 data set DB size test pts dim Corel (UCI) Covertype (UCI) Letter (UCI) Pen digits (UCI) Bio (KDD) Physics (KDD) 32k 100k 18k 9k 100k 100k 5k 10k 2k 1k 10k 10k # distance calculations median split learned split 1035. [sent-243, score-0.403]
83 3 Figure 2: Percentage of DB examined as a function of (the approximation factor) for various query distributions. [sent-292, score-0.174]
84 Random boundaries Random boundaries Tuned boundaries Tuned boundaries Figure 3: Example RCSs. [sent-293, score-0.432]
85 This dataset allows us to explore the effect of differing query and database distributions in a natural setting. [sent-304, score-0.317]
86 Figure 2 shows the results of running KD-trees using median and learned splits. [sent-306, score-0.216]
87 In each case, 4000 images were chosen for the database (from across all the classes) and images from select classes were chosen for the queries. [sent-307, score-0.204]
88 The “All” queries were chosen from all classes; the “Animals” were chosen from the 11 animal classes; the “N. [sent-308, score-0.243]
89 Standard KD-trees are performing somewhat better than brute force in these experiments; the learned KD-trees yield much faster retrieval times across a range of approximation errors. [sent-310, score-0.133]
90 Note also that the performance of the learned KD-tree seems to improve as the query distribution becomes simpler whereas the performance for the standard KD-tree actually degrades. [sent-311, score-0.239]
91 The queries and DB are drawn from the same distribution. [sent-314, score-0.253]
92 The learning algorithm adjusts the bin boundaries to the regions of density. [sent-315, score-0.149]
93 Experimenting with RCS structures is somewhat challenging since there are two parameters to set (number of projections and boundaries), an approximation factor , and two quantities to compare (size and miss). [sent-316, score-0.209]
94 Rather than minimizing a tradeoff between sizef and missf , we constrained the miss rate and optimized the sizef . [sent-319, score-1.421]
95 We suspect that the learning algorithm helps substantially for the physics data because the one-dimensional projections are highly nonuniform whereas the MNIST one-dimensional projections are much more uniform. [sent-325, score-0.141]
96 We used this framework to develop algorithms that learn RCSs and KD-trees optimized for a query distribution. [sent-346, score-0.212]
97 Possible future work includes applying the learning framework to other data structures, though we expect that even stronger results may be obtained by using this framework to develop a novel data structure from the ground up. [sent-347, score-0.13]
98 An algorithm for finding nearest neighbours in (approximately) constant average time. [sent-398, score-0.134]
99 The analysis of a probabilistic approach to nearest neighbor searching. [sent-419, score-0.221]
100 Analysis of approximate nearest neighbor searching with clustered point sets. [sent-424, score-0.266]
wordName wordTfidf (topN-words)
[('missf', 0.48), ('sizef', 0.371), ('nn', 0.249), ('queries', 0.216), ('query', 0.174), ('rcss', 0.153), ('median', 0.151), ('database', 0.143), ('structures', 0.139), ('miss', 0.139), ('ud', 0.139), ('nearest', 0.134), ('lsh', 0.131), ('lshp', 0.131), ('rcs', 0.131), ('rd', 0.13), ('qi', 0.111), ('qtc', 0.109), ('boundaries', 0.108), ('cell', 0.106), ('qm', 0.095), ('neighbor', 0.087), ('uild', 0.087), ('db', 0.07), ('retrieval', 0.068), ('mount', 0.065), ('learned', 0.065), ('physics', 0.065), ('split', 0.064), ('splits', 0.063), ('classes', 0.061), ('animals', 0.058), ('mnist', 0.057), ('pi', 0.056), ('structure', 0.054), ('locality', 0.053), ('hashing', 0.052), ('distances', 0.05), ('uci', 0.048), ('search', 0.048), ('iid', 0.047), ('databases', 0.047), ('searching', 0.045), ('cost', 0.045), ('eq', 0.044), ('arya', 0.044), ('lefttree', 0.044), ('maneewongvatana', 0.044), ('rasiwasia', 0.044), ('rectilinear', 0.044), ('righttree', 0.044), ('pb', 0.044), ('depth', 0.042), ('gm', 0.042), ('dasgupta', 0.042), ('ree', 0.042), ('bin', 0.041), ('partitioning', 0.04), ('bears', 0.04), ('boundary', 0.039), ('partitions', 0.039), ('framework', 0.038), ('generalization', 0.038), ('ql', 0.038), ('projections', 0.038), ('fix', 0.037), ('drawn', 0.037), ('kdd', 0.035), ('splitting', 0.035), ('divides', 0.035), ('qr', 0.035), ('xr', 0.035), ('sup', 0.034), ('return', 0.034), ('minimizing', 0.033), ('fraction', 0.033), ('likewise', 0.032), ('quantities', 0.032), ('sample', 0.032), ('calculations', 0.032), ('sensitive', 0.031), ('built', 0.031), ('cells', 0.031), ('tremendous', 0.031), ('trees', 0.03), ('ann', 0.029), ('diego', 0.029), ('zm', 0.029), ('log', 0.029), ('points', 0.029), ('surrogate', 0.028), ('bounds', 0.028), ('dimension', 0.028), ('distance', 0.027), ('tradeoff', 0.027), ('locations', 0.027), ('animal', 0.027), ('pick', 0.027), ('minimize', 0.027), ('building', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
2 0.17306174 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
3 0.078235917 136 nips-2007-Multiple-Instance Active Learning
Author: Burr Settles, Mark Craven, Soumya Ray
Abstract: We present a framework for active learning in the multiple-instance (MI) setting. In an MI learning problem, instances are naturally organized into bags and it is the bags, instead of individual instances, that are labeled for training. MI learners assume that every instance in a bag labeled negative is actually negative, whereas at least one instance in a bag labeled positive is actually positive. We consider the particular case in which an MI learner is allowed to selectively query unlabeled instances from positive bags. This approach is well motivated in domains in which it is inexpensive to acquire bag labels and possible, but expensive, to acquire instance labels. We describe a method for learning from labels at mixed levels of granularity, and introduce two active query selection strategies motivated by the MI setting. Our experiments show that learning from instance labels can significantly improve performance of a basic MI learning algorithm in two multiple-instance domains: content-based image retrieval and text classification. 1
4 0.077684842 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
5 0.070096955 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
Author: Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun
Abstract: We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. More importantly, this general framework enables us to use a standard regression base learner such as single regression tree for £tting any loss function. We illustrate an application of the proposed method in learning ranking functions for Web search by combining both preference data and labeled data for training. We present experimental results for Web search using data from a commercial search engine that show signi£cant improvements of our proposed methods over some existing methods. 1
6 0.065661773 160 nips-2007-Random Features for Large-Scale Kernel Machines
7 0.060724668 161 nips-2007-Random Projections for Manifold Learning
8 0.060614567 143 nips-2007-Object Recognition by Scene Alignment
10 0.058685053 19 nips-2007-Active Preference Learning with Discrete Choice Data
11 0.055649448 58 nips-2007-Consistent Minimization of Clustering Objective Functions
12 0.055144332 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
13 0.053705454 162 nips-2007-Random Sampling of States in Dynamic Programming
14 0.050567895 118 nips-2007-Learning with Transformation Invariant Kernels
15 0.049078442 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
16 0.047634054 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
17 0.04551468 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
18 0.044103868 115 nips-2007-Learning the 2-D Topology of Images
19 0.043678295 197 nips-2007-The Infinite Markov Model
20 0.043537151 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
topicId topicWeight
[(0, -0.158), (1, 0.024), (2, -0.053), (3, 0.006), (4, -0.006), (5, 0.019), (6, 0.038), (7, 0.053), (8, 0.013), (9, 0.031), (10, -0.063), (11, 0.18), (12, 0.101), (13, -0.077), (14, -0.009), (15, 0.022), (16, 0.017), (17, -0.005), (18, 0.013), (19, -0.002), (20, 0.053), (21, 0.011), (22, 0.09), (23, 0.037), (24, 0.13), (25, -0.181), (26, -0.023), (27, 0.019), (28, -0.055), (29, -0.02), (30, -0.085), (31, -0.065), (32, -0.003), (33, -0.144), (34, -0.064), (35, 0.003), (36, -0.093), (37, -0.008), (38, -0.134), (39, -0.127), (40, -0.026), (41, -0.179), (42, 0.102), (43, -0.03), (44, -0.093), (45, -0.006), (46, -0.034), (47, 0.029), (48, 0.074), (49, -0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.94456059 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
2 0.76980537 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
3 0.53110486 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present the first truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufficiently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on fixed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1
4 0.52558637 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
5 0.43850508 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
Author: Yee W. Teh, Daniel J. Hsu, Hal Daume
Abstract: We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over the state-of-the-art, and demonstrate our approach in document clustering and phylolinguistics. 1
6 0.41650218 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
7 0.41530135 19 nips-2007-Active Preference Learning with Discrete Choice Data
8 0.41117814 160 nips-2007-Random Features for Large-Scale Kernel Machines
9 0.384076 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
10 0.37981328 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
12 0.3702288 136 nips-2007-Multiple-Instance Active Learning
13 0.35414115 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
14 0.35108846 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables
15 0.34857234 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
16 0.33959737 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers
17 0.32800922 197 nips-2007-The Infinite Markov Model
18 0.32451788 58 nips-2007-Consistent Minimization of Clustering Objective Functions
19 0.32418966 161 nips-2007-Random Projections for Manifold Learning
20 0.31864828 193 nips-2007-The Distribution Family of Similarity Distances
topicId topicWeight
[(5, 0.057), (13, 0.036), (16, 0.029), (21, 0.086), (31, 0.039), (34, 0.03), (35, 0.069), (47, 0.077), (49, 0.016), (75, 0.27), (83, 0.112), (85, 0.014), (87, 0.025), (90, 0.06)]
simIndex simValue paperId paperTitle
1 0.80573958 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
same-paper 2 0.74464691 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
3 0.56377172 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
4 0.5632031 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
5 0.55847138 156 nips-2007-Predictive Matrix-Variate t Models
Author: Shenghuo Zhu, Kai Yu, Yihong Gong
Abstract: It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate t model (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model selection to encourage sparse predictive models. Due to the non-conjugacy of its prior, it is difficult to make predictions by computing the mode or mean of the posterior distribution. We suggest an optimization method that sequentially minimizes a convex upper-bound of the log-likelihood, which is very efficient and scalable. The experiments on a toy data and EachMovie dataset show a good predictive accuracy of the model. 1
6 0.55567819 63 nips-2007-Convex Relaxations of Latent Variable Training
7 0.55392879 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
8 0.5535832 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
9 0.55332589 175 nips-2007-Semi-Supervised Multitask Learning
10 0.5520972 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
11 0.55120814 158 nips-2007-Probabilistic Matrix Factorization
12 0.55017233 69 nips-2007-Discriminative Batch Mode Active Learning
13 0.54981172 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
14 0.54929042 174 nips-2007-Selecting Observations against Adversarial Objectives
15 0.54923046 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
16 0.54836881 134 nips-2007-Multi-Task Learning via Conic Programming
17 0.54788166 97 nips-2007-Hidden Common Cause Relations in Relational Learning
18 0.54591298 86 nips-2007-Exponential Family Predictive Representations of State
19 0.54582667 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
20 0.54560894 7 nips-2007-A Kernel Statistical Test of Independence