nips nips2003 nips2003-136 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. [sent-11, score-0.435]
2 We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. [sent-12, score-0.12]
3 This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. [sent-13, score-0.048]
4 For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. [sent-14, score-0.035]
5 We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. [sent-15, score-0.559]
6 These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. [sent-16, score-0.273]
7 Spatial methods such as kd-trees [6, 17], R-trees [9], metric trees [18, 4] and ball trees [15] have been proposed and tested as a way of alleviating the computational cost of such statistics without resorting to approximate answers. [sent-19, score-0.376]
8 They have been used in many different ways, and with a variety of tree search algorithms and with a variety of “cached sufficient statistics” decorating the internal leaves, for example in [14, 5, 16, 8]. [sent-20, score-0.184]
9 The main concern with such accelerations is the extent to which they can survive high dimensional data. [sent-21, score-0.097]
10 This paper is about the consequences of the fact that none of these three questions have the same precise meaning: (a) “What are the k nearest neighbors of t? [sent-23, score-0.307]
11 ” (b) “How many of the k nearest neighbors of t are from the positive class? [sent-24, score-0.351]
12 ” and (c) “Are at least q of the k nearest neighbors from the positive class? [sent-25, score-0.351]
13 ” The computational geometry community has focused on question (a), but uses of proximity queries in statistics far more frequently require (b) and (c) types of computations. [sent-26, score-0.201]
14 Further, in addition to traditional K-NN, the same insight applies to many other statistical computations such as nonparametric density estimation, locally weighted regression, mixture models, k-means and the prediction phase of SVM classification. [sent-27, score-0.161]
15 2 Ball trees A ball tree is a binary tree in which each node represents a set of points, called Points(Node). [sent-28, score-0.905]
16 Given a dataset, the root node of a ball tree represents the full set of points in the dataset. [sent-29, score-0.81]
17 A node can be either a leaf node or a non-leaf node. [sent-30, score-0.685]
18 A leaf node explicitly contains a list of the points represented by the node. [sent-31, score-0.483]
19 A non-leaf node does not explicitly contain a set of points. [sent-32, score-0.304]
20 Each node has a distinguished point called a pivot. [sent-40, score-0.304]
21 Each node records the maximum distance of the points it owns to its pivot. [sent-42, score-0.548]
22 Pivot − x | Balls lower down the tree cover smaller volumes. [sent-45, score-0.131]
23 This is achieved by insisting, at tree construction time, that x ∈ Points(Node. [sent-46, score-0.131]
24 Pivot | Provided our distance function obeys the triangle inequality, this gives the ability to bound the distance from a target point t to any point in any ball tree node. [sent-56, score-0.441]
25 There are several ways to construct them, and practical algorithms trade off the cost of construction (it would be useless to be O(R2 ) given a dataset with R points, for example) against the tightness of the radius of the balls. [sent-62, score-0.052]
26 [13] describes one fast way of constructing a ball tree appropriate for computational statistics. [sent-63, score-0.349]
27 If a ball tree is balanced, then the construction time is O(CR log R), where C is the cost of a pointpoint distance computation (which is O(m) if there are m dense attributes, and O( f m) if the records are sparse with only fraction f of attributes taking non-zero values). [sent-64, score-0.528]
28 1 KNS1: Conventional K nearest neighbor search with ball trees In this paper, we call conventional ball-tree-based search [18] KNS1. [sent-66, score-0.763]
29 We begin with the following definition: Say that PS consists of the k-NN of t in pointset V if and only if ((| V |≥ k) ∧ (PS are the k-NN of t in V)) ∨ ((| V |< k) ∧ (PS = V )) (3) We now define a recursive procedure called BallKNN with the following inputs and output. [sent-68, score-0.16]
30 PSout = BallKNN(PSin , Node) Let V = set of points searched so far, on entry. [sent-69, score-0.102]
31 ∞ i f | PSin |< k (4) Let Dsofar = maxx∈PSin | x − t | i f | PSin |= k Dsofar is the minimum distance within which points would become interesting to us. [sent-72, score-0.148]
32 Radius, 0) ) i f Node = Root i f Node = Root DNode is the minimum possible distance from any point in Node to t. [sent-78, score-0.046]
33 minp (5) Procedure BallKNN (PSin , Node) begin if (DNode ≥ Dsofar ) then exit returning PSin unchanged. [sent-79, score-0.197]
34 2 KNS2: Faster k-NN classification for skewed-class data In several binary classification domains,one class is much more frequent than the other, For example, in High Throughput Screening datasets, [19] it is far more common for the result of an experiment to be negative than positive. [sent-82, score-0.104]
35 The new algorithm introduced in this section, KNS2, is designed to accelerate k-NN based classification beyond the speedups already available by using KNS1 (conventional ball-tree-based k-NN). [sent-84, score-0.096]
36 KNS2 attacks the problem by building two ball trees: Root pos is the root of a (small) ball tree built from all the positive points in the dataset. [sent-85, score-0.768]
37 Rootneg is the root of a (large) ball tree built from all negative points. [sent-86, score-0.466]
38 • Step 2 — “Insert negative”: Do sufficient search of the negative tree to prove that the number of positive datapoints among k nearest neighbors is q for some value of q. [sent-88, score-0.661]
39 Step 2 is achieved using a new recursive search called NegCount. [sent-89, score-0.108]
40 Distsk consisting of the distances to the k nearest positive neighbors of t, sorted in increasing order of distance. [sent-95, score-0.421]
41 Define pointset V as the set of points in the negative balls visited so far. [sent-98, score-0.327]
42 Say that (n,C) summarize interesting negative points for pointset V if and only if 1. [sent-100, score-0.304]
43 This simply declares that the length n of the C i=0 i=0 array is as short as possible while accounting for the k members of V that are nearest to t. [sent-103, score-0.239]
44 Step 2 of KNS2 is implemented by the recursive function (nout ,Cout ) = NegCount(nin ,Cin , Node, Dists) Assume that on entry that (nin ,Cin ) summarize interesting negative points for pointset V , where V is the set of points visited so far during the search. [sent-104, score-0.461]
45 This algorithm efficiently ensures that on exit (nout ,Cout ) summarize interesting negative points for V ∪ Points(Node). [sent-105, score-0.325]
46 Root, Dists) where C0 is an array of zeroes and Dists are defined in Equation 6 and obtained by applying KNS1 to the (small) positive ball tree. [sent-109, score-0.312]
47 3 KNS3: Are at least q of the k nearest neighbors positive? [sent-111, score-0.307]
48 KNS3 removes KNS2’s constraint of an assumed skewedness in the class distribution, while introducing a new constraint: we answer the binary question “are at least q nearest neighbors positive? [sent-113, score-0.423]
49 This is often the most statistically relevant question, for example during classification with known false positive and false negative costs. [sent-115, score-0.106]
50 4 SVP1: Faster Radial Basis SVM Prediction After an SVM [3] has been trained we hit the prediction phase. [sent-118, score-0.035]
51 org i∈negvecs Class(q j ) if ASUM(q j ) − BSUM(q j ) ≥ −b =1 if ASUM(q j ) − BSUM(q j ) < −b =0 Where the positive support vectors posvecs, the negative support vectors negvecs and the weights {αi }, {βi } and constant term b are all obtained from SVM training. [sent-126, score-0.154]
52 We place the queries (not the support vectors) into a ball-tree. [sent-127, score-0.128]
53 We can then apply the same kinds of tricks as KNS2 and KNS3 in which we do not need to find the explicit values of the ASUM and BSUM terms, but merely find balls in the tree in which we can prove all query points satisfy one of the above inequalities. [sent-128, score-0.373]
54 To classify all the points in a node called Node we do the following: 1. [sent-129, score-0.406]
55 Compute values (ASUMLO , ASUMHI ) such that we can be sure ∀q j ∈ Node : ASUMLO ≤ ASUM(q j ) ≤ ASUMHI (8) without iterating over the queries in Node. [sent-130, score-0.128]
56 If ASUMLO − BSUMHI ≥ −b we have proved that all queries in Node should be classified positively, and we can terminate this recursive call. [sent-140, score-0.219]
57 If ASUMHI − BSUMLO < −b we have proved that all queries in Node should be classified negatively, and we can terminate this recursive call. [sent-142, score-0.219]
58 Else we recurse and apply the same procedure to the two children of Node, unless Node is a leaf node in which case we must explicitly iterate over its members. [sent-144, score-0.381]
59 3 Experimental Results Table 1 is a summary of the datasets in the empirical analysis. [sent-145, score-0.112]
60 Life Sciences: These were proprietary datasets (ds1 and ds2) similar to the publicly available Open Compound Database provided by the National Cancer Institute (NCI Open Compound Database, 2000). [sent-146, score-0.112]
61 We also present results on datasets derived from ds1, denoted ds1. [sent-148, score-0.112]
62 The goal is to predict whether J Lee ( the most common name) was a collaborator for each work based on who else is listed for that work. [sent-153, score-0.073]
63 100pca to represent the linear projection of the data to 100 dimensions using PCA. [sent-155, score-0.065]
64 The second link detection dataset is derived from the Internet Movie Database (IMDB,2002) and is denoted imdb using a similar approach, but to predict the participation of Mel Blanc (again the most common participant). [sent-156, score-0.052]
65 UCI/KDD data: We use three large datasets from KDD/UCI repository [2]. [sent-157, score-0.112]
66 Each categorical input attribute was converted into n binary attributes by a 1-of-n encoding (where n is the attribute’s arity). [sent-160, score-0.113]
67 We performed binary classification using the letter A as the positive class and “Not A” as negative. [sent-166, score-0.154]
68 The TREC-2001 Video Track organized by NIST shot boundary Task. [sent-169, score-0.048]
69 It is a 4 hours of video or 13 MPEG-1 video files at slightly over 2GB of data. [sent-170, score-0.104]
70 : a datapoint is classified as positive iff the majority of its k nearest neighbors are positive. [sent-195, score-0.351]
71 Thus, each experiment required R k-NN classification queries (where R is the number of records in the dataset) and each query involved the k-NN among 0. [sent-197, score-0.306]
72 A naive implementation with no ball-trees would thus require 0. [sent-199, score-0.082]
73 Table 2 shows the computational cost of naive k-NN, both in terms of the number of distance computations and the wall-clock time on an unloaded 2 GHz Pentium. [sent-203, score-0.163]
74 We then examine the speedups of KNS1 (traditional use of Ball-trees) and our two new Ball-tree methods (KNS2 and KNS3). [sent-204, score-0.096]
75 It is notable that for some high dimensional datasets, KNS1 does not produce an acceleration over naive. [sent-205, score-0.063]
76 The first thing to notice is that conventional ball-trees (KNS1) were slightly worse than the naive O(R2 ) algorithm. [sent-208, score-0.161]
77 In only one case was KNS2 inferior to naive and KNS3 was always superior. [sent-209, score-0.082]
78 On some datasets KNS2 and KNS3 gave dramatic speedups. [sent-210, score-0.112]
79 Table 2: Number of distance computations and wall-clock-time for Naive k-NN classification (2nd column). [sent-215, score-0.081]
80 9 Table 3: Comparison between SVM light and SVP1. [sent-344, score-0.033]
81 We show the total number of distance computations made during the prediction phase for each method, and total wall-clock time. [sent-345, score-0.116]
82 100pca Blanc Mel Letter Ipums Movie Kddcup99(10%) 4 SVM light distances 6. [sent-350, score-0.103]
83 8 × 105 SVM light seconds 394 60 259 2775 31 61 21 494 371 69 SVP1 seconds 171 23 92 762 7 26 11 1 136 1 speedup 2. [sent-368, score-0.168]
84 7 69 Comments and related work Applicability of other proximity query work. [sent-376, score-0.12]
85 For example [7] gives a hashing method that was demonstrated to provide speedups over a ball-tree-based approach in 64 dimensions by a factor of 2-5 depending on how much error in the approximate answer was permitted. [sent-378, score-0.248]
86 Another approximate k-NN idea is in [1], one of the first k-NN approaches to use a priority queue of nodes, in this case achieving a 3-fold speedup with an approximation to the true k-NN. [sent-379, score-0.135]
87 However, these approaches are based on the notion that any points falling within a factor of (1 + ε) times the true nearest neighbor distance are acceptable substitutes for the true nearest neighbor. [sent-380, score-0.618]
88 Noting in particular that distances in high-dimensional spaces tend to occupy a decreasing range of continuous values [10], it remains an open question whether schemes based upon the absolute values of the distances rather than their ranks are relevant to the classification task. [sent-381, score-0.175]
89 Our approach, because it need not find the k-NN to answer the relevant statistical question, finds an answer without approximation. [sent-382, score-0.078]
90 An optimal algorithm for approximate nearest neighbor searching fixed dimensions. [sent-390, score-0.281]
91 M-tree: An efficient access method for similarity search in metric spaces. [sent-408, score-0.053]
92 The trec-2001 video trackorganized by nist shot boundary task, 2001. [sent-452, score-0.136]
wordName wordTfidf (topN-words)
[('node', 0.304), ('dists', 0.29), ('ball', 0.218), ('nout', 0.218), ('psin', 0.194), ('nearest', 0.189), ('asum', 0.169), ('psout', 0.169), ('ballknn', 0.145), ('negcount', 0.145), ('nin', 0.145), ('posvecs', 0.145), ('speedup', 0.135), ('tree', 0.131), ('queries', 0.128), ('exit', 0.126), ('dsofar', 0.121), ('neighbors', 0.118), ('datasets', 0.112), ('pointset', 0.105), ('points', 0.102), ('asumhi', 0.097), ('asumlo', 0.097), ('blanc', 0.097), ('bsum', 0.097), ('dnode', 0.097), ('ipums', 0.097), ('records', 0.096), ('speedups', 0.096), ('movie', 0.096), ('neighbor', 0.092), ('classi', 0.082), ('query', 0.082), ('naive', 0.082), ('conventional', 0.079), ('trees', 0.079), ('leaf', 0.077), ('ps', 0.076), ('else', 0.073), ('cout', 0.073), ('ntemp', 0.073), ('minp', 0.071), ('distances', 0.07), ('svm', 0.069), ('child', 0.068), ('letter', 0.068), ('dimensions', 0.065), ('datapoints', 0.064), ('acceleration', 0.063), ('accelerations', 0.063), ('furthest', 0.063), ('mel', 0.063), ('negative', 0.062), ('balls', 0.058), ('nonparametric', 0.056), ('recursive', 0.055), ('root', 0.055), ('search', 0.053), ('video', 0.052), ('dataset', 0.052), ('array', 0.05), ('bsumhi', 0.048), ('bsumlo', 0.048), ('distsi', 0.048), ('distsj', 0.048), ('distsk', 0.048), ('hashing', 0.048), ('leeway', 0.048), ('negvecs', 0.048), ('pstemp', 0.048), ('shot', 0.048), ('twelfth', 0.048), ('vldb', 0.048), ('distance', 0.046), ('positive', 0.044), ('compound', 0.042), ('citeseer', 0.042), ('binary', 0.042), ('answer', 0.039), ('maxx', 0.038), ('proximity', 0.038), ('attributes', 0.037), ('database', 0.036), ('radial', 0.036), ('terminate', 0.036), ('nist', 0.036), ('question', 0.035), ('prediction', 0.035), ('summarize', 0.035), ('computations', 0.035), ('traditional', 0.035), ('acm', 0.035), ('cation', 0.034), ('survive', 0.034), ('attribute', 0.034), ('carnegie', 0.034), ('mellon', 0.034), ('pittsburgh', 0.034), ('light', 0.033), ('ci', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
2 0.14366706 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun
Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1
3 0.10585114 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
Author: Clayton Scott, Robert Nowak
Abstract: This paper reports on a family of computationally practical classifiers that converge to the Bayes error at near-minimax optimal rates for a variety of distributions. The classifiers are based on dyadic classification trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables local (rather than global) fitting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y ) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate. We also study DCTs equipped with polynomial classification rules at each leaf, and show that as the smoothness of the boundary increases their errors converge to the Bayes error at a rate approaching n−1/2 , the parametric rate. We are not aware of any other practical classifiers that provide similar rate of convergence guarantees. Fast algorithms for tree pruning are discussed. 1
4 0.087768383 172 nips-2003-Semi-Supervised Learning with Trees
Author: Charles Kemp, Thomas L. Griffiths, Sean Stromsten, Joshua B. Tenenbaum
Abstract: We describe a nonparametric Bayesian approach to generalizing from few labeled examples, guided by a larger set of unlabeled objects and the assumption of a latent tree-structure to the domain. The tree (or a distribution over trees) may be inferred using the unlabeled data. A prior over concepts generated by a mutation process on the inferred tree(s) allows efficient computation of the optimal Bayesian classification function from the labeled examples. We test our approach on eight real-world datasets. 1
5 0.080142118 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin
Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as on-line learning. 1
6 0.078919046 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
7 0.076653741 42 nips-2003-Bounded Finite State Controllers
8 0.074404418 99 nips-2003-Kernels for Structured Natural Language Data
9 0.072139621 189 nips-2003-Tree-structured Approximations by Expectation Propagation
10 0.068973266 111 nips-2003-Learning the k in k-means
11 0.068785064 164 nips-2003-Ranking on Data Manifolds
12 0.063379169 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
13 0.05923238 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
14 0.058641493 1 nips-2003-1-norm Support Vector Machines
15 0.05813428 113 nips-2003-Learning with Local and Global Consistency
16 0.057214584 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
17 0.056497406 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
18 0.052753817 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
19 0.051884707 108 nips-2003-Learning a Distance Metric from Relative Comparisons
20 0.048034925 117 nips-2003-Linear Response for Approximate Inference
topicId topicWeight
[(0, -0.181), (1, -0.056), (2, -0.051), (3, 0.018), (4, -0.04), (5, -0.083), (6, -0.092), (7, 0.025), (8, 0.092), (9, 0.105), (10, 0.101), (11, -0.004), (12, 0.02), (13, -0.018), (14, 0.104), (15, -0.04), (16, -0.058), (17, 0.083), (18, 0.174), (19, 0.018), (20, 0.054), (21, -0.073), (22, -0.064), (23, 0.021), (24, 0.103), (25, 0.085), (26, -0.028), (27, 0.131), (28, 0.098), (29, -0.007), (30, -0.057), (31, 0.072), (32, 0.101), (33, 0.108), (34, -0.091), (35, -0.028), (36, -0.026), (37, 0.063), (38, -0.038), (39, 0.061), (40, 0.011), (41, 0.09), (42, 0.046), (43, -0.072), (44, 0.215), (45, -0.013), (46, -0.041), (47, -0.097), (48, -0.067), (49, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.96031886 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
2 0.74208432 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
Author: Clayton Scott, Robert Nowak
Abstract: This paper reports on a family of computationally practical classifiers that converge to the Bayes error at near-minimax optimal rates for a variety of distributions. The classifiers are based on dyadic classification trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables local (rather than global) fitting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y ) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate. We also study DCTs equipped with polynomial classification rules at each leaf, and show that as the smoothness of the boundary increases their errors converge to the Bayes error at a rate approaching n−1/2 , the parametric rate. We are not aware of any other practical classifiers that provide similar rate of convergence guarantees. Fast algorithms for tree pruning are discussed. 1
3 0.63979506 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
Author: David Kauchak, Sanjoy Dasgupta
Abstract: We describe a procedure which finds a hierarchical clustering by hillclimbing. The cost function we use is a hierarchical extension of the k-means cost; our local moves are tree restructurings and node reorderings. We show these can be accomplished efficiently, by exploiting special properties of squared Euclidean distances and by using techniques from scheduling algorithms. 1
4 0.63949269 172 nips-2003-Semi-Supervised Learning with Trees
Author: Charles Kemp, Thomas L. Griffiths, Sean Stromsten, Joshua B. Tenenbaum
Abstract: We describe a nonparametric Bayesian approach to generalizing from few labeled examples, guided by a larger set of unlabeled objects and the assumption of a latent tree-structure to the domain. The tree (or a distribution over trees) may be inferred using the unlabeled data. A prior over concepts generated by a mutation process on the inferred tree(s) allows efficient computation of the optimal Bayesian classification function from the labeled examples. We test our approach on eight real-world datasets. 1
5 0.51998383 99 nips-2003-Kernels for Structured Natural Language Data
Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1
6 0.51818234 187 nips-2003-Training a Quantum Neural Network
7 0.51480871 3 nips-2003-AUC Optimization vs. Error Rate Minimization
8 0.51424575 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
9 0.50616562 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
10 0.44767088 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification
11 0.42371583 181 nips-2003-Statistical Debugging of Sampled Programs
12 0.40500218 113 nips-2003-Learning with Local and Global Consistency
13 0.39748016 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
14 0.37681985 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
15 0.37282372 44 nips-2003-Can We Learn to Beat the Best Stock
16 0.36032894 189 nips-2003-Tree-structured Approximations by Expectation Propagation
17 0.34757429 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
18 0.32945511 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality
19 0.32775897 42 nips-2003-Bounded Finite State Controllers
20 0.32553229 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
topicId topicWeight
[(0, 0.057), (35, 0.039), (53, 0.071), (71, 0.038), (76, 0.031), (85, 0.623), (91, 0.049)]
simIndex simValue paperId paperTitle
1 0.98119271 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
Author: Felix A. Wichmann, Arnulf B. Graf
Abstract: We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and shape representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away. 1
2 0.9727574 193 nips-2003-Variational Linear Response
Author: Manfred Opper, Ole Winther
Abstract: A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations.
same-paper 3 0.9716695 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
4 0.92946571 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality
Author: Maxim Likhachev, Geoffrey J. Gordon, Sebastian Thrun
Abstract: In real world planning problems, time for deliberation is often limited. Anytime planners are well suited for these problems: they find a feasible solution quickly and then continually work on improving it until time runs out. In this paper we propose an anytime heuristic search, ARA*, which tunes its performance bound based on available search time. It starts by finding a suboptimal solution quickly using a loose bound, then tightens the bound progressively as time allows. Given enough time it finds a provably optimal solution. While improving its bound, ARA* reuses previous search efforts and, as a result, is significantly more efficient than other anytime search methods. In addition to our theoretical analysis, we demonstrate the practical utility of ARA* with experiments on a simulated robot kinematic arm and a dynamic path planning problem for an outdoor rover. 1
5 0.87754607 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
Author: Kazuyuki Samejima, Kenji Doya, Yasumasa Ueda, Minoru Kimura
Abstract: When we model a higher order functions, such as learning and memory, we face a difficulty of comparing neural activities with hidden variables that depend on the history of sensory and motor signals and the dynamics of the network. Here, we propose novel method for estimating hidden variables of a learning agent, such as connection weights from sequences of observable variables. Bayesian estimation is a method to estimate the posterior probability of hidden variables from observable data sequence using a dynamic model of hidden and observable variables. In this paper, we apply particle filter for estimating internal parameters and metaparameters of a reinforcement learning model. We verified the effectiveness of the method using both artificial data and real animal behavioral data. 1
6 0.82954544 124 nips-2003-Max-Margin Markov Networks
7 0.77666217 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
8 0.72917855 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
9 0.69600105 3 nips-2003-AUC Optimization vs. Error Rate Minimization
10 0.689596 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
11 0.65594339 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
12 0.63347483 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
13 0.62053585 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
14 0.60569847 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
15 0.59907746 148 nips-2003-Online Passive-Aggressive Algorithms
16 0.59496927 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
17 0.59296507 41 nips-2003-Boosting versus Covering
18 0.58995914 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales
19 0.58828324 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
20 0.58079886 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates