jmlr jmlr2006 jmlr2006-63 knowledge-graph by maker-knowledge-mining

63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification


Source: pdf

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA Editor: Claire Cardie Abstract This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. [sent-8, score-0.4]

2 We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. [sent-9, score-0.284]

3 Given a data set V ⊂ RD containing n points, it finds the k closest points to a query point q ∈ RD , typically under the Euclidean distance, and chooses the label corresponding to the majority. [sent-19, score-0.198]

4 Our method achieves the exact classification that would be achieved by exhaustive search for the nearest neighbors. [sent-47, score-0.295]

5 Djouadi and Bouktache (1997) described both approximate and exact methods, however a speedup of only about a factor of two over exhaustive search was reported for the exact case, for simulated, low-dimensional data. [sent-49, score-0.205]

6 Lee and Chae (1998) also achieves exact classifications, but only obtained a speedup over exhaustive search of about 1. [sent-50, score-0.205]

7 KNS2 and KNS3 share the same insight that the task of k-nearest-neighbor classification of a query q need not require us to explicitly find those k nearest neighbors. [sent-62, score-0.308]

8 To be more specific, there are three similar but in fact different questions: (a) “What are the k nearest neigh1136 N EW A LGORITHMS FOR E FFICIENT H IGH -D IMENSIONAL N ONPARAMETRIC C LASSIFICATION bors of q? [sent-63, score-0.253]

9 ” (b) “How many of the k nearest neighbors of q are from the positive class? [sent-64, score-0.367]

10 ” and (c) “Are at least t of the k nearest neighbors from the positive class? [sent-65, score-0.367]

11 The triangle inequality underlying a ball-tree has the advantage of bounding the distances between data points, and can thus help us estimate the nearest neighbors without explicitly finding them. [sent-68, score-0.474]

12 We observe up to 100-fold speedup compared against highly optimized traditional ball-tree-based k-NN , in which the neighbors are found explicitly. [sent-72, score-0.277]

13 , 1997; Moore, 2000) is a binary tree where each node represents a set of points, called Points(Node). [sent-81, score-0.188]

14 Given a data set, the root node of a ball-tree represents the full set of points in the data set. [sent-82, score-0.304]

15 A node can be either a leaf node or a non-leaf node. [sent-83, score-0.312]

16 A leaf node explicitly contains a list of the points represented by the node. [sent-84, score-0.244]

17 Each node has a distinguished point called a Pivot. [sent-92, score-0.156]

18 Each node records the maximum distance of the points it owns to its pivot. [sent-94, score-0.378]

19 Pivot | 1137 L IU , M OORE AND G RAY Provided that our distance function satisfies the triangle inequality, we can bound the distance from a query point q to any point in any ball-tree node. [sent-108, score-0.201]

20 Radius is the maximum distance of the points it owns to its pivot, |x − Node. [sent-118, score-0.148]

21 We define ∞ i f | PSin |< k Dsofar = (4) maxx∈PSin | x − q | i f | PSin |== k Dsofar is the minimum distance within which points become interesting to us. [sent-141, score-0.148]

22 This is computed using the minp bound given by Equation (1) and the fact that all points covered by a node must be covered by its parent. [sent-148, score-0.422]

23 This property implies that DNode will never be less than the minimum distance of its minp ancestors. [sent-149, score-0.238]

24 Experimental results show that KNS1 (conventional ball-tree-based k-NN search) achieves significant speedup over Naive k-NN when the dimension d of the data set is moderate (less than 30). [sent-152, score-0.193]

25 In the following sections, we describe our new algorithms KNS2 and KNS3, these two algorithms are both based on ball-tree structure, but by using different search strategies, we explore how much speedup can be achieved beyond KNS1. [sent-156, score-0.205]

26 Various researches have been focused on designing clever 1139 L IU , M OORE AND G RAY Procedure BallKNN (PSin , Node) begin if (DNode ≥ Dsofar ) then minp Return PSin unchanged. [sent-166, score-0.178]

27 /* If this condition is satisfied, then impossible for a point in Node to be closer than the previously discovered kth nearest neighbor. [sent-167, score-0.28]

28 */ PStemp = BallKNN(PSin , node1 ) PSout = BallKNN(PStemp , node2 ) end Figure 2: A call of BallKNN({},Root) returns the k nearest neighbors of q in the ball-tree. [sent-170, score-0.367]

29 KNS2 answers type(b) question described in the introduction, namely, “How many of the k nearest neighbors are in the positive class? [sent-173, score-0.422]

30 KNS2 attacks the problem by building two ball-trees: A Postree for the points from the positive (small) class, and a Negtree for the points from the negative (large) class. [sent-175, score-0.222]

31 Since the number of points from the positive class(small) is so small, it is quite cheap to find the exact k nearest positive points of q by using KNS1. [sent-176, score-0.429]

32 And the idea of KNS2 is first search Postree using KNS1 to find the k nearest positive neighbors set Possetk , and then search Negtree while using Possetk as bounds to prune nodes far away, and at the same time estimating the number of negative points to be inserted to the true nearest neighbor set. [sent-177, score-1.0]

33 The search can be stopped as soon as we get the answer to question (b). [sent-178, score-0.147]

34 Then, we classify a new query point q in the following fashion 1140 N EW A LGORITHMS FOR E FFICIENT H IGH -D IMENSIONAL N ONPARAMETRIC C LASSIFICATION • Step 1 — “ Find positive”: Find the k nearest positive class neighbors of q (and their distances to q) using conventional ball-tree search. [sent-181, score-0.525]

35 • Step 2 — “Insert negative”: Do sufficient search on the negative tree to prove that the number of positive data points among k nearest neighbors is n for some value of n. [sent-182, score-0.605]

36 Distsk consisting of the distances to the k nearest positive neighbors found so far of q, sorted in increasing order of distance. [sent-189, score-0.418]

37 Define pointset V as the set of points in the negative balls visited so far in the search. [sent-192, score-0.254]

38 Say (n,C) summarize interesting negative points for pointset V if and only if 1. [sent-198, score-0.193]

39 , n, Ci =| V ∩ {x :| x − q |< Distsi } | (6) Intuitively Ci is the number of points in V whose distances to q are closer than Distsi . [sent-202, score-0.166]

40 In other words, Ci is the number of negative points in V closer than the ith positive neighbor to q. [sent-203, score-0.28]

41 This simply declares that the length n of the C array is as short as possible while accounting for the k members of V that are nearest to q. [sent-206, score-0.292]

42 To make the problem interesting, we assume that the number of negative points and the number of positive points are both greater than k. [sent-208, score-0.222]

43 • DNode and DNode maxp minp Here we will continue to use DNode which is defined in equation (4). [sent-209, score-0.299]

44 minp Symmetrically, we also define DNode as follows: maxp Let DNode = maxp min(|q − Node. [sent-210, score-0.42]

45 This is computed using the bound in Equation (1) and the property of a ball-tree that all the points covered by a node must be covered by its parent. [sent-216, score-0.244]

46 This property implies that DNode will never be greater than the maxp maximum possible distance of its ancestors. [sent-217, score-0.181]

47 maxp DNode and DNode are used to estimate the counts array (n,C). [sent-226, score-0.16]

48 Again we take advantage of maxp minp the triangle inequality of ball-tree. [sent-227, score-0.325]

49 The function of DNode minp similar to KNS1, is used to help prune uninteresting nodes. [sent-233, score-0.221]

50 Assume that on entry (nin ,Cin ) summarize interesting negative points for pointset V , where V is the set of points visited so far during the search. [sent-235, score-0.281]

51 We can stop the procedure when nout becomes 1 (which means all the k nearest neighbors of q are in the negative class) or when we run out of nodes. [sent-239, score-0.555]

52 nout represents the number of positive points in the k nearest neighbors of q. [sent-240, score-0.597]

53 First of all, when n = 1, we can stop and exit, since this means we have found at least k negative points closer than the nearest positive neighbor to q. [sent-244, score-0.533]

54 */ Set Distsnout := ∞ (1) if (nout == 1) /* At least k negative points closer to q out ) Return(1, C than the closest positive one: done! [sent-248, score-0.186]

55 The search is the same as conventional ball-tree search (KNS1), except that it uses the kth candidate negative point to bound the distance. [sent-264, score-0.182]

56 KNSV counts how many of the k nearest neighbors so far are from the negative class. [sent-266, score-0.413]

57 Also if the first guess of KNSV is correct and the k candidate points are good enough to prune away many nodes, it will be faster than conventional ball-tree search. [sent-271, score-0.183]

58 Second, using a greedy search to find the k candidate nearest neighbors has a high risk, since these candidates might not even be close to the true nearest neighbors. [sent-275, score-0.662]

59 Finally, we want to point out that KNSV claims it can perform well for many-class nearest neighbors, but this is based on the assumption that the winner class contains at least k/2 points within the nearest neighbors, which is often not true for the many-class case. [sent-278, score-0.594]

60 Comparing to KNSV, KNS2’s advantages are (i) it uses the skewness property of a data set, which can be robustly detected before the search, and (ii) more careful design gives KNS2 more chance to speedup the search. [sent-279, score-0.229]

61 Instead, we answer a weaker question: “are at least t of the k nearest neighbors positive? [sent-283, score-0.446]

62 In KNS3, we define two important quantities: Dtpos = distance o f the t th nearest positive neighbor o f q (8) Dneg m (9) th = distance o f the m nearest negative neighbor o f q where m + t = k + 1. [sent-286, score-0.91]

63 m Proposition 1 Dtpos ≤ Dneg if and only if at least t of the k nearest neighbors of q from the positive m class. [sent-288, score-0.367]

64 1144 N EW A LGORITHMS FOR E FFICIENT H IGH -D IMENSIONAL N ONPARAMETRIC C LASSIFICATION Proof: If Dtpos ≤ Dneg , then there are at least t positive points closer than the mth negative point to q. [sent-289, score-0.161]

65 This m also implies that if we draw a ball centered at q, and with its radius equal to Dneg , then there are m exactly m negative points and at least t positive points within the ball. [sent-290, score-0.284]

66 Since t + m = k + 1, if we use Dk to denote the distance of the kth nearest neighbor, we get Dk ≤ Dneg , which means that there m are at most m − 1 of the k nearest neighbors of q from the negative class. [sent-291, score-0.726]

67 It is equivalent to say that there are at least t of the k nearest neighbors of q are from the positive class. [sent-292, score-0.367]

68 On the other hand, if there are at least t of the k nearest neighbors from the positive class, then Dtpos ≤ Dk , the number of negative points is at most k −t < m, so Dk ≤ Dneg . [sent-293, score-0.501]

69 The reason to redefine the problem of pos D3 q A B neg D3 Figure 5: An example of Dtpos and Dneg m KNS3 is to transform a k nearest neighbor searching problem to a much simpler counting problem. [sent-298, score-0.462]

70 Below is the detailed description: At each stage of KNS3 we have two sets of balls in use called P and N, where P is a set of balls from Postree built from positive data points, and N consists of balls from Negtree built from negative data points. [sent-304, score-0.289]

71 This is possible because when a ball is splitted, we only require the pointset of its children be disjoint, but the balls covering the children node may be overlapped. [sent-313, score-0.392]

72 m m To illustrate this, it is useful to depict a ball as an interval, where the two ends of the interval denote the minimum and maximum possible distances of a point owned by the ball to the query. [sent-317, score-0.175]

73 Notice, we also mark “+5” above the interval to denote the number of points owned by the ball B. [sent-319, score-0.15]

74 The value of Lo(Dtpos ) can be understood as the answer to the following question: what if we tried to slide all the positive points within their bounds as far to the left as possible, where would the t th closest positive point lie? [sent-323, score-0.192]

75 Answer = PREDICT (P, N,t, m) The Answer, a boolean value, is TRUE, if there are at least t of the k nearest neighbors from the positive class; and False otherwise. [sent-333, score-0.367]

76 Define: (Lo(DS ),U p(DS )) = Estimate bound(S, i) i i (10) Here S is either set P or N, and we are interested in the ith nearest neighbor of q from set S. [sent-339, score-0.372]

77 In this case, we will need to split a ball-tree node and re-estimate the bounds. [sent-343, score-0.156]

78 The function of Pick(P, N) below is to choose one node either from P or m N to split. [sent-345, score-0.156]

79 There are different strategies for picking a node, for simplicity, our implementation only randomly pick a node to split. [sent-346, score-0.156]

80 Define: split node = Pick(P, N) (11) Here split node is the node chosen to be split. [sent-347, score-0.468]

81 */ split node = Pick(P, N) remove split node from P or N insert split node. [sent-350, score-0.312]

82 To know exact how many of the nearest neighbors are from the positive class can be especially useful when the threshold for deciding a class is not known. [sent-363, score-0.367]

83 We randomly took 1% negative records (881) and 50% positive records (105) as test data (total 986 points), and train on the remaining 87372 data points. [sent-434, score-0.254]

84 For KNS3, we used t = ⌈k/2⌉: a data point is classified as positive iff the majority of its k nearest neighbors are positive. [sent-463, score-0.397]

85 Since we use cross-validation, thus each experiment required R k-NN classification queries (where R is the umber of records in the data set) and each query involved the k-NN among 0. [sent-464, score-0.26]

86 For Naive k-NN , all the queries take 87372 distance computations. [sent-472, score-0.161]

87 0 × 104 distance computations, a few points take longer time. [sent-477, score-0.148]

88 The figures show the distribution of the speedup obtained for each query. [sent-482, score-0.163]

89 Generally speaking, the speedup achieved for distance computations on all three algorithms are greater than the corresponding speedup for wall-clock time. [sent-492, score-0.386]

90 We can see that for the synthetic data sets, KNS1 and KNS2 yield 2-700 fold speedup over naive. [sent-494, score-0.232]

91 Notice that KNS2 can’t beat KNS1 for these data sets, because KNS2 is designed to speedup k-NN search on data sets with unbalanced output classes. [sent-496, score-0.265]

92 (1999) gives a hashing method that was demonstrated to provide speedups over a ball-tree-based approach in 64 dimensions by a factor of 2-5 depending on how much error in the approximate answer was permitted. [sent-704, score-0.152]

93 (1998), one of the first k-NN approaches to use a priority queue of nodes, in this case achieving a 3-fold speedup with an approximation to the true k-NN . [sent-706, score-0.196]

94 , 2004a), we introduced a variant of ball-tree structures which allow non-backtracking search to speed up approximate nearest neighbor, and we observed up to 700-fold accelerations over conventional balltree based k-NN . [sent-708, score-0.383]

95 However, these approaches are based on the notion that any points falling within a factor of (1 + ε) times the true nearest neighbor distance are acceptable substitutes for the true nearest neighbor. [sent-710, score-0.773]

96 Our approach, because it need not find the k-NN to answer the relevant statistical question, finds an answer without approximation. [sent-712, score-0.158]

97 An optimal algorithm for approximate nearest neighbor searching fixed dimensions. [sent-732, score-0.372]

98 The analysis of a probabilistic approach to nearest neighbor searching. [sent-999, score-0.372]

99 The labeled cell classifier: A fast approximation to k nearest neighbors. [sent-1037, score-0.253]

100 An algorithm for a selective nearest neighbor decision rule. [sent-1085, score-0.372]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dneg', 0.379), ('dtpos', 0.355), ('nearest', 0.253), ('dnode', 0.237), ('lo', 0.206), ('minp', 0.178), ('speedup', 0.163), ('node', 0.156), ('dists', 0.142), ('nout', 0.142), ('oore', 0.142), ('knsv', 0.13), ('onparametric', 0.13), ('iu', 0.121), ('maxp', 0.121), ('neighbor', 0.119), ('neighbors', 0.114), ('igh', 0.111), ('imensional', 0.111), ('ray', 0.108), ('psin', 0.107), ('queries', 0.101), ('ew', 0.099), ('fficient', 0.099), ('points', 0.088), ('distsi', 0.083), ('answer', 0.079), ('records', 0.074), ('ballknn', 0.071), ('negcount', 0.071), ('negtree', 0.071), ('nin', 0.071), ('postree', 0.071), ('psout', 0.071), ('lgorithms', 0.071), ('lassification', 0.068), ('video', 0.065), ('ball', 0.062), ('moore', 0.062), ('balls', 0.061), ('distance', 0.06), ('dsofar', 0.059), ('pointset', 0.059), ('query', 0.055), ('pos', 0.054), ('naive', 0.053), ('conventional', 0.052), ('distances', 0.051), ('dminp', 0.047), ('omachi', 0.047), ('omohundro', 0.047), ('classi', 0.047), ('negative', 0.046), ('prune', 0.043), ('search', 0.042), ('ps', 0.041), ('movie', 0.04), ('faloutsos', 0.04), ('dimensions', 0.04), ('array', 0.039), ('synthetic', 0.039), ('accelerations', 0.036), ('arya', 0.036), ('aso', 0.036), ('blanc', 0.036), ('ciaccia', 0.036), ('cout', 0.036), ('flickner', 0.036), ('fraud', 0.036), ('ipums', 0.036), ('neg', 0.036), ('ntemp', 0.036), ('preparata', 0.036), ('queues', 0.036), ('shot', 0.036), ('skewness', 0.036), ('cmu', 0.033), ('priority', 0.033), ('speedups', 0.033), ('tree', 0.032), ('child', 0.031), ('qsar', 0.03), ('mel', 0.03), ('cardie', 0.03), ('data', 0.03), ('liu', 0.029), ('answers', 0.029), ('nonparametric', 0.028), ('kdd', 0.027), ('children', 0.027), ('ui', 0.027), ('furthest', 0.027), ('exit', 0.027), ('fukunaga', 0.027), ('uhlmann', 0.027), ('closer', 0.027), ('triangle', 0.026), ('question', 0.026), ('hart', 0.026), ('closest', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification

2 0.064652629 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Luca Zaniboni

Abstract: We study the problem of classifying data in a given taxonomy when classifications associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linear-threshold classifier for each node of the taxonomy. A hierarchical classification is obtained by evaluating the trained node classifiers in a top-down fashion. To evaluate classifiers in our multipath framework, we define a new hierarchical loss function, the H-loss, capturing the intuition that whenever a classification mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the subtree of that node. Making no assumptions on the mechanism generating the data instances, and assuming a linear noise model for the labels, we bound the H-loss of our on-line algorithm in terms of the H-loss of a reference classifier knowing the true parameters of the label-generating process. We show that, in expectation, the excess cumulative H-loss grows at most logarithmically in the length of the data sequence. Furthermore, our analysis reveals the precise dependence of the rate of convergence on the eigenstructure of the data each node observes. Our theoretical results are complemented by a number of experiments on texual corpora. In these experiments we show that, after only one epoch of training, our algorithm performs much better than Perceptron-based hierarchical classifiers, and reasonably close to a hierarchical support vector machine. Keywords: incremental algorithms, online learning, hierarchical classification, second order perceptron, support vector machines, regret bound, loss function

3 0.049184851 53 jmlr-2006-Learning a Hidden Hypergraph

Author: Dana Angluin, Jiang Chen

Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network

4 0.043844637 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution

Author: Gilles Blanchard, Motoaki Kawanabe, Masashi Sugiyama, Vladimir Spokoiny, Klaus-Robert Müller

Abstract: Finding non-Gaussian components of high-dimensional data is an important preprocessing step for efficient information processing. This article proposes a new linear method to identify the “nonGaussian subspace” within a very general semi-parametric framework. Our proposed method, called NGCA (non-Gaussian component analysis), is based on a linear operator which, to any arbitrary nonlinear (smooth) function, associates a vector belonging to the low dimensional nonGaussian target subspace, up to an estimation error. By applying this operator to a family of different nonlinear functions, one obtains a family of different vectors lying in a vicinity of the target space. As a final step, the target space itself is estimated by applying PCA to this family of vectors. We show that this procedure is consistent in the sense that the estimaton error tends to zero at a parametric rate, uniformly over the family, Numerical examples demonstrate the usefulness of our method.

5 0.039752975 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

Author: Hichem Sahbi, Donald Geman

Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: first, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-specific SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classifiers, coarse-to-fine computation, support vector machines, face detection

6 0.039059542 13 jmlr-2006-Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies

7 0.038386773 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

8 0.03555654 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

9 0.033299658 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

10 0.033041693 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

11 0.03189227 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

12 0.031714838 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

13 0.02894091 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

14 0.028698949 12 jmlr-2006-Active Learning with Feedback on Features and Instances

15 0.028649531 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

16 0.028139636 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities

17 0.028049266 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

18 0.0266759 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

19 0.025656343 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

20 0.02512403 25 jmlr-2006-Distance Patterns in Structural Similarity


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.147), (1, -0.032), (2, -0.018), (3, 0.022), (4, -0.053), (5, 0.066), (6, 0.008), (7, -0.131), (8, -0.021), (9, 0.082), (10, 0.018), (11, 0.001), (12, 0.045), (13, -0.02), (14, -0.001), (15, -0.049), (16, 0.052), (17, -0.006), (18, 0.169), (19, 0.013), (20, -0.192), (21, -0.026), (22, -0.078), (23, 0.069), (24, -0.043), (25, 0.193), (26, -0.056), (27, -0.171), (28, -0.071), (29, -0.203), (30, -0.27), (31, 0.156), (32, -0.224), (33, 0.11), (34, 0.211), (35, -0.007), (36, -0.011), (37, -0.0), (38, 0.126), (39, -0.027), (40, -0.204), (41, -0.002), (42, -0.149), (43, 0.06), (44, -0.018), (45, 0.105), (46, -0.309), (47, -0.181), (48, -0.143), (49, -0.126)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95461714 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification

2 0.31541976 53 jmlr-2006-Learning a Hidden Hypergraph

Author: Dana Angluin, Jiang Chen

Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network

3 0.28132853 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Luca Zaniboni

Abstract: We study the problem of classifying data in a given taxonomy when classifications associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linear-threshold classifier for each node of the taxonomy. A hierarchical classification is obtained by evaluating the trained node classifiers in a top-down fashion. To evaluate classifiers in our multipath framework, we define a new hierarchical loss function, the H-loss, capturing the intuition that whenever a classification mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the subtree of that node. Making no assumptions on the mechanism generating the data instances, and assuming a linear noise model for the labels, we bound the H-loss of our on-line algorithm in terms of the H-loss of a reference classifier knowing the true parameters of the label-generating process. We show that, in expectation, the excess cumulative H-loss grows at most logarithmically in the length of the data sequence. Furthermore, our analysis reveals the precise dependence of the rate of convergence on the eigenstructure of the data each node observes. Our theoretical results are complemented by a number of experiments on texual corpora. In these experiments we show that, after only one epoch of training, our algorithm performs much better than Perceptron-based hierarchical classifiers, and reasonably close to a hierarchical support vector machine. Keywords: incremental algorithms, online learning, hierarchical classification, second order perceptron, support vector machines, regret bound, loss function

4 0.22631584 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefficients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation

5 0.21875688 13 jmlr-2006-Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies

Author: Fu Chang, Chin-Chin Lin, Chi-Jen Lu

Abstract: In this paper, we propose a number of adaptive prototype learning (APL) algorithms. They employ the same algorithmic scheme to determine the number and location of prototypes, but differ in the use of samples or the weighted averages of samples as prototypes, and also in the assumption of distance measures. To understand these algorithms from a theoretical viewpoint, we address their convergence properties, as well as their consistency under certain conditions. We also present a soft version of APL, in which a non-zero training error is allowed in order to enhance the generalization power of the resultant classifier. Applying the proposed algorithms to twelve UCI benchmark data sets, we demonstrate that they outperform many instance-based learning algorithms, the k-nearest neighbor rule, and support vector machines in terms of average test accuracy. Keywords: adaptive prototype learning, cluster-based prototypes, consistency, instance-based prototype, pattern classification 1

6 0.214931 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

7 0.20471431 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

8 0.19404544 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

9 0.18042777 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

10 0.17952068 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

11 0.17576237 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution

12 0.17576049 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

13 0.1579887 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming     (Special Topic on Machine Learning and Optimization)

14 0.15453272 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

15 0.15178092 47 jmlr-2006-Learning Image Components for Object Recognition

16 0.14421065 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

17 0.1435466 21 jmlr-2006-Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis

18 0.13809536 49 jmlr-2006-Learning Parts-Based Representations of Data

19 0.12869959 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

20 0.12149356 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.031), (11, 0.013), (32, 0.43), (35, 0.01), (36, 0.07), (45, 0.014), (50, 0.047), (61, 0.023), (63, 0.037), (68, 0.014), (76, 0.015), (78, 0.021), (79, 0.029), (81, 0.033), (84, 0.015), (90, 0.013), (91, 0.026), (96, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72420287 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification

2 0.62320977 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

Author: Greg Hamerly, Erez Perelman, Jeremy Lau, Brad Calder, Timothy Sherwood

Abstract: An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately, simulating every cycle of a real program can take weeks or months. To address this problem we have created a tool called SimPoint that uses data clustering algorithms from machine learning to automatically find repetitive patterns in a program’s execution. By simulating one representative of each repetitive behavior pattern, simulation time can be reduced to minutes instead of weeks for standard benchmark programs, with very little cost in terms of accuracy. We describe this important problem, the data representation and preprocessing methods used by SimPoint, the clustering algorithm at the core of SimPoint, and we evaluate different options for tuning SimPoint. Keywords: k-means, random projection, Bayesian information criterion, simulation, SimPoint

3 0.2805998 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs

4 0.2773217 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

5 0.27383676 53 jmlr-2006-Learning a Hidden Hypergraph

Author: Dana Angluin, Jiang Chen

Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network

6 0.27375561 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

7 0.2721881 44 jmlr-2006-Large Scale Transductive SVMs

8 0.26911694 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

9 0.26672596 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

10 0.26613969 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming     (Special Topic on Machine Learning and Optimization)

11 0.26567417 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms

12 0.26540625 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

13 0.265073 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

14 0.26468289 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

15 0.2645677 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

16 0.26420018 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

17 0.26332793 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting     (Special Topic on Inductive Programming)

18 0.26286611 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

19 0.26235485 66 jmlr-2006-On Model Selection Consistency of Lasso

20 0.26142025 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs     (Special Topic on Machine Learning and Optimization)