jmlr jmlr2008 jmlr2008-43 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Elena Marchiori
Abstract: In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results indicate that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: graph-based training set repre
Reference: text
sentIndex sentText sentNum sentScore
1 This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. [sent-5, score-0.834]
2 We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. [sent-6, score-0.523]
3 HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. [sent-9, score-0.491]
4 Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. [sent-12, score-0.509]
5 Keywords: graph-based training set representation, nearest neighbor, instance selection for instancebased learning 1. [sent-14, score-0.35]
6 Introduction In supervised learning, a machine learning algorithm is given a training set, consisting of training examples called labeled instances (here called also points). [sent-15, score-0.228]
7 In particular, the 1-NN rule classifies an unknown point into the class of the nearest of the training set points. [sent-18, score-0.283]
8 Moreover, when the training set contains noisy instances, generalization accuracy can be negatively affected if these instances are stored as well (see Wilson and Martinez, 2000). [sent-24, score-0.233]
9 Instance selection algorithms tackle these issues by selecting a subset of the training set in order to reduce storage and possibly also enhance accuracy of the 1-NN rule on new instances (generalization performance). [sent-25, score-0.477]
10 In an HMN, nodes are instances of the considered training set. [sent-27, score-0.206]
11 Edges are defined as follows: for each node x and for each class, there is a directed edge from x to its nearest neighbor among training set instances belonging to that class. [sent-28, score-0.521]
12 Thus HMN represents a ’more specific’ nearest neighbor relation, namely between points from each pair of classes. [sent-29, score-0.414]
13 We show that structural properties of HMN’s correspond to properties of training instances related to the decision boundary of the 1-NN rule, such as being border or central point. [sent-33, score-0.209]
14 We prove that instance selection by means of this algorithm does not change the 1-NN classification of instances in the original training set. [sent-37, score-0.271]
15 Edited Nearest Neighbor (E-NN), designed for noise reduction (Wilson, 1972), and two state-of-the-art editing algorithms: Iterative Case Filtering (ICF) (Brighton and Mellish, 1999) and the best of the Decremental Reduction Optimization algorithms introduced in Wilson and Martinez (1997) (DROP3). [sent-44, score-0.321]
16 1 Related Work Graphs have been successfully used for representing relations between points of a given data set, such as functional interaction between proteins (protein-protein interaction networks) or proximity (nearest neighbor graphs) (Dorogovtsev and Mendes, 2003). [sent-48, score-0.35]
17 Proximity graphs are defined as graphs in which points close to each other by some definition of closeness are connected (Barnett, 1976). [sent-50, score-0.201]
18 The nearest neighbor graph (NNG) is a typical example of proximity graph, where each vertex is a data point that is joined by an edge to its nearest neighbor. [sent-51, score-0.639]
19 The Voronoi diagram and correspondingly the DT of a point set capture all the proximity information about the point set because they represent the original 1-NN rule decision boundary. [sent-57, score-0.2]
20 Second, HMN’s are directed graphs, while the above proximity graphs are not. [sent-61, score-0.207]
21 A class of directed proximity graphs, called class cover catch digraphs (CCCD’s) has been introduced in Marchette et al. [sent-62, score-0.227]
22 While both HMN’s and CCCD are directed graphs, they describe different relations: HMN’s describe the nearest neighbor relation between points of each pair of classes, while CCCD’s describe the relation between maximal covering balls and target instances of one class. [sent-73, score-0.581]
23 Representations of a data set based on proximity graphs have been used to define algorithms for reducing the size of the training set (for instance, Bhattacharya, 1982), for removing noisy instances (for instance, S´ nchez et al. [sent-74, score-0.331]
24 , 1997), and for detecting critical instances close to the decision a boundary (for instance, Bhattacharya and Kaller, 1998), in order to improve storage and accuracy of 1-NN. [sent-75, score-0.377]
25 (1984) a so-called Voronoi condensed data set is introduced, obtained by discarding all those points whose Voronoi cell shares a face with those cells that contain points of the same class. [sent-77, score-0.229]
26 999 M ARCHIORI Faster algorithms for instance selection based on the GG and the Reduced Neighborhood graph (Jaromczyk and Toussaint, 1992) have been proposed, for instance in Bhattacharya and Kaller ´ (1998), Bhattacharya et al. [sent-82, score-0.201]
27 For a thorough survey of graph-based methods for nearest neighbor classification, the reader is referred to Toussaint (2002). [sent-91, score-0.317]
28 In an HMN of X, a directed edge from point x to y is defined if y is the nearest neighbor of x in the class of y. [sent-105, score-0.357]
29 When the classes of x and y are the same, we call x a hit of y, otherwise a miss of y . [sent-107, score-0.353]
30 The name hit miss network is derived from these terms. [sent-108, score-0.353]
31 A hit of x (respectively, miss of x) is any point y such that e = (y, x) is an edge of G and label(y) = label(x) (respectively, label(y) = label(x) ). [sent-131, score-0.353]
32 We call hit-degree (respectively miss-degree) of x the number of hit (respectively miss) nodes of x. [sent-132, score-0.223]
33 Hit(x) (respectively Miss(x)) denotes the set of hit (respectively miss) nodes of x. [sent-133, score-0.223]
34 Observe that the two points with zero in-degree are relatively isolated and far from points of the opposite class, while points with high miss-degree are closer to points of the opposite class and to the 1-NN decision boundary. [sent-135, score-0.466]
35 For instance, using kd trees, whose construction takes time proportional to n log(n), nearest neighbor search exhibits approximately O(n1/2 ) behavior (Grother et al. [sent-138, score-0.317]
36 A recent fast all nearest neighbor algorithm for applications involving large point-clouds is introduced in Sankaranarayanan et al. [sent-140, score-0.317]
37 HMN’s describe the nearest neighbor relation over pairs of points from each pair of classes of the training set. [sent-143, score-0.478]
38 In particular, in the one nearest neighbor algorithm (1-NN) the class label of a new instance is the one of the stored instance with minimum distance. [sent-172, score-0.483]
39 Instance selection techniques, here also called editing techniques, select a subset of the training set in order to improve the storage and possibly the generalization performance of an instance-based learning algorithm. [sent-190, score-0.507]
40 • Competence preservation algorithms compute a training set consistent subset by removing irrelevant points that do not affect the classification accuracy of the training set (see for instance Angiulli, 2007; Dasarathy, 1994). [sent-194, score-0.36]
41 Alternative methods use prototypes instead of instances of the training set (see for instance Pekalska et al. [sent-198, score-0.23]
42 1003 M ARCHIORI In particular, in Brighton and Mellish (2002) the authors compare experimentally Edited Nearest Neighbor (E-NN) and the state-of-the-art editing algorithms Iterative Case Filtering (ICF) and Decremental Reduction Optimization Procedure 3 (DROP3). [sent-202, score-0.239]
43 E-NN is an algorithm generally considered in comparative experimental analysis of editing methods mainly because it provides useful information on the amount of ’noisy’ instances contained in the considered data sets, and on the improvement of accuracy obtained by their removal. [sent-203, score-0.408]
44 These algorithms, together with E-NN, are described in more detail below and used to assess comparatively the performance of the HMN-based editing algorithms we propose. [sent-207, score-0.239]
45 1 Edited Nearest Neighbor Wilson (1972) introduced the Edited Nearest Neighbor (E-NN), where each point x is removed from X if it does not agree with the majority of its K nearest neighbors. [sent-209, score-0.246]
46 This editing rule removes noisy points as well as points close to the decision boundary, yielding to smoother decision boundaries. [sent-210, score-0.632]
47 2 Iterative Case Filtering In Brighton and Mellish (1999) the Iterative Case Filtering algorithm (ICF) was proposed, which first applies E-NN iteratively until it cannot remove any point, and next iteratively removes other points as follows. [sent-212, score-0.202]
48 The reachability of a point x consists of the points inside the largest hyper-sphere containing only points of the same class as x. [sent-214, score-0.226]
49 DROP1 is the basic removal rule, which removes a point x from X if the accuracy of the K-NN rule on the set of its associates does not decrease. [sent-219, score-0.283]
50 Each point has a list of K nearest neighbors and a list of associates, which are updated each time a point is removed from X. [sent-220, score-0.284]
51 A point y is an associate of x if x belongs to the set of K nearest neighbors of y. [sent-221, score-0.217]
52 If x is removed then the list of K nearest neighbors of each of its associates y is updated by adding a new neighbor point z, and y is added to the list of associates of z. [sent-222, score-0.52]
53 Moreover, for each of the K nearest neighbors y of x, x is removed from the list of associates of y. [sent-223, score-0.333]
54 DROP2 is obtained from DROP1 by discarding the last update step, hence it considers all associates in the entire training set when testing accuracy performance in the removal rule. [sent-224, score-0.238]
55 Moreover, the removal rule is applied to the points sorted in decreasing order of distance from their nearest 1004 H IT M ISS N ETWORKS neighbor from the other classes (nearest enemy). [sent-225, score-0.541]
56 In this way, points furthest from their nearest enemy are selected first. [sent-226, score-0.301]
57 DROP3 applies a pre-processing step which discards points of X misclassified by their K nearest neighbors, and then applies DROP2. [sent-227, score-0.309]
58 DROP4 uses a stronger pre-processing step which discards points of X misclassified by their K nearest neighbors if their removal does not hurt the classification of other instances. [sent-228, score-0.403]
59 Finally DROP5 modifies DROP2 by considering the reverse order of selection of points, in such a way that instances are considered for removal beginning with instances that are nearest to their nearest enemy. [sent-229, score-0.655]
60 DROP3 achieves the best mix of storage reduction and generalization accuracy of the DROP methods (see Wilson and Martinez, 2000). [sent-230, score-0.317]
61 Moreover, results of experiments conducted in Wilson and Martinez (1997, 2000) show that DROP3 achieves higher accuracy and smaller storage requirements than several other methods, such as CNN (Hart, 1968), SNN (Ritter et al. [sent-231, score-0.261]
62 Therefore we use DROP3 and ICF as representatives of the state-of-the-art, in order to assess the performance of the HMN-based editing algorithms introduced in the following section. [sent-234, score-0.239]
63 Instance Selection with Hit Miss Networks Zero in-degree nodes of HMN(X) include relatively isolated points, and points not too close to the decision boundary. [sent-236, score-0.217]
64 We call HMN-C (HMN for training set Consistent instance selection) the algorithm that removes from the training set all instances with zero in-degree. [sent-251, score-0.363]
65 Therefore, a more aggressive removal strategy is adopted in the following instance selection heuristic algorithm, called HMN-E (HMN for Editing), which compares the hit- and miss-degree of each node for deciding whether to remove it. [sent-253, score-0.199]
66 We consider this to be a reasonable class storage lower bound for the condensed 1-NN rule. [sent-273, score-0.198]
67 Observe that the three HMN-based editing algorithms are order independent, that is, their output does not depend on the order in which training points are processed. [sent-290, score-0.4]
68 Moreover, by construction, points removed by HMN-C are also removed by HMN-E, and points removed by HMN-E are also removed by HMN-EI. [sent-291, score-0.462]
69 1 Comparison of the Methods on the XOR Problem Figure 5 shows application of the considered editing algorithms to the training set of a XOR classification task. [sent-293, score-0.303]
70 As expected, points removed by E-NN are close to the decision boundary. [sent-295, score-0.209]
71 ICF and DROP3 delete also ’safe’ points far from the decision boundary (in order to enhance storage requirements). [sent-296, score-0.305]
72 HMN-C removed points ’locally’ isolated, while HMN-E removes also ’safe’ points as well as points close to the decision boundary. [sent-297, score-0.472]
73 Figure 6 plots the sorted in-degrees of the considered XOR training set, where in-degree of points removed by a method are marked with triangles. [sent-300, score-0.259]
74 The majority of points removed by E-NN have high in-degree, showing the tendency of ’noisy’ points to have high in-degree. [sent-302, score-0.261]
75 HMN-E removes more points with high in-degree than E-NN, and it selects points with low, but not zero, degree. [sent-303, score-0.263]
76 2 0 Figure 5: Effect of the algorithms on a XOR problem training set: removed points are shown with filled markers. [sent-404, score-0.228]
77 The performance measures here used are (average) test accuracy of the classifier and (average) percentage of the training set removed by the method. [sent-409, score-0.2]
78 For each partition of the data set, each editing algorithm is applied to the training set X from which a subset S is returned. [sent-435, score-0.303]
79 The one nearest neighbor classifier that uses only points of S is applied to the test set. [sent-436, score-0.414]
80 Average and median accuracy and training set reduction percentage for each algorithm over all the 22 data sets is reported near the bottom of the Table. [sent-440, score-0.216]
81 A paired t-test is also applied to assess significance of differences in storage reduction percentages for each experiment. [sent-451, score-0.265]
82 • Second, in order to assess whether differences in accuracy and storage reduction on all runs of the entire group of data sets are significant, a non-parametric paired test, the Wilcoxon Signed Ranks test1 is applied to compare HMN-EI with each of the other algorithms. [sent-452, score-0.334]
83 For these reasons, HMN-EI is chosen for further comparison with state-of-the-art editing algorithms. [sent-484, score-0.239]
84 • On data sets with more than three classes, HMN-EI has worse storage requirements than the other algorithms, but also generally higher accuracy, due to the more conservative editing strategy (Rule 3) HMN-EI uses on data sets with many classes. [sent-864, score-0.402]
85 Storage reduction of HMN-EI is 7, 22, and 11 times better, and 15, 0, and 8 worse, indicating better storage performance of 1013 M ARCHIORI ICF, according to this test. [sent-869, score-0.219]
86 • The three best performing instance selection algorithms, DROP3, ICF and HMN-EI have quadratic computational complexity in the number of instances (which can be reduced by using ad-hoc data structures such as kd-trees). [sent-874, score-0.207]
87 Comparison with results obtained by E-NN, ICF and DROP3 shows improved average accuracy and similar storage requirement of HMN-EI, ICF and DROP3 on these data sets. [sent-879, score-0.232]
88 Conclusions and Future Work This paper proposed a new graph-based representation of a training set and showed how local structural properties of nodes provide information about the closeness of the corresponding points to the decision boundary of the 1-NN rule. [sent-881, score-0.248]
89 We proved that HMN-C removes instances without affecting the accuracy of the 1-NN rule on the original training set (it computes a decision-boundary consistent subset). [sent-883, score-0.368]
90 Results of extensive experiments indicated that HMN-EI significantly improves the generalization accuracy of 1-NN and reduces significantly its storage requirements. [sent-885, score-0.232]
91 We compared experimentally HMN-EI with a popular noise reduction algorithm (E-NN), and two state-of-the-art editing algorithms (ICF and DROP3). [sent-886, score-0.321]
92 Results of extensive experiments on 19 reallife data sets and 3 artificial ones showed that HMN-EI achieved best average accuracy, and storage reduction similar to that of ICF and DROP3. [sent-887, score-0.219]
93 We conducted preliminary experiments to investigate whether using more than one neighbor to classify new points affects the accuracy performance of the condensing algorithms here considered. [sent-890, score-0.354]
94 Fast nearest neighbor condensation for large data sets classification. [sent-908, score-0.342]
95 Minimal consistent set (mcs) identification for optimal nearest neighbor decision systems design. [sent-967, score-0.362]
96 An algorithm for a selective nearest neighbor decision rule. [sent-1079, score-0.362]
97 Prototype selection for the nearest neighbour rule through a proximity graphs. [sent-1087, score-0.375]
98 A fast all nearest neighbor algorithm for applications involving large point-clouds. [sent-1093, score-0.317]
99 Proximity graphs for nearest neighbor decision rules: recent progress. [sent-1109, score-0.414]
100 Asymptotic properties of nearest neighbor rules using edited data. [sent-1129, score-0.374]
wordName wordTfidf (topN-words)
[('hmn', 0.605), ('icf', 0.289), ('editing', 0.239), ('hit', 0.181), ('nearest', 0.179), ('miss', 0.172), ('storage', 0.163), ('neighbor', 0.138), ('wilson', 0.13), ('archiori', 0.126), ('iss', 0.126), ('brighton', 0.118), ('proximity', 0.115), ('bhattacharya', 0.113), ('mellish', 0.113), ('martinez', 0.107), ('voronoi', 0.101), ('instances', 0.1), ('points', 0.097), ('cccd', 0.088), ('etworks', 0.077), ('xor', 0.073), ('accuracy', 0.069), ('removes', 0.069), ('wilcoxon', 0.067), ('gabriel', 0.067), ('removed', 0.067), ('instance', 0.066), ('training', 0.064), ('edited', 0.057), ('reduction', 0.056), ('removal', 0.056), ('toussaint', 0.053), ('graphs', 0.052), ('condensing', 0.05), ('decremental', 0.05), ('devinney', 0.05), ('jankowski', 0.05), ('marchette', 0.05), ('priebe', 0.05), ('associates', 0.049), ('wl', 0.048), ('paired', 0.046), ('decision', 0.045), ('filtering', 0.044), ('catch', 0.044), ('benchmark', 0.043), ('bupa', 0.043), ('competence', 0.043), ('uspst', 0.043), ('nodes', 0.042), ('selection', 0.041), ('rule', 0.04), ('directed', 0.04), ('ag', 0.039), ('grother', 0.038), ('raetsch', 0.038), ('vezhnevets', 0.038), ('neighbors', 0.038), ('remove', 0.036), ('banana', 0.035), ('condensed', 0.035), ('label', 0.034), ('cance', 0.034), ('discards', 0.033), ('iris', 0.033), ('isolated', 0.033), ('diabetis', 0.032), ('cccp', 0.032), ('reachability', 0.032), ('chapelle', 0.031), ('cantly', 0.031), ('sorted', 0.031), ('splice', 0.031), ('achieves', 0.029), ('pima', 0.029), ('graph', 0.028), ('cover', 0.028), ('repository', 0.028), ('classi', 0.027), ('median', 0.027), ('covering', 0.027), ('thyroid', 0.026), ('titanic', 0.026), ('german', 0.026), ('affecting', 0.026), ('ringnorm', 0.026), ('hart', 0.026), ('noise', 0.026), ('aha', 0.025), ('barinova', 0.025), ('condensation', 0.025), ('dorogovtsev', 0.025), ('elena', 0.025), ('enemy', 0.025), ('enn', 0.025), ('fraser', 0.025), ('grochowski', 0.025), ('jaromczyk', 0.025), ('kaller', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
Author: Elena Marchiori
Abstract: In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results indicate that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: graph-based training set repre
2 0.068644032 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
Author: Eric Bax, Augusto Callejas
Abstract: This paper introduces a new PAC transductive error bound for classification. The method uses information from the training examples and inputs of working examples to develop a set of likely assignments to outputs of the working examples. A likely assignment with maximum error determines the bound. The method is very effective for small data sets. Keywords: error bound, transduction, nearest neighbor, dynamic programming
3 0.052524395 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
Author: Zongming Ma, Xianchao Xie, Zhi Geng
Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning
4 0.046771079 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting
Author: David Mease, Abraham Wyner
Abstract: The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences. Keywords: boosting algorithms, LogitBoost, AdaBoost
5 0.042327393 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
Author: Arthur D. Szlam, Mauro Maggioni, Ronald R. Coifman
Abstract: Harmonic analysis and diffusion on discrete data has been shown to lead to state-of-the-art algorithms for machine learning tasks, especially in the context of semi-supervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modified geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classification. In both settings, our approach improves on standard diffusion based methods. Keywords: diffusion processes, diffusion geometry, spectral graph theory, image denoising, transductive learning, semi-supervised learning
6 0.042231243 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
7 0.04146434 58 jmlr-2008-Max-margin Classification of Data with Absent Features
9 0.036313485 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines
10 0.033400215 54 jmlr-2008-Learning to Select Features using their Properties
11 0.032520313 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
12 0.031721957 45 jmlr-2008-JNCC2: The Java Implementation Of Naive Credal Classifier 2 (Machine Learning Open Source Software Paper)
13 0.031203635 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
14 0.030594988 96 jmlr-2008-Visualizing Data using t-SNE
15 0.030501092 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
16 0.030433292 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
17 0.03014957 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data
18 0.028959597 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs (Special Topic on Causality)
19 0.028677246 42 jmlr-2008-HPB: A Model for Handling BN Nodes with High Cardinality Parents
20 0.028139465 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
topicId topicWeight
[(0, 0.149), (1, 0.029), (2, 0.097), (3, -0.015), (4, -0.073), (5, 0.054), (6, 0.055), (7, 0.007), (8, 0.024), (9, -0.007), (10, -0.055), (11, 0.076), (12, -0.035), (13, 0.079), (14, -0.012), (15, 0.103), (16, 0.068), (17, 0.008), (18, -0.024), (19, 0.019), (20, -0.041), (21, 0.024), (22, -0.015), (23, 0.191), (24, -0.122), (25, 0.032), (26, -0.014), (27, 0.015), (28, -0.262), (29, -0.125), (30, 0.02), (31, -0.001), (32, 0.297), (33, -0.202), (34, -0.309), (35, -0.033), (36, -0.128), (37, -0.033), (38, -0.182), (39, 0.098), (40, 0.266), (41, 0.034), (42, 0.011), (43, -0.075), (44, -0.184), (45, 0.075), (46, -0.059), (47, 0.302), (48, 0.141), (49, 0.202)]
simIndex simValue paperId paperTitle
same-paper 1 0.94603193 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
Author: Elena Marchiori
Abstract: In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results indicate that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: graph-based training set repre
2 0.36546391 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
Author: Eric Bax, Augusto Callejas
Abstract: This paper introduces a new PAC transductive error bound for classification. The method uses information from the training examples and inputs of working examples to develop a set of likely assignments to outputs of the working examples. A likely assignment with maximum error determines the bound. The method is very effective for small data sets. Keywords: error bound, transduction, nearest neighbor, dynamic programming
3 0.30085105 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
Author: Arthur D. Szlam, Mauro Maggioni, Ronald R. Coifman
Abstract: Harmonic analysis and diffusion on discrete data has been shown to lead to state-of-the-art algorithms for machine learning tasks, especially in the context of semi-supervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modified geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classification. In both settings, our approach improves on standard diffusion based methods. Keywords: diffusion processes, diffusion geometry, spectral graph theory, image denoising, transductive learning, semi-supervised learning
4 0.20825544 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting
Author: David Mease, Abraham Wyner
Abstract: The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences. Keywords: boosting algorithms, LogitBoost, AdaBoost
5 0.20111337 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, Sathiya S. Keerthi
Abstract: Due to its wide applicability, the problem of semi-supervised classification is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3 VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3 VM algorithms is studied together, under a common experimental setting. Keywords: semi-supervised learning, support vector machines, non-convex optimization, transductive learning
8 0.17297363 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
9 0.17088322 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers
10 0.16746762 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
11 0.16698968 42 jmlr-2008-HPB: A Model for Handling BN Nodes with High Cardinality Parents
13 0.15649764 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
14 0.15326872 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons
15 0.15312518 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
16 0.15035684 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)
17 0.14464177 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)
18 0.14446416 96 jmlr-2008-Visualizing Data using t-SNE
19 0.14193428 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
20 0.13915138 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
topicId topicWeight
[(0, 0.037), (5, 0.029), (31, 0.013), (40, 0.052), (43, 0.424), (54, 0.032), (58, 0.041), (66, 0.042), (76, 0.054), (88, 0.084), (92, 0.028), (94, 0.053), (99, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.69929999 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
Author: Elena Marchiori
Abstract: In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results indicate that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: graph-based training set repre
2 0.30452472 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
Author: Hsuan-Tien Lin, Ling Li
Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on the support vector machine (SVM). The framework can output an infinite and nonsparse ensemble through embedding infinitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies infinitely many decision stumps, and the perceptron kernel embodies infinitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies infinitely many decision trees, and can thus be explained through infinite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the benefit of faster parameter selection. These properties make the novel kernels favorable choices in practice. Keywords: ensemble learning, boosting, support vector machine, kernel
3 0.30228326 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
4 0.29964197 58 jmlr-2008-Max-margin Classification of Data with Absent Features
Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller
Abstract: We consider the problem of learning classifiers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to first complete their unknown values, and then use a standard classification procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efficiently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. Keywords: max margin, missing features, network reconstruction, metabolic pa
5 0.29726541 57 jmlr-2008-Manifold Learning: The Price of Normalization
Author: Yair Goldberg, Alon Zakai, Dan Kushnir, Ya'acov Ritov
Abstract: We analyze the performance of a class of manifold-learning algorithms that find their output by minimizing a quadratic form under some normalization constraints. This class consists of Locally Linear Embedding (LLE), Laplacian Eigenmap, Local Tangent Space Alignment (LTSA), Hessian Eigenmaps (HLLE), and Diffusion maps. We present and prove conditions on the manifold that are necessary for the success of the algorithms. Both the finite sample case and the limit case are analyzed. We show that there are simple manifolds in which the necessary conditions are violated, and hence the algorithms cannot recover the underlying manifolds. Finally, we present numerical results that demonstrate our claims. Keywords: dimensionality reduction, manifold learning, Laplacian eigenmap, diffusion maps, locally linear embedding, local tangent space alignment, Hessian eigenmap
6 0.29591671 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
8 0.29171002 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
9 0.29081786 83 jmlr-2008-Robust Submodular Observation Selection
10 0.29030797 86 jmlr-2008-SimpleMKL
11 0.2896125 96 jmlr-2008-Visualizing Data using t-SNE
12 0.28930789 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
13 0.28918505 56 jmlr-2008-Magic Moments for Structured Output Prediction
14 0.28885624 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
15 0.28779155 9 jmlr-2008-Active Learning by Spherical Subdivision
16 0.28683347 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)
17 0.28653094 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
18 0.28488117 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
19 0.28229842 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines
20 0.28040341 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration