jmlr jmlr2008 jmlr2008-43 knowledge-graph by maker-knowledge-mining

43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

Source: pdf

Author: Elena Marchiori

Abstract: In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classiﬁer) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, inﬂuence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classiﬁer by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classiﬁer increases signiﬁcantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no signiﬁcant difference in storage requirements. In general, these results indicate that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: graph-based training set repre

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. [sent-5, score-0.834]

2 We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. [sent-6, score-0.523]

3 HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. [sent-9, score-0.491]

4 Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no signiﬁcant difference in storage requirements. [sent-12, score-0.509]

5 Keywords: graph-based training set representation, nearest neighbor, instance selection for instancebased learning 1. [sent-14, score-0.35]

6 Introduction In supervised learning, a machine learning algorithm is given a training set, consisting of training examples called labeled instances (here called also points). [sent-15, score-0.228]

7 In particular, the 1-NN rule classiﬁes an unknown point into the class of the nearest of the training set points. [sent-18, score-0.283]

8 Moreover, when the training set contains noisy instances, generalization accuracy can be negatively affected if these instances are stored as well (see Wilson and Martinez, 2000). [sent-24, score-0.233]

9 Instance selection algorithms tackle these issues by selecting a subset of the training set in order to reduce storage and possibly also enhance accuracy of the 1-NN rule on new instances (generalization performance). [sent-25, score-0.477]

10 In an HMN, nodes are instances of the considered training set. [sent-27, score-0.206]

11 Edges are deﬁned as follows: for each node x and for each class, there is a directed edge from x to its nearest neighbor among training set instances belonging to that class. [sent-28, score-0.521]

12 Thus HMN represents a ’more speciﬁc’ nearest neighbor relation, namely between points from each pair of classes. [sent-29, score-0.414]

13 We show that structural properties of HMN’s correspond to properties of training instances related to the decision boundary of the 1-NN rule, such as being border or central point. [sent-33, score-0.209]

14 We prove that instance selection by means of this algorithm does not change the 1-NN classiﬁcation of instances in the original training set. [sent-37, score-0.271]

15 Edited Nearest Neighbor (E-NN), designed for noise reduction (Wilson, 1972), and two state-of-the-art editing algorithms: Iterative Case Filtering (ICF) (Brighton and Mellish, 1999) and the best of the Decremental Reduction Optimization algorithms introduced in Wilson and Martinez (1997) (DROP3). [sent-44, score-0.321]

16 1 Related Work Graphs have been successfully used for representing relations between points of a given data set, such as functional interaction between proteins (protein-protein interaction networks) or proximity (nearest neighbor graphs) (Dorogovtsev and Mendes, 2003). [sent-48, score-0.35]

17 Proximity graphs are deﬁned as graphs in which points close to each other by some deﬁnition of closeness are connected (Barnett, 1976). [sent-50, score-0.201]

18 The nearest neighbor graph (NNG) is a typical example of proximity graph, where each vertex is a data point that is joined by an edge to its nearest neighbor. [sent-51, score-0.639]

19 The Voronoi diagram and correspondingly the DT of a point set capture all the proximity information about the point set because they represent the original 1-NN rule decision boundary. [sent-57, score-0.2]

20 Second, HMN’s are directed graphs, while the above proximity graphs are not. [sent-61, score-0.207]

21 A class of directed proximity graphs, called class cover catch digraphs (CCCD’s) has been introduced in Marchette et al. [sent-62, score-0.227]

22 While both HMN’s and CCCD are directed graphs, they describe different relations: HMN’s describe the nearest neighbor relation between points of each pair of classes, while CCCD’s describe the relation between maximal covering balls and target instances of one class. [sent-73, score-0.581]

23 Representations of a data set based on proximity graphs have been used to deﬁne algorithms for reducing the size of the training set (for instance, Bhattacharya, 1982), for removing noisy instances (for instance, S´ nchez et al. [sent-74, score-0.331]

24 , 1997), and for detecting critical instances close to the decision a boundary (for instance, Bhattacharya and Kaller, 1998), in order to improve storage and accuracy of 1-NN. [sent-75, score-0.377]

25 (1984) a so-called Voronoi condensed data set is introduced, obtained by discarding all those points whose Voronoi cell shares a face with those cells that contain points of the same class. [sent-77, score-0.229]

26 999 M ARCHIORI Faster algorithms for instance selection based on the GG and the Reduced Neighborhood graph (Jaromczyk and Toussaint, 1992) have been proposed, for instance in Bhattacharya and Kaller ´ (1998), Bhattacharya et al. [sent-82, score-0.201]

27 For a thorough survey of graph-based methods for nearest neighbor classiﬁcation, the reader is referred to Toussaint (2002). [sent-91, score-0.317]

28 In an HMN of X, a directed edge from point x to y is deﬁned if y is the nearest neighbor of x in the class of y. [sent-105, score-0.357]

29 When the classes of x and y are the same, we call x a hit of y, otherwise a miss of y . [sent-107, score-0.353]

30 The name hit miss network is derived from these terms. [sent-108, score-0.353]

31 A hit of x (respectively, miss of x) is any point y such that e = (y, x) is an edge of G and label(y) = label(x) (respectively, label(y) = label(x) ). [sent-131, score-0.353]

32 We call hit-degree (respectively miss-degree) of x the number of hit (respectively miss) nodes of x. [sent-132, score-0.223]

33 Hit(x) (respectively Miss(x)) denotes the set of hit (respectively miss) nodes of x. [sent-133, score-0.223]

34 Observe that the two points with zero in-degree are relatively isolated and far from points of the opposite class, while points with high miss-degree are closer to points of the opposite class and to the 1-NN decision boundary. [sent-135, score-0.466]

35 For instance, using kd trees, whose construction takes time proportional to n log(n), nearest neighbor search exhibits approximately O(n1/2 ) behavior (Grother et al. [sent-138, score-0.317]

36 A recent fast all nearest neighbor algorithm for applications involving large point-clouds is introduced in Sankaranarayanan et al. [sent-140, score-0.317]

37 HMN’s describe the nearest neighbor relation over pairs of points from each pair of classes of the training set. [sent-143, score-0.478]

38 In particular, in the one nearest neighbor algorithm (1-NN) the class label of a new instance is the one of the stored instance with minimum distance. [sent-172, score-0.483]

39 Instance selection techniques, here also called editing techniques, select a subset of the training set in order to improve the storage and possibly the generalization performance of an instance-based learning algorithm. [sent-190, score-0.507]

40 • Competence preservation algorithms compute a training set consistent subset by removing irrelevant points that do not affect the classiﬁcation accuracy of the training set (see for instance Angiulli, 2007; Dasarathy, 1994). [sent-194, score-0.36]

41 Alternative methods use prototypes instead of instances of the training set (see for instance Pekalska et al. [sent-198, score-0.23]

42 1003 M ARCHIORI In particular, in Brighton and Mellish (2002) the authors compare experimentally Edited Nearest Neighbor (E-NN) and the state-of-the-art editing algorithms Iterative Case Filtering (ICF) and Decremental Reduction Optimization Procedure 3 (DROP3). [sent-202, score-0.239]

43 E-NN is an algorithm generally considered in comparative experimental analysis of editing methods mainly because it provides useful information on the amount of ’noisy’ instances contained in the considered data sets, and on the improvement of accuracy obtained by their removal. [sent-203, score-0.408]

44 These algorithms, together with E-NN, are described in more detail below and used to assess comparatively the performance of the HMN-based editing algorithms we propose. [sent-207, score-0.239]

45 1 Edited Nearest Neighbor Wilson (1972) introduced the Edited Nearest Neighbor (E-NN), where each point x is removed from X if it does not agree with the majority of its K nearest neighbors. [sent-209, score-0.246]

46 This editing rule removes noisy points as well as points close to the decision boundary, yielding to smoother decision boundaries. [sent-210, score-0.632]

47 2 Iterative Case Filtering In Brighton and Mellish (1999) the Iterative Case Filtering algorithm (ICF) was proposed, which ﬁrst applies E-NN iteratively until it cannot remove any point, and next iteratively removes other points as follows. [sent-212, score-0.202]

48 The reachability of a point x consists of the points inside the largest hyper-sphere containing only points of the same class as x. [sent-214, score-0.226]

49 DROP1 is the basic removal rule, which removes a point x from X if the accuracy of the K-NN rule on the set of its associates does not decrease. [sent-219, score-0.283]

50 Each point has a list of K nearest neighbors and a list of associates, which are updated each time a point is removed from X. [sent-220, score-0.284]

51 A point y is an associate of x if x belongs to the set of K nearest neighbors of y. [sent-221, score-0.217]

52 If x is removed then the list of K nearest neighbors of each of its associates y is updated by adding a new neighbor point z, and y is added to the list of associates of z. [sent-222, score-0.52]

53 Moreover, for each of the K nearest neighbors y of x, x is removed from the list of associates of y. [sent-223, score-0.333]

54 DROP2 is obtained from DROP1 by discarding the last update step, hence it considers all associates in the entire training set when testing accuracy performance in the removal rule. [sent-224, score-0.238]

55 Moreover, the removal rule is applied to the points sorted in decreasing order of distance from their nearest 1004 H IT M ISS N ETWORKS neighbor from the other classes (nearest enemy). [sent-225, score-0.541]

56 In this way, points furthest from their nearest enemy are selected ﬁrst. [sent-226, score-0.301]

57 DROP3 applies a pre-processing step which discards points of X misclassiﬁed by their K nearest neighbors, and then applies DROP2. [sent-227, score-0.309]

58 DROP4 uses a stronger pre-processing step which discards points of X misclassiﬁed by their K nearest neighbors if their removal does not hurt the classiﬁcation of other instances. [sent-228, score-0.403]

59 Finally DROP5 modiﬁes DROP2 by considering the reverse order of selection of points, in such a way that instances are considered for removal beginning with instances that are nearest to their nearest enemy. [sent-229, score-0.655]

60 DROP3 achieves the best mix of storage reduction and generalization accuracy of the DROP methods (see Wilson and Martinez, 2000). [sent-230, score-0.317]

61 Moreover, results of experiments conducted in Wilson and Martinez (1997, 2000) show that DROP3 achieves higher accuracy and smaller storage requirements than several other methods, such as CNN (Hart, 1968), SNN (Ritter et al. [sent-231, score-0.261]

62 Therefore we use DROP3 and ICF as representatives of the state-of-the-art, in order to assess the performance of the HMN-based editing algorithms introduced in the following section. [sent-234, score-0.239]

63 Instance Selection with Hit Miss Networks Zero in-degree nodes of HMN(X) include relatively isolated points, and points not too close to the decision boundary. [sent-236, score-0.217]

64 We call HMN-C (HMN for training set Consistent instance selection) the algorithm that removes from the training set all instances with zero in-degree. [sent-251, score-0.363]

65 Therefore, a more aggressive removal strategy is adopted in the following instance selection heuristic algorithm, called HMN-E (HMN for Editing), which compares the hit- and miss-degree of each node for deciding whether to remove it. [sent-253, score-0.199]

66 We consider this to be a reasonable class storage lower bound for the condensed 1-NN rule. [sent-273, score-0.198]

67 Observe that the three HMN-based editing algorithms are order independent, that is, their output does not depend on the order in which training points are processed. [sent-290, score-0.4]

68 Moreover, by construction, points removed by HMN-C are also removed by HMN-E, and points removed by HMN-E are also removed by HMN-EI. [sent-291, score-0.462]

69 1 Comparison of the Methods on the XOR Problem Figure 5 shows application of the considered editing algorithms to the training set of a XOR classiﬁcation task. [sent-293, score-0.303]

70 As expected, points removed by E-NN are close to the decision boundary. [sent-295, score-0.209]

71 ICF and DROP3 delete also ’safe’ points far from the decision boundary (in order to enhance storage requirements). [sent-296, score-0.305]

72 HMN-C removed points ’locally’ isolated, while HMN-E removes also ’safe’ points as well as points close to the decision boundary. [sent-297, score-0.472]

73 Figure 6 plots the sorted in-degrees of the considered XOR training set, where in-degree of points removed by a method are marked with triangles. [sent-300, score-0.259]

74 The majority of points removed by E-NN have high in-degree, showing the tendency of ’noisy’ points to have high in-degree. [sent-302, score-0.261]

75 HMN-E removes more points with high in-degree than E-NN, and it selects points with low, but not zero, degree. [sent-303, score-0.263]

76 2 0 Figure 5: Effect of the algorithms on a XOR problem training set: removed points are shown with ﬁlled markers. [sent-404, score-0.228]

77 The performance measures here used are (average) test accuracy of the classiﬁer and (average) percentage of the training set removed by the method. [sent-409, score-0.2]

78 For each partition of the data set, each editing algorithm is applied to the training set X from which a subset S is returned. [sent-435, score-0.303]

79 The one nearest neighbor classiﬁer that uses only points of S is applied to the test set. [sent-436, score-0.414]

80 Average and median accuracy and training set reduction percentage for each algorithm over all the 22 data sets is reported near the bottom of the Table. [sent-440, score-0.216]

81 A paired t-test is also applied to assess signiﬁcance of differences in storage reduction percentages for each experiment. [sent-451, score-0.265]

82 • Second, in order to assess whether differences in accuracy and storage reduction on all runs of the entire group of data sets are signiﬁcant, a non-parametric paired test, the Wilcoxon Signed Ranks test1 is applied to compare HMN-EI with each of the other algorithms. [sent-452, score-0.334]

83 For these reasons, HMN-EI is chosen for further comparison with state-of-the-art editing algorithms. [sent-484, score-0.239]

84 • On data sets with more than three classes, HMN-EI has worse storage requirements than the other algorithms, but also generally higher accuracy, due to the more conservative editing strategy (Rule 3) HMN-EI uses on data sets with many classes. [sent-864, score-0.402]

85 Storage reduction of HMN-EI is 7, 22, and 11 times better, and 15, 0, and 8 worse, indicating better storage performance of 1013 M ARCHIORI ICF, according to this test. [sent-869, score-0.219]

86 • The three best performing instance selection algorithms, DROP3, ICF and HMN-EI have quadratic computational complexity in the number of instances (which can be reduced by using ad-hoc data structures such as kd-trees). [sent-874, score-0.207]

87 Comparison with results obtained by E-NN, ICF and DROP3 shows improved average accuracy and similar storage requirement of HMN-EI, ICF and DROP3 on these data sets. [sent-879, score-0.232]

88 Conclusions and Future Work This paper proposed a new graph-based representation of a training set and showed how local structural properties of nodes provide information about the closeness of the corresponding points to the decision boundary of the 1-NN rule. [sent-881, score-0.248]

89 We proved that HMN-C removes instances without affecting the accuracy of the 1-NN rule on the original training set (it computes a decision-boundary consistent subset). [sent-883, score-0.368]

90 Results of extensive experiments indicated that HMN-EI signiﬁcantly improves the generalization accuracy of 1-NN and reduces signiﬁcantly its storage requirements. [sent-885, score-0.232]

91 We compared experimentally HMN-EI with a popular noise reduction algorithm (E-NN), and two state-of-the-art editing algorithms (ICF and DROP3). [sent-886, score-0.321]

92 Results of extensive experiments on 19 reallife data sets and 3 artiﬁcial ones showed that HMN-EI achieved best average accuracy, and storage reduction similar to that of ICF and DROP3. [sent-887, score-0.219]

93 We conducted preliminary experiments to investigate whether using more than one neighbor to classify new points affects the accuracy performance of the condensing algorithms here considered. [sent-890, score-0.354]

94 Fast nearest neighbor condensation for large data sets classiﬁcation. [sent-908, score-0.342]

95 Minimal consistent set (mcs) identiﬁcation for optimal nearest neighbor decision systems design. [sent-967, score-0.362]

96 An algorithm for a selective nearest neighbor decision rule. [sent-1079, score-0.362]

97 Prototype selection for the nearest neighbour rule through a proximity graphs. [sent-1087, score-0.375]

98 A fast all nearest neighbor algorithm for applications involving large point-clouds. [sent-1093, score-0.317]

99 Proximity graphs for nearest neighbor decision rules: recent progress. [sent-1109, score-0.414]

100 Asymptotic properties of nearest neighbor rules using edited data. [sent-1129, score-0.374]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hmn', 0.605), ('icf', 0.289), ('editing', 0.239), ('hit', 0.181), ('nearest', 0.179), ('miss', 0.172), ('storage', 0.163), ('neighbor', 0.138), ('wilson', 0.13), ('archiori', 0.126), ('iss', 0.126), ('brighton', 0.118), ('proximity', 0.115), ('bhattacharya', 0.113), ('mellish', 0.113), ('martinez', 0.107), ('voronoi', 0.101), ('instances', 0.1), ('points', 0.097), ('cccd', 0.088), ('etworks', 0.077), ('xor', 0.073), ('accuracy', 0.069), ('removes', 0.069), ('wilcoxon', 0.067), ('gabriel', 0.067), ('removed', 0.067), ('instance', 0.066), ('training', 0.064), ('edited', 0.057), ('reduction', 0.056), ('removal', 0.056), ('toussaint', 0.053), ('graphs', 0.052), ('condensing', 0.05), ('decremental', 0.05), ('devinney', 0.05), ('jankowski', 0.05), ('marchette', 0.05), ('priebe', 0.05), ('associates', 0.049), ('wl', 0.048), ('paired', 0.046), ('decision', 0.045), ('filtering', 0.044), ('catch', 0.044), ('benchmark', 0.043), ('bupa', 0.043), ('competence', 0.043), ('uspst', 0.043), ('nodes', 0.042), ('selection', 0.041), ('rule', 0.04), ('directed', 0.04), ('ag', 0.039), ('grother', 0.038), ('raetsch', 0.038), ('vezhnevets', 0.038), ('neighbors', 0.038), ('remove', 0.036), ('banana', 0.035), ('condensed', 0.035), ('label', 0.034), ('cance', 0.034), ('discards', 0.033), ('iris', 0.033), ('isolated', 0.033), ('diabetis', 0.032), ('cccp', 0.032), ('reachability', 0.032), ('chapelle', 0.031), ('cantly', 0.031), ('sorted', 0.031), ('splice', 0.031), ('achieves', 0.029), ('pima', 0.029), ('graph', 0.028), ('cover', 0.028), ('repository', 0.028), ('classi', 0.027), ('median', 0.027), ('covering', 0.027), ('thyroid', 0.026), ('titanic', 0.026), ('german', 0.026), ('affecting', 0.026), ('ringnorm', 0.026), ('hart', 0.026), ('noise', 0.026), ('aha', 0.025), ('barinova', 0.025), ('condensation', 0.025), ('dorogovtsev', 0.025), ('elena', 0.025), ('enemy', 0.025), ('enn', 0.025), ('fraser', 0.025), ('grochowski', 0.025), ('jaromczyk', 0.025), ('kaller', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

Author: Elena Marchiori

2 0.068644032 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

Author: Eric Bax, Augusto Callejas

Abstract: This paper introduces a new PAC transductive error bound for classiﬁcation. The method uses information from the training examples and inputs of working examples to develop a set of likely assignments to outputs of the working examples. A likely assignment with maximum error determines the bound. The method is very effective for small data sets. Keywords: error bound, transduction, nearest neighbor, dynamic programming

3 0.052524395 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

4 0.046771079 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting

Author: David Mease, Abraham Wyner

Abstract: The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial ﬂaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these ﬂaws and their practical consequences. Keywords: boosting algorithms, LogitBoost, AdaBoost

5 0.042327393 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

Author: Arthur D. Szlam, Mauro Maggioni, Ronald R. Coifman

Abstract: Harmonic analysis and diffusion on discrete data has been shown to lead to state-of-the-art algorithms for machine learning tasks, especially in the context of semi-supervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modiﬁed geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classiﬁcation. In both settings, our approach improves on standard diffusion based methods. Keywords: diffusion processes, diffusion geometry, spectral graph theory, image denoising, transductive learning, semi-supervised learning

6 0.042231243 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

7 0.04146434 58 jmlr-2008-Max-margin Classification of Data with Absent Features

8 0.037385434 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

9 0.036313485 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

10 0.033400215 54 jmlr-2008-Learning to Select Features using their Properties

11 0.032520313 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces

12 0.031721957 45 jmlr-2008-JNCC2: The Java Implementation Of Naive Credal Classifier 2 (Machine Learning Open Source Software Paper)

13 0.031203635 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

14 0.030594988 96 jmlr-2008-Visualizing Data using t-SNE

15 0.030501092 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

16 0.030433292 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

17 0.03014957 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data

18 0.028959597 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs (Special Topic on Causality)

19 0.028677246 42 jmlr-2008-HPB: A Model for Handling BN Nodes with High Cardinality Parents

20 0.028139465 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.149), (1, 0.029), (2, 0.097), (3, -0.015), (4, -0.073), (5, 0.054), (6, 0.055), (7, 0.007), (8, 0.024), (9, -0.007), (10, -0.055), (11, 0.076), (12, -0.035), (13, 0.079), (14, -0.012), (15, 0.103), (16, 0.068), (17, 0.008), (18, -0.024), (19, 0.019), (20, -0.041), (21, 0.024), (22, -0.015), (23, 0.191), (24, -0.122), (25, 0.032), (26, -0.014), (27, 0.015), (28, -0.262), (29, -0.125), (30, 0.02), (31, -0.001), (32, 0.297), (33, -0.202), (34, -0.309), (35, -0.033), (36, -0.128), (37, -0.033), (38, -0.182), (39, 0.098), (40, 0.266), (41, 0.034), (42, 0.011), (43, -0.075), (44, -0.184), (45, 0.075), (46, -0.059), (47, 0.302), (48, 0.141), (49, 0.202)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94603193 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

Author: Elena Marchiori

2 0.36546391 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

Author: Eric Bax, Augusto Callejas

3 0.30085105 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

Author: Arthur D. Szlam, Mauro Maggioni, Ronald R. Coifman

4 0.20825544 33 jmlr-2008-Evidence Contrary to the Statistical View of Boosting

Author: David Mease, Abraham Wyner

5 0.20111337 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

Author: Olivier Chapelle, Vikas Sindhwani, Sathiya S. Keerthi

Abstract: Due to its wide applicability, the problem of semi-supervised classiﬁcation is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3 VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3 VM algorithms is studied together, under a common experimental setting. Keywords: semi-supervised learning, support vector machines, non-convex optimization, transductive learning

6 0.18033165 45 jmlr-2008-JNCC2: The Java Implementation Of Naive Credal Classifier 2 (Machine Learning Open Source Software Paper)

7 0.17298482 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

8 0.17297363 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

9 0.17088322 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

10 0.16746762 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

11 0.16698968 42 jmlr-2008-HPB: A Model for Handling BN Nodes with High Cardinality Parents

12 0.1605317 15 jmlr-2008-An Information Criterion for Variable Selection in Support Vector Machines (Special Topic on Model Selection)

13 0.15649764 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

14 0.15326872 14 jmlr-2008-An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons

15 0.15312518 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

16 0.15035684 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)

17 0.14464177 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

18 0.14446416 96 jmlr-2008-Visualizing Data using t-SNE

19 0.14193428 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

20 0.13915138 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.037), (5, 0.029), (31, 0.013), (40, 0.052), (43, 0.424), (54, 0.032), (58, 0.041), (66, 0.042), (76, 0.054), (88, 0.084), (92, 0.028), (94, 0.053), (99, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69929999 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

Author: Elena Marchiori

2 0.30452472 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

Author: Hsuan-Tien Lin, Ling Li

Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a ﬁnite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classiﬁer with a larger or even an inﬁnite number of hypotheses. In addition, constructing an inﬁnite ensemble itself is a challenging task. In this paper, we formulate an inﬁnite ensemble learning framework based on the support vector machine (SVM). The framework can output an inﬁnite and nonsparse ensemble through embedding inﬁnitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies inﬁnitely many decision stumps, and the perceptron kernel embodies inﬁnitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies inﬁnitely many decision trees, and can thus be explained through inﬁnite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the beneﬁt of faster parameter selection. These properties make the novel kernels favorable choices in practice. Keywords: ensemble learning, boosting, support vector machine, kernel

3 0.30228326 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

Author: Matthias W. Seeger

Abstract: The linear model with sparsity-favouring prior on the coefﬁcients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identiﬁcation from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in signiﬁcant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identiﬁcation application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing

4 0.29964197 58 jmlr-2008-Max-margin Classification of Data with Absent Features

Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller

Abstract: We consider the problem of learning classiﬁers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to ﬁrst complete their unknown values, and then use a standard classiﬁcation procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classiﬁed directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efﬁciently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. Keywords: max margin, missing features, network reconstruction, metabolic pa

5 0.29726541 57 jmlr-2008-Manifold Learning: The Price of Normalization

Author: Yair Goldberg, Alon Zakai, Dan Kushnir, Ya'acov Ritov

Abstract: We analyze the performance of a class of manifold-learning algorithms that ﬁnd their output by minimizing a quadratic form under some normalization constraints. This class consists of Locally Linear Embedding (LLE), Laplacian Eigenmap, Local Tangent Space Alignment (LTSA), Hessian Eigenmaps (HLLE), and Diffusion maps. We present and prove conditions on the manifold that are necessary for the success of the algorithms. Both the ﬁnite sample case and the limit case are analyzed. We show that there are simple manifolds in which the necessary conditions are violated, and hence the algorithms cannot recover the underlying manifolds. Finally, we present numerical results that demonstrate our claims. Keywords: dimensionality reduction, manifold learning, Laplacian eigenmap, diffusion maps, locally linear embedding, local tangent space alignment, Hessian eigenmap

6 0.29591671 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

7 0.29479998 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

8 0.29171002 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

9 0.29081786 83 jmlr-2008-Robust Submodular Observation Selection

10 0.29030797 86 jmlr-2008-SimpleMKL

11 0.2896125 96 jmlr-2008-Visualizing Data using t-SNE

12 0.28930789 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

13 0.28918505 56 jmlr-2008-Magic Moments for Structured Output Prediction

14 0.28885624 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

15 0.28779155 9 jmlr-2008-Active Learning by Spherical Subdivision

16 0.28683347 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)

17 0.28653094 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

18 0.28488117 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

19 0.28229842 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

20 0.28040341 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration