jmlr jmlr2006 jmlr2006-85 knowledge-graph by maker-knowledge-mining

85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

Source: pdf

Author: Janez Demšar

Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classiﬁers: the Wilcoxon signed ranks test for comparison of two classiﬁers and the Friedman test with the corresponding post-hoc tests for comparison of more classiﬁers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams. Keywords: comparative studies, statistical methods, Wilcoxon signed ranks test, Friedman test, multiple comparisons tests

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Keywords: comparative studies, statistical methods, Wilcoxon signed ranks test, Friedman test, multiple comparisons tests 1. [sent-8, score-0.698]

2 Their message has been taken by the community, and the overly conﬁdent paired t-tests over cross validation folds are giving place to the McNemar test and 5×2 cross validation. [sent-17, score-0.31]

3 We observed that many otherwise excellent and innovative machine learning papers end by drawing conclusions from a matrix of, for instance, McNemar’s tests comparing all pairs of classiﬁers, as if the tests for multiple comparisons, such as ANOVA and Friedman test are yet to be invented. [sent-21, score-0.911]

4 The core of the paper is the study of the statistical tests that could be (or already are) used for comparing two or more classiﬁers on multiple data sets. [sent-22, score-0.52]

5 Although some of the tests are quite common in machine learning literature, many researchers seem ignorant about what the tests actually measure and which circumstances they are suitable for. [sent-29, score-0.622]

6 The tests used have however long been rather naive and unveriﬁed. [sent-34, score-0.371]

7 He examines ﬁve statistical tests and concludes the analysis by recommending the newly crafted 5×2cv t-test that overcomes the problem of underestimated variance and the consequently elevated Type I error of the more traditional paired t-test over folds of the usual k-fold cross validation. [sent-39, score-0.568]

8 As Salzberg himself notes, binomial testing lacks the power of the better non-parametric tests and the Bonferroni correction is overly radical. [sent-48, score-0.477]

9 Finally, for comparison of classiﬁers over multiple data sets, Hull (1994) was, to the best of our knowledge, the ﬁrst who used non-parametric tests for comparing classiﬁers in information retrieval and assessment of relevance of documents (see also Sch¨ tze et al. [sent-52, score-0.519]

10 There is a fundamental difference between the tests used to assess the difference between two classiﬁers on a single data set and the differences over multiple data sets. [sent-92, score-0.573]

11 Since these samples are usually related, a lot of care is needed in designing the statistical procedures and tests that avoid problems with biased estimations of variance. [sent-94, score-0.397]

12 Furthermore, the problem of correct statistical tests for comparing classiﬁers on a single data set is not related to the comparison on multiple data sets in the sense that we would ﬁrst have to solve the former problem in order to tackle the latter. [sent-98, score-0.599]

13 Since running the algorithms on multiple data sets naturally gives a sample of independent measurements, such comparisons are even simpler than comparisons on a single data set. [sent-99, score-0.371]

14 1 Comparisons of Two Classiﬁers In the discussion of the tests for comparisons of two classiﬁers over multiple data sets we will make two points. [sent-103, score-0.526]

15 Another, even more rarely used test is the sign test which is weaker than the Wilcoxon test but also has its distinct merits. [sent-106, score-0.288]

16 2 PAIRED T-T EST A common way to test whether the difference between two classiﬁers’ results over various data sets is non-random is to compute a paired t-test, which checks whether the average difference in their performance over the data sets is signiﬁcantly different from zero. [sent-121, score-0.283]

17 Ironically, the Kolmogorov-Smirnov and similar tests for testing the normality of distributions have little power on small samples, that is, they are unlikely to detect abnormalities and warn against using the t-test. [sent-136, score-0.466]

18 6 S TATISTICAL C OMPARISONS OF C LASSIFIERS OVER M ULTIPLE DATA S ETS adult (sample) breast cancer breast cancer wisconsin cmc ionosphere iris liver disorders lung cancer lymphography mushroom primary tumor rheum voting wine C4. [sent-138, score-0.483]

19 The differences are ranked according to their absolute values; average ranks are assigned in case of ties. [sent-195, score-0.295]

20 Let R+ be the sum of ranks for the data sets on which the second algorithm outperformed the ﬁrst, and R− the sum of ranks for the opposite. [sent-196, score-0.424]

21 The ranks are assigned from the lowest to the highest absolute difference, and the equal differences (0. [sent-214, score-0.295]

22 The sum of ranks for the positive differences is R+ = 3. [sent-217, score-0.295]

23 5 = 93 and the sum of ranks for the negative differences equals R− = 7 + 3. [sent-219, score-0.295]

24 The Wilcoxon signed ranks test is more sensible than the t-test. [sent-225, score-0.284]

25 The Wilcoxon test assumes continuous differences di , therefore they should not be rounded to, say, one or two decimals since this would decrease the power of the test due to a high number of ties. [sent-229, score-0.395]

26 When the assumptions of the paired t-test are met, the Wilcoxon signed-ranks test is less powerful than the paired t-test. [sent-230, score-0.344]

27 This test does not assume any commensurability of scores or differences nor does it assume normal distributions and is thus applicable to any data (as long as the observations, i. [sent-253, score-0.321]

28 Some authors prefer to count only the signiﬁcant wins and losses, where the signiﬁcance is determined using a statistical test on each data set, for instance Dietterich’s 5×2cv. [sent-258, score-0.281]

29 This would be a valid argument if statistical tests could distinguish between the random and non-random differences. [sent-260, score-0.343]

30 However, statistical tests only measure the improbability of the obtained experimental result if the null hypothesis was correct, which is not even the (im)probability of the null-hypothesis. [sent-261, score-0.393]

31 Contrary to the popular belief, counting only signiﬁcant wins and losses therefore does not make the tests more but rather less reliable, since it draws an arbitrary threshold of p < 0. [sent-265, score-0.38]

32 2 Comparisons of Multiple Classiﬁers None of the above tests was designed for reasoning about the means of multiple random variables. [sent-268, score-0.37]

33 When so many tests are made, a certain proportion of the null hypotheses is rejected due to random chance, so listing them makes little sense. [sent-271, score-0.514]

34 Statistics offers more powerful specialized procedures for testing the signiﬁcance of differences between multiple means. [sent-275, score-0.3]

35 If the between-classiﬁers variability is significantly larger than the error variability, we can reject the null-hypothesis and conclude that there are some differences between the classiﬁers. [sent-284, score-0.282]

36 To make pairwise comparisons between the classiﬁers, the corresponding differences in performances are divided by the standard error and compared with the critical value. [sent-288, score-0.392]

37 The two procedures are thus similar to a t-test, except that the critical values tabulated by Tukey and Dunnett are higher to ensure that there is at most 5 % chance that one of the pairwise differences will be erroneously found signiﬁcant. [sent-289, score-0.306]

38 We will not describe ANOVA and its post-hoc tests in more details due to our reservations about the parametric tests and, especially, since these tests are well known and described in statistical literature (Zar, 1998; Sheskin, 2000). [sent-298, score-1.017]

39 It ranks the algorithms for each data set separately, the best performing algorithm getting the rank of 1, the second best rank 2. [sent-311, score-0.316]

40 The Friedman test j 1 compares the average ranks of algorithms, R j = N ∑i ri . [sent-317, score-0.284]

41 The performance of two classiﬁers is signiﬁcantly different if the corresponding average ranks differ by at least the critical difference k(k + 1) 6N CD = qα 11 ˇ D EM S AR #classiﬁers q0. [sent-330, score-0.298]

42 Table 5: Critical values for post-hoc tests after the Friedman test √ where critical values qα are based on the Studentized range statistic divided by 2 (Table 5(a)). [sent-371, score-0.517]

43 When all classiﬁers are compared with a control classiﬁer, we can instead of the Nemenyi test use one of the general procedures for controlling the family-wise error in multiple hypothesis testing, such as the Bonferroni correction or similar procedures. [sent-372, score-0.295]

44 Although these methods are generally conservative and can have little power, they are in this speciﬁc case more powerful than the Nemenyi test, since the latter adjusts the critical value for making k(k − 1)/2 comparisons while when comparing with a control we only make k − 1 comparisons. [sent-373, score-0.318]

45 The tests differ in the way they adjust the value of α to compensate for multiple comparisons. [sent-376, score-0.37]

46 Although we here use these procedures only as post-hoc tests for the Friedman test, they can be used generally for controlling the family-wise error when multiple hypotheses of possibly various types are tested. [sent-406, score-0.486]

47 The Friedman test checks whether the measured average ranks are signiﬁcantly different from the mean rank R j = 2. [sent-434, score-0.324]

48 13 ˇ D EM S AR adult (sample) breast cancer breast cancer wisconsin cmc ionosphere iris liver disorders lung cancer lymphography mushroom primary tumor rheum voting wine average rank C4. [sent-446, score-0.523]

49 The ranks in the parentheses are used in computation of the Friedman test and would usually not be published in an actual paper. [sent-523, score-0.284]

50 For the other tests we have to compute and order the corresponding statistics and p values. [sent-574, score-0.311]

51 4 G RAPHICAL P RESENTATION OF R ESULTS When multiple classiﬁers are compared, the results of the post-hoc tests can be visually represented with a simple diagram. [sent-620, score-0.37]

52 Similar graphs for the other post-hoc tests would need to plot a different adjusted critical interval for each classiﬁer and specify the procedure used for testing and the corresponding order of comparisons, which could easily become confusing. [sent-628, score-0.471]

53 Figure 1: Visualization of post-hoc tests for data from Table 6. [sent-1521, score-0.359]

54 1 Experimental Setup We examined the behaviour of the studied tests through the experiments in which we repeatedly compared the learning algorithms on sets of ten randomly drawn data sets and recorded the p values returned by the tests. [sent-2130, score-0.359]

55 Since the statistical tests which we use are theoretically well understood, we do not need to test the tests but the compliance of the real-world data to their assumptions. [sent-2158, score-0.798]

56 As already mentioned in the description of the t-test, the tests like the Kolmogorov-Smirnov test of normality are unreliable on small samples where they are very unlikely to detect abnormalities. [sent-2162, score-0.46]

57 And even if we did have suitable tests at our disposal, they would only compute the degree of (ab)normality of the distribution, non-homogeneity of variance etc, and not the sample’s suitability for t-test. [sent-2163, score-0.366]

58 2 M EASURES OF P OWER AND R EPLICABILITY Formally, the power of a statistical test is deﬁned as the probability that the test will (correctly) reject the false null-hypothesis. [sent-2169, score-0.397]

59 The two measures for assessing the power of the tests lead to two related measures of replicability. [sent-2176, score-0.363]

60 The disadvantage of this measure is that a statistical test will show a low replicability when the difference between the classiﬁers is marginally signiﬁcant. [sent-2183, score-0.35]

61 When comparing two tests of different power, the one with results closer to the chosen α will usually be deemed as less reliable. [sent-2184, score-0.381]

62 4 To allow for comparisons with Bouckaert’s R(e), we deﬁne the replicability with respect to the variance of p as R(p) = 1 − 2 · var(p) = 1 − 2 ∑i (pi − p)2 . [sent-2188, score-0.385]

63 n−1 A problem with this measure of replicability when used in our experimental procedure is that when the bias k increases, the variability of the data set selection decreases and so does the variance of p. [sent-2189, score-0.379]

64 2 Comparisons of Two Classiﬁers We have tested four statistics for comparisons of two classiﬁers: the paired t-test on absolute and on relative differences, the Wilcoxon test and the sign test. [sent-2198, score-0.295]

65 The graphs on the left hand side of Figure 3 show the average p values returned by the tests as a function of the bias k when comparing C4. [sent-2200, score-0.381]

66 To demonstrate the relation between power (as we measure it) and Bouckaert’s measure of replicability we have added the right axis that shows R(e) corresponding to the number of rejected hypothesis. [sent-2207, score-0.47]

67 Therefore, at k = 0 the tests reﬂect the number of rejections of the null-hypothesis on a completely random selection of data sets from our collection. [sent-2210, score-0.417]

68 The numbers below the diagonal show the average p values and the related replicability R(p), and the numbers above the diagonal represent the number of experiments in which the null-hypothesis was rejected at α = 5% and the related R(e). [sent-2221, score-0.363]

69 The table again shows that the Wilcoxon test almost always returns lower p values than other tests and more often rejects the null hypothesis. [sent-2222, score-0.454]

70 R(e), on the other hand, again prefers other tests with p values farther from the critical 0. [sent-2224, score-0.421]

71 Overall, it is known that parametric tests are more likely to reject the null-hypothesis than the non-parametric unless their assumptions are violated. [sent-2226, score-0.52]

72 kNN Figure 3: Power of statistical tests for comparison of two classiﬁers. [sent-5936, score-0.374]

73 kNN Figure 4: Replicability of tests for comparison of two classiﬁers: variance-based R(p) (left) and Bouckaert’s R(e) (right). [sent-10070, score-0.342]

74 00 (b) Paired t-test on relative differences c45 c45 c45-m c45-cf tree bayes disc-bayes knn . [sent-10197, score-0.326]

75 00 (c) Wilcoxon signed-ranks test c45 c45 c45-m c45-cf tree bayes disc-bayes knn . [sent-10260, score-0.315]

76 Altogether, replicability seems somewhat smaller than the replicability of the tests for comparisons of two classiﬁers. [sent-11624, score-0.863]

77 Therefore, as common sense would suggest, when comparing multiple classiﬁers, it is even more important to conduct the tests on as many data sets as possible. [sent-11625, score-0.488]

78 Figure 7 compares post hoc tests for comparisons with a control classiﬁer, using the same two ways of selecting data sets as in Figure 6. [sent-11635, score-0.467]

79 When the differences are large, the power of all tests is comparable, while when they are smaller the number of rejections for the parametric test seems to lag behind (we have observed this same pattern on other combinations of algorithms). [sent-11636, score-0.676]

80 The order of the non-parametric tests is as expected from the theory, although it is interesting to note that the Holm and Hochberg tests give practically equal results. [sent-11637, score-0.622]

81 kNN Figure 6: Power of statistical tests for comparison of multiple classiﬁers. [sent-12884, score-0.433]

82 These experiments again seem to favour the non-parametric tests over the parametric ones although not always as convincingly as in the case of comparisons of two classiﬁers. [sent-12888, score-0.501]

83 Due to the theoretical and practical advantages of the Friedman test (ease of computation and interpretation, the ability to present the overall performance of classiﬁers in form of ranks instead of the dubious averages), the Friedman test should be preferred over ANOVA. [sent-12889, score-0.38]

84 The corresponding non-parametric post-hoc tests give similar results, so it is upon the researcher to decide whether the slightly more powerful Hommel test is worth the complexity of its calculation as compared to the much simpler Holm test. [sent-12890, score-0.472]

85 kNN Figure 7: Power of statistical tests for comparison of multiple classiﬁers with a control. [sent-15492, score-0.433]

86 There is however no golden standard for making such comparisons and the tests performed often have dubious statistical foundations and lead to unwarranted and unveriﬁed conclusions. [sent-15498, score-0.451]

87 The problems with the multiple data set tests are quite different, even in a sense complementary: the measurements from different data sets are usually incommensurate, and the normality of their distributions and the homogeneity of variance is questionable at best. [sent-15501, score-0.604]

88 Based on the well known statistical properties of the tests and our knowledge of the machine learning data, we concluded that the non-parametric tests should be preferred over the parametric ones. [sent-15504, score-0.706]

89 We varied the differences between the classiﬁers by biasing the selection of data sets, and measured the likelihood of rejection of the null-hypothesis and the replicability of the test. [sent-15506, score-0.41]

90 We have indeed found that the non-parametric tests are more likely to reject the null-hypothesis, which hints at the presence of outliers or violations of assumptions of the parametric tests and conﬁrms our theoretical misgivings about them. [sent-15507, score-0.831]

91 The empirical analysis also shows that replicability of the tests might be a problem, thus the actual experiments should be conducted on as many data sets as possible. [sent-15508, score-0.581]

92 They are safer than parametric tests since they do not assume normal distributions or homogeneity of variance. [sent-15515, score-0.393]

93 Empirical results suggest that they are also stronger than the other tests studied. [sent-15517, score-0.311]

94 There is an alternative opinion among statisticians that signiﬁcance tests should not be performed at all since they are often misused, either due to misinterpretation or by putting too much stress on their results (Cohen, 1994; Schmidt, 1996; Harlow and Mulaik, 1997). [sent-15521, score-0.311]

95 Our stance is that statistical tests provide certain reassurance about the validity and non-randomness of the published results. [sent-15522, score-0.343]

96 On the other hand, statistical tests should not be the deciding factor for or against publishing the work. [sent-15524, score-0.343]

97 Evaluating the replicability of signiﬁcance tests for comparing learning algorithms. [sent-15577, score-0.603]

98 Approximate statistical tests for comparing supervised classiﬁcation learning algorithms. [sent-15607, score-0.413]

99 A comparison of alternative tests of signiﬁcance for the problem of m rankings. [sent-15639, score-0.342]

100 A sharper Bonferroni procedure for multiple tests of signiﬁcance. [sent-15655, score-0.37]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('anova', 0.339), ('tests', 0.311), ('replicability', 0.222), ('cf', 0.188), ('ranks', 0.188), ('knn', 0.178), ('ultiple', 0.175), ('wilcoxon', 0.174), ('ers', 0.171), ('omparisons', 0.163), ('tatistical', 0.163), ('ets', 0.142), ('rejected', 0.141), ('nemenyi', 0.14), ('bouckaert', 0.139), ('lassifiers', 0.139), ('friedman', 0.133), ('reject', 0.121), ('classi', 0.118), ('critical', 0.11), ('ar', 0.109), ('comparisons', 0.108), ('differences', 0.107), ('holm', 0.105), ('test', 0.096), ('paired', 0.091), ('hommel', 0.082), ('cd', 0.076), ('em', 0.074), ('auc', 0.071), ('comparing', 0.07), ('commensurability', 0.07), ('wins', 0.069), ('bonferroni', 0.068), ('papers', 0.064), ('hypotheses', 0.062), ('cancer', 0.061), ('naive', 0.06), ('multiple', 0.059), ('rejections', 0.058), ('txt', 0.058), ('variance', 0.055), ('axis', 0.055), ('procedures', 0.054), ('variability', 0.054), ('hochberg', 0.053), ('normality', 0.053), ('parametric', 0.052), ('cance', 0.052), ('power', 0.052), ('hypothesis', 0.05), ('testing', 0.05), ('tukey', 0.049), ('data', 0.048), ('ei', 0.048), ('dem', 0.047), ('dunnett', 0.047), ('rejects', 0.047), ('breast', 0.045), ('di', 0.044), ('cross', 0.044), ('signi', 0.041), ('bayes', 0.041), ('rank', 0.04), ('mushroom', 0.04), ('lung', 0.04), ('slovenia', 0.04), ('assumptions', 0.036), ('count', 0.036), ('correction', 0.036), ('researcher', 0.035), ('tumor', 0.035), ('iris', 0.035), ('salzberg', 0.035), ('validation', 0.035), ('beck', 0.035), ('elevated', 0.035), ('janez', 0.035), ('mcnemar', 0.035), ('mladeni', 0.035), ('orange', 0.035), ('sheskin', 0.035), ('zar', 0.035), ('zupan', 0.035), ('pairwise', 0.035), ('averages', 0.034), ('rejection', 0.033), ('est', 0.033), ('statistical', 0.032), ('performances', 0.032), ('comparison', 0.031), ('powerful', 0.03), ('er', 0.03), ('lymphography', 0.03), ('disorders', 0.03), ('homogeneity', 0.03), ('favour', 0.03), ('ljubljana', 0.03), ('degrees', 0.03), ('binomial', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

Author: Janez Demšar

2 0.060650211 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

Author: Hichem Sahbi, Donald Geman

Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly ﬁner subsets. The hierarchy is traversed coarse-to-ﬁne and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: ﬁrst, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-speciﬁc SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classiﬁers, coarse-to-ﬁne computation, support vector machines, face detection

3 0.060170949 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

Author: Francis R. Bach, David Heckerman, Eric Horvitz

Abstract: Receiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classiﬁers for all feasible ratios of the costs associated with false positives and false negatives. For linear classiﬁers, the set of classiﬁers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classiﬁers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efﬁciently the entire ROC curve, and that has the same computational complexity as training a single classiﬁer. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classiﬁer and the cost model assumed in applying the classiﬁer. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantiﬁable difference around extreme asymmetries. Keywords: support vector machines, receiver operating characteristic (ROC) analysis, linear classiﬁcation

4 0.059993856 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

Author: Luis M. de Campos

Abstract: We propose a new scoring function for learning Bayesian networks from data using score+search algorithms. This is based on the concept of mutual information and exploits some well-known properties of this measure in a novel way. Essentially, a statistical independence test based on the chi-square distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a non-Bayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the Kullback-Leibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the well-known K2, BDeu and BIC/MDL scores are also presented. Keywords: Bayesian networks, scoring functions, learning, mutual information, conditional independence tests

5 0.051672924 57 jmlr-2006-Linear State-Space Models for Blind Source Separation

Author: Rasmus Kongsgaard Olsson, Lars Kai Hansen

Abstract: We apply a type of generative modelling to the problem of blind source separation in which prior knowledge about the latent source signals, such as time-varying auto-correlation and quasiperiodicity, are incorporated into a linear state-space model. In simulations, we show that in terms of signal-to-error ratio, the sources are inferred more accurately as a result of the inclusion of strong prior knowledge. We explore different schemes of maximum-likelihood optimization for the purpose of learning the model parameters. The Expectation Maximization algorithm, which is often considered the standard optimization method in this context, results in slow convergence when the noise variance is small. In such scenarios, quasi-Newton optimization yields substantial improvements in a range of signal to noise ratios. We analyze the performance of the methods on convolutive mixtures of speech signals. Keywords: blind source separation, state-space model, independent component analysis, convolutive model, EM, speech modelling

6 0.051509753 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

7 0.050950922 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

8 0.042004503 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)

9 0.038984299 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming (Special Topic on Machine Learning and Optimization)

10 0.038508806 13 jmlr-2006-Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies

11 0.038348384 12 jmlr-2006-Active Learning with Feedback on Features and Instances

12 0.038204417 48 jmlr-2006-Learning Minimum Volume Sets

13 0.037991464 88 jmlr-2006-Streamwise Feature Selection

14 0.037317518 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data (Special Topic on Machine Learning and Optimization)

15 0.037243646 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

16 0.035593472 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

17 0.033620667 83 jmlr-2006-Sparse Boosting

18 0.032277137 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models (Special Topic on Machine Learning and Optimization)

19 0.032227255 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)

20 0.032150172 18 jmlr-2006-Building Support Vector Machines with Reduced Classifier Complexity (Special Topic on Machine Learning and Optimization)

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, -0.058), (2, -0.052), (3, 0.038), (4, -0.091), (5, -0.006), (6, -0.058), (7, -0.14), (8, 0.033), (9, -0.036), (10, -0.002), (11, -0.055), (12, -0.051), (13, -0.031), (14, 0.071), (15, -0.038), (16, -0.095), (17, 0.111), (18, 0.194), (19, 0.108), (20, 0.072), (21, -0.225), (22, -0.162), (23, -0.258), (24, -0.077), (25, 0.053), (26, -0.116), (27, 0.021), (28, -0.163), (29, -0.098), (30, -0.005), (31, 0.004), (32, -0.036), (33, 0.014), (34, -0.093), (35, 0.157), (36, -0.051), (37, 0.021), (38, -0.032), (39, -0.004), (40, 0.134), (41, 0.117), (42, -0.056), (43, 0.022), (44, -0.273), (45, -0.044), (46, -0.032), (47, -0.045), (48, 0.025), (49, 0.163)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9612177 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

Author: Janez Demšar

2 0.51418245 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

Author: Francis R. Bach, David Heckerman, Eric Horvitz

3 0.39059252 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

Author: Hichem Sahbi, Donald Geman

4 0.36809242 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

Author: Magnus Ekdahl, Timo Koski

Abstract: In many pattern recognition/classiﬁcation problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classiﬁer is expected to have worse performance, here measured by the probability of correct classiﬁcation. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classiﬁcation when compared to the optimal classiﬁer. An example of an approximation is the Na¨ve Bayes classiﬁer. We show that the perforı mance of the Na¨ve Bayes depends on the degree of functional dependence between the features ı and labels. We provide a sufﬁcient condition for zero loss of performance, too. Keywords: Bayesian networks, na¨ve Bayes, plug-in classiﬁer, Kolmogorov distance of variation, ı variational learning

5 0.33881405 13 jmlr-2006-Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies

Author: Fu Chang, Chin-Chin Lin, Chi-Jen Lu

Abstract: In this paper, we propose a number of adaptive prototype learning (APL) algorithms. They employ the same algorithmic scheme to determine the number and location of prototypes, but differ in the use of samples or the weighted averages of samples as prototypes, and also in the assumption of distance measures. To understand these algorithms from a theoretical viewpoint, we address their convergence properties, as well as their consistency under certain conditions. We also present a soft version of APL, in which a non-zero training error is allowed in order to enhance the generalization power of the resultant classifier. Applying the proposed algorithms to twelve UCI benchmark data sets, we demonstrate that they outperform many instance-based learning algorithms, the k-nearest neighbor rule, and support vector machines in terms of average test accuracy. Keywords: adaptive prototype learning, cluster-based prototypes, consistency, instance-based prototype, pattern classification 1

6 0.33684179 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

7 0.31922311 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming (Special Topic on Machine Learning and Optimization)

8 0.31182796 48 jmlr-2006-Learning Minimum Volume Sets

9 0.28683275 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models (Special Topic on Machine Learning and Optimization)

10 0.28173614 18 jmlr-2006-Building Support Vector Machines with Reduced Classifier Complexity (Special Topic on Machine Learning and Optimization)

11 0.26113889 88 jmlr-2006-Streamwise Feature Selection

12 0.25805461 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

13 0.23191649 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data (Special Topic on Machine Learning and Optimization)

14 0.23005559 83 jmlr-2006-Sparse Boosting

15 0.22608621 57 jmlr-2006-Linear State-Space Models for Blind Source Separation

16 0.20742507 4 jmlr-2006-A Linear Non-Gaussian Acyclic Model for Causal Discovery

17 0.20568693 12 jmlr-2006-Active Learning with Feedback on Features and Instances

18 0.20060901 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting (Special Topic on Inductive Programming)

19 0.19212842 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

20 0.18979912 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(35, 0.017), (36, 0.661), (45, 0.01), (50, 0.036), (63, 0.023), (68, 0.011), (76, 0.011), (78, 0.016), (81, 0.027), (84, 0.01), (90, 0.018), (91, 0.014), (96, 0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99050605 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

Author: Janez Demšar

2 0.9903428 50 jmlr-2006-Learning Recursive Control Programs from Problem Solving (Special Topic on Inductive Programming)

Author: Pat Langley, Dongkyu Choi

Abstract: In this paper, we propose a new representation for physical control – teleoreactive logic programs – along with an interpreter that uses them to achieve goals. In addition, we present a new learning method that acquires recursive forms of these structures from traces of successful problem solving. We report experiments in three different domains that demonstrate the generality of this approach. In closing, we review related work on learning complex skills and discuss directions for future research on this topic. Keywords: teleoreactive control, logic programs, problem solving, skill learning

3 0.96697903 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

Author: Paul W. Goldberg

Abstract: A classical approach in multi-class pattern classiﬁcation is the following. Estimate the probability distributions that generated the observations for each label class, and then label new instances by applying the Bayes classiﬁer to the estimated distributions. That approach provides more useful information than just a class label; it also provides estimates of the conditional distribution of class labels, in situations where there is class overlap. We would like to know whether it is harder to build accurate classiﬁers via this approach, than by techniques that may process all data with distinct labels together. In this paper we make that question precise by considering it in the context of PAC learnability. We propose two restrictions on the PAC learning framework that are intended to correspond with the above approach, and consider their relationship with standard PAC learning. Our main restriction of interest leads to some interesting algorithms that show that the restriction is not stronger (more restrictive) than various other well-known restrictions on PAC learning. An alternative slightly milder restriction turns out to be almost equivalent to unrestricted PAC learning. Keywords: computational learning theory, computational complexity, pattern classiﬁcation

4 0.65466356 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

Author: Magnus Ekdahl, Timo Koski

5 0.63651562 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classiﬁers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly ﬁnd the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classiﬁcation. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classiﬁcation

6 0.62487024 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

7 0.61690074 21 jmlr-2006-Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis

8 0.61616611 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting (Special Topic on Inductive Programming)

9 0.61074698 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming (Special Topic on Machine Learning and Optimization)

10 0.60304594 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

11 0.59740674 47 jmlr-2006-Learning Image Components for Object Recognition

12 0.59532994 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities

13 0.59122425 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

14 0.5906958 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

15 0.5901916 53 jmlr-2006-Learning a Hidden Hypergraph

16 0.57515872 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

17 0.57449931 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

18 0.56440926 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

19 0.56191564 49 jmlr-2006-Learning Parts-Based Representations of Data

20 0.56093341 62 jmlr-2006-MinReg: A Scalable Algorithm for Learning Parsimonious Regulatory Networks in Yeast and Mammals