nips nips2003 nips2003-181 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alice X. Zheng, Michael I. Jordan, Ben Liblit, Alex Aiken
Abstract: We present a novel strategy for automatically debugging programs given sampled data from thousands of actual user runs. Our goal is to pinpoint those features that are most correlated with crashes. This is accomplished by maximizing an appropriately defined utility function. It has analogies with intuitive debugging heuristics, and, as we demonstrate, is able to deal with various types of bugs that occur in real programs. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a novel strategy for automatically debugging programs given sampled data from thousands of actual user runs. [sent-11, score-0.302]
2 This is accomplished by maximizing an appropriately defined utility function. [sent-13, score-0.211]
3 It has analogies with intuitive debugging heuristics, and, as we demonstrate, is able to deal with various types of bugs that occur in real programs. [sent-14, score-0.528]
4 Most users take software bugs for granted, and willingly run buggy programs every day with little complaint. [sent-16, score-0.681]
5 In some sense, these user runs of the program are the ideal test suite any software engineer could hope for. [sent-17, score-0.489]
6 User crash reports are used to direct debugging efforts toward those bugs which seem to affect the most people. [sent-19, score-0.72]
7 In earlier work [1] we present a program sampling framework that collects data from users at minimal cost; the aggregated runs are then analyzed to isolate the bugs. [sent-22, score-0.527]
8 In this paper, we describe how to design a single classification utility function that integrates the various debugging heuristics. [sent-25, score-0.33]
9 In particular, determinism of some features is a significant issue in this domain, and an additional penalty term for false positives is included to deal with this aspect. [sent-26, score-0.593]
10 Furthermore, utility levels, while subjective, are robust: we offer simple guidelines for their selection, and demonstrate that results remain stable and strong across a wide range of reasonable parameter settings. [sent-27, score-0.247]
11 We start by briefly describing the program sampling framework in Section 2, and present the feature selection framework in Section 3. [sent-28, score-0.452]
12 2 Program Sampling Framework Our approach relies on being able to collect information about program behavior at runtime. [sent-30, score-0.244]
13 We scatter a large number of checks in the program code, but do not execute all of them during any single run. [sent-32, score-0.294]
14 At runtime, the program tosses a coin (with low heads probability) independently for each assertion it encounters, and decides whether or not to execute the assertion. [sent-38, score-0.449]
15 However, while it is not expensive to generate a random coin toss, doing so separately for each assertion would incur a very large overhead; the program will run even slower than just executing every assertion. [sent-39, score-0.5]
16 Each assertion decrements this countdown by 1; when it reaches 0, we perform the assertion and generate another geometric random variable. [sent-46, score-0.368]
17 1 However, checking to see if the counter has reached 0 at every assertion is still an expensive procedure. [sent-47, score-0.275]
18 For further code optimization, we analyze each contiguous acyclic code region (loops- and recursion-free) at compile time and count the maximum number of assertions on any path through that region. [sent-48, score-0.229]
19 Samples are taken in chronological order as the program runs. [sent-50, score-0.244]
20 To save space, we instead record only the counts of how often each assertion is found to be true or false. [sent-52, score-0.258]
21 When the program finishes, these counts, along with the program exit status, are sent back to the central server for further analysis. [sent-53, score-0.528]
22 The program sampling framework is a non-trivial software analysis effort. [sent-54, score-0.365]
23 Knowing the final program exit status (crashed or successful) leaves us in 1 The sampling density h controls the tradeoff between runtime overhead and data sparsity. [sent-59, score-0.497]
24 This is not a problem for large programs like Mozilla and Windows with thousands of crash reports a day. [sent-61, score-0.343]
25 Good feature selection should be corroborated by classification performance, though in our case, we only care about features that correctly predict one of the two classes. [sent-64, score-0.246]
26 Hence, instead of working in the usual maximum likelihood setting for classification and regularization, we define and maximize a more appropriate utility function. [sent-65, score-0.211]
27 1 Some characteristics of the problem We concentrate on isolating the bugs that are caused by the occurrence of a small set of features, i. [sent-71, score-0.364]
28 assertions that are always true when a crash occurs. [sent-73, score-0.328]
29 2 We want to identify the predicate counts that are positively correlated with the program crashing. [sent-74, score-0.367]
30 Due to sampling effects, it is quite possible that a feature responsible for the ultimate crash may not have been observed in a given run. [sent-77, score-0.392]
31 This is especially true in the case of “quick and painless” deaths, where a program crashes very soon after the actual bug occurs. [sent-78, score-0.659]
32 Normally this would be an easy bug to find, because one wouldn’t have to look very far beyond the crashing point at the top of the stack. [sent-79, score-0.343]
33 However, this is a challenge for our approach, because there may be only a single opportunity to sample the buggy feature before the program dies. [sent-80, score-0.39]
34 Thus many crashes may have an input feature profile that is very similar to that of a successful run. [sent-81, score-0.314]
35 At the other end of the spectrum, if we are dealing with a deterministic bug3 , false positives should have a probability of zero: if the buggy feature is observed to be true, then the program has to crash; if the program did not crash, then the bug must not have occurred. [sent-83, score-1.207]
36 Therefore, for a deterministic bug, any false positives during the training process should incur a much larger penalty compared to any false negatives. [sent-84, score-0.56]
37 2 Designing the utility function Let (x, y) denote a data point, where x is an input vector of non-negative integer counts, and y ∈ {0, 1} is the output label. [sent-86, score-0.241]
38 The last two cases represent false negative and false positive, respectively. [sent-89, score-0.314]
39 In the general form of utility maximization for classification (see, e. [sent-90, score-0.211]
40 We do not focus on this type of bugs in this paper. [sent-93, score-0.364]
41 3 A bug is deterministic if it crashes the program every time it is observed. [sent-94, score-0.732]
42 For example, dereferencing a null pointer would crash the program without exception. [sent-95, score-0.527]
43 Note that this notion of determinism is data-dependent: it is always predicated on the trial runs that we have seen. [sent-96, score-0.3]
44 To slightly simplify the 1 formula, we choose the same functional form for u1 and u2 , but add an extra penalty term for false positives: u1 (x; θ) := u2 (x; θ) u3 (x; θ) u4 (x; θ) := δ1 (log2 µ(x; θ) + 1) := δ2 (log2 (1 − µ(x; θ)) + 1) (4) (5) := δ2 (log2 (1 − µ(x; θ)) + 1) − δ3 θT x . [sent-107, score-0.244]
45 Also, we can fold any multiplicative constants of the utility functions into δi , so the base of the log function is freely exchangeable. [sent-109, score-0.248]
46 We find that the expected utility function is equivalent to: E U = δ1 y log µ + δ2 (1 − y) log(1 − µ) − δ3 θT x(1 − y)I{µ>1/2} − λ θ 1 1 . [sent-110, score-0.248]
47 In general, this expected utility function weighs each class separately using δi , and has an additional penalty term for false positives. [sent-113, score-0.455]
48 3 Interpretation of the utility functions Let us closely examine the utility functions defined in Eqns. [sent-125, score-0.422]
49 It is positive when z is positive, and approaches 4 Assuming that the more abnormalities there are, the more likely it is for the program to crash, it is reasonable to use a classifier based on a linear combination of features. [sent-129, score-0.275]
50 5 −2 −2 −1 0 z 1 2 −2 −2 −z/2ln2 −z/ln2 −1 0 z 1 2 Figure 1: (a) Plot of the true positive indicator function and the utility function log2 µ(z) + 1. [sent-144, score-0.291]
51 (b) Plot of the true negative indicator function, utility function log2 (1 − µ(z)) + 1, and its asymptotic slopes −z/ log 2 and −z/2 log 2. [sent-145, score-0.334]
52 On the other hand, when z is negative, the utility function is negative, acting as a penalty for false negatives. [sent-148, score-0.455]
53 Hence, when the false positive is close to the decision boundary, the additional penalty of θT x = z in Eqn. [sent-156, score-0.275]
54 Most of the time a program exits successfully without crashing, so we have to deal with having many more successful runs than crashed runs (see Section 5). [sent-160, score-0.714]
55 Finally, δ3 is the knob of determinism: if the bug is deterministic, then setting δ3 to a large value will severely penalize false positives; if the bug is not deterministic, then a small value for δ3 affords the necessary slack to accommodate runs which should have failed but did not. [sent-162, score-0.795]
56 As we shall see in Section 5, if the bug is truly deterministic, then the quality of the final features selected will be higher for large δ3 values. [sent-163, score-0.37]
57 In a previous paper [1], we outlined some simple feature elimination heuristics that can be used in the case of a deterministic bug. [sent-164, score-0.276]
58 Elimination by universal falsehood discards any counter that is always zero, because it likely represents an assertion that can never be true. [sent-165, score-0.325]
59 Elimination by lack of failing example discards any counter that is zero on all crashes, because what never happens cannot have caused the crash. [sent-167, score-0.221]
60 Elimination by successful counterexample discards any counter that is non-zero on any successful run, because these are assertions that can be true without a subsequent program failure. [sent-168, score-0.718]
61 Also, if a heavily weighted feature xi is positive on a successful run in the training set, then the classifier is more likely to result in a false positive. [sent-172, score-0.395]
62 The false positive penalty term will then decrease the weight θi , so that such a feature is unlikely to be chosen at the end. [sent-173, score-0.353]
63 Thus utility maximization also handles elimination by successful counterexample . [sent-174, score-0.452]
64 4 Two Case Studies As examples, we present two cases studies of C programs with bugs that are at the opposite ends of the determinism spectrum. [sent-176, score-0.607]
65 2 is known to contain a bug that involves overwriting existing files. [sent-179, score-0.257]
66 If the user responds to a confirmation prompt with EOF rather than yes or no, ccrypt consistently crashes. [sent-180, score-0.327]
67 We find that feeding bc nine megabytes of random input causes it to crash roughly one time in four while calling malloc() — a strong indication of heap corruption. [sent-183, score-0.45]
68 Such bugs are inherently difficult to fix because they are inherently non-deterministic: there is no guarantee that a mangled heap will cause a crash soon or indeed at all. [sent-184, score-0.723]
69 Our instrumented program adds instrumentation after each function call to sample and record the number of times the return value is negative, zero, or positive. [sent-187, score-0.339]
70 Each run uses a randomly selected set of present or absent files, randomized command line flags, and randomized responses to ccrypt prompts including the occasional EOF. [sent-190, score-0.365]
71 We have collected 7204 trial runs at a sampling rate of 1/100, 1162 of which result in a crash. [sent-191, score-0.24]
72 Out of the 1710 counter features, 1542 are constant across all runs, leaving 168 counters to be considered in the training process. [sent-193, score-0.228]
73 Our bc data set consists of 3051 runs with distinct random inputs at a sampling rate of 1/1000. [sent-203, score-0.339]
74 The smaller ccrypt dataset requires just under 8 seconds. [sent-214, score-0.25]
75 The more important knobs are δ1 and δ3 : the former controls the relative importance of classification performance on crashed runs, the latter adjusts the believed level of determinism of the bug. [sent-217, score-0.27]
76 (1) In order to counter the effects of imbalanced datasets, the ratio of δ1 /δ2 should be at least around the range of the ratio of successful to crashed runs. [sent-219, score-0.354]
77 This is especially crucial for the ccrypt data set, which contains roughly 32 successful runs for every crash. [sent-220, score-0.487]
78 (2) δ3 should not be higher than δ1 , because it is ultimately more important (a) ccrypt (b) ccrypt, δ1 = 30 (d) bc, δ1 = 5 (c) bc 1. [sent-221, score-0.388]
79 2 0 5 10 15 20 25 δ3 1 1 5 10 δ1 15 20 1 0 1 2 δ3 3 4 5 Figure 2: (a,b) Cross-validation scores for the ccrypt data set; (c,d) Cross-validation scores for the bc data set. [sent-237, score-0.462]
80 to correctly classify crashes than to not have any false positives. [sent-239, score-0.315]
81 2(a) shows a plot of cross-validation score (maximum over a number of settings for δ2 and δ3 ) for the ccrypt data set at various δ1 values. [sent-242, score-0.38]
82 The “smoking gun” which directly indicates the ccrypt bug is: traverse. [sent-248, score-0.507]
83 In all of the above mentioned safe settings for δ1 and δ3 , this feature is returned as the top feature. [sent-250, score-0.217]
84 This is to be expected: the bug in bc is non-deterministic, and therefore false positives do indeed exist in the training set. [sent-257, score-0.638]
85 As for the feature selection results for bc, for all reasonable parameter settings (and even those that do not have the best classification performance), the top features are a group of correlated counters that all point to the index of an array being abnormally big. [sent-259, score-0.566]
86 c:176: more more more more more arrays(): arrays(): arrays(): arrays(): arrays(): indx indx indx indx indx > > > > > optopt opterr use math quiet f count In Fig. [sent-270, score-1.057]
87 The top feature becomes a necessary but not sufficient condition for a crash – a false positive-inducing feature! [sent-273, score-0.512]
88 Hence the lesson is that if the bug is believed to be deterministic then δ3 should always be positive. [sent-274, score-0.364]
89 They also indicate that the variable indx seems to be abnormally big. [sent-277, score-0.273]
90 Indeed, indx is the array index that runs over the actual array length, which is contained in the integer variable a count. [sent-278, score-0.475]
91 The program may crash long after the first array bound violation, which means that there are many opportunities for the sampling framework to observe the abnormally big value of indx. [sent-279, score-0.684]
92 Since there are many comparisons between indx and other integer variables, there is a large set of inter-correlated counters, any subset of which may be picked by our algorithm as the top features. [sent-280, score-0.275]
93 In the training run shown above, the smoking gun of indx > a count is ranked number 8. [sent-281, score-0.38]
94 But in general its rank could be much smaller, because the top features already suffice for predicting crashes and pointing us to the right line in the code. [sent-282, score-0.279]
95 6 Conclusions and Future Work Our goal is a system that automatically pinpoints the location of bugs in widely deployed software. [sent-283, score-0.364]
96 We tackle different types of bugs using a custom-designed utility function with a “determinism level” knob. [sent-284, score-0.575]
97 Our methods are shown to work on two real-world programs, and are able to locate the bugs in a range of parameter settings. [sent-285, score-0.364]
98 In on-going research, we are extending our approach to deal with the problem of multiple bugs in larger programs. [sent-288, score-0.409]
99 We are also working on modifying the program sampling framework to allow denser sampling in more important regions of the code. [sent-289, score-0.398]
100 This should alleviate the sparsity of features while reducing the number of runs required to yield useful results. [sent-290, score-0.237]
wordName wordTfidf (topN-words)
[('bugs', 0.364), ('bug', 0.257), ('ccrypt', 0.25), ('program', 0.244), ('crash', 0.237), ('utility', 0.211), ('indx', 0.205), ('crashes', 0.158), ('false', 0.157), ('assertion', 0.138), ('bc', 0.138), ('counter', 0.137), ('determinism', 0.137), ('runs', 0.124), ('debugging', 0.119), ('programs', 0.106), ('crashed', 0.099), ('classi', 0.097), ('elimination', 0.093), ('assertions', 0.091), ('counters', 0.091), ('subgradient', 0.091), ('penalty', 0.087), ('positives', 0.086), ('features', 0.081), ('arrays', 0.079), ('successful', 0.078), ('feature', 0.078), ('user', 0.077), ('sampling', 0.077), ('deterministic', 0.073), ('abnormally', 0.068), ('aiken', 0.068), ('buggy', 0.068), ('liblit', 0.068), ('score', 0.067), ('coin', 0.067), ('settings', 0.063), ('record', 0.063), ('pointers', 0.059), ('array', 0.058), ('counts', 0.057), ('runtime', 0.054), ('code', 0.053), ('selection', 0.053), ('run', 0.051), ('overhead', 0.05), ('checks', 0.05), ('discards', 0.05), ('uc', 0.05), ('indicator', 0.049), ('ascent', 0.048), ('users', 0.048), ('cation', 0.046), ('countdown', 0.046), ('crashing', 0.046), ('decrements', 0.046), ('eof', 0.046), ('gun', 0.046), ('pinpoint', 0.046), ('pointer', 0.046), ('smoking', 0.046), ('xreadline', 0.046), ('deal', 0.045), ('software', 0.044), ('inherently', 0.041), ('berkeley', 0.04), ('regularization', 0.04), ('top', 0.04), ('guesses', 0.04), ('heap', 0.04), ('counterexample', 0.04), ('imbalanced', 0.04), ('exit', 0.04), ('trial', 0.039), ('division', 0.039), ('cs', 0.038), ('log', 0.037), ('scores', 0.037), ('safe', 0.036), ('wild', 0.036), ('guidelines', 0.036), ('zheng', 0.036), ('roughly', 0.035), ('care', 0.034), ('correlated', 0.034), ('aggregated', 0.034), ('failing', 0.034), ('believed', 0.034), ('heuristics', 0.032), ('count', 0.032), ('selected', 0.032), ('identify', 0.032), ('return', 0.032), ('status', 0.032), ('command', 0.032), ('alleviate', 0.032), ('positive', 0.031), ('handles', 0.03), ('integer', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 181 nips-2003-Statistical Debugging of Sampled Programs
Author: Alice X. Zheng, Michael I. Jordan, Ben Liblit, Alex Aiken
Abstract: We present a novel strategy for automatically debugging programs given sampled data from thousands of actual user runs. Our goal is to pinpoint those features that are most correlated with crashes. This is accomplished by maximizing an appropriately defined utility function. It has analogies with intuitive debugging heuristics, and, as we demonstrate, is able to deal with various types of bugs that occur in real programs. 1
2 0.0985962 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin
Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as on-line learning. 1
3 0.076490305 1 nips-2003-1-norm Support Vector Machines
Author: Ji Zhu, Saharon Rosset, Robert Tibshirani, Trevor J. Hastie
Abstract: The standard 2-norm SVM is known for its good performance in twoclass classi£cation. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an ef£cient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM. 1
4 0.075295359 73 nips-2003-Feature Selection in Clustering Problems
Author: Volker Roth, Tilman Lange
Abstract: A novel approach to combining clustering and feature selection is presented. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. On the technical side, we present an efficient optimization algorithm with guaranteed local convergence property. The only free parameter of this method is selected by a resampling-based stability analysis. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features. 1
5 0.053596251 88 nips-2003-Image Reconstruction by Linear Programming
Author: Koji Tsuda, Gunnar Rätsch
Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.
6 0.052622054 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
7 0.051638354 51 nips-2003-Design of Experiments via Information Theory
8 0.051628649 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
9 0.050073087 124 nips-2003-Max-Margin Markov Networks
10 0.049132798 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting
11 0.048737872 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
12 0.048640881 122 nips-2003-Margin Maximizing Loss Functions
13 0.047764488 171 nips-2003-Semi-Definite Programming by Perceptron Learning
14 0.047598518 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
15 0.047539588 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
16 0.046364646 113 nips-2003-Learning with Local and Global Consistency
17 0.045683295 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines
18 0.044632711 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
19 0.044237062 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
20 0.044204302 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
topicId topicWeight
[(0, -0.171), (1, -0.032), (2, -0.004), (3, -0.076), (4, -0.021), (5, -0.036), (6, -0.047), (7, -0.021), (8, -0.005), (9, 0.086), (10, -0.007), (11, -0.037), (12, -0.017), (13, 0.01), (14, 0.043), (15, 0.044), (16, -0.07), (17, 0.007), (18, 0.089), (19, 0.007), (20, 0.016), (21, 0.007), (22, -0.035), (23, -0.029), (24, -0.002), (25, -0.023), (26, 0.041), (27, 0.008), (28, -0.036), (29, -0.015), (30, 0.0), (31, 0.036), (32, -0.038), (33, 0.039), (34, -0.003), (35, 0.023), (36, -0.024), (37, -0.081), (38, -0.023), (39, 0.1), (40, -0.094), (41, 0.003), (42, 0.056), (43, -0.023), (44, -0.04), (45, 0.098), (46, -0.046), (47, -0.037), (48, 0.022), (49, 0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.93923402 181 nips-2003-Statistical Debugging of Sampled Programs
Author: Alice X. Zheng, Michael I. Jordan, Ben Liblit, Alex Aiken
Abstract: We present a novel strategy for automatically debugging programs given sampled data from thousands of actual user runs. Our goal is to pinpoint those features that are most correlated with crashes. This is accomplished by maximizing an appropriately defined utility function. It has analogies with intuitive debugging heuristics, and, as we demonstrate, is able to deal with various types of bugs that occur in real programs. 1
2 0.64097995 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters
Author: Daniel B. Neill, Andrew W. Moore
Abstract: Given an N ×N grid of squares, where each square has a count and an underlying population, our goal is to find the square region with the highest density, and to calculate its significance by randomization. Any density measure D, dependent on the total count and total population of a region, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff’s spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N 3 ) time, and is generally computationally infeasible. We present a novel algorithm which partitions the grid into overlapping regions, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in optimal O(N 2 ) time, in practice resulting in significant (10-200x) speedups. 1
3 0.62823027 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification
Author: Thomas R. Strohmann, Andrei Belitski, Gregory Z. Grudic, Dennis DeCoste
Abstract: The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound Ω. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation. Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound Ω. Experimental results indicate that we can obtain reliable bounds Ω, as well as test set accuracies that are comparable to state of the art classification algorithms.
4 0.62622786 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
Author: Michael J. Quinlan, Stephan K. Chalup, Richard H. Middleton
Abstract: This article addresses the issues of colour classification and collision detection as they occur in the legged league robot soccer environment of RoboCup. We show how the method of one-class classification with support vector machines (SVMs) can be applied to solve these tasks satisfactorily using the limited hardware capacity of the prescribed Sony AIBO quadruped robots. The experimental evaluation shows an improvement over our previous methods of ellipse fitting for colour classification and the statistical approach used for collision detection.
5 0.61627066 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin
Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as on-line learning. 1
6 0.6111455 3 nips-2003-AUC Optimization vs. Error Rate Minimization
7 0.61015964 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class
8 0.56208318 187 nips-2003-Training a Quantum Neural Network
9 0.52320951 196 nips-2003-Wormholes Improve Contrastive Divergence
10 0.52011639 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
11 0.5184018 1 nips-2003-1-norm Support Vector Machines
12 0.51827657 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
13 0.51734531 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
14 0.51229352 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
15 0.50758529 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
16 0.48050222 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering
17 0.47582799 73 nips-2003-Feature Selection in Clustering Problems
18 0.46930286 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
19 0.46477193 124 nips-2003-Max-Margin Markov Networks
20 0.46355587 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
topicId topicWeight
[(0, 0.05), (11, 0.029), (29, 0.015), (30, 0.017), (35, 0.059), (53, 0.089), (69, 0.023), (71, 0.039), (76, 0.046), (82, 0.369), (85, 0.078), (91, 0.105), (99, 0.018)]
simIndex simValue paperId paperTitle
1 0.87253577 21 nips-2003-An Autonomous Robotic System for Mapping Abandoned Mines
Author: David Ferguson, Aaron Morris, Dirk Hähnel, Christopher Baker, Zachary Omohundro, Carlos Reverte, Scott Thayer, Charles Whittaker, William Whittaker, Wolfram Burgard, Sebastian Thrun
Abstract: We present the software architecture of a robotic system for mapping abandoned mines. The software is capable of acquiring consistent 2D maps of large mines with many cycles, represented as Markov random £elds. 3D C-space maps are acquired from local 3D range scans, which are used to identify navigable paths using A* search. Our system has been deployed in three abandoned mines, two of which inaccessible to people, where it has acquired maps of unprecedented detail and accuracy. 1
same-paper 2 0.80069375 181 nips-2003-Statistical Debugging of Sampled Programs
Author: Alice X. Zheng, Michael I. Jordan, Ben Liblit, Alex Aiken
Abstract: We present a novel strategy for automatically debugging programs given sampled data from thousands of actual user runs. Our goal is to pinpoint those features that are most correlated with crashes. This is accomplished by maximizing an appropriately defined utility function. It has analogies with intuitive debugging heuristics, and, as we demonstrate, is able to deal with various types of bugs that occur in real programs. 1
3 0.72696829 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
Author: Roland Vollgraf, Michael Scholz, Ian A. Meinertzhagen, Klaus Obermayer
Abstract: Nonlinear filtering can solve very complex problems, but typically involve very time consuming calculations. Here we show that for filters that are constructed as a RBF network with Gaussian basis functions, a decomposition into linear filters exists, which can be computed efficiently in the frequency domain, yielding dramatic improvement in speed. We present an application of this idea to image processing. In electron micrograph images of photoreceptor terminals of the fruit fly, Drosophila, synaptic vesicles containing neurotransmitter should be detected and labeled automatically. We use hand labels, provided by human experts, to learn a RBF filter using Support Vector Regression with Gaussian kernels. We will show that the resulting nonlinear filter solves the task to a degree of accuracy, which is close to what can be achieved by human experts. This allows the very time consuming task of data evaluation to be done efficiently. 1
4 0.4577677 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning
Author: H. J. Kim, Michael I. Jordan, Shankar Sastry, Andrew Y. Ng
Abstract: Autonomous helicopter flight represents a challenging control problem, with complex, noisy, dynamics. In this paper, we describe a successful application of reinforcement learning to autonomous helicopter flight. We first fit a stochastic, nonlinear model of the helicopter dynamics. We then use the model to learn to hover in place, and to fly a number of maneuvers taken from an RC helicopter competition.
5 0.45775852 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
Author: Liva Ralaivola, Florence D'alché-buc
Abstract: We consider the question of predicting nonlinear time series. Kernel Dynamical Modeling (KDM), a new method based on kernels, is proposed as an extension to linear dynamical models. The kernel trick is used twice: first, to learn the parameters of the model, and second, to compute preimages of the time series predicted in the feature space by means of Support Vector Regression. Our model shows strong connection with the classic Kalman Filter model, with the kernel feature space as hidden state space. Kernel Dynamical Modeling is tested against two benchmark time series and achieves high quality predictions. 1
6 0.44815275 78 nips-2003-Gaussian Processes in Reinforcement Learning
7 0.44516626 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
8 0.44180983 113 nips-2003-Learning with Local and Global Consistency
9 0.43903828 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
10 0.43759936 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
11 0.43683815 107 nips-2003-Learning Spectral Clustering
12 0.43539003 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
13 0.43445429 68 nips-2003-Eye Movements for Reward Maximization
14 0.43361595 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
15 0.43328467 158 nips-2003-Policy Search by Dynamic Programming
16 0.43294913 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
17 0.4321638 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
18 0.43131343 126 nips-2003-Measure Based Regularization
19 0.43114668 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
20 0.43107289 81 nips-2003-Geometric Analysis of Constrained Curves