nips nips2007 nips2007-27 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
Reference: text
sentIndex sentText sentNum sentScore
1 il Abstract Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. [sent-7, score-0.413]
2 It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. [sent-11, score-0.248]
3 Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. [sent-12, score-0.43]
4 ACT is also shown to exhibit good anytime behavior with diminishing returns. [sent-16, score-0.257]
5 The center aims to obtain a comprehensible model, with low expected test costs (the costs of testing attribute values) and high expected accuracy. [sent-18, score-0.845]
6 Moreover, in many cases there are costs associated with the predictive errors. [sent-19, score-0.291]
7 In such a scenario, the task of the inducer is to produce a model with low expected test costs and low expected misclassification costs. [sent-20, score-0.451]
8 A good candidate for achieving the goals of comprehensibility and reduced costs is a decision tree model. [sent-21, score-0.488]
9 Several works proposed learners that consider different misclassification costs [7, 18, 6, 9, 10, 14, 1]. [sent-26, score-0.312]
10 The medical center scenario exemplifies the need for considering both types of cost together: doctors do not perform a test before considering both its cost and its importance to the diagnosis. [sent-30, score-0.61]
11 Minimal Cost trees, a method that attempts to minimize both types of costs simultaneously has been proposed in [21]. [sent-31, score-0.314]
12 The immediate reduction in total cost each split results in is estimated, and a split with the maximal reduction is selected. [sent-33, score-0.303]
13 The cost of a9 and a10 , however, is significantly higher than the others but lower than the cost of misclassification. [sent-38, score-0.482]
14 In such a case, even if the costs were uniform, local measures would fail in recognizing the relevance of a9 and a10 and other attributes might be preferred. [sent-43, score-0.382]
15 Algorithms that can exploit more time to produce solutions of better quality are called anytime algorithms [5]. [sent-48, score-0.262]
16 One way to exploit additional time when searching for a tree of lower costs is to widen the search space. [sent-49, score-0.473]
17 The ICET algorithm [24] was a pioneer in searching non-greedily for a tree that minimizes both costs together. [sent-52, score-0.422]
18 ICET uses genetic search to produce a new set of costs that reflects both the original costs and the contribution each attribute can make to reduce misclassification costs. [sent-53, score-0.797]
19 Then it builds a tree using the greedy EG2 algorithm but with the evolved costs instead of the original ones. [sent-54, score-0.483]
20 Because ICET assigns costs globally, they will have similar costs as well. [sent-67, score-0.582]
21 In addition ACT is shown to exhibit good anytime behavior with diminishing returns. [sent-76, score-0.257]
22 Note that costs may also be involved during example acquisition [12, 15]. [sent-78, score-0.291]
23 These two stages involve different types of cost [23]. [sent-84, score-0.284]
24 In a problem with |C| different classes, a classification cost matrix M is a |C| × |C| matrix whose Mi,j entry defines the penalty of assigning the class ci to an instance that actually belongs to the class cj . [sent-88, score-0.241]
25 To calculate the test costs of a particular case, we sum the cost of the tests along the path from the root to the appropriate leaf. [sent-89, score-0.635]
26 Grouped tests share a common cost that is charged only once per group. [sent-92, score-0.346]
27 Each test also has an extra cost charged when the test is actually made. [sent-93, score-0.357]
28 For example, consider a tree path with tests like cholesterol level and glucose level. [sent-94, score-0.25]
29 Clearly, once blood samples are taken to measure the cholesterol level, the cost for measuring the glucose level is lower. [sent-96, score-0.314]
30 In this work we do not handle delayed costs but we do explain how to adapt our framework to scenarios that involve them. [sent-104, score-0.348]
31 Having measured the test costs and misclassification costs, an important question is how to combine them. [sent-105, score-0.329]
32 Following [24] we assume that both types of cost are given in the same scale. [sent-106, score-0.264]
33 [19] presented a method to handle the two kinds of cost scales by setting a maximal budget for one kind and minimizing the other. [sent-109, score-0.241]
34 ACT, our proposed anytime framework for induction of cost-sensitive trees, builds on the recently introduced LSID3 algorithm [11]. [sent-110, score-0.254]
35 LSID3 adopts the general top-down induction of decision trees scheme (TDIDT): it starts from the entire set of training examples, partitions it into subsets by testing the value of an attribute, and then recursively builds subtrees. [sent-111, score-0.259]
36 For every candidate split, LSID3 attempts to estimate the size of the resulting subtree were the split to take place and following Occam’s razor [4] it favors the one with the smallest expected size. [sent-113, score-0.24]
37 In SID3, rather than choosing an attribute that maximizes the information gain ∆I (as in ID3), the splitting attribute is chosen semi-randomly. [sent-116, score-0.261]
38 LSID3 was shown to exhibit a good anytime behavior with diminishing returns. [sent-122, score-0.257]
39 In ACT, however, we would like to bias our sample towards low cost trees. [sent-129, score-0.241]
40 For this purpose, we designed a stochastic version of the EG2 algorithm, that attempts to build low cost trees greedily. [sent-130, score-0.377]
41 In EG2, a tree is built top-down, and the attribute that maximizes ICF (Information Cost Function) is chosen for splitting a node, where, ICF (a) = 2∆I(a) − 1 / ((cost (a) + 1)w ). [sent-131, score-0.276]
42 , if an attribute has already been tested its cost would be zero and if another attribute that belongs to the same group has been tested, a group discount is applied. [sent-137, score-0.473]
43 The parameter w controls the bias towards lower cost 3 attributes. [sent-138, score-0.241]
44 While ICET tunes this parameter using genetic search, we set w inverse proportionally to the misclassification cost: a high misclassification cost results in a smaller w, reducing the effect of attribute costs. [sent-139, score-0.443]
45 As a cost insensitive learner, the main goal of LSID3 is to maximize the expected accuracy of the learned tree. [sent-142, score-0.31]
46 Following Occam’s razor, it uses the tree size as a preference bias and favors splits that are expected to reduce the final tree size. [sent-143, score-0.325]
47 In a cost-sensitive setup, our goal is to minimize the expected cost of classification. [sent-144, score-0.282]
48 Therefore, given a tree, we need to come up with a procedure that estimates the expected costs when classifying a future case. [sent-147, score-0.332]
49 This cost consists of two components: the test cost and misclassification cost. [sent-148, score-0.52]
50 Assuming that the distribution of future cases would be similar to that of the learning examples, we can estimate the test costs using the training data. [sent-149, score-0.329]
51 Given a tree, we calculate the average test cost of the training examples and use it to approximate the test cost of new cases. [sent-150, score-0.602]
52 For a tree T and a set of training examples E, we denote the average cost of traversing T for an example from E (average testing cost) by tst-cost(T, E). [sent-151, score-0.443]
53 Note that group discounts and delayed cost penalties do not need a special care because they will be incorporated when calculating the average test costs. [sent-152, score-0.339]
54 Moreover, tree size is measured in a different currency than accuracy and hence cannot be easily incorporated in the cost function. [sent-156, score-0.4]
55 Assume a problem with |C| classes and a misclassification cost matrix M . [sent-166, score-0.241]
56 The expected misclassification cost in l is (the right most expression assumes uniform misclassification cost Mi,j = mc) mc-cost(l) = EE(m, m − mc , cf ) · 1 |C| − 1 Mc,i = EE(m, m − mc , cf ) · mc i=c The expected error of a tree is the sum of the expected errors in its leafs. [sent-169, score-1.297]
57 The expected total cost of T when classifying an instance is: tst-cost(T, E) + 1 · m mc-cost (l). [sent-177, score-0.282]
58 l∈L Having decided about the sampler and the tree utility function we are ready to formalize the tree growing phase in ACT. [sent-178, score-0.262]
59 The selection procedure, as formalized is Figure 2 (left) needs to be slightly modified when an attribute is numeric: instead of iterating over the values the attribute can take, we examine r cutting points, each is evaluated with a single invocation of EG2. [sent-181, score-0.255]
60 A subtree is pruned if the resulting tree is expected to yield a lower error. [sent-187, score-0.257]
61 When test costs are taken into account, pruning has another important role: reducing costs. [sent-188, score-0.363]
62 It is worthwhile to keep a subtree only if its expected reduction to the misclassification cost is larger that the cost of its tests. [sent-189, score-0.628]
63 If the misclassification cost was zero, it makes no sense to keep any split in the tree. [sent-190, score-0.272]
64 Assume that the cost of a in the current context is 1. [sent-196, score-0.241]
65 The estimated cost of a subtree rooted at a is therefore 1 + min(4. [sent-197, score-0.326]
66 the misclassification cost was very large, we would expect similar behavior to the cost-insensitive setup. [sent-202, score-0.268]
67 For each subtree, we compare its expected total cost to that of a leaf. [sent-205, score-0.282]
68 The results are indeed similar with the basic version of ICET achieving an average cost of 49. [sent-219, score-0.264]
69 Only five UCI datasets, however, have assigned test costs [24]. [sent-225, score-0.329]
70 To gain a wider perspective, we developed an automatic method that assigns costs to existing datasets randomly. [sent-226, score-0.332]
71 The method is parameterized with: (1) cr the cost range, (2) g the number of desired groups as a percentage of the number of attributes, and (3) sc the group shared cost as a percentage of the maximal marginal cost in the group. [sent-227, score-0.723]
72 Using this method we assigned costs to 25 datasets: 21 arbitrarily chosen UCI datasets2 and 4 datasets that represent hard concept and have been used in previous research. [sent-228, score-0.37]
73 Two versions of each dataset have been created, both with cost range of 1-100. [sent-230, score-0.241]
74 In total we have 55 datasets: 5 with costs assigned as in [24] and 50 with random costs. [sent-233, score-0.291]
75 Cost-insensitive learning algorithms focus on accuracy and therefore are expected to perform well 1 The default class is the one that minimizes the misclassification cost in the node. [sent-234, score-0.337]
76 il/∼esaher/publications/nips07 2 5 Table 1: Average cost of classification as a percentage of the standard cost of classification. [sent-240, score-0.482]
77 62 10 12 √ 0 20 40 60 80 100 0 20 40 60 80 100 0 0 20 40 60 80 100 0 20 40 60 80 100 Figure 3: Illustration of the differences in performance between ACT and ICET for misclassification costs (from left to right: 10, 100, 1000, and 10000). [sent-255, score-0.291]
78 The x-axis represents the cost of ICET while the y-axis represents that of ACT. [sent-257, score-0.241]
79 when testing costs are negligible relative to misclassification costs. [sent-260, score-0.318]
80 On the other hand, when testing costs are significant, ignoring them would result in expensive classifiers. [sent-261, score-0.318]
81 Therefore, to evaluate a cost-sensitive learner a wide spectrum of misclassification costs is needed. [sent-262, score-0.314]
82 For each problem out of the 55, we created 4 instances, with uniform misclassification costs mc = 10, 100, 1000, 10000. [sent-263, score-0.442]
83 To overcome these problems, Turney suggests to normalize the average cost of classification by dividing it by the standard cost, defined as (T C + mini (1 − fi ) · maxi,j (Mi,j )), The standard cost is an approximation for the maximal cost in a given problem. [sent-266, score-0.791]
84 It consists of two components: (1) T C, the cost if we take all tests, and (2) the misclassification cost if the classifier achieves only the base-line accuracy. [sent-267, score-0.482]
85 Table 1 lists the average results, Figure 3 illustrates the differences between ICET and ACT, and Figure 4 (left) plots the average cost for the different values of mc. [sent-282, score-0.287]
86 For mc set to 1000 and 10000 there are fewer significant wins, yet it is clear that ACT is dominating: the 4 Seeded ICET includes the true costs in the initial population and was reported to perform better [24]. [sent-287, score-0.442]
87 Average cost as a function of time for Breast-cancer-20 (mid-right) and Multi-XOR-80 (right most). [sent-290, score-0.241]
88 The Wilcoxon test, states that for mc = 10, 100, 10000, ACT is significantly better than ICET, and that for mc = 1000 no significant winner was found. [sent-292, score-0.321]
89 When misclassification costs are low, an optimal algorithm would produce a very shallow tree. [sent-293, score-0.331]
90 When misclassification costs are dominant, an optimal algorithm would produce a highly accurate tree. [sent-294, score-0.331]
91 Hence, with the increase in the importance of accuracy the normalized cost increases: the predictive errors affect the cost more dramatically. [sent-296, score-0.529]
92 In addition, we compared the studied methods on nonuniform misclassification costs and found ACT’s advantage to be consistent. [sent-311, score-0.326]
93 ACT can use additional time to acquire larger samples and hence achieve better cost estimations. [sent-315, score-0.241]
94 ACT dominates ICET for both domains and is able to produce trees of lower costs in shorter time. [sent-329, score-0.467]
95 Allowing ICET to produce more and more generations (up to 128) does not result in trees comparable to those obtained by ACT. [sent-332, score-0.243]
96 4 Conclusions Machine learning techniques are increasingly being used to produce a wide-range of classifiers for real-world applications that involve nonuniform testing costs and misclassification costs. [sent-333, score-0.413]
97 Our framework has 4 major advantages: (1) it uses a non-greedy approach to build a decision tree and therefore is able to overcome local minima problems, (2) it evaluates entire trees and therefore can be adjusted to any cost scheme that is defined over trees. [sent-336, score-0.568]
98 (3) it exhibits good anytime behavior and produces significantly better trees when more time is available, and (4) it can be easily parallelized and hence can benefit from distributed computer power. [sent-337, score-0.361]
99 ACT also exhibited good anytime behavior: with the increase in time allocation, there was a decrease in the cost of the learned models. [sent-341, score-0.439]
100 Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. [sent-479, score-0.254]
wordName wordTfidf (topN-words)
[('icet', 0.618), ('act', 0.385), ('costs', 0.291), ('misclassi', 0.268), ('cost', 0.241), ('anytime', 0.198), ('mc', 0.151), ('trees', 0.136), ('tree', 0.131), ('attribute', 0.116), ('attributes', 0.091), ('subtree', 0.085), ('generations', 0.067), ('tests', 0.065), ('genetic', 0.059), ('turney', 0.054), ('cf', 0.054), ('cation', 0.05), ('mini', 0.045), ('occam', 0.043), ('expected', 0.041), ('datasets', 0.041), ('charged', 0.04), ('ost', 0.04), ('decision', 0.04), ('israel', 0.04), ('produce', 0.04), ('classi', 0.039), ('resources', 0.039), ('test', 0.038), ('lookahead', 0.038), ('concept', 0.038), ('delayed', 0.037), ('nonuniform', 0.035), ('razor', 0.035), ('pruning', 0.034), ('ei', 0.032), ('diminishing', 0.032), ('builds', 0.032), ('split', 0.031), ('wilcoxon', 0.03), ('uci', 0.03), ('greedy', 0.029), ('splitting', 0.029), ('accuracy', 0.028), ('behavior', 0.027), ('default', 0.027), ('testing', 0.027), ('cholesterol', 0.027), ('doctors', 0.027), ('esmeir', 0.027), ('glucose', 0.027), ('idx', 0.027), ('misclassification', 0.027), ('proportionally', 0.027), ('seeded', 0.027), ('totala', 0.027), ('widen', 0.027), ('ee', 0.027), ('wins', 0.027), ('runtime', 0.026), ('candidate', 0.026), ('minimal', 0.025), ('ijcai', 0.025), ('exploit', 0.024), ('induction', 0.024), ('haifa', 0.023), ('technion', 0.023), ('hoose', 0.023), ('icf', 0.023), ('invocation', 0.023), ('qin', 0.023), ('reimplementation', 0.023), ('wait', 0.023), ('average', 0.023), ('learner', 0.023), ('types', 0.023), ('favors', 0.022), ('allocate', 0.021), ('escape', 0.021), ('foreach', 0.021), ('provost', 0.021), ('examples', 0.021), ('learners', 0.021), ('medical', 0.021), ('ers', 0.021), ('setup', 0.02), ('involve', 0.02), ('outcome', 0.02), ('minima', 0.02), ('worthwhile', 0.02), ('intend', 0.02), ('diagnostic', 0.02), ('estimations', 0.02), ('exempli', 0.02), ('leaf', 0.019), ('cantly', 0.019), ('blood', 0.019), ('winner', 0.019), ('importance', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
2 0.099784374 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
3 0.065964624 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
Author: Stephane Ross, Joelle Pineau, Brahim Chaib-draa
Abstract: Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and ǫ-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can find (provably) near-optimal solutions in reasonable time. 1
4 0.0656358 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
5 0.060202736 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
Author: Francis R. Bach, Zaïd Harchaoui
Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.
6 0.057139955 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
7 0.056921609 166 nips-2007-Regularized Boost for Semi-Supervised Learning
8 0.055102818 162 nips-2007-Random Sampling of States in Dynamic Programming
9 0.054968223 197 nips-2007-The Infinite Markov Model
10 0.053443369 39 nips-2007-Boosting the Area under the ROC Curve
11 0.052707266 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
12 0.051165666 113 nips-2007-Learning Visual Attributes
13 0.050962795 32 nips-2007-Bayesian Co-Training
14 0.048638359 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
15 0.044569258 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
16 0.041025225 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration
17 0.040608551 16 nips-2007-A learning framework for nearest neighbor search
18 0.040513363 62 nips-2007-Convex Learning with Invariances
19 0.039391287 61 nips-2007-Convex Clustering with Exemplar-Based Models
20 0.037906196 206 nips-2007-Topmoumoute Online Natural Gradient Algorithm
topicId topicWeight
[(0, -0.128), (1, -0.015), (2, -0.047), (3, 0.014), (4, -0.006), (5, 0.005), (6, 0.029), (7, 0.007), (8, 0.04), (9, -0.047), (10, -0.054), (11, 0.101), (12, 0.072), (13, -0.098), (14, -0.014), (15, -0.01), (16, 0.011), (17, -0.043), (18, 0.053), (19, -0.048), (20, -0.017), (21, 0.098), (22, 0.022), (23, 0.1), (24, 0.127), (25, -0.082), (26, -0.054), (27, 0.016), (28, -0.033), (29, 0.018), (30, -0.069), (31, -0.051), (32, -0.01), (33, -0.052), (34, -0.094), (35, -0.1), (36, -0.095), (37, 0.037), (38, 0.054), (39, -0.017), (40, 0.01), (41, 0.018), (42, -0.004), (43, -0.022), (44, 0.083), (45, 0.04), (46, -0.036), (47, 0.024), (48, -0.075), (49, 0.092)]
simIndex simValue paperId paperTitle
same-paper 1 0.95072025 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
2 0.67658919 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
3 0.64256668 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present the first truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufficiently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on fixed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1
4 0.56484038 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
5 0.49679241 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
6 0.48011491 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
7 0.47864434 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
8 0.42825496 119 nips-2007-Learning with Tree-Averaged Densities and Distributions
9 0.40999255 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables
10 0.3963393 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
11 0.38871485 197 nips-2007-The Infinite Markov Model
12 0.38704413 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
13 0.37756324 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
14 0.34158 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
15 0.33891982 39 nips-2007-Boosting the Area under the ROC Curve
16 0.33069912 171 nips-2007-Scan Strategies for Meteorological Radars
17 0.32112849 99 nips-2007-Hierarchical Penalization
18 0.31342864 206 nips-2007-Topmoumoute Online Natural Gradient Algorithm
19 0.3095156 136 nips-2007-Multiple-Instance Active Learning
20 0.30793446 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration
topicId topicWeight
[(5, 0.461), (13, 0.026), (16, 0.016), (21, 0.058), (31, 0.019), (34, 0.018), (35, 0.017), (47, 0.086), (49, 0.017), (83, 0.106), (85, 0.02), (87, 0.018), (90, 0.036)]
simIndex simValue paperId paperTitle
1 0.94587839 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process
Author: Charlie Frogner, Avi Pfeffer
Abstract: Dynamic Bayesian networks are structured representations of stochastic processes. Despite their structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors. In this paper we present several techniques for decomposing a dynamic Bayesian network automatically to enable factored inference. We examine a number of features of a DBN that capture different types of dependencies that will cause error in factored inference. An empirical comparison shows that the most useful of these is a heuristic that estimates the mutual information introduced between factors by one step of belief propagation. In addition to features computed over entire factors, for efficiency we explored scores computed over pairs of variables. We present search methods that use these features, pairwise and not, to find a factorization, and we compare their results on several datasets. Automatic factorization extends the applicability of factored inference to large, complex models that are undesirable to factor by hand. Moreover, tests on real DBNs show that automatic factorization can achieve significantly lower error in some cases. 1
same-paper 2 0.9033463 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
3 0.89910728 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefficients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive fields. However, the resulting sparse coefficients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefficients by including a pairwise coupling term in the prior over the coefficient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.
4 0.88436162 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
Author: Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis
Abstract: An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.
5 0.59187889 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
6 0.52906519 7 nips-2007-A Kernel Statistical Test of Independence
7 0.52517331 115 nips-2007-Learning the 2-D Topology of Images
8 0.51807839 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
9 0.51724517 146 nips-2007-On higher-order perceptron algorithms
10 0.51103407 96 nips-2007-Heterogeneous Component Analysis
11 0.50831699 187 nips-2007-Structured Learning with Approximate Inference
12 0.49845493 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
13 0.49807903 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
14 0.49311531 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
15 0.49054477 49 nips-2007-Colored Maximum Variance Unfolding
16 0.486137 164 nips-2007-Receptive Fields without Spike-Triggering
17 0.48330623 158 nips-2007-Probabilistic Matrix Factorization
18 0.48089811 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration
19 0.48009738 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis
20 0.47978693 113 nips-2007-Learning Visual Attributes