nips nips2007 nips2007-27 knowledge-graph by maker-knowledge-mining

27 nips-2007-Anytime Induction of Cost-sensitive Trees

Source: pdf

Author: Saher Esmeir, Shaul Markovitch

Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classiﬁers for complex real-world applications that involve nonuniform testing costs and misclassiﬁcation costs. As the complexity of these applications grows, the management of resources during the learning and classiﬁcation processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classiﬁcation costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of signiﬁcantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il Abstract Machine learning techniques are increasingly being used to produce a wide-range of classiﬁers for complex real-world applications that involve nonuniform testing costs and misclassiﬁcation costs. [sent-7, score-0.413]

2 It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. [sent-11, score-0.248]

3 Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. [sent-12, score-0.43]

4 ACT is also shown to exhibit good anytime behavior with diminishing returns. [sent-16, score-0.257]

5 The center aims to obtain a comprehensible model, with low expected test costs (the costs of testing attribute values) and high expected accuracy. [sent-18, score-0.845]

6 Moreover, in many cases there are costs associated with the predictive errors. [sent-19, score-0.291]

7 In such a scenario, the task of the inducer is to produce a model with low expected test costs and low expected misclassiﬁcation costs. [sent-20, score-0.451]

8 A good candidate for achieving the goals of comprehensibility and reduced costs is a decision tree model. [sent-21, score-0.488]

9 Several works proposed learners that consider different misclassiﬁcation costs [7, 18, 6, 9, 10, 14, 1]. [sent-26, score-0.312]

10 The medical center scenario exempliﬁes the need for considering both types of cost together: doctors do not perform a test before considering both its cost and its importance to the diagnosis. [sent-30, score-0.61]

11 Minimal Cost trees, a method that attempts to minimize both types of costs simultaneously has been proposed in [21]. [sent-31, score-0.314]

12 The immediate reduction in total cost each split results in is estimated, and a split with the maximal reduction is selected. [sent-33, score-0.303]

13 The cost of a9 and a10 , however, is signiﬁcantly higher than the others but lower than the cost of misclassiﬁcation. [sent-38, score-0.482]

14 In such a case, even if the costs were uniform, local measures would fail in recognizing the relevance of a9 and a10 and other attributes might be preferred. [sent-43, score-0.382]

15 Algorithms that can exploit more time to produce solutions of better quality are called anytime algorithms [5]. [sent-48, score-0.262]

16 One way to exploit additional time when searching for a tree of lower costs is to widen the search space. [sent-49, score-0.473]

17 The ICET algorithm [24] was a pioneer in searching non-greedily for a tree that minimizes both costs together. [sent-52, score-0.422]

18 ICET uses genetic search to produce a new set of costs that reﬂects both the original costs and the contribution each attribute can make to reduce misclassiﬁcation costs. [sent-53, score-0.797]

19 Then it builds a tree using the greedy EG2 algorithm but with the evolved costs instead of the original ones. [sent-54, score-0.483]

20 Because ICET assigns costs globally, they will have similar costs as well. [sent-67, score-0.582]

21 In addition ACT is shown to exhibit good anytime behavior with diminishing returns. [sent-76, score-0.257]

22 Note that costs may also be involved during example acquisition [12, 15]. [sent-78, score-0.291]

23 These two stages involve different types of cost [23]. [sent-84, score-0.284]

24 In a problem with |C| different classes, a classiﬁcation cost matrix M is a |C| × |C| matrix whose Mi,j entry deﬁnes the penalty of assigning the class ci to an instance that actually belongs to the class cj . [sent-88, score-0.241]

25 To calculate the test costs of a particular case, we sum the cost of the tests along the path from the root to the appropriate leaf. [sent-89, score-0.635]

26 Grouped tests share a common cost that is charged only once per group. [sent-92, score-0.346]

27 Each test also has an extra cost charged when the test is actually made. [sent-93, score-0.357]

28 For example, consider a tree path with tests like cholesterol level and glucose level. [sent-94, score-0.25]

29 Clearly, once blood samples are taken to measure the cholesterol level, the cost for measuring the glucose level is lower. [sent-96, score-0.314]

30 In this work we do not handle delayed costs but we do explain how to adapt our framework to scenarios that involve them. [sent-104, score-0.348]

31 Having measured the test costs and misclassiﬁcation costs, an important question is how to combine them. [sent-105, score-0.329]

32 Following [24] we assume that both types of cost are given in the same scale. [sent-106, score-0.264]

33 [19] presented a method to handle the two kinds of cost scales by setting a maximal budget for one kind and minimizing the other. [sent-109, score-0.241]

34 ACT, our proposed anytime framework for induction of cost-sensitive trees, builds on the recently introduced LSID3 algorithm [11]. [sent-110, score-0.254]

35 LSID3 adopts the general top-down induction of decision trees scheme (TDIDT): it starts from the entire set of training examples, partitions it into subsets by testing the value of an attribute, and then recursively builds subtrees. [sent-111, score-0.259]

36 For every candidate split, LSID3 attempts to estimate the size of the resulting subtree were the split to take place and following Occam’s razor [4] it favors the one with the smallest expected size. [sent-113, score-0.24]

37 In SID3, rather than choosing an attribute that maximizes the information gain ∆I (as in ID3), the splitting attribute is chosen semi-randomly. [sent-116, score-0.261]

38 LSID3 was shown to exhibit a good anytime behavior with diminishing returns. [sent-122, score-0.257]

39 In ACT, however, we would like to bias our sample towards low cost trees. [sent-129, score-0.241]

40 For this purpose, we designed a stochastic version of the EG2 algorithm, that attempts to build low cost trees greedily. [sent-130, score-0.377]

41 In EG2, a tree is built top-down, and the attribute that maximizes ICF (Information Cost Function) is chosen for splitting a node, where, ICF (a) = 2∆I(a) − 1 / ((cost (a) + 1)w ). [sent-131, score-0.276]

42 , if an attribute has already been tested its cost would be zero and if another attribute that belongs to the same group has been tested, a group discount is applied. [sent-137, score-0.473]

43 The parameter w controls the bias towards lower cost 3 attributes. [sent-138, score-0.241]

44 While ICET tunes this parameter using genetic search, we set w inverse proportionally to the misclassiﬁcation cost: a high misclassiﬁcation cost results in a smaller w, reducing the effect of attribute costs. [sent-139, score-0.443]

45 As a cost insensitive learner, the main goal of LSID3 is to maximize the expected accuracy of the learned tree. [sent-142, score-0.31]

46 Following Occam’s razor, it uses the tree size as a preference bias and favors splits that are expected to reduce the ﬁnal tree size. [sent-143, score-0.325]

47 In a cost-sensitive setup, our goal is to minimize the expected cost of classiﬁcation. [sent-144, score-0.282]

48 Therefore, given a tree, we need to come up with a procedure that estimates the expected costs when classifying a future case. [sent-147, score-0.332]

49 This cost consists of two components: the test cost and misclassiﬁcation cost. [sent-148, score-0.52]

50 Assuming that the distribution of future cases would be similar to that of the learning examples, we can estimate the test costs using the training data. [sent-149, score-0.329]

51 Given a tree, we calculate the average test cost of the training examples and use it to approximate the test cost of new cases. [sent-150, score-0.602]

52 For a tree T and a set of training examples E, we denote the average cost of traversing T for an example from E (average testing cost) by tst-cost(T, E). [sent-151, score-0.443]

53 Note that group discounts and delayed cost penalties do not need a special care because they will be incorporated when calculating the average test costs. [sent-152, score-0.339]

54 Moreover, tree size is measured in a different currency than accuracy and hence cannot be easily incorporated in the cost function. [sent-156, score-0.4]

55 Assume a problem with |C| classes and a misclassiﬁcation cost matrix M . [sent-166, score-0.241]

56 The expected misclassiﬁcation cost in l is (the right most expression assumes uniform misclassiﬁcation cost Mi,j = mc) mc-cost(l) = EE(m, m − mc , cf ) · 1 |C| − 1 Mc,i = EE(m, m − mc , cf ) · mc i=c The expected error of a tree is the sum of the expected errors in its leafs. [sent-169, score-1.297]

57 The expected total cost of T when classifying an instance is: tst-cost(T, E) + 1 · m mc-cost (l). [sent-177, score-0.282]

58 l∈L Having decided about the sampler and the tree utility function we are ready to formalize the tree growing phase in ACT. [sent-178, score-0.262]

59 The selection procedure, as formalized is Figure 2 (left) needs to be slightly modiﬁed when an attribute is numeric: instead of iterating over the values the attribute can take, we examine r cutting points, each is evaluated with a single invocation of EG2. [sent-181, score-0.255]

60 A subtree is pruned if the resulting tree is expected to yield a lower error. [sent-187, score-0.257]

61 When test costs are taken into account, pruning has another important role: reducing costs. [sent-188, score-0.363]

62 It is worthwhile to keep a subtree only if its expected reduction to the misclassiﬁcation cost is larger that the cost of its tests. [sent-189, score-0.628]

63 If the misclassiﬁcation cost was zero, it makes no sense to keep any split in the tree. [sent-190, score-0.272]

64 Assume that the cost of a in the current context is 1. [sent-196, score-0.241]

65 The estimated cost of a subtree rooted at a is therefore 1 + min(4. [sent-197, score-0.326]

66 the misclassiﬁcation cost was very large, we would expect similar behavior to the cost-insensitive setup. [sent-202, score-0.268]

67 For each subtree, we compare its expected total cost to that of a leaf. [sent-205, score-0.282]

68 The results are indeed similar with the basic version of ICET achieving an average cost of 49. [sent-219, score-0.264]

69 Only ﬁve UCI datasets, however, have assigned test costs [24]. [sent-225, score-0.329]

70 To gain a wider perspective, we developed an automatic method that assigns costs to existing datasets randomly. [sent-226, score-0.332]

71 The method is parameterized with: (1) cr the cost range, (2) g the number of desired groups as a percentage of the number of attributes, and (3) sc the group shared cost as a percentage of the maximal marginal cost in the group. [sent-227, score-0.723]

72 Using this method we assigned costs to 25 datasets: 21 arbitrarily chosen UCI datasets2 and 4 datasets that represent hard concept and have been used in previous research. [sent-228, score-0.37]

73 Two versions of each dataset have been created, both with cost range of 1-100. [sent-230, score-0.241]

74 In total we have 55 datasets: 5 with costs assigned as in [24] and 50 with random costs. [sent-233, score-0.291]

75 Cost-insensitive learning algorithms focus on accuracy and therefore are expected to perform well 1 The default class is the one that minimizes the misclassiﬁcation cost in the node. [sent-234, score-0.337]

76 il/∼esaher/publications/nips07 2 5 Table 1: Average cost of classiﬁcation as a percentage of the standard cost of classiﬁcation. [sent-240, score-0.482]

77 62 10 12 √ 0 20 40 60 80 100 0 20 40 60 80 100 0 0 20 40 60 80 100 0 20 40 60 80 100 Figure 3: Illustration of the differences in performance between ACT and ICET for misclassiﬁcation costs (from left to right: 10, 100, 1000, and 10000). [sent-255, score-0.291]

78 The x-axis represents the cost of ICET while the y-axis represents that of ACT. [sent-257, score-0.241]

79 when testing costs are negligible relative to misclassiﬁcation costs. [sent-260, score-0.318]

80 On the other hand, when testing costs are signiﬁcant, ignoring them would result in expensive classiﬁers. [sent-261, score-0.318]

81 Therefore, to evaluate a cost-sensitive learner a wide spectrum of misclassiﬁcation costs is needed. [sent-262, score-0.314]

82 For each problem out of the 55, we created 4 instances, with uniform misclassiﬁcation costs mc = 10, 100, 1000, 10000. [sent-263, score-0.442]

83 To overcome these problems, Turney suggests to normalize the average cost of classiﬁcation by dividing it by the standard cost, deﬁned as (T C + mini (1 − fi ) · maxi,j (Mi,j )), The standard cost is an approximation for the maximal cost in a given problem. [sent-266, score-0.791]

84 It consists of two components: (1) T C, the cost if we take all tests, and (2) the misclassiﬁcation cost if the classiﬁer achieves only the base-line accuracy. [sent-267, score-0.482]

85 Table 1 lists the average results, Figure 3 illustrates the differences between ICET and ACT, and Figure 4 (left) plots the average cost for the different values of mc. [sent-282, score-0.287]

86 For mc set to 1000 and 10000 there are fewer signiﬁcant wins, yet it is clear that ACT is dominating: the 4 Seeded ICET includes the true costs in the initial population and was reported to perform better [24]. [sent-287, score-0.442]

87 Average cost as a function of time for Breast-cancer-20 (mid-right) and Multi-XOR-80 (right most). [sent-290, score-0.241]

88 The Wilcoxon test, states that for mc = 10, 100, 10000, ACT is signiﬁcantly better than ICET, and that for mc = 1000 no signiﬁcant winner was found. [sent-292, score-0.321]

89 When misclassiﬁcation costs are low, an optimal algorithm would produce a very shallow tree. [sent-293, score-0.331]

90 When misclassiﬁcation costs are dominant, an optimal algorithm would produce a highly accurate tree. [sent-294, score-0.331]

91 Hence, with the increase in the importance of accuracy the normalized cost increases: the predictive errors affect the cost more dramatically. [sent-296, score-0.529]

92 In addition, we compared the studied methods on nonuniform misclassiﬁcation costs and found ACT’s advantage to be consistent. [sent-311, score-0.326]

93 ACT can use additional time to acquire larger samples and hence achieve better cost estimations. [sent-315, score-0.241]

94 ACT dominates ICET for both domains and is able to produce trees of lower costs in shorter time. [sent-329, score-0.467]

95 Allowing ICET to produce more and more generations (up to 128) does not result in trees comparable to those obtained by ACT. [sent-332, score-0.243]

96 4 Conclusions Machine learning techniques are increasingly being used to produce a wide-range of classiﬁers for real-world applications that involve nonuniform testing costs and misclassiﬁcation costs. [sent-333, score-0.413]

97 Our framework has 4 major advantages: (1) it uses a non-greedy approach to build a decision tree and therefore is able to overcome local minima problems, (2) it evaluates entire trees and therefore can be adjusted to any cost scheme that is deﬁned over trees. [sent-336, score-0.568]

98 (3) it exhibits good anytime behavior and produces signiﬁcantly better trees when more time is available, and (4) it can be easily parallelized and hence can beneﬁt from distributed computer power. [sent-337, score-0.361]

99 ACT also exhibited good anytime behavior: with the increase in time allocation, there was a decrease in the cost of the learned models. [sent-341, score-0.439]

100 Cost-sensitive classiﬁcation: Empirical evaluation of a hybrid genetic decision tree induction algorithm. [sent-479, score-0.254]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('icet', 0.618), ('act', 0.385), ('costs', 0.291), ('misclassi', 0.268), ('cost', 0.241), ('anytime', 0.198), ('mc', 0.151), ('trees', 0.136), ('tree', 0.131), ('attribute', 0.116), ('attributes', 0.091), ('subtree', 0.085), ('generations', 0.067), ('tests', 0.065), ('genetic', 0.059), ('turney', 0.054), ('cf', 0.054), ('cation', 0.05), ('mini', 0.045), ('occam', 0.043), ('expected', 0.041), ('datasets', 0.041), ('charged', 0.04), ('ost', 0.04), ('decision', 0.04), ('israel', 0.04), ('produce', 0.04), ('classi', 0.039), ('resources', 0.039), ('test', 0.038), ('lookahead', 0.038), ('concept', 0.038), ('delayed', 0.037), ('nonuniform', 0.035), ('razor', 0.035), ('pruning', 0.034), ('ei', 0.032), ('diminishing', 0.032), ('builds', 0.032), ('split', 0.031), ('wilcoxon', 0.03), ('uci', 0.03), ('greedy', 0.029), ('splitting', 0.029), ('accuracy', 0.028), ('behavior', 0.027), ('default', 0.027), ('testing', 0.027), ('cholesterol', 0.027), ('doctors', 0.027), ('esmeir', 0.027), ('glucose', 0.027), ('idx', 0.027), ('misclassification', 0.027), ('proportionally', 0.027), ('seeded', 0.027), ('totala', 0.027), ('widen', 0.027), ('ee', 0.027), ('wins', 0.027), ('runtime', 0.026), ('candidate', 0.026), ('minimal', 0.025), ('ijcai', 0.025), ('exploit', 0.024), ('induction', 0.024), ('haifa', 0.023), ('technion', 0.023), ('hoose', 0.023), ('icf', 0.023), ('invocation', 0.023), ('qin', 0.023), ('reimplementation', 0.023), ('wait', 0.023), ('average', 0.023), ('learner', 0.023), ('types', 0.023), ('favors', 0.022), ('allocate', 0.021), ('escape', 0.021), ('foreach', 0.021), ('provost', 0.021), ('examples', 0.021), ('learners', 0.021), ('medical', 0.021), ('ers', 0.021), ('setup', 0.02), ('involve', 0.02), ('outcome', 0.02), ('minima', 0.02), ('worthwhile', 0.02), ('intend', 0.02), ('diagnostic', 0.02), ('estimations', 0.02), ('exempli', 0.02), ('leaf', 0.019), ('cantly', 0.019), ('blood', 0.019), ('winner', 0.019), ('importance', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 27 nips-2007-Anytime Induction of Cost-sensitive Trees

Author: Saher Esmeir, Shaul Markovitch

2 0.099784374 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

3 0.065964624 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Author: Stephane Ross, Joelle Pineau, Brahim Chaib-draa

Abstract: Planning in partially observable environments remains a challenging problem, despite signiﬁcant recent advances in ofﬂine approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their ofﬂine counterparts. Thus it seems natural to try to unify ofﬂine and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate ofﬂine value iteration algorithms through the use of an efﬁcient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and ǫ-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can ﬁnd (provably) near-optimal solutions in reasonable time. 1

4 0.0656358 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1

5 0.060202736 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

Author: Francis R. Bach, Zaïd Harchaoui

Abstract: We present a novel linear clustering framework (D IFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive deﬁnite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classiﬁcation. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.

6 0.057139955 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

7 0.056921609 166 nips-2007-Regularized Boost for Semi-Supervised Learning

8 0.055102818 162 nips-2007-Random Sampling of States in Dynamic Programming

9 0.054968223 197 nips-2007-The Infinite Markov Model

10 0.053443369 39 nips-2007-Boosting the Area under the ROC Curve

11 0.052707266 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

12 0.051165666 113 nips-2007-Learning Visual Attributes

13 0.050962795 32 nips-2007-Bayesian Co-Training

14 0.048638359 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

15 0.044569258 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

16 0.041025225 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration

17 0.040608551 16 nips-2007-A learning framework for nearest neighbor search

18 0.040513363 62 nips-2007-Convex Learning with Invariances

19 0.039391287 61 nips-2007-Convex Clustering with Exemplar-Based Models

20 0.037906196 206 nips-2007-Topmoumoute Online Natural Gradient Algorithm

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.128), (1, -0.015), (2, -0.047), (3, 0.014), (4, -0.006), (5, 0.005), (6, 0.029), (7, 0.007), (8, 0.04), (9, -0.047), (10, -0.054), (11, 0.101), (12, 0.072), (13, -0.098), (14, -0.014), (15, -0.01), (16, 0.011), (17, -0.043), (18, 0.053), (19, -0.048), (20, -0.017), (21, 0.098), (22, 0.022), (23, 0.1), (24, 0.127), (25, -0.082), (26, -0.054), (27, 0.016), (28, -0.033), (29, 0.018), (30, -0.069), (31, -0.051), (32, -0.01), (33, -0.052), (34, -0.094), (35, -0.1), (36, -0.095), (37, 0.037), (38, 0.054), (39, -0.017), (40, 0.01), (41, 0.018), (42, -0.004), (43, -0.022), (44, 0.083), (45, 0.04), (46, -0.036), (47, 0.024), (48, -0.075), (49, 0.092)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95072025 27 nips-2007-Anytime Induction of Cost-sensitive Trees

Author: Saher Esmeir, Shaul Markovitch

2 0.67658919 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

3 0.64256668 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present the ﬁrst truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efﬁcient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very signiﬁcant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on ﬁxed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1

4 0.56484038 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

5 0.49679241 16 nips-2007-A learning framework for nearest neighbor search

Author: Lawrence Cayton, Sanjoy Dasgupta

Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1

6 0.48011491 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

7 0.47864434 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

8 0.42825496 119 nips-2007-Learning with Tree-Averaged Densities and Distributions

9 0.40999255 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables

10 0.3963393 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

11 0.38871485 197 nips-2007-The Infinite Markov Model

12 0.38704413 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

13 0.37756324 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

14 0.34158 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

15 0.33891982 39 nips-2007-Boosting the Area under the ROC Curve

16 0.33069912 171 nips-2007-Scan Strategies for Meteorological Radars

17 0.32112849 99 nips-2007-Hierarchical Penalization

18 0.31342864 206 nips-2007-Topmoumoute Online Natural Gradient Algorithm

19 0.3095156 136 nips-2007-Multiple-Instance Active Learning

20 0.30793446 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.461), (13, 0.026), (16, 0.016), (21, 0.058), (31, 0.019), (34, 0.018), (35, 0.017), (47, 0.086), (49, 0.017), (83, 0.106), (85, 0.02), (87, 0.018), (90, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94587839 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

Author: Charlie Frogner, Avi Pfeffer

Abstract: Dynamic Bayesian networks are structured representations of stochastic processes. Despite their structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors. In this paper we present several techniques for decomposing a dynamic Bayesian network automatically to enable factored inference. We examine a number of features of a DBN that capture different types of dependencies that will cause error in factored inference. An empirical comparison shows that the most useful of these is a heuristic that estimates the mutual information introduced between factors by one step of belief propagation. In addition to features computed over entire factors, for efﬁciency we explored scores computed over pairs of variables. We present search methods that use these features, pairwise and not, to ﬁnd a factorization, and we compare their results on several datasets. Automatic factorization extends the applicability of factored inference to large, complex models that are undesirable to factor by hand. Moreover, tests on real DBNs show that automatic factorization can achieve signiﬁcantly lower error in some cases. 1

same-paper 2 0.9033463 27 nips-2007-Anytime Induction of Cost-sensitive Trees

Author: Saher Esmeir, Shaul Markovitch

3 0.89910728 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

Author: Pierre Garrigues, Bruno A. Olshausen

Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefﬁcients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive ﬁelds. However, the resulting sparse coefﬁcients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefﬁcients by including a pairwise coupling term in the prior over the coefﬁcient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.

4 0.88436162 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data

Author: Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

Abstract: An important problem in many ﬁelds is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.

5 0.59187889 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

Author: Matthias Bethge, Philipp Berens

Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1

6 0.52906519 7 nips-2007-A Kernel Statistical Test of Independence

7 0.52517331 115 nips-2007-Learning the 2-D Topology of Images

8 0.51807839 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

9 0.51724517 146 nips-2007-On higher-order perceptron algorithms

10 0.51103407 96 nips-2007-Heterogeneous Component Analysis

11 0.50831699 187 nips-2007-Structured Learning with Approximate Inference

12 0.49845493 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

13 0.49807903 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images

14 0.49311531 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation

15 0.49054477 49 nips-2007-Colored Maximum Variance Unfolding

16 0.486137 164 nips-2007-Receptive Fields without Spike-Triggering

17 0.48330623 158 nips-2007-Probabilistic Matrix Factorization

18 0.48089811 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration

19 0.48009738 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis

20 0.47978693 113 nips-2007-Learning Visual Attributes