nips nips2007 nips2007-78 knowledge-graph by maker-knowledge-mining

78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Source: pdf

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present the ﬁrst truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efﬁcient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very signiﬁcant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on ﬁxed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. [sent-3, score-0.751]

2 An attractive solution is to use junction trees (JTs) of limited treewidth – a subclass of PGMs that permits efﬁcient exact inference. [sent-23, score-0.856]

3 For treewidth k = 1 (trees), the most likely (MLE) structure of a junction tree can be learned efﬁciently using the Chow-Liu algorithm [6], but the representational power of trees is often insufﬁcient. [sent-24, score-1.092]

4 We address the problem of learning JTs for ﬁxed treewidth k > 1. [sent-25, score-0.28]

5 While there are algorithms with global guarantees for learning ﬁxed-treewidth JTs [10, 13], there has been no polynomial algorithm with PAC guarantees. [sent-27, score-0.145]

6 In contrast, we provide a truly polynomial algorithm with PAC guarantees. [sent-31, score-0.113]

7 The contributions of this paper are as follows: • A theoretical result (Lemma 4) that upper bounds the conditional mutual information of arbitrarily large sets of random variables in polynomial time. [sent-32, score-0.239]

8 In particular, we do not assume that an efﬁciently computable mutual information oracle exists. [sent-33, score-0.22]

9 • The ﬁrst polynomial algorithm for PAC-learning the structure of limited-treewidth junction trees with strong intra-clique dependencies. [sent-34, score-0.638]

10 We provide graceful degradation guarantees for distributions that are only approximately representable by JTs with ﬁxed treewidth. [sent-35, score-0.16]

11 1 x1,x4,x5 x4,x5 1 x4,x5,x6 Algorithm 1: Na¨ve approach to structure learning ı Input: V , oracle I (·, · | ·), treewidth k, threshold δ 1 L ← ∅ ; // L is a set of “useful components” 2 for S ⊂ V s. [sent-36, score-0.421]

12 |S| = k do 3 for Q ⊂ V-S do 4 if I (Q, V-SQ | S) ≤ δ then 5 L ← L ∪ (S, Q) 4 x1,x5 x1,x2,x5 x2,x5 2 x2,x3,x5 5 x1,x2 x1,x2,x7 3 Figure 1: A junction tree. [sent-38, score-0.458]

13 6 return FindConsistentTree(L) • A lazy heuristics that allows to make the algorithm practical. [sent-40, score-0.12]

14 2 Bounded treewidth graphical models In general, even to represent a probability distribution P (V ) over discrete variables1 V we need space exponential in the size n of V . [sent-42, score-0.312]

15 However, junction trees of limited treewidth allow compact representation and tractable exact inference. [sent-43, score-0.903]

16 We brieﬂy review junction trees (for details see [7]). [sent-44, score-0.522]

17 Let T be a set of edges connecting pairs of cliques such that (T, C) is a tree. [sent-50, score-0.13]

18 Tree (T, C) is a junction tree iff it satisﬁes the running intersection property (RIP): ∀Ci , Cj ∈ C and ∀Ck on the (unique) simple path between Ci and Cj , x ∈ Ci ∩ Cj ⇒ x ∈ Ck . [sent-52, score-0.76]

19 A set Sij ≡ Ci ∩ Cj is called the separator corresponding to an edge (i−j) from T . [sent-53, score-0.225]

20 The size of a largest clique in a junction tree minus one is called the treewidth of that tree. [sent-54, score-1.134]

21 1, variable x2 is contained in both clique 3 and 5, so it has to be contained in clique 2, because 2 is on the simple path between 3 and 5. [sent-56, score-0.324]

22 1 has size 3, so the treewidth of that junction tree is 2. [sent-58, score-0.972]

23 A distribution P (V ) is representable using junction tree (T, C) if instantiating all variables in a separator Sij renders the variables on different sides of Sij independent. [sent-59, score-1.081]

24 Let Ci be cliques that can be reached from Ci in the (T, C) ij i i without using edge (i−j), and denote these reachable variables by Vij ≡ Vji ≡ Ck ∈Ci Ck \ Sij . [sent-61, score-0.184]

25 P (V ) factors according to junction tree (T, C) iff ∀(i − j) ∈ T , Vij ⊥ Vij | Sij . [sent-65, score-0.76]

26 If a distribution P (V ) factors according to some junction tree of treewidth k, we will say that P (V ) is k-JT representable. [sent-66, score-0.972]

27 For clarity, we will only consider maximal junction trees, where all separators have size k. [sent-68, score-0.485]

28 If P is k-JT representable, it also factors according to some maximal JT of treewidth k. [sent-69, score-0.28]

29 Instead, a natural relaxation is to require sets of variables to have low conditional mutual information I. [sent-71, score-0.153]

30 (T, C) is an ε-junction tree for P (V ) iff ∀(i − j) ∈ T : I Vij , Vij | Sij ≤ ε. [sent-75, score-0.302]

31 2 If there exists an ε-junction tree (T, C) for P (V ), we will say that P is k-JT ε-representable. [sent-77, score-0.262]

32 (2) This bound means that if we have an ε-junction tree for P (V ), then instead of P we can use its tractable principled approximation P(T,C) for inference. [sent-79, score-0.268]

33 In this paper, we address the problem of learning structure of such junction tree from data (samples from P ). [sent-80, score-0.722]

34 3 Structure learning In this paper, we address the following problem: given data, such as multiple temperature readings from sensors in a sensor network, we treat each datapoint as an instantiation of the random variables V and seek to ﬁnd a good approximation of P (V ). [sent-81, score-0.135]

35 We will assume that P (V ) is k-JT ε-representable for some ε and aim to ﬁnd a ε-junction tree for P with the same treewidth k and ˆ with ε as small as possible. [sent-82, score-0.514]

36 Note that the maximal treewidth k is considered to be a constant and not ˆ a part of problem input. [sent-83, score-0.28]

37 Let us initially assume that we have an oracle I (·, · | ·) that can compute the mutual in- Algorithm 2: LTCI: ﬁnd Conditional Indepenformation I (A, B | C) exactly for any disjoint dencies in Low-Treewidth distributions Input: V , separator S, oracle I (·, · | ·), subsets A, B, C ⊂ V . [sent-85, score-0.577]

38 Using the oracle I, a na¨ve approach 1 QS ← ∪x∈V {x} ; // QS is a set of singletons ı would be to evaluate2 I(Q, V-QS | S) for all 2 for A ⊂ V-S s. [sent-87, score-0.111]

39 We will say that a junction tree (T, C) 4 is consistent with a list L iff for every separa5 return QS i tor Sij of (T, C) it holds that (Sij , Vij ) ∈ L. [sent-95, score-0.807]

40 After L is formed, any junction tree consistent with L would be an ε-junction tree for P (V ). [sent-96, score-0.926]

41 Such tree would be found by some FindConsistentTree procedure, implemented, e. [sent-97, score-0.234]

42 These algorithms use mutual information tests to constrain the set of possible structures and return one that is consistent with the constraints. [sent-103, score-0.183]

43 1 directly is impractical because its complexity is exponential in the total number of variables n. [sent-105, score-0.115]

44 First, for each separator we need to call the oracle exponentially many times (2n−k−1 , once for every Q ⊂ V-S ). [sent-111, score-0.311]

45 Second, the mutual information oracle, I (A, B | S), is called on subsets A and B of size O(n). [sent-113, score-0.155]

46 Unfortunately, the best known way of computing mutual information (and estimating I from data) has time and sample complexity exponential in |A|+|B|+|S|. [sent-114, score-0.18]

47 Our ﬁrst new result states that we can limit ourselves to computing mutual information over small subsets of variables: Lemma 4. [sent-117, score-0.155]

48 , polynomially k many) calls to the oracle I (·, · | ·), and each call will involve at most |S| + k + 1 variables. [sent-125, score-0.111]

49 Lemma 4 also bounds the quality of approximation of P by a projection on any junction tree (T, C): i Corollary 5. [sent-126, score-0.692]

50 If conditions of Lemma 4 hold for P (V ) with S = Sij and A = Vij for every separator Sij of a junction tree (T, C), then (T, C) is a n(ε + δ)-junction tree for P (V ). [sent-127, score-1.126]

51 3 Algorithm 3: Efﬁcient approach to structure learning Input: V , oracle I (·, · | ·), treewidth k, threshold ε, L = ∅ 1 for S ⊂ V s. [sent-132, score-0.421]

52 2), in contrast, has polynomial complexity for q = O(1). [sent-140, score-0.125]

53 To gain intuition for LTCI, suppose there exists a ε-junction tree for P (V ), such that S is a separator and subsets B and C are on different sides of S in the junction tree. [sent-142, score-1.0]

54 If none of the possible partitionings have I (X, A-X | S) ≤ ε, we can conclude that all variables in A are on the same side of separator S in any ε-junction tree that includes S as a separator. [sent-145, score-0.478]

55 It is important that the partitioning algorithm returns partitions that are similar to connected comi ponents of Vij of the true junction tree for P (V ). [sent-153, score-0.84]

56 Suppose (T, C) is an ε-junction tree for P (V ), and QSij is an output of the algorithm for separator Sij and threshold δ. [sent-155, score-0.434]

57 For small α, an α-weak algorithm puts variables on different sides of a separator only if corresponding mutual information between those variables is not too large. [sent-159, score-0.431]

58 Ideally, we want a correct and δ-weak algorithm; for δ = ε it would separate variables that are on different sides of S in a true junction tree, but not introduce any spurious independencies. [sent-160, score-0.536]

59 Let Sij be a separator in (T, C) and Ci be the set of cliques ij i reachable from Ci without using edge (i − j). [sent-170, score-0.34]

60 Denote Tij the set of edges from T that connect 4 i cliques from Ci . [sent-171, score-0.13]

61 If (T, C) is an ε-junction tree for P (V ), then (Ci , Tij ) is an ε-junction tree for ij ij i i i P (Vij ∪ Sij ). [sent-172, score-0.52]

62 Moreover, the subtree (Cij , Tij ) consists of a clique Ci and several sub-subtrees that are each connected to Ci . [sent-173, score-0.284]

63 1 the subtree over cliques 1,2,4,5 can be decomposed into clique 2 and two sub-subtrees: one including cliques {1,4} and one with clique 5. [sent-175, score-0.537]

64 , each subcomponent can be connected directly to the clique (S, x); 3. [sent-184, score-0.249]

65 8 holds, return the decomposition D, otherwise return that no decomposition exists. [sent-196, score-0.18]

66 For separator size k, time complexity of FindConsistentTreeDPGreedy is O(nk+2 ) Combining Alg. [sent-199, score-0.239]

67 In general, FindConsistentTreeDP with greedy decomposition checks may miss a junction tree that is consistent with the list of components L, but there is a class of distributions for which Alg. [sent-205, score-0.79]

68 Intuitively, we require that for every (Sij , Vij ) from a ε-junction i tree (T, C), Alg. [sent-207, score-0.234]

69 This requirement is guaranteed for distributions where variables inside every clique of the junction tree are sufﬁciently strongly interdependent (have a certain level of mutual information): Lemma 10. [sent-209, score-1.066]

70 no two edges of T have the same separator, and for every separator S, clique C ∈ C, minX⊂C-S I (X, C-XS | S) > (k + 3)ε (we will call (T, C) (k + 3)ε-strongly connected), then Alg. [sent-212, score-0.403]

71 4 Sample complexity So far we have assumed that a mutual information oracle I (·, · | ·) exists for the distribution P (V ) and can be efﬁciently queried. [sent-214, score-0.287]

72 If we employ this oracle in our algorithms, the performance guarantee becomes probabilistic: Theorem 12. [sent-221, score-0.136]

73 If there exists a (k + 3)(ε + 2∆)-strongly connected ε-junction tree for P (V ), then γ ˆ Alg. [sent-222, score-0.349]

74 11, using U ≡ F (k, R, ∆, n2k+2 ) samples and O(n2k+3 U ) time, will ﬁnd a kn(ε+2∆)-junction tree for P (V ) with probability at least (1−γ). [sent-224, score-0.234]

75 , ε = 0), and the corresponding junction tree is strongly connected, then we can let both ∆ and γ go to zero and use Alg. [sent-227, score-0.723]

76 3 to ﬁnd, with probability arbitrarily 1 1 close to one, a junction tree that approximates P arbitrarily well in time polynomial in ∆ and log γ , i. [sent-228, score-0.778]

77 , the class of strongly connected k-junction trees is probably approximately correctly learnable3 . [sent-230, score-0.182]

78 If there exists an α-strongly connected junction tree for P (V ) with α > 0, then for β < αn, Alg. [sent-233, score-0.807]

79 3 will learn a β-junction tree for P with probability at least 1 − γ using 4 2k+7 O n2 log2 n log n samples from P (V ) and O n β 2 log2 n log n computation time. [sent-234, score-0.234]

80 3 is guaranteed to ﬁnd a junction tree (with all cliques connected to the same separator). [sent-242, score-0.896]

81 2 as a set of connected components of a graph with variables as vertices, and a hyper-edge connecting all variables from A whenever minX⊂A I (X, A-X | S) > δ. [sent-248, score-0.201]

82 As δ increases, some of the hyper-edges disappear, and the number of connected components (or independent sets) may increase. [sent-249, score-0.113]

83 More speciﬁcally, a graph QS is maintained for each separator S. [sent-250, score-0.2]

84 This issue is addressed by our second insight: If we ﬁnd a junction tree for a particular value of δ, we only need to recheck the components used in this tree. [sent-261, score-0.746]

85 These insights lead to a simple, lazy procedure: If FindConsistentTree returns a tree (T, C), we check the hyper-edges that intersect the components used to form (T, C). [sent-262, score-0.333]

86 This discrete-valued data was sampled from a known Bayesian network with treewidth 4. [sent-272, score-0.28]

87 We learned models with treewidth 3 because of computational concerns. [sent-273, score-0.306]

88 Chow-Liu performs much worse, since it is limited to models with treewidth 1. [sent-280, score-0.305]

89 4 Hill-climbing had 2 kinds of moves available: replace variable x with variable y in a connected subjunction tree, or relpace a leaf clique Ci with another clique (Ci \ Sij ) ∪ Smr connected to a separator Smr . [sent-284, score-0.698]

90 In 2(c), nodes denote variables, edges connect variables that belong to the same clique, green edges belong to both true and learned models, blue edges belong only to the learned model, red - only to the true one. [sent-295, score-0.219]

91 Each variable was discretized into 4 bins and we learned models of treewidth 2. [sent-298, score-0.306]

92 Since the locations of the sensor have an ∞-like shape with two loops, the problem of learning a thin junction tree for this data is hard. [sent-299, score-0.762]

93 We selected 32 locations in San Francisco Bay area for the experiments, discretized trafﬁc ﬂow values into 4 bins and learned models of treewidth 3. [sent-307, score-0.306]

94 Unlike our approach, [1] does not guarantee low treewidth of the result, instead settling for compactness. [sent-312, score-0.305]

95 Our approach has polynomial complexity and quality guarantees that hold for strongly connected k-JT ε-representable distributions, while those of [13] only hold for ε = 0. [sent-319, score-0.302]

96 We have presented the ﬁrst truly polynomial algorithm for learning junction trees with limited treewidth. [sent-320, score-0.66]

97 Based on a new upper bound for conditional mutual information that can be computed using polynomial time and number of samples, our algorithm is guaranteed to ﬁnd a junction tree that is close in KL divergence to the true distribution, for strongly connected k-JT ε-representable distributions. [sent-321, score-1.033]

98 As a special case of these guarantees, we show PAC-learnability of strongly connected k-JT representable distributions. [sent-322, score-0.185]

99 Tractable means that the result is guaranteed to be of limited treewidth, compact - with limited connectivity of the graph. [sent-328, score-0.12]

100 † superscript means per-iteration complexity, poly - O(nO(k) ), exp‡ - exponential in general, but poly for special cases. [sent-331, score-0.256]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('junction', 0.458), ('lpacjt', 0.319), ('treewidth', 0.28), ('tree', 0.234), ('qs', 0.234), ('sij', 0.224), ('vij', 0.218), ('separator', 0.2), ('clique', 0.162), ('pac', 0.139), ('ltci', 0.124), ('findconsistenttree', 0.117), ('poly', 0.112), ('oracle', 0.111), ('mutual', 0.109), ('ci', 0.108), ('cliques', 0.089), ('obs', 0.088), ('connected', 0.087), ('polynomial', 0.086), ('qi', 0.085), ('queyranne', 0.084), ('minx', 0.08), ('lazy', 0.073), ('iff', 0.068), ('findconsistenttreedpgreedy', 0.067), ('jts', 0.067), ('llh', 0.067), ('representable', 0.067), ('trees', 0.064), ('alarm', 0.062), ('guarantees', 0.059), ('karger', 0.058), ('srebro', 0.053), ('chow', 0.053), ('temperature', 0.053), ('fs', 0.051), ('hyperedges', 0.05), ('pgms', 0.05), ('qsij', 0.05), ('jt', 0.05), ('return', 0.047), ('subsets', 0.046), ('variables', 0.044), ('submodular', 0.044), ('tij', 0.044), ('decomposition', 0.043), ('compact', 0.042), ('edges', 0.041), ('complexity', 0.039), ('cj', 0.038), ('sensor', 0.038), ('subtree', 0.035), ('tractable', 0.034), ('sides', 0.034), ('graceful', 0.034), ('logr', 0.034), ('narasimhan', 0.034), ('teyssier', 0.034), ('traffic', 0.034), ('evolution', 0.033), ('score', 0.033), ('exponential', 0.032), ('abbeel', 0.032), ('traf', 0.032), ('guestrin', 0.032), ('thin', 0.032), ('strongly', 0.031), ('partitioning', 0.031), ('mi', 0.03), ('partitions', 0.03), ('structure', 0.03), ('mle', 0.029), ('checks', 0.029), ('subclass', 0.029), ('smr', 0.029), ('bay', 0.029), ('ck', 0.029), ('loglikelihood', 0.029), ('liu', 0.029), ('addressed', 0.028), ('exists', 0.028), ('si', 0.028), ('guaranteed', 0.028), ('local', 0.027), ('truly', 0.027), ('tests', 0.027), ('separators', 0.027), ('nathan', 0.027), ('viability', 0.027), ('learned', 0.026), ('kl', 0.026), ('ij', 0.026), ('components', 0.026), ('nk', 0.026), ('lemma', 0.025), ('limited', 0.025), ('guarantee', 0.025), ('edge', 0.025), ('evaluations', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

2 0.12071918 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1

3 0.11343576 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

4 0.096580639 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

Author: Tim V. Erven, Steven D. Rooij, Peter Grünwald

Abstract: Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods. Based on this analysis we deﬁne the switch-distribution, a modiﬁcation of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efﬁcient algorithm. 1

5 0.092054531 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson

Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a signiﬁcant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1

6 0.074572153 197 nips-2007-The Infinite Markov Model

7 0.07008414 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

8 0.066050708 61 nips-2007-Convex Clustering with Exemplar-Based Models

9 0.059864566 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

10 0.058198199 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

11 0.057139955 27 nips-2007-Anytime Induction of Cost-sensitive Trees

12 0.052141294 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

13 0.050901718 174 nips-2007-Selecting Observations against Adversarial Objectives

14 0.047762815 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

15 0.046926141 97 nips-2007-Hidden Common Cause Relations in Relational Learning

16 0.04665662 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

17 0.045102201 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

18 0.044166464 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

19 0.043172132 110 nips-2007-Learning Bounds for Domain Adaptation

20 0.042423528 42 nips-2007-CPR for CSPs: A Probabilistic Relaxation of Constraint Propagation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.151), (1, -0.016), (2, -0.056), (3, -0.012), (4, -0.027), (5, -0.104), (6, 0.032), (7, 0.065), (8, 0.046), (9, -0.069), (10, -0.082), (11, 0.109), (12, 0.086), (13, -0.126), (14, -0.008), (15, -0.026), (16, 0.047), (17, -0.023), (18, 0.023), (19, -0.043), (20, 0.05), (21, 0.027), (22, 0.019), (23, 0.052), (24, 0.179), (25, -0.029), (26, -0.048), (27, 0.008), (28, -0.021), (29, -0.029), (30, -0.12), (31, 0.028), (32, -0.081), (33, 0.0), (34, -0.151), (35, -0.015), (36, -0.059), (37, -0.009), (38, -0.047), (39, -0.069), (40, -0.068), (41, 0.069), (42, -0.049), (43, 0.04), (44, 0.024), (45, -0.02), (46, 0.046), (47, -0.082), (48, 0.013), (49, -0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9424727 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

2 0.70422095 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

3 0.66988432 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson

4 0.66251463 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

5 0.64320976 27 nips-2007-Anytime Induction of Cost-sensitive Trees

Author: Saher Esmeir, Shaul Markovitch

Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classiﬁers for complex real-world applications that involve nonuniform testing costs and misclassiﬁcation costs. As the complexity of these applications grows, the management of resources during the learning and classiﬁcation processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classiﬁcation costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of signiﬁcantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.

6 0.6079483 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

7 0.55245233 16 nips-2007-A learning framework for nearest neighbor search

8 0.54478151 197 nips-2007-The Infinite Markov Model

9 0.50593644 119 nips-2007-Learning with Tree-Averaged Densities and Distributions

10 0.46828526 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

11 0.41591281 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

12 0.39856631 159 nips-2007-Progressive mixture rules are deviation suboptimal

13 0.38602674 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

14 0.38453412 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

15 0.37015843 72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables

16 0.36961043 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

17 0.33711562 141 nips-2007-New Outer Bounds on the Marginal Polytope

18 0.33631444 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data

19 0.33530033 61 nips-2007-Convex Clustering with Exemplar-Based Models

20 0.32029381 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.057), (13, 0.026), (16, 0.017), (21, 0.036), (31, 0.035), (34, 0.025), (35, 0.465), (47, 0.065), (83, 0.116), (85, 0.015), (87, 0.01), (90, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8540051 130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes

Author: Richard Turner, Maneesh Sahani

Abstract: Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences (∼ 1 s); phonemes (∼ 10−1 s); glottal pulses (∼ 10−2 s); and formants ( 10−3 s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis [1]. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscienceinspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored. The reason for this is two-fold. Firstly, it is a difﬁcult technical problem to construct an algorithm that utilises both sorts of information. Secondly, it is computationally demanding to simultaneously process data both at high resolution (to extract short temporal information) and for long duration (to extract long temporal information). The contribution of this work is to develop a new statistical model for natural sounds that captures structure across a wide range of time-scales, and to provide efﬁcient learning and inference algorithms. We demonstrate the success of this approach on a missing data task. 1

2 0.83827418 160 nips-2007-Random Features for Large-Scale Kernel Machines

Author: Ali Rahimi, Benjamin Recht

Abstract: To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user speciﬁed shiftinvariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classiﬁcation and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large-scale kernel machines. 1

same-paper 3 0.83263344 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

4 0.79413372 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning

Author: Andreas Argyriou, Massimiliano Pontil, Yiming Ying, Charles A. Micchelli

Abstract: Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efﬁciently by an alternating minimization algorithm. In addition, we provide a necessary and sufﬁcient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1

5 0.52846026 16 nips-2007-A learning framework for nearest neighbor search

Author: Lawrence Cayton, Sanjoy Dasgupta

Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1

6 0.51337302 96 nips-2007-Heterogeneous Component Analysis

7 0.49069008 134 nips-2007-Multi-Task Learning via Conic Programming

8 0.48916245 175 nips-2007-Semi-Supervised Multitask Learning

9 0.47467378 49 nips-2007-Colored Maximum Variance Unfolding

10 0.47361815 156 nips-2007-Predictive Matrix-Variate t Models

11 0.46344256 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

12 0.4616054 63 nips-2007-Convex Relaxations of Latent Variable Training

13 0.45805439 116 nips-2007-Learning the structure of manifolds using random projections

14 0.45628631 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

15 0.45290595 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

16 0.45240048 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

17 0.44903436 70 nips-2007-Discriminative K-means for Clustering

18 0.44515908 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

19 0.44330385 158 nips-2007-Probabilistic Matrix Factorization

20 0.44328463 146 nips-2007-On higher-order perceptron algorithms