nips nips2012 nips2012-260 knowledge-graph by maker-knowledge-mining

260 nips-2012-Online Sum-Product Computation Over Trees


Source: pdf

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. [sent-9, score-0.402]

2 In our online model we are given a tree and a fixed set of parameters. [sent-17, score-0.409]

3 ” A prediction request indicates a vertex for which we then return the posterior marginal at that vertex. [sent-19, score-0.402]

4 Classical belief propagation requires time linear in the size of the tree for this task. [sent-21, score-0.413]

5 Our algorithm requires time linear in the height of an optimal hierarchical cover of this tree. [sent-22, score-0.728]

6 The height of the cover is in the worst case logarithmic in the size the tree. [sent-23, score-0.573]

7 In Section 4 we show how we may use this cover as a structure to cache computations in our sumproduct-like algorithm. [sent-28, score-0.387]

8 In [6] an algorithm for the online setting was given for a Bayes net on a tree T which required O(log |V (T )|) time per marginalization step, where |V (T )| is the number of vertices in the tree. [sent-32, score-0.662]

9 The term χ∗ (T ) is the height of our 1 optimal hierarchical cover which is upper bounded by O(min(log |V (T )|, diameter(T ))), but may in fact be exponentially smaller. [sent-35, score-0.728]

10 2 Hierarchical cover of a tree In this section we introduce our notion of a hierarchical cover of a tree and its dual the decomposition tree. [sent-36, score-1.609]

11 With slight abuse of notation, NG (v) := NG ({v}), and thus the degree / of a vertex v is |NG (v)|. [sent-44, score-0.379]

12 A tree T is a graph in which for all v, w ∈ T there exists a unique path connecting v with w. [sent-51, score-0.481]

13 In this paper we will only consider trees with a non-empty edge set and thus the vertex set will always have a cardinality of at least 2. [sent-52, score-0.467]

14 The pair (T, r) denotes a rooted tree T with root vertex r. [sent-54, score-0.854]

15 Given a rooted tree (T, r) and any vertex i ∈ V (T ), the (proper) descendants of i are all vertices that can be connected with r via paths P ⊆ T containing i (excluding i). [sent-55, score-1.031]

16 Given a tree T we use the notation S ⊆ T only if S is a tree and subgraph of T . [sent-65, score-0.78]

17 The T height of a rooted tree (T, r) is the maximum length of a path P ⊆ T connecting the root to any vertex: hr (T ) := maxv∈T dT (v, r). [sent-66, score-0.794]

18 The diameter ∆(T ) of a tree T is defined as the length of the longest path between any two vertices in T . [sent-67, score-0.657]

19 1 The hierarchical cover of a tree In this section we describe a splitting process that recursively decomposes a given tree T . [sent-69, score-1.32]

20 A (decomposition) tree (D, r) identifies this splitting process, generating a tree-structured collection S of subtrees that hierarchically cover the given tree T . [sent-70, score-1.306]

21 More precisely, a subtree S ⊆ T is split into two or more subcomponents and the decomposition of S depends only on the choice of a vertex v ∈ S • , which we call splitting vertex, in the following way. [sent-72, score-0.798]

22 The splitting vertex v ∈ S • of S induces the split set Ω(S, v) = {S1 , . [sent-73, score-0.516]

23 , S|NS (v)| } which is the unique set of S’s subtrees which overlap at a vertex v, uniquely, that represent a cover for S, i. [sent-76, score-0.911]

24 Thus the split may be visualized by considering the forest F resulting from removing a vertex from S, but afterwards each component S1 , . [sent-79, score-0.437]

25 We indicate with S v ⊆ T the component subtree whose splitting vertex is v, and we denote atomic components by S (i,j) , where E(S (i,j) ) = {(i, j)}. [sent-84, score-0.781]

26 Since the method is recursive, we can associate a rooted tree (D, r), with T ’s decomposition into a hierarchical cover, whose internal vertices are the splitting vertices of the splitting process. [sent-86, score-1.255]

27 Its leaves correspond to the single edges (of E(T )) of each atomic component, and a vertex “parent-child” relation c ∈ ↓r (p) corresponds D to the “splits-into” relation S c ∈ Ω(S p , p) (see Figure 1). [sent-87, score-0.553]

28 We will now formalize the splitting process by defining the hierarchical cover S of a tree T , which is a key concept used by our algorithm. [sent-88, score-0.942]

29 A hierarchical cover S of a tree T is a tree-structured collection of subtrees that hierarchically cover the tree T satisfying the following three properties: 1. [sent-90, score-1.691]

30 The splitting process that generates a hierarchical cover S of T is formalized as rooted tree (D, r) in the following definition. [sent-95, score-1.037]

31 If S is a hierarchical cover of T we define the associated decomposition tree (D, r) as a rooted tree, whose vertex set V (D) := T • ∪ E(T ) where D• = T • and leaves(D) = E(T ), such that the following three properties hold: 1. [sent-97, score-1.401]

32 D The following lemma shows that with any given hierarchical cover S it is possible to associate a unique decomposition tree (D, r). [sent-101, score-0.952]

33 A hierarchical cover S of T defines a unique decomposition tree (D, r) such that if S ∈ S there exists a v ∈ V (D) such that S = S v and if v, w ∈ V (D) and v = w, then S v = S w . [sent-103, score-0.952]

34 For a given hierarchical cover S in the following we define the height and the exposure: two properties which measure different senses of the “size” of a cover. [sent-104, score-0.728]

35 The height of a hierarchical cover S is the height of the associated decomposition tree D. [sent-105, score-1.403]

36 Note that the height of a decomposition tree D may be exponentially smaller than the height of T , since, for example, it is not difficult to show that there exists a decomposition tree isomorphic to a binary tree when the input tree T is a path graph. [sent-106, score-2.156]

37 If R ⊆ T and SR is a hierarchical cover of R, we define the exposure of SR (with respect to tree T ) as maxQ∈SR |∂T (Q)|. [sent-107, score-0.926]

38 Thus the exposure is a measure relative to a “containing” tree (which can be the input tree T itself) and the height is independent of any containing tree. [sent-108, score-1.021]

39 In Section 4 the covering subtrees correspond to cached “joint distributions,” which are defined on the boundary vertices of the subtrees, and require memory exponential in the boundary size. [sent-109, score-0.658]

40 We now define a measure of the optimal height with respect to a given exposure value. [sent-111, score-0.327]

41 A hierarchical cover with exposure at most k is called a k-hierarchical cover. [sent-113, score-0.579]

42 Given any subtree R ⊆ T , the k-decomposition potential χk (R) of R is the minimum height of all hierarchical covers of SR with exposure (with respect to T ) not larger than k. [sent-114, score-0.743]

43 The ∗-decomposition potential χ∗ (R) is the minimum height of all hierarchical covers of R. [sent-115, score-0.462]

44 , a graph with a single central vertex and any number of adjacent vertices, there is in fact only one possible hierarchical cover obtained by splitting the central vertex so that χ∗ (star) = 1. [sent-120, score-1.381]

45 In this case we have χ∗ (star-path) = O(log log(|star-path|)); as each path has a hierarchical cover of height O(log log(|star-path|)), each of these path covers may then be joined to create a cover of the star-path. [sent-128, score-1.325]

46 Thus we may restrict our algorithm to hierarchical covers with an exposure of 2 at very little cost in efficiency. [sent-132, score-0.313]

47 Given any element Q = T in a 2-hierarchical cover of T then |∂T (Q)| ∈ {1, 2}. [sent-135, score-0.335]

48 3 the two vertices v, w and defined as follows: Q := w := argmaxS⊆T (|V (S)| : v, w ∈ leaves(S)), v that is the maximal subtree of T , having v and w among its leaves. [sent-140, score-0.371]

49 Q is now defined as the T ’s subtree containing vertex w together with all the descendents ⇓w (z) where z ∈ NT (w). [sent-144, score-0.571]

50 We now introduce the notion of (2, s)-hierarchical covers (which, for simplicity, we shall also call (2, s)-covers) with respect to a rooted tree (T, s). [sent-149, score-0.534]

51 This notion explicitly depends on a given vertex s ∈ V (T ), which, for the sake of simplicity, will be assumed to be a leaf of T . [sent-150, score-0.445]

52 (2, s)-Hierarchical covers are guaranteed to not be much larger than a 2-hierarchical cover (see Theorem 6). [sent-151, score-0.404]

53 Given any subtree R ⊆ T , a 2-hierarchical cover SR is a (2, s)-hierarchical cover of R if, for all S ∈ SR \ {T }, there exists v, w ∈ S where v ∈ ⇓s (w) such that (case 1: |∂T (Q)| = 1) T S = w , or (case 2: |∂T (Q)| = 2) S = w . [sent-154, score-0.862]

54 We define χ2 (R) to be s T v v the minimal height of any possible (2, s)-hierarchical cover of R ⊆ T . [sent-156, score-0.573]

55 Thus every subtree of a (2, s)-hierarchical cover is necessarily “oriented” with respect to a root s. [sent-157, score-0.56]

56 3 Computing an optimal hierarchical cover From a “big picture” perspective, a (2, s)-hierarchical cover G is recursively constructed in a bottomup fashion: in the initialization phase G contains only the atomic components convering T , i. [sent-158, score-0.972]

57 All the subtrees added at each step t must strictly contain only subtrees added before step t. [sent-164, score-0.402]

58 We now introduce the formal description of our method for constructing a (2, s)-hierarchical cover G. [sent-165, score-0.335]

59 At each step t the method operates on a tree Tt , whose vertices are part of V (T ). [sent-167, score-0.526]

60 These additional edges are not part of E(T ) and link each vertex v with its grand-parent in Tt if vertex v’s parent was deleted (see the dashed line edges in Figure 1) during the construction of Tt+1 from Tt . [sent-171, score-0.847]

61 2 3 This representation is not necessarily unique, as if 1 , 2 ∈ leaves(T ) ∩ Q, we have Observe that s is the unique vertex belonging to V (Tt ) for all time steps t ≥ 0. [sent-177, score-0.404]

62 Given a rooted tree (T, s), the algorithm in Figure 1 outputs G, an optimal (2, s)hierarchical cover in time linear in |V (T )| of height χ2 (T ) which is bounded as χ∗ (T ) ≤ χ2 (T ) ≤ s χ2 (T ) ≤ 2χ∗ (T ) ≤ O(min(log |V (T )|, ∆(T ))) . [sent-180, score-1.015]

63 s Before we provide the detailed description of the algorithm for constructing an optimal (2, s)hierarchical cover we need some ancillary definitions. [sent-181, score-0.335]

64 We call a vertex v ∈ V (Tt ) \ {s} mergeable (at time t) if and only if either (i) v ∈ leaves(Tt ) or (ii) v has a single child in Tt and that child is not mergeable. [sent-182, score-0.523]

65 We also use the following shorthands for making more intuitive our notation: We set ct := ↓s t (v) when |↓s t (v)| = 1, pt := ↑s t (v) when v v T T T t v = s and gv := ↑s t (pt ) when v, pt = s. [sent-184, score-0.322]

66 E(Tt+1 ) ← {(v, pt ) : v, pt ∈ V (Tt+1 )}∪ v v t t {(v, gv ) : v, gv ∈ V (Tt+1 ), pt ∈ V (Tt+1 )}. [sent-195, score-0.506]

67 /01-2 Output: Optimal (2, s)-hierarchical cover G of T . [sent-235, score-0.335]

68 In order to clarify the method, we describe some of the details of the cover and some merge operations that are performed in the diagram. [sent-242, score-0.377]

69 , at time t = 0, S 2 is formed by merging the three atomic subtrees S (1,2) , S (2,3) and S (2,4) , which were added in the initialization step. [sent-252, score-0.375]

70 These three subtrees overlap at only vertex 2, which is depicted in black because it is mergeable in T0 . [sent-253, score-0.657]

71 For what concerns the decomposition tree (D, r), we have ↓r (5) = {(4, 5), 6}, which D implies that S 5 is therefore formed by the atomic component S (4,5) and the non-atomic component S 6 . [sent-254, score-0.594]

72 Observe that in T1 vertex 12 is a leaf and the z variable in the while-loop step 2 is assigned to vertex 10 (v and and pt is respectively vertex 12 and 8). [sent-256, score-1.341]

73 Finally, notice that the height of the T (2, s)-hierarchical cover of S v is equal to t + 1 iff v is depicted in black in Tt . [sent-259, score-0.639]

74 4 Online marginalization In this section we introduce our algorithm for efficiently computing marginals by summing over products of variables in a tree topology. [sent-260, score-0.471]

75 In a probabilistic setting it is natural to view each normalized θe as a stochastic symmetric “transition” matrix and the “data” D as a right stochastic matrix corresponding to “beliefs” about k different labels at each vertex in T . [sent-262, score-0.379]

76 Using the hierarchical cover for efficient online marginalization. [sent-275, score-0.552]

77 In the previous section we discussed a method to compute a hierarchical cover of a tree T with optimal height χ2 (T ) in time s linear in T . [sent-276, score-1.075]

78 For each tree in our hierarchical cover S ∈ S we will have an associated potential function. [sent-281, score-0.837]

79 This clarifies our motivation to find a cover with both small height and exposure. [sent-284, score-0.573]

80 Given a tree T and a hierarchical cover S it is isomorphic to a decomposition tree (D, r). [sent-288, score-1.305]

81 First, each vertex z ∈ D will serve as a “name” for a tree S z ∈ S. [sent-290, score-0.726]

82 Second, in the same way that the “messages passing” in belief propagation the follows the topology of the input tree, the structure of our computations follows the decomposition tree D. [sent-291, score-0.531]

83 As the cover has trees with one or two boundary vertices (excepting T which has none) we define the corresponding vertices of the decomposition tree, Ci := {z ∈ D : |∂T (S z )| = i} for i ∈ {1, 2}. [sent-293, score-0.891]

84 We also need notation for the potentially two boundary vertices of a tree S v ∈ S if v ∈ D \ {r}. [sent-297, score-0.637]

85 Observe that for v ∈ C1 ∪ C2 one boundary vertex of S v is necessarily v :=↑ v and if v ∈ C2 there exists an ancestor ˙ v of v in D of so that {v, v } = ∂T (S v ). [sent-298, score-0.475]

86 ” The indices a, b ∈ INk and thus the memory requirements of our algorithm are linear in the cardinality of the tree and quadratic in the number of labels. [sent-310, score-0.347]

87 We extend by a vertex v ∈ V (T ) and a label a ∈ INk , the labelling µ ∈ L(X) to the labelling µa ∈ L(X ∪ {v}) which satisfies µa (v) = a and µa (X) = µ. [sent-313, score-0.429]

88 A number of our identities assumed for a given vertex that it is in the interior of the tree and hence in the interior of decomposition tree. [sent-319, score-0.904]

89 Thus before we find the hierarchical cover of our input tree we extend the tree by adding a “dummy” edge from each leaf of the tree to a new dummy vertex. [sent-320, score-1.721]

90 The hierarchical cover is then found on this enlarged tree; the cover height may at most only increase by one. [sent-322, score-1.063]

91 By setting the values in dummy edges and vertices in Θ and D to one, this ensures that all marginal computations are unchanged. [sent-323, score-0.332]

92 The update and marginalization are linear in cover height χ∗ (T ). [sent-326, score-0.681]

93 Thus for example if the set of possible labels is linear in the size of the tree classical belief propagation may be faster. [sent-328, score-0.413]

94 Finally we observe that we may reduce the cubic dependence to a quadratic dependence on k via a cover with the height bounded by the diameter of T as opposed to χ∗ (T ). [sent-329, score-0.647]

95 We accomplish this by modifying the cover algorithm (Figure 1) to only merge leaf vertices. [sent-332, score-0.443]

96 Observe that the height of this cover is now O(diameter(T )); and we have a hierarchical factorization into α-potentials and only atomic β-potentials. [sent-333, score-0.807]

97 The inspiration is that we have multiple tasks and a given tree structure that describes our prior expectation of “relatedness” between tasks (see e. [sent-335, score-0.347]

98 The construction of the decomposition tree may be simultaneously accomplished with the same complexity. [sent-342, score-0.468]

99 Output: ρa (v)/( b ρb (v)) Initialization: The α, β and γ weights are initialised in a bottom-up fashion on the decomposition tree - we initialise the weights of a vertex after we have initialised the weights of all its children. [sent-363, score-0.979]

100 Update: Dt+1 = Dt ; Dt+1 (v t ) = (ˆt (a)e−ηy (a) )a∈INk p Figure 3: T REE -H EDGE Thus each vertex represents a task and if we have an edge between vertices then a priori we expect those tasks to be related. [sent-385, score-0.609]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vertex', 0.379), ('tt', 0.357), ('tree', 0.347), ('cover', 0.335), ('height', 0.238), ('subtree', 0.192), ('vertices', 0.179), ('subtrees', 0.172), ('ink', 0.171), ('hierarchical', 0.155), ('pt', 0.138), ('dva', 0.133), ('splitting', 0.105), ('cached', 0.101), ('rooted', 0.095), ('decomposition', 0.09), ('exposure', 0.089), ('path', 0.081), ('atomic', 0.079), ('mergeable', 0.076), ('marginalization', 0.074), ('dummy', 0.073), ('boundary', 0.071), ('ab', 0.07), ('covers', 0.069), ('sr', 0.068), ('leaves', 0.066), ('leaf', 0.066), ('covering', 0.064), ('online', 0.062), ('lmix', 0.057), ('massimiliano', 0.055), ('edge', 0.051), ('marginals', 0.05), ('diameter', 0.05), ('herbster', 0.05), ('dv', 0.049), ('dt', 0.048), ('gv', 0.046), ('ree', 0.046), ('subgraph', 0.046), ('ng', 0.044), ('merge', 0.042), ('else', 0.041), ('notation', 0.04), ('judea', 0.038), ('milan', 0.038), ('vm', 0.037), ('receive', 0.037), ('triple', 0.037), ('initialization', 0.037), ('trees', 0.037), ('iff', 0.036), ('mt', 0.036), ('weights', 0.035), ('update', 0.034), ('child', 0.034), ('fabio', 0.034), ('lever', 0.034), ('belief', 0.033), ('ns', 0.033), ('root', 0.033), ('propagation', 0.033), ('star', 0.032), ('merging', 0.032), ('interior', 0.032), ('split', 0.032), ('recursively', 0.031), ('descendants', 0.031), ('joined', 0.031), ('isomorphic', 0.031), ('construction', 0.031), ('graphs', 0.03), ('black', 0.03), ('allocation', 0.03), ('added', 0.029), ('initialised', 0.029), ('edges', 0.029), ('computations', 0.028), ('graph', 0.028), ('ancestors', 0.028), ('clari', 0.028), ('formed', 0.026), ('component', 0.026), ('ancestor', 0.025), ('factorizations', 0.025), ('unique', 0.025), ('guy', 0.025), ('cb', 0.025), ('border', 0.025), ('labelling', 0.025), ('pontil', 0.025), ('observe', 0.024), ('pseudocode', 0.024), ('cache', 0.024), ('identities', 0.024), ('mark', 0.023), ('marginal', 0.023), ('trial', 0.023), ('shall', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

2 0.16891059 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

Author: Levi Boyles, Max Welling

Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1

3 0.14728986 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.

4 0.13794222 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

Author: Odalric-ambrym Maillard

Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a fixed partition P of a continuous space X being given, and a process ν defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis. 1

5 0.12675016 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

6 0.11100725 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

7 0.10734544 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

8 0.10277354 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

9 0.091804422 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

10 0.087989196 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.087472431 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

12 0.086826704 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

13 0.086485222 253 nips-2012-On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks

14 0.085716948 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

15 0.085262567 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

16 0.081677534 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

17 0.079959743 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

18 0.079076782 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

19 0.078215033 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

20 0.077898741 233 nips-2012-Multiresolution Gaussian Processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, 0.021), (2, 0.03), (3, -0.043), (4, -0.054), (5, -0.066), (6, -0.028), (7, -0.073), (8, -0.206), (9, 0.13), (10, 0.008), (11, 0.039), (12, -0.008), (13, -0.047), (14, -0.101), (15, 0.037), (16, -0.037), (17, 0.076), (18, 0.039), (19, -0.001), (20, -0.097), (21, 0.133), (22, 0.04), (23, -0.027), (24, -0.074), (25, 0.159), (26, -0.128), (27, -0.041), (28, 0.078), (29, -0.014), (30, 0.037), (31, -0.064), (32, -0.073), (33, -0.103), (34, 0.043), (35, -0.005), (36, 0.032), (37, -0.002), (38, -0.149), (39, 0.026), (40, -0.008), (41, -0.012), (42, 0.008), (43, 0.076), (44, -0.039), (45, 0.009), (46, -0.005), (47, -0.025), (48, -0.098), (49, -0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9784767 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

2 0.64255255 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

Author: Levi Boyles, Max Welling

Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1

3 0.59506279 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

4 0.59099954 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

5 0.57749951 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.

6 0.57722443 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

7 0.57375956 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

8 0.54167253 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

9 0.49918094 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

10 0.48418501 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

11 0.48414961 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

12 0.47702894 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

13 0.47254464 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

14 0.44287452 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

15 0.44084758 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

16 0.43412131 206 nips-2012-Majorization for CRFs and Latent Likelihoods

17 0.43004075 233 nips-2012-Multiresolution Gaussian Processes

18 0.41699833 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

19 0.40995803 102 nips-2012-Distributed Non-Stochastic Experts

20 0.40900394 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.056), (21, 0.039), (38, 0.106), (39, 0.014), (42, 0.026), (53, 0.022), (54, 0.017), (55, 0.022), (74, 0.135), (76, 0.11), (80, 0.084), (92, 0.099), (95, 0.181)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83491516 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

2 0.82761925 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

Author: Philip Sterne, Joerg Bornschein, Abdul-saboor Sheikh, Joerg Luecke, Jacquelyn A. Shelton

Abstract: Modelling natural images with sparse coding (SC) has faced two main challenges: flexibly representing varying pixel intensities and realistically representing lowlevel image components. This paper proposes a novel multiple-cause generative model of low-level image statistics that generalizes the standard SC model in two crucial points: (1) it uses a spike-and-slab prior distribution for a more realistic representation of component absence/intensity, and (2) the model uses the highly nonlinear combination rule of maximal causes analysis (MCA) instead of a linear combination. The major challenge is parameter optimization because a model with either (1) or (2) results in strongly multimodal posteriors. We show for the first time that a model combining both improvements can be trained efficiently while retaining the rich structure of the posteriors. We design an exact piecewise Gibbs sampling method and combine this with a variational method based on preselection of latent dimensions. This combined training scheme tackles both analytical and computational intractability and enables application of the model to a large number of observed and hidden dimensions. Applying the model to image patches we study the optimal encoding of images by simple cells in V1 and compare the model’s predictions with in vivo neural recordings. In contrast to standard SC, we find that the optimal prior favors asymmetric and bimodal activity of simple cells. Testing our model for consistency we find that the average posterior is approximately equal to the prior. Furthermore, we find that the model predicts a high percentage of globular receptive fields alongside Gabor-like fields. Similarly high percentages are observed in vivo. Our results thus argue in favor of improvements of the standard sparse coding model for simple cells by using flexible priors and nonlinear combinations. 1

3 0.77612233 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

4 0.76993114 26 nips-2012-A nonparametric variable clustering model

Author: Konstantina Palla, Zoubin Ghahramani, David A. Knowles

Abstract: Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems. 1

5 0.75569314 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

Author: Daniel Zoran, Yair Weiss

Abstract: Simple Gaussian Mixture Models (GMMs) learned from pixels of natural image patches have been recently shown to be surprisingly strong performers in modeling the statistics of natural images. Here we provide an in depth analysis of this simple yet rich model. We show that such a GMM model is able to compete with even the most successful models of natural images in log likelihood scores, denoising performance and sample quality. We provide an analysis of what such a model learns from natural images as a function of number of mixture components including covariance structure, contrast variation and intricate structures such as textures, boundaries and more. Finally, we show that the salient properties of the GMM learned from natural images can be derived from a simplified Dead Leaves model which explicitly models occlusion, explaining its surprising success relative to other models. 1 GMMs and natural image statistics models Many models for the statistics of natural image patches have been suggested in recent years. Finding good models for natural images is important to many different research areas - computer vision, biological vision and neuroscience among others. Recently, there has been a growing interest in comparing different aspects of models for natural images such as log-likelihood and multi-information reduction performance, and much progress has been achieved [1,2, 3,4,5, 6]. Out of these results there is one which is particularly interesting: simple, unconstrained Gaussian Mixture Models (GMMs) with a relatively small number of mixture components learned from image patches are extraordinarily good in modeling image statistics [6, 4]. This is a surprising result due to the simplicity of GMMs and their ubiquity. Another surprising aspect of this result is that many of the current models may be thought of as GMMs with an exponential or infinite number of components, having different constraints on the covariance structure of the mixture components. In this work we study the nature of GMMs learned from natural image patches. We start with a thorough comparison to some popular and cutting edge image models. We show that indeed, GMMs are excellent performers in modeling natural image patches. We then analyze what properties of natural images these GMMs capture, their dependence on the number of components in the mixture and their relation to the structure of the world around us. Finally, we show that the learned GMM suggests a strong connection between natural image statistics and a simple variant of the dead leaves model [7, 8] , explicitly modeling occlusions and explaining some of the success of GMMs in modeling natural images. 1 3.5 .,...- ••.......-.-.. -..---'-. 1 ~~6\8161·· -.. .-.. --...--.-- ---..-.- -. --------------MII+··+ilIl ..... .. . . ~ '[25 . . . ---- ] B'II 1_ -- ~2 ;t:: fI 1 - --- ,---- ._.. : 61.5 ..... '

6 0.75519997 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

7 0.75505596 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

8 0.7488417 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

9 0.74397492 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

10 0.7420854 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

11 0.73905182 8 nips-2012-A Generative Model for Parts-based Object Segmentation

12 0.73564351 168 nips-2012-Kernel Latent SVM for Visual Recognition

13 0.73527884 201 nips-2012-Localizing 3D cuboids in single-view images

14 0.73455769 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

15 0.73176509 210 nips-2012-Memorability of Image Regions

16 0.72889686 202 nips-2012-Locally Uniform Comparison Image Descriptor

17 0.72748327 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

18 0.7274816 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

19 0.72627741 193 nips-2012-Learning to Align from Scratch

20 0.72505742 40 nips-2012-Analyzing 3D Objects in Cluttered Images