nips nips2012 nips2012-191 knowledge-graph by maker-knowledge-mining

191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

Source: pdf

Author: Aaron Dennis, Dan Ventura

Abstract: The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difﬁcult tasks such as image completion. Designing an SPN network architecture that is suitable for the task at hand is an open question. We propose an algorithm for learning the SPN architecture from data. The idea is to cluster variables (as opposed to data instances) in order to identify variable subsets that strongly interact with one another. Nodes in the SPN network are then allocated towards explaining these interactions. Experimental evidence shows that learning the SPN architecture signiﬁcantly improves its performance compared to using a previously-proposed static architecture. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difﬁcult tasks such as image completion. [sent-4, score-0.292]

2 Designing an SPN network architecture that is suitable for the task at hand is an open question. [sent-5, score-0.272]

3 We propose an algorithm for learning the SPN architecture from data. [sent-6, score-0.272]

4 Experimental evidence shows that learning the SPN architecture signiﬁcantly improves its performance compared to using a previously-proposed static architecture. [sent-9, score-0.272]

5 The architecture of these latent variables (the size of the layers, the number of variables, the connections between variables) can dramatically affect the performance of these models. [sent-21, score-0.379]

6 Selecting a reasonable architecture is often done by hand. [sent-22, score-0.272]

7 This paper proposes an algorithm that automatically learns a deep architecture from data for a sum-product network (SPN), a recently-proposed deep model that takes advantage of the simplifying effect of latent variables [2]. [sent-23, score-0.531]

8 The leaf node λa takes value 1 if A = 0 and 0 otherwise while leaf node λa takes value 1 if A = 1 and 0 otherwise. [sent-25, score-0.552]

9 If the value of A is not known then both leaf nodes take value 1. [sent-26, score-0.312]

10 Weights on the edges connecting sum nodes with their children are not shown. [sent-28, score-0.326]

11 x + + x + + + Figure 3: The Poon architecture with m = 1 sum nodes per region. [sent-31, score-0.524]

12 Three product nodes are introduced because the 2×3-pixel image patch can be split vertically and horizontally in three different ways. [sent-32, score-0.314]

13 In general the Poon architecture has number-of-splits times m2 product nodes per region. [sent-33, score-0.502]

14 ate architecture for a traditional deep model can be challenging [3, 4], but the nature of SPNs lend themselves to a remarkably simple, fast, and effective architecture-learning algorithm. [sent-34, score-0.348]

15 In proposing SPNs, Poon & Domingos introduce a general scheme for building an initial SPN architecture; the experiments they run all use one particular instantiation of this scheme to build an initial “ﬁxed” architecture that is suitable for image data. [sent-35, score-0.314]

16 We will refer to this architecture as the Poon architecture. [sent-36, score-0.272]

17 In this way both the weights and architecture are learned from data. [sent-38, score-0.272]

18 We take this a step further by also learning the initial SPN architecture from data. [sent-39, score-0.272]

19 Our experiments show that learning the initial SPN architecture in this way improves its performance. [sent-42, score-0.272]

20 Edges connecting sum nodes to their children are weighted using non-negative weights. [sent-44, score-0.308]

21 The value of a sum node is computed as the dot product of its weights with the values of it child nodes. [sent-45, score-0.312]

22 The value of a product node is computed by multiplying the values of its child nodes. [sent-46, score-0.252]

23 The leaf nodes act as indicator functions, taking the value 1 when the variable takes on the value that the leaf node is responsible for and 0 otherwise. [sent-51, score-0.619]

24 An SPN can be constructed such that it is a representation of some probability distribution, with the value of its root node and certain partial derivatives with respect to the root node having probabilistic meaning. [sent-52, score-0.362]

25 However, Poon & Domingos proved that if the architecture of an SPN follows two simple rules then it will be valid. [sent-56, score-0.272]

26 The scope of an SPN node n is a subset of the input variables. [sent-60, score-0.301]

27 This subset can be determined by looking at the leaf nodes of the subgraph rooted at n. [sent-61, score-0.403]

28 All input variables that have one or more of their associated leaf nodes in this subgraph are included in the scope of the node. [sent-62, score-0.549]

29 The ﬁrst rule is that all children of a sum node must have the same scope. [sent-64, score-0.272]

30 The second rule is that for every pair of children, (ci , cj ), of a product node, there must not be contradictory leaf nodes in the subgraphs rooted at ci and cj . [sent-66, score-0.46]

31 For example, if the leaf node corresponding to the variable X taking on value x is in the subgraph rooted at ci , then the leaf nodes corresponding to the variable X taking on any other value may not appear in the subgraph rooted at cj . [sent-67, score-0.876]

32 A decomposable SPN is one in which the scopes of the children of each product node are disjoint. [sent-71, score-0.337]

33 The Poon architecture is suited for modeling probability distributions over images, or other domains with local dependencies among variables. [sent-76, score-0.272]

34 For every possible axisaligned rectangular region in the image, the Poon architecture includes a set of m sum nodes, all of whose scope is the set of variables associated with the pixels in that region. [sent-78, score-0.734]

35 For each pair of subregions, and for every possible pairing of sum nodes (one taken from each subregion), a product node is introduced and made the parent of the pair of sum nodes. [sent-80, score-0.541]

36 The product node is also added as a child to all of the top region’s sum nodes. [sent-81, score-0.33]

37 Figure 3 shows a fragment of a Poon architecture SPN modeling a 2 × 3 image patch. [sent-82, score-0.335]

38 Since the effect of a latent variable is to help explain the interactions between its child variables [7], it makes little sense to add a latent variable as the parent of two statistically independent variables. [sent-84, score-0.398]

39 C C A W B X Y (a) A Z W B X Y Z (b) Figure 4: Latent variables explain the interaction between child variables, causing the children to be independent given the latent variable parent. [sent-91, score-0.293]

40 3 In the probabilistic interpretation of an SPN, sum nodes are associated with latent variables. [sent-93, score-0.311]

41 (The evaluation of a sum node is equivalent to summing out its associated latent variable. [sent-94, score-0.275]

42 ) Each latent variable helps the SPN explain interactions between variables in the scope of the sum nodes. [sent-95, score-0.436]

43 Just as in the example, then, we would like to place sum nodes over sets of variables with strong interactions. [sent-96, score-0.3]

44 Taking advantage of this prior knowledge, the Poon architecture chooses to place sum nodes over local spatial neighborhoods that are rectangular in shape. [sent-99, score-0.56]

45 One is that the Poon architecture includes many rectangular regions that are long and skinny. [sent-101, score-0.353]

46 Some grouping of weakly-interacting pixels is inevitable, but the Poon architecture probably does this more than is needed. [sent-103, score-0.306]

47 Another problem is that the Poon architecture has no way of explaining strongly-interacting, non-rectangular local spatial regions. [sent-104, score-0.328]

48 Additionally, if the data does not exhibit strong spatially-local interactions then the Poon architecture could perform poorly. [sent-106, score-0.358]

49 Our proposed architecture (we will call it the cluster architecture) avoids these problems. [sent-107, score-0.379]

50 Sum nodes can be placed over spatially-local, non-rectangular regions; we are not restricted to rectangular regions, but can explain arbitrarilyshaped blob-like regions. [sent-109, score-0.252]

51 In fact, the regions found by the cluster architecture are not required to exhibit spatial locality. [sent-110, score-0.441]

52 This makes our architecture suitable for modeling data that does not exhibit strong spatially-local interactions between variables. [sent-111, score-0.358]

53 1 Building a Cluster Architecture As was described earlier, a sum node s in an SPN has the task of explaining the interactions between all the variables in its scope. [sent-113, score-0.389]

54 Then, for each subset Si in the partition, an additional sum node si is introduced into the SPN and is given the task of explaining the interactions between all the variables in Si . [sent-119, score-0.43]

55 The original sum node s is then given a new child product node p and the product node becomes the parent of each sum node si . [sent-120, score-0.938]

56 In this example the node s is analogous to the variable C in Figure 4 and the nodes si are analogous to the variables A and B. [sent-121, score-0.468]

57 So this partitioning process allows s to focus on explaining the interactions between the nodes si and frees it from needing to explain everything about the interactions between the variables {V1 , · · · , Vn }. [sent-122, score-0.521]

58 And, of course, the partitioning process can be repeated recursively, with any of the nodes si taking the place of s. [sent-123, score-0.255]

59 This is the main idea behind the algorithm for building a cluster architecture (see Algorithm 1 and Algorithm 2). [sent-124, score-0.379]

60 However, due to the architectural ﬂexibility of an SPN, discussing this algorithm in terms of sum and product nodes quickly becomes tedious and confusing. [sent-125, score-0.306]

61 A region graph is a rooted DAG consisting of region nodes and partition nodes. [sent-128, score-0.617]

62 Partition nodes are restricted to being the children of region nodes and vice versa. [sent-130, score-0.579]

63 Region and partition nodes have scopes just like nodes in an SPN. [sent-131, score-0.518]

64 The scope of a node n in a region graph is denoted scope(n). [sent-132, score-0.477]

65 Region nodes can be thought of as playing the role of sum nodes (explaining interactions among variables) and partition nodes can be thought of as playing the role of product nodes (delegating responsibilities). [sent-133, score-0.998]

66 Using the deﬁnition of the region graph may not appear to have made things any simpler, but its beneﬁts will become more clear when discussing the conversion of region graphs to SPNs (see Figure 5). [sent-134, score-0.372]

67 At a high level the algorithm for building a cluster architecture is simple: build a region graph (Algorithm 1 and Algorithm 2), then convert it to an SPN (Algorithm 3). [sent-135, score-0.555]

68 x (b) Figure 5: Subﬁgure (a) shows a region graph fragment consisting of region nodes R1 , R2 , R3 , R4 , and R5 . [sent-149, score-0.528]

69 The product nodes in R1 are surrounded by two rectangles labeled P1 and P2 ; they correspond to the partition nodes in the region graph. [sent-153, score-0.624]

70 The remainder of the algorithm creates a single-node region graph G and then adds nodes and edges to G using k calls to Algorithm 2 (ExpandRegionGraph). [sent-158, score-0.386]

71 At a high level, Algorithm 2 partitions scopes into sub-scopes recursively, adding region and partition nodes to G along the way. [sent-160, score-0.486]

72 The initial call to ExpandRegionGraph partitions the scope of the root region node. [sent-161, score-0.33]

73 A corresponding partition node is added as a child of the root node. [sent-162, score-0.32]

74 Two sub-region nodes (whose scopes form the partition) are then added as children to the partition node. [sent-163, score-0.4]

75 Algorithm 2 is then called recursively with each of these sub-region nodes as arguments (unless the scope of the sub-region node is too small). [sent-164, score-0.516]

76 If node n does not already have a child partition node representing the partition {S1 , S2 } then one is created (p in line 15); p is then connected to child region nodes n1 and n2 , whose scopes are S1 and S2 , respectively. [sent-176, score-0.977]

77 Note that n1 and n2 may be newly-created region nodes or they may be nodes that were created during a previous call to Algorithm 2. [sent-177, score-0.544]

78 We recursively call ExpandRegionGraph only on newly-created nodes; the recursive call is also not made if the node is a leaf node (|Si | = 1) since partitioning a leaf node is not helpful (see lines 19 through 22). [sent-178, score-0.788]

79 Figure 5 shows a small subgraph from a region graph and its conversion into an SPN; this example demonstrates the basic pattern that can be applied to all region nodes in G in order to generate an SPN. [sent-183, score-0.576]

80 In this algorithm the assumption is made (noted in the comments) that certain sum nodes are inserted before others. [sent-185, score-0.268]

81 This assumption can be guaranteed if the algorithm performs a postorder traversal of the region nodes in G in the outermost loop. [sent-186, score-0.331]

82 Also note that the ConnectProductsToSums method connects product nodes of the current region with sum nodes from its subregions; the children of a product node consist of a single node drawn from each subregion, and there is a product node for every possible combination of such sum nodes. [sent-187, score-1.281]

83 SPNs produced the best results; our experiments show that the cluster architecture signiﬁcantly improves SPN performance. [sent-191, score-0.379]

84 We matched the experimental set-up reported in [2] in order to isolate the effect of changing the initial SPN architecture and to make their reported results directly comparable to several of our results. [sent-192, score-0.272]

85 They add 20 sum nodes for each non-unit and non-root region. [sent-193, score-0.252]

86 The root region has one sum node and the unit regions have four sum nodes, each of which function as a Gaussian over pixel values. [sent-194, score-0.511]

87 All sum node weights are initialized to zero; weight values are decreased after each training epoch using an L0 prior; add-one smoothing on sum node weights is used during network evaluation. [sent-198, score-0.454]

88 The cluster architecture is compared to the Poon architectures using the negative log-likelihood (LLH) of the training and test sets as well as the MSE of the image completion results for the left half and bottom half of the images. [sent-207, score-0.599]

89 We train ten cluster architecture SPNs and ten Poon architecture SPNs. [sent-208, score-0.702]

90 On the Olivetti and Caltech-101 Faces datasets the Poon architecture resulted in better training set LLH, but the cluster architecture generalized better, getting a better test set LLH (see Table 1). [sent-210, score-0.673]

91 The cluster architecture was also clearly better at the image completion tasks as measured by MSE. [sent-211, score-0.474]

92 The cluster architecture outperforms the Poon architecture across all measures on this dataset (see Table 1); this is due to its ability to focus resources on non-rectangular regions. [sent-214, score-0.703]

93 To demonstrate that the cluster architecture does not rely on the presence of spatially-local, strong interactions between the variables, we repeated the Olivetti experiment with the pixels in the images having been shufﬂed. [sent-215, score-0.518]

94 In this experiment (see Table 1) the cluster architecture was, as expected, relatively unaffected by the pixel shufﬂing. [sent-216, score-0.405]

95 ) On the other hand, the performance of the Poon architecture dropped considerably due to the fact that it was no longer able to take advantage of strong correlations between neighboring pixels. [sent-219, score-0.272]

96 Figure 7 visually demonstrates the difference between the rectangular-regions Poon architecture and the arbitrarily-shaped-regions cluster architecture. [sent-220, score-0.379]

97 Artifacts of the different region shapes can be seen in subﬁgure (a), where some regions are shaded lighter or darker, revealing region boundaries. [sent-221, score-0.323]

98 Note how the Poon architecture produces results that look “blocky”, whereas the cluster architecture produces results that are smoother-looking. [sent-223, score-0.651]

99 7 (a) (b) Figure 7: The completion results in subﬁgure (a) highlight the difference between the rectangularshaped regions of the Poon architecture (top image) and the blob-like regions of the cluster architecture (bottom image), artifacts of which can be seen in the completions. [sent-224, score-0.814]

100 5 Conclusion The algorithm for learning a cluster architecture is simple, fast, and effective. [sent-244, score-0.379]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('spn', 0.618), ('poon', 0.299), ('architecture', 0.272), ('llh', 0.257), ('olivetti', 0.2), ('nodes', 0.192), ('spns', 0.177), ('pgm', 0.164), ('node', 0.156), ('scope', 0.145), ('region', 0.139), ('leaf', 0.12), ('expandregiongraph', 0.114), ('cluster', 0.107), ('mse', 0.095), ('sr', 0.085), ('deep', 0.076), ('shuf', 0.073), ('scopes', 0.071), ('completions', 0.07), ('interactions', 0.069), ('partition', 0.063), ('sum', 0.06), ('latent', 0.059), ('child', 0.058), ('explaining', 0.056), ('children', 0.056), ('completion', 0.053), ('variables', 0.048), ('rooted', 0.047), ('regions', 0.045), ('sub', 0.045), ('subgraph', 0.044), ('partitionscope', 0.043), ('image', 0.042), ('faces', 0.042), ('architectures', 0.041), ('si', 0.041), ('sn', 0.04), ('product', 0.038), ('graph', 0.037), ('rectangular', 0.036), ('images', 0.036), ('pixels', 0.034), ('experimenting', 0.033), ('variable', 0.031), ('interact', 0.03), ('addchildtopartitionnode', 0.029), ('addsumnodestospn', 0.029), ('brigham', 0.029), ('connectproductstosums', 0.029), ('getorcreateregionnode', 0.029), ('provo', 0.029), ('dataset', 0.027), ('bottom', 0.026), ('pixel', 0.026), ('ci', 0.025), ('resources', 0.025), ('ventura', 0.025), ('responsibilities', 0.025), ('conversion', 0.025), ('root', 0.025), ('explain', 0.024), ('recursively', 0.023), ('subregion', 0.023), ('subsets', 0.023), ('layers', 0.023), ('partitioning', 0.022), ('horizontally', 0.022), ('subregions', 0.022), ('domingos', 0.022), ('symptoms', 0.022), ('training', 0.022), ('created', 0.021), ('gure', 0.021), ('partitions', 0.021), ('young', 0.021), ('violates', 0.021), ('fragment', 0.021), ('vertically', 0.02), ('rotated', 0.02), ('artifacts', 0.02), ('lines', 0.019), ('cj', 0.019), ('train', 0.019), ('parent', 0.019), ('caltech', 0.019), ('arrow', 0.019), ('added', 0.018), ('instances', 0.018), ('edges', 0.018), ('half', 0.018), ('shallow', 0.017), ('exhibit', 0.017), ('causing', 0.017), ('ten', 0.016), ('decomposable', 0.016), ('made', 0.016), ('discussing', 0.016), ('parents', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

Author: Aaron Dennis, Dan Ventura

2 0.62944859 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the ﬁrst discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efﬁcient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classiﬁcation tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

3 0.11835581 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efﬁcient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efﬁcient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides ﬂexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

4 0.083800204 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a conﬁdence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efﬁcient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

5 0.080761351 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difﬁcult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

6 0.079178639 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

7 0.077669129 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

8 0.073921181 298 nips-2012-Scalable Inference of Overlapping Communities

9 0.069525376 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

10 0.068657681 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

11 0.065456495 182 nips-2012-Learning Networks of Heterogeneous Influence

12 0.064055242 197 nips-2012-Learning with Recursive Perceptual Representations

13 0.061839648 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

14 0.059914488 180 nips-2012-Learning Mixtures of Tree Graphical Models

15 0.057099003 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

16 0.055026636 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

17 0.054470275 200 nips-2012-Local Supervised Learning through Space Partitioning

18 0.049712062 126 nips-2012-FastEx: Hash Clustering with Exponential Families

19 0.048248354 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

20 0.047899783 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.139), (1, 0.059), (2, -0.102), (3, -0.047), (4, -0.032), (5, -0.064), (6, -0.016), (7, -0.139), (8, -0.209), (9, 0.097), (10, 0.017), (11, 0.024), (12, -0.02), (13, 0.041), (14, -0.167), (15, -0.124), (16, -0.078), (17, 0.014), (18, 0.074), (19, -0.089), (20, -0.149), (21, -0.014), (22, 0.344), (23, 0.041), (24, -0.027), (25, -0.146), (26, 0.018), (27, 0.204), (28, -0.116), (29, 0.122), (30, -0.006), (31, 0.116), (32, 0.344), (33, 0.227), (34, -0.026), (35, 0.083), (36, -0.11), (37, -0.079), (38, 0.028), (39, -0.06), (40, 0.069), (41, 0.113), (42, 0.029), (43, -0.023), (44, 0.002), (45, -0.021), (46, -0.067), (47, 0.12), (48, -0.035), (49, -0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95089394 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

Author: Aaron Dennis, Dan Ventura

2 0.86842102 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

3 0.42616889 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

4 0.36443624 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

Author: Wei Bi, James T. Kwok

Abstract: In hierarchical classiﬁcation, the prediction paths may be required to always end at leaf nodes. This is called mandatory leaf node prediction (MLNP) and is particularly useful when the leaf nodes have much stronger semantic meaning than the internal nodes. However, while there have been a lot of MLNP methods in hierarchical multiclass classiﬁcation, performing MLNP in hierarchical multilabel classiﬁcation is much more difﬁcult. In this paper, we propose a novel MLNP algorithm that (i) considers the global hierarchy structure; and (ii) can be used on hierarchies of both trees and DAGs. We show that one can efﬁciently maximize the joint posterior probability of all the node labels by a simple greedy algorithm. Moreover, this can be further extended to the minimization of the expected symmetric loss. Experiments are performed on a number of real-world data sets with tree- and DAG-structured label hierarchies. The proposed method consistently outperforms other hierarchical and ﬂat multilabel classiﬁcation methods. 1

5 0.36277783 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and inﬂuence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the inﬂuence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of inﬂuence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

6 0.32214761 39 nips-2012-Analog readout for optical reservoir computers

7 0.30728078 170 nips-2012-Large Scale Distributed Deep Networks

8 0.30577412 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

9 0.2750704 298 nips-2012-Scalable Inference of Overlapping Communities

10 0.27409923 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

11 0.26332381 267 nips-2012-Perceptron Learning of SAT

12 0.2617791 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

13 0.24909081 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

14 0.24681434 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

15 0.24612688 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

16 0.2404491 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

17 0.23902813 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

18 0.23651169 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

19 0.23520115 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

20 0.22950885 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.444), (21, 0.019), (38, 0.076), (42, 0.01), (54, 0.023), (55, 0.022), (74, 0.057), (76, 0.083), (80, 0.125), (92, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88081324 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

Author: Michael Paul, Mark Dredze

Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is inﬂuenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientiﬁc discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1

same-paper 2 0.86636382 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

Author: Aaron Dennis, Dan Ventura

3 0.78999966 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo

Abstract: For phoneme classiﬁcation, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classiﬁcation performance, on the synthetic and benchmark datasets. 1

4 0.7884447 233 nips-2012-Multiresolution Gaussian Processes

Author: David B. Dunson, Emily B. Fox

Abstract: We propose a multiresolution Gaussian process to capture long-range, nonMarkovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each deﬁned over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points deﬁne the abrupt changes. Due to the inherent conjugacy of the GPs, one can analytically marginalize the GPs and compute the marginal likelihood of the observations given the partition tree. This property allows for efﬁcient inference of the partition itself, for which we employ graph-theoretic techniques. We apply the multiresolution GP to the analysis of magnetoencephalography (MEG) recordings of brain activity.

5 0.77457833 192 nips-2012-Learning the Dependency Structure of Latent Factors

Author: Yunlong He, Yanjun Qi, Koray Kavukcuoglu, Haesun Park

Abstract: In this paper, we study latent factor models with dependency structure in the latent space. We propose a general learning framework which induces sparsity on the undirected graphical model imposed on the vector of latent factors. A novel latent factor model SLFA is then proposed as a matrix factorization problem with a special regularization term that encourages collaborative reconstruction. The main beneﬁt (novelty) of the model is that we can simultaneously learn the lowerdimensional representation for data and model the pairwise relationships between latent factors explicitly. An on-line learning algorithm is devised to make the model feasible for large-scale learning problems. Experimental results on two synthetic data and two real-world data sets demonstrate that pairwise relationships and latent factors learned by our model provide a more structured way of exploring high-dimensional data, and the learned representations achieve the state-of-the-art classiﬁcation performance. 1

6 0.76811123 282 nips-2012-Proximal Newton-type methods for convex optimization

7 0.70656115 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.68937302 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

9 0.67777729 12 nips-2012-A Neural Autoregressive Topic Model

10 0.61635584 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

11 0.55553401 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

12 0.55236918 72 nips-2012-Cocktail Party Processing via Structured Prediction

13 0.5500471 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

14 0.54756308 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

15 0.54278445 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

16 0.54259217 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

17 0.53699505 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

18 0.53658146 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

19 0.53249502 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

20 0.52886915 274 nips-2012-Priors for Diversity in Generative Latent Variable Models