nips nips2012 nips2012-215 knowledge-graph by maker-knowledge-mining

215 nips-2012-Minimizing Uncertainty in Pipelines


Source: pdf

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. [sent-9, score-0.346]

2 We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. [sent-11, score-0.364]

3 In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. [sent-12, score-0.13]

4 However, the output uncertainties are highly correlated since different tuples share their lineage. [sent-23, score-0.128]

5 In this paper, we consider the following interesting and non-trivial problem : given a budget of k tuples, choose the k tuples to inspect that minimize the total uncertainty in the output. [sent-25, score-0.217]

6 We will formalize the notion of a data pipeline and uncertainty in Section 2. [sent-26, score-0.171]

7 We want to choose k tuples to inspect that minimize the total output uncertainty. [sent-54, score-0.17]

8 Resolving e3 completely resolves the uncertainty in w2 , which, in turn, completely resolves the uncertainty in e4 and reduces the uncertainty in e5 . [sent-75, score-0.333]

9 In general, one can also query intermediate nodes in addition to the output tuples, and choosing the best node is non-trivial. [sent-77, score-0.456]

10 In this paper, we consider the general setting of a data pipeline given by a directed acyclic graph that can capture both the motivating scenarios. [sent-78, score-0.141]

11 We define a measure of total uncertainty of the final output based on how close the probabilities are to either 0 or 1. [sent-79, score-0.173]

12 We give efficient algorithms to find the set of data items to query that minimizes the total uncertainty of the output, both under interactive and batch settings. [sent-80, score-0.317]

13 1 Related Work Our problem is an instance of active learning [27, 13, 12, 17, 2, 15, 5, 4, 3] since our goal is to infer probability values of the nodes being true in the DAG, by asking for tags of example nodes. [sent-82, score-0.254]

14 Unlike traditional active learning where we want to learn the underlying probabilistic model from iid samples, in our problem, we already know the underlying model and want to gain information about non-iid items with known correlations. [sent-85, score-0.16]

15 Our work is closely related to the field of active diagnosis [28, 19, 20], where the goal is to infer the state of unknown nodes in a network by selecting suitable “test probes”. [sent-93, score-0.214]

16 Our work is also related to the work on graph search [23], where the goal is to identify hidden nodes while asking questions to humans. [sent-99, score-0.244]

17 2 Problem Statement Execution Graph: Let G be a directed acyclic graph (dag), where each node n in G has a label from the set {∧, ∨} and a probability p(n). [sent-101, score-0.23]

18 the set of nodes having an outgoing edge to n, denote the set of data items which were input to the instance of the operator that produced n. [sent-109, score-0.224]

19 We further assume that, conditioned on the parents being correct, nodes are correct independently. [sent-114, score-0.264]

20 To state the semantics formally, we associate a set of independent Boolean random variables X(n) for each node n in G with probability p(n). [sent-115, score-0.149]

21 We also associate another set of random variables Y (n), which denotes whether the result at node n is correct (unconditionally). [sent-116, score-0.175]

22 , all nodes have a single parent, the labels of nodes do not have any effect, since Y (n) is the same for both ∧ and ∨ nodes. [sent-121, score-0.344]

23 We want all the final probabilities of L to be close to either 0 or 1, as the closer the probability to 1/2, the more uncertain the correctness of the given node is. [sent-128, score-0.278]

24 Then, we define the total output uncertainty of the DAG as I= ∑ f (Pr(Y (n))) (1) n∈L Our results continue to hold when different n ∈ L are weighted differently, i. [sent-130, score-0.142]

25 Now, our goal is to query a set of nodes Q that minimize the expected total output uncertainty conditioned on observing Q. [sent-135, score-0.452]

26 Given a G ∈ DAG, find the node q that minimizes the expected uncertainty I({q}). [sent-141, score-0.31]

27 Given a G ∈ DAG, find the set of nodes Q of size k that minimizes I(Q). [sent-143, score-0.222]

28 Suppose we have already issued a set of queries Q0 and obtained a vector v0 of their correctness values. [sent-145, score-0.128]

29 Given a new set of queries, we define the conditioned uncertainty as I(Q | Q0 = v0 ) = ∑v Pr(Q = v | Q0 = v0 ) ∑n∈L f (Pr(Y (n) | Q = v ∧ Q0 = v0 )). [sent-146, score-0.145]

30 Given a G ∈ DAG, and a set of already issued queries Q0 with answer v0 , find the best node q to query next that minimizes I({q} | Q0 = v0 ). [sent-148, score-0.391]

31 For any sets of queries Q1 , Q2 , I(Q1 ) ≥ I(Q1 ∪ Q2 ) Thus, expected uncertainty cannot increase with more queries. [sent-160, score-0.17]

32 Lastly, for the rest of the paper, we will assume that the query nodes Q are selected from only among the leaves of G. [sent-164, score-0.353]

33 There is a simple reduction of the general problem to this problem, where we attach a new leaf node to every internal node, and set their probabilities to 1. [sent-166, score-0.296]

34 Thus, for any internal node, we can equivalently query the corresponding leaf node (we will need to use the weighted form of the Eq. [sent-167, score-0.338]

35 (1), described in the extended technical report [11], to ensure that new leaf nodes have weight 0 in the objective function. [sent-168, score-0.281]

36 Let DAG(∧) and DAG(∨) denote the subclasses of DAG where all the node labels are ∧ and ∨ respectively. [sent-170, score-0.149]

37 (We define the depth to be the number of nodes in the longest root to leaf directed path in the dag. [sent-172, score-0.311]

38 ) Similarly, we define the class T REE where the dag is restricted to a tree, and T REE(d), consisting of depth-d trees. [sent-173, score-0.672]

39 For trees, since each node has a single parent, the labels of the nodes do not matter. [sent-174, score-0.321]

40 The parameter n is the number of nodes in the graph. [sent-183, score-0.172]

41 4 Best-1 Problem We start with the most basic problem: given a probabilistic DAG G, find the node to query that minimizes the resulting uncertainty. [sent-188, score-0.303]

42 ) Subsequently, we show that finding the best node to query is intractable for DAG(∨) of depth greater than 2, and is thus intractable for DAG as well. [sent-190, score-0.297]

43 , those nodes that have a directed path to n, including n itself. [sent-193, score-0.197]

44 1 T REE(2) Consider a simple tree graph G with root r, having p(r) = pr , and having children l1 , · · · , ln with p(li ) = pi . [sent-196, score-0.39]

45 Given a node x, let ex denote the event Y (x) = 1, and ex denote the event that Y (x) = 0. [sent-197, score-0.303]

46 We want to find the leaf q that minimizes I({q}), where: I({q}) = ∑ Pr(eq ) f (Pr(el | eq )) + Pr(eq ) f (Pr(el | eq )) (4) l∈L By a slight abuse of notation, we will use I(q) to denote the quantity I({q}). [sent-198, score-0.372]

47 It is easy to see the following (let l = q): Pr(el | eq ) = pl , Pr(eq ) = pr pq , Pr(el | eq ) = pr pl (1 − pq )/(1 − pr pq ) Substituting these expressions back in Eq. [sent-199, score-2.292]

48 We first compute ∑l pl and ∑l p2 , and then compute the objective l function for all q in linear time. [sent-202, score-0.171]

49 Now we consider the case when G is any general tree with the set of leaves L. [sent-203, score-0.127]

50 Thus, Px is the product of p(y) over all nodes y that are the ancestors of x (including x itself). [sent-206, score-0.234]

51 Given nodes x and y, let lca(x, y) denote the least common ancestor of x and y. [sent-207, score-0.22]

52 The following is immediate: Pr(eq ) = Pq Pr(el | eq ) = Pl Plca(l,q) Pr(el | eq ) = Pl (1 − Pq /Plca(l,q) ) 1 − Pq However, if we directly plug this in Eq. [sent-210, score-0.204]

53 Instead, we group all the leaves into equivalence classes based on their lowest common ancestor with q as shown in Fig. [sent-213, score-0.125]

54 Consider all leaves in the set Li such that their lowest common ancestor with q is ai . [sent-216, score-0.178]

55 Given a node x, let S(x) denote the sum of Pl2 over all leaves l reachable from x. [sent-217, score-0.226]

56 (4) over all leaves in Li , we get the following expression: ad −(S(ai ) − S(ai−1 )) 2 (Pq + Pai − 2Pq Pai ) + ∑ Pl 2 (1 − P ) Pai q l∈Li Define ∆1 (ai ) = S(ai ) − S(ai−1 ) and ∆2 (ai ) = (S(ai ) − 1−2P S(ai−1 )) P2 ai . [sent-219, score-0.13]

57 First, using a single top-down dynamic programming over the tree, we can compute Px for all nodes x. [sent-222, score-0.172]

58 Next, using a single bottom-up dynamic programming over G, we can compute S(x) for all nodes x. [sent-223, score-0.172]

59 In the third step, we compute ∆1 (x) and ∆2 (x) for all nodes in the tree. [sent-224, score-0.172]

60 In the fourth step, we compute ∑a∈anc(x) ∆i (x) for all nodes in the graph using another top-down dynamic programming. [sent-225, score-0.204]

61 Given a tree G with n nodes, we can compute the node q that minimizes I(q) is time O(n). [sent-230, score-0.249]

62 As before, we want to find the best node q that minimizes I(q) as given by Eq. [sent-233, score-0.232]

63 However, the expressions for probabilities Pr(eq ) and Pr(el | eq ) are more complex for DAG(2, ∨). [sent-235, score-0.133]

64 Subsequently, finding the best candidate node would require O(n2 ) time, giving us an overall O(n3 ) algorithm to find the best node. [sent-241, score-0.149]

65 5 Incremental Node Selection In this section, we consider the problem of picking the next best node to query after a set of nodes Q0 have already been queried. [sent-270, score-0.501]

66 We next pick a leaf node q that minimizes I({q} | Q0 = v0 ). [sent-272, score-0.339]

67 Here, we prove that incremental picking is intractable for DAG(∧) itself. [sent-276, score-0.144]

68 Thus, after each query, we can incorporate the new evidence by updating the probabilities of all the nodes along the path from the query node to the root. [sent-291, score-0.456]

69 Thus, finding the next best node to query can still be computed in linear time. [sent-292, score-0.253]

70 If the result is 0, p(q) is set to 0 and p(r) pr (1−pq ) is set to 1−pr pq . [sent-296, score-0.582]

71 Then, we can subsequently compute the leaf q that minimizes Eq. [sent-300, score-0.135]

72 For a fixed pr , ∑l pl , and ∑l p2 , this expression can be treated as l a rational polynomial in a single variable pq . [sent-302, score-0.809]

73 If we take the derivative, the numerator is a quartic in pq . [sent-303, score-0.349]

74 Using 4 binary searches, we can find the two pq closest to each of these roots (giving us 8 candidates for pq , plus two more which are the smallest and the largest pq ), and evaluate I(q) for each of those 10 candidates. [sent-306, score-1.044]

75 , the answer to each subsequent query), we can update the pr probability and the sum ∑l p2 in constant time. [sent-310, score-0.256]

76 If the p values of the leaf nodes are provided in sorted order, then, for a Depth-2 tree, the next best node to query can be computed in O(log n). [sent-314, score-0.51]

77 3 DAG(∧) For DAG(∧), while we can pick the best-1 node in O(n3 ) time, we have the surprising result that the problem of picking subsequent nodes become intractable. [sent-316, score-0.452]

78 In particular, we show that given a set S of queried nodes, the problem of finding the next best node is intractable in the size of S. [sent-318, score-0.201]

79 Our reduction, shown in in the extended technical report [11], is a weakly parsimonious reduction involving monotone-2-CNF, which is known to be hard to approximate, thus we have the following result: Theorem 5. [sent-324, score-0.154]

80 6 Best-K In this section, we consider the problem of picking the best k nodes to minimize uncertainty. [sent-329, score-0.248]

81 [19] give a log n approximation algorithm for a similar problem under the conditions of super-modularity: super-modularity states that the marginal decrease in uncertainty when adding a single query node to an existing set of query nodes decreases as the set becomes larger. [sent-331, score-0.64]

82 If we pick any leaf node with p = 1, the expected uncertainty is n/8. [sent-339, score-0.4]

83 If we pick a node with p = 1/2, the expected uncertainty is 25n/16 − 4/16. [sent-340, score-0.315]

84 Thus, if we sort nodes by their expected uncertainty, all the p = 1 nodes appear before all the p = 1/2 nodes. [sent-341, score-0.344]

85 If we pick greedily based on their expected uncertainty, we pick all the p = 1 nodes. [sent-343, score-0.142]

86 Thus, expected uncertainty after querying all p = 1 nodes is still n/8. [sent-345, score-0.283]

87 On the other hand, if we pick a single p = 1 node, and n − 1 nodes with p = 1/2, the resulting uncertainty is a constant. [sent-346, score-0.338]

88 Thus, picking nodes greedily can be O(n) worse than the optimal. [sent-347, score-0.28]

89 Consider a G ∈ DAG(2, ∧) having two nodes u and v on 7 the top layer and three nodes a, b, and c in the bottom layer. [sent-349, score-0.344]

90 Now consider the expected uncertainty Ic at node c. [sent-353, score-0.26]

91 Super-modularity condition implies that Ic ({b, a}) − Ic ({b}) ≥ Ic ({a}) − Ic ({}) (since marginal decrease in expected uncertainty of c on picking an additional node a should be less for set {} compared to {b}). [sent-354, score-0.336]

92 This can be used to show that conditioned on Y (b), expected uncertainty in c drops when conditioning on Y (a). [sent-359, score-0.145]

93 Then there exists a tree T ∈ T REE(d) such that for leaf nodes a , b, and c in T the following holds: Ic ({b, a}) − Ic ({b}) < Ic ({a}) − Ic ({}). [sent-367, score-0.307]

94 As in Section 4, assume the root r with p(r) to be pr , while the leaves L = {l1 , . [sent-370, score-0.362]

95 ) It is easy to check that that the first term is minimized with Q consists of nodes with p(l) closest to 1/2, and the second term is minimized with nodes with p(l) closest to 1. [sent-378, score-0.41]

96 Intuitively, the first term prefers nodes that are as uncertain as possible, while the second term prefers nodes that reveal as much about the root as possible. [sent-379, score-0.444]

97 This immediately gives us a 2-approximation in the number of queries : by picking at most 2k nodes, k closest to 1/2 and k closest to 1, we can do at least as well as the optimal solution for best-k. [sent-380, score-0.201]

98 This also makes intuitive sense, since the second term prefers nodes that reveal more about the root, and once we use sufficiently many nodes to infer the correctness of the root, we do not get any gain from asking additional questions. [sent-382, score-0.447]

99 Our reduction is a weakly parsimonious reduction involving monotone-partitioned-2-CNF, which is known to be hard to approximate, thus we have the following result: Theorem 6. [sent-391, score-0.161]

100 7 Conclusion In this work, we performed a detailed complexity analysis for the problem of finding optimal set of query nodes for various classes of graphs. [sent-394, score-0.297]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dag', 0.672), ('pq', 0.326), ('pr', 0.256), ('ree', 0.233), ('nodes', 0.172), ('pl', 0.171), ('ic', 0.15), ('node', 0.149), ('uncertainty', 0.111), ('query', 0.104), ('eq', 0.102), ('leaf', 0.085), ('leaves', 0.077), ('picking', 0.076), ('parent', 0.075), ('anc', 0.072), ('tuples', 0.068), ('ancestors', 0.062), ('pipeline', 0.06), ('queries', 0.059), ('el', 0.058), ('dalvi', 0.057), ('nilesh', 0.057), ('ex', 0.056), ('pick', 0.055), ('extractor', 0.055), ('pipelines', 0.055), ('ai', 0.053), ('items', 0.052), ('alina', 0.051), ('parameswaran', 0.051), ('tree', 0.05), ('minimizes', 0.05), ('hard', 0.049), ('ancestor', 0.048), ('incremental', 0.046), ('dags', 0.044), ('attaches', 0.043), ('inapproximability', 0.043), ('ptime', 0.043), ('active', 0.042), ('correctness', 0.04), ('asking', 0.04), ('inspect', 0.038), ('vibhor', 0.038), ('pai', 0.038), ('krause', 0.036), ('aditya', 0.035), ('conditioned', 0.034), ('want', 0.033), ('roots', 0.033), ('closest', 0.033), ('greedily', 0.032), ('trees', 0.032), ('graph', 0.032), ('parents', 0.032), ('probabilities', 0.031), ('reduction', 0.031), ('output', 0.031), ('queried', 0.03), ('execution', 0.03), ('root', 0.029), ('polynomial', 0.029), ('uncertainties', 0.029), ('bohannon', 0.029), ('debugging', 0.029), ('issued', 0.029), ('ramakrishnan', 0.029), ('srujana', 0.029), ('agnostic', 0.027), ('rational', 0.027), ('item', 0.026), ('beygelzimer', 0.026), ('correct', 0.026), ('hardness', 0.026), ('parsimonious', 0.026), ('directed', 0.025), ('operators', 0.025), ('pvldb', 0.025), ('rastogi', 0.025), ('raghu', 0.025), ('incr', 0.025), ('extractions', 0.025), ('uncertain', 0.025), ('px', 0.025), ('technical', 0.024), ('weakly', 0.024), ('theorem', 0.024), ('acyclic', 0.024), ('philip', 0.023), ('quartic', 0.023), ('pi', 0.023), ('prefers', 0.023), ('sanjoy', 0.022), ('intractable', 0.022), ('li', 0.021), ('web', 0.021), ('submodularity', 0.021), ('complexity', 0.021), ('event', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

2 0.13266297 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

Author: Charles Blundell, Jeff Beck, Katherine A. Heller

Abstract: We present a Bayesian nonparametric model that discovers implicit social structure from interaction time-series data. Social groups are often formed implicitly, through actions among members of groups. Yet many models of social networks use explicitly declared relationships to infer social structure. We consider a particular class of Hawkes processes, a doubly stochastic point process, that is able to model reciprocity between groups of individuals. We then extend the Infinite Relational Model by using these reciprocating Hawkes processes to parameterise its edges, making events associated with edges co-dependent through time. Our model outperforms general, unstructured Hawkes processes as well as structured Poisson process-based models at predicting verbal and email turn-taking, and military conflicts among nations. 1

3 0.13255304 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

Author: Wei Bi, James T. Kwok

Abstract: In hierarchical classification, the prediction paths may be required to always end at leaf nodes. This is called mandatory leaf node prediction (MLNP) and is particularly useful when the leaf nodes have much stronger semantic meaning than the internal nodes. However, while there have been a lot of MLNP methods in hierarchical multiclass classification, performing MLNP in hierarchical multilabel classification is much more difficult. In this paper, we propose a novel MLNP algorithm that (i) considers the global hierarchy structure; and (ii) can be used on hierarchies of both trees and DAGs. We show that one can efficiently maximize the joint posterior probability of all the node labels by a simple greedy algorithm. Moreover, this can be further extended to the minimization of the expected symmetric loss. Experiments are performed on a number of real-world data sets with tree- and DAG-structured label hierarchies. The proposed method consistently outperforms other hierarchical and flat multilabel classification methods. 1

4 0.10843609 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

5 0.10049191 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

6 0.094713129 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

7 0.086848818 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.083800204 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

9 0.082496434 298 nips-2012-Scalable Inference of Overlapping Communities

10 0.072613351 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.068461113 180 nips-2012-Learning Mixtures of Tree Graphical Models

12 0.063596874 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

13 0.062047582 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

14 0.061912429 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

15 0.06000964 286 nips-2012-Random Utility Theory for Social Choice

16 0.057809915 260 nips-2012-Online Sum-Product Computation Over Trees

17 0.056515057 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

18 0.052296035 100 nips-2012-Discriminative Learning of Sum-Product Networks

19 0.051615272 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

20 0.049533106 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.128), (1, -0.001), (2, -0.006), (3, -0.039), (4, -0.069), (5, -0.069), (6, -0.012), (7, -0.02), (8, -0.155), (9, 0.116), (10, -0.023), (11, 0.014), (12, 0.019), (13, -0.005), (14, -0.07), (15, -0.011), (16, -0.034), (17, 0.002), (18, 0.055), (19, 0.023), (20, -0.058), (21, 0.126), (22, 0.077), (23, 0.005), (24, 0.041), (25, -0.046), (26, 0.024), (27, -0.001), (28, -0.033), (29, -0.025), (30, -0.039), (31, -0.052), (32, -0.009), (33, -0.019), (34, -0.024), (35, 0.096), (36, 0.032), (37, -0.062), (38, -0.022), (39, 0.008), (40, -0.022), (41, 0.036), (42, 0.018), (43, 0.049), (44, -0.024), (45, 0.049), (46, 0.003), (47, 0.002), (48, 0.036), (49, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95092642 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

2 0.76126283 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

3 0.75824755 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

Author: Wei Bi, James T. Kwok

Abstract: In hierarchical classification, the prediction paths may be required to always end at leaf nodes. This is called mandatory leaf node prediction (MLNP) and is particularly useful when the leaf nodes have much stronger semantic meaning than the internal nodes. However, while there have been a lot of MLNP methods in hierarchical multiclass classification, performing MLNP in hierarchical multilabel classification is much more difficult. In this paper, we propose a novel MLNP algorithm that (i) considers the global hierarchy structure; and (ii) can be used on hierarchies of both trees and DAGs. We show that one can efficiently maximize the joint posterior probability of all the node labels by a simple greedy algorithm. Moreover, this can be further extended to the minimization of the expected symmetric loss. Experiments are performed on a number of real-world data sets with tree- and DAG-structured label hierarchies. The proposed method consistently outperforms other hierarchical and flat multilabel classification methods. 1

4 0.68057418 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

Author: Levi Boyles, Max Welling

Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1

5 0.63098413 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

Author: Jeremy Weiss, Sriraam Natarajan, David Page

Abstract: Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

6 0.61939871 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

7 0.60809058 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

8 0.60145038 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

9 0.59588253 260 nips-2012-Online Sum-Product Computation Over Trees

10 0.58327991 180 nips-2012-Learning Mixtures of Tree Graphical Models

11 0.56391793 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

12 0.53359354 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

13 0.52901763 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

14 0.52728254 298 nips-2012-Scalable Inference of Overlapping Communities

15 0.49293867 194 nips-2012-Learning to Discover Social Circles in Ego Networks

16 0.48529899 100 nips-2012-Discriminative Learning of Sum-Product Networks

17 0.48321423 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

18 0.46900576 327 nips-2012-Structured Learning of Gaussian Graphical Models

19 0.44464689 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

20 0.44042817 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.057), (21, 0.023), (38, 0.089), (39, 0.012), (42, 0.019), (54, 0.021), (55, 0.345), (74, 0.069), (76, 0.139), (80, 0.098), (92, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89647013 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz

Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1

2 0.8944822 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

3 0.88135499 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

4 0.87031329 155 nips-2012-Human memory search as a random walk in a semantic network

Author: Joseph L. Austerweil, Joshua T. Abbott, Thomas L. Griffiths

Abstract: The human mind has a remarkable ability to store a vast amount of information in memory, and an even more remarkable ability to retrieve these experiences when needed. Understanding the representations and algorithms that underlie human memory search could potentially be useful in other information retrieval settings, including internet search. Psychological studies have revealed clear regularities in how people search their memory, with clusters of semantically related items tending to be retrieved together. These findings have recently been taken as evidence that human memory search is similar to animals foraging for food in patchy environments, with people making a rational decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that the results that were taken as evidence for this account also emerge from a random walk on a semantic network, much like the random web surfer model used in internet search engines. This offers a simpler and more unified account of how people search their memory, postulating a single process rather than one process for exploring a cluster and one process for switching between clusters. 1

5 0.8246724 211 nips-2012-Meta-Gaussian Information Bottleneck

Author: Melanie Rey, Volker Roth

Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1

same-paper 6 0.77187282 215 nips-2012-Minimizing Uncertainty in Pipelines

7 0.7495423 95 nips-2012-Density-Difference Estimation

8 0.66293073 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

9 0.6497286 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

10 0.64258558 193 nips-2012-Learning to Align from Scratch

11 0.63200665 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

12 0.62670094 210 nips-2012-Memorability of Image Regions

13 0.62643975 231 nips-2012-Multiple Operator-valued Kernel Learning

14 0.62529224 298 nips-2012-Scalable Inference of Overlapping Communities

15 0.6227591 188 nips-2012-Learning from Distributions via Support Measure Machines

16 0.60641801 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

17 0.6046828 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

18 0.60263002 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

19 0.60193026 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

20 0.59949476 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model