nips nips2005 nips2005-70 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Brigham S. Anderson, Andrew W. Moore
Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.
Reference: text
sentIndex sentText sentNum sentScore
1 Previously, pairwise information gain calculation has involved a cost quadratic in network size. [sent-9, score-0.311]
2 In this work, we show how to perform a similar computation with cost linear in network size. [sent-10, score-0.239]
3 The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. [sent-12, score-0.081]
4 1 Introduction In a diagnosis problem, one wishes to select the best test (or observation) to make in order to learn the most about a system of interest. [sent-14, score-0.068]
5 Medical settings and disease diagnosis immediately come to mind, but sensor management (Krishnamurthy, 2002), sensitivity analysis (Kjrulff & van der Gaag, 2000), and active learning (Anderson & Moore, 2005) all make use of similar computations. [sent-15, score-0.272]
6 ) A common technique in the field of diagnosis is to compute the mutual information between each query and target, then select the query that is expected to provide the most information (Agostak & Weiss, 1999). [sent-17, score-0.345]
7 Likewise, a sensitivity analysis between the query variable and the target variables can be performed (Laskey, 1995; Kjrulff & van der Gaag, 2000). [sent-18, score-0.281]
8 However, both suffer from a quadratic blowup with respect to the number of queries and targets. [sent-19, score-0.102]
9 In the current paper we present a loss function which can be used in a message-passing framework to perform the all-pairs computation with cost linear in network size. [sent-20, score-0.276]
10 We describe the loss function in Section 2, we describe a polynomial expression for networkwide expected loss in Section 3, and in Section 4 we present a message-passing scheme to perform this computation efficiently for each node in the network. [sent-21, score-0.405]
11 Section 5 shows the empirical speedups and accuracy gains achieved by the algorithm. [sent-22, score-0.119]
12 The independence graph G = (X , E) is a directed acyclic graph (DAG) in which X is a set of N discrete random variables {x1 , x2 , . [sent-27, score-0.089]
13 We will denote the marginal distribution of a single node P (x|B) by πx , where (πx )i is P (x = i). [sent-31, score-0.282]
14 We indicate the number states a node x can assume as |x|. [sent-33, score-0.275]
15 Additionally, each node x is assigned a cost matrix Cx , in which (Cx )ij is the cost of believing x = j when in fact the true value x∗ = i. [sent-34, score-0.663]
16 A cost matrix of all zeros indicates that one is not interested in the node’s value. [sent-35, score-0.203]
17 The cost matrix C is useful because inhomogeneous costs are a common feature in most realistic domains. [sent-36, score-0.274]
18 This ubiquity results from the fact that information almost always has a purpose, so that some variables are more relevant than others, some states of a variable are more relevant than others, and confusion between some pairs of states are more relevant than between other pairs. [sent-37, score-0.058]
19 For our task, we are given B, and wish to estimate P (X ) accurately by iteratively selecting the next node to observe. [sent-38, score-0.246]
20 Although typically only a subset of the nodes are queryable, we will for the purposes of this paper assume that any node can be queried. [sent-39, score-0.372]
21 How do we select the most informative node to query? [sent-40, score-0.246]
22 We must first define our objective function, which is determined by our definition of error. [sent-41, score-0.064]
23 2 Risk Due to Uncertainty The underlying error function for the information gain computation will be denoted Error(P (X )||X ∗ ), which quantifies the loss associated with the current belief state, P (X ) given the true values X ∗ . [sent-42, score-0.18]
24 Errorlog (P (X )||X ∗ ) = − log P (X ∗ ) (1) log P (u∗ ) (2) P (u∗ ) Errormlog (P (X )||X ∗ ) = − (3) u∈X Error01 (P (X )||X ∗ ) = − u∈X Where X is the set of nodes, and u∗ is the true value of node u. [sent-45, score-0.286]
25 The error function of Equation 1 will prove insufficient for our needs as it cannot target individual node errors, while the error function of Equation 2 results in an objective function that is quadratic in cost to compute. [sent-46, score-0.592]
26 For instance, we would like to specify that misclassifying a node’s state as 0 when it is actually 1 is different from misclassifying it as 0 when it is actually in state 2. [sent-48, score-0.086]
27 Different costs for each node can be specified with cost matrices Cu for u ∈ X . [sent-49, score-0.503]
28 The final error function is |u| Error(P (X )||X ∗ ) = P (u = i)Cu [u∗ , i] u∈X i (4) Where C[i, j] is the ijth element of the matrix C, and |u| is the number of states that the node u can assume. [sent-50, score-0.343]
29 The presence of the cost matrix Cu in Equation 4 constitutes a significant advantage in real applications, as they often need to specify inhomogeneous costs. [sent-51, score-0.239]
30 There is a separate consideration, that of query cost, or cost(x), which is the cost incurred by the action of observing x (e. [sent-52, score-0.289]
31 ) If both the query cost and the misclassification cost C are formulated in the same units, e. [sent-55, score-0.463]
32 The query costs will be omitted from this presentation for clarity. [sent-58, score-0.15]
33 In general, one does not actually know the true values X ∗ of the nodes, so one cannot directly minimize the error function as described. [sent-59, score-0.079]
34 Risk(P (X )) = P (x)ErrorP (X ||x) (5) x which for the error function of Equation 4 reduces to Risk(P (X )) = P (u = j)P (u = k)Cu [j, k] u∈X j (7) T πu C u πu = (6) k u∈X where (πu )i = P (u = i). [sent-61, score-0.039]
35 It quantifies “On average, how much is our current ignorance going cost us? [sent-63, score-0.174]
36 ” For comparison, note that the log-loss function, Errorlog , results in an entropy risk function Risklog (P (X )) = H(X ), and the log-loss function over the marginals, Errormlog , results in the risk function Riskmlog (P (X )) = u∈X H(u). [sent-64, score-0.45]
37 Ultimately, we want to find the nodes that have the greatest effect on Risk(P (X )), so we must condition Risk(P (X )) on the beliefs at each node. [sent-65, score-0.261]
38 In other words, if we learned that the true marginal probabilities of node x were πx , what effect would that have on our current risk, or rather, what is Risk(P (X )|P (x) = πx )? [sent-66, score-0.359]
39 Discouragingly, however, any change in πx propagates to all the other beliefs in the network. [sent-67, score-0.098]
40 It seems as if we must perform several network evaluations for each node, a prohibitive cost for networks of any appreciable size. [sent-68, score-0.239]
41 However, we will show that in fact dynamic programming can perform this computation for all nodes in only two passes through the network. [sent-69, score-0.126]
42 3 Risk Calculation To clarify our objective, we wish to construct a function Ra (π) for each node a, where Ra (π) = Risk(P (X )|P (a) = π). [sent-70, score-0.246]
43 Suppose, for instance, that we learn that the value of node a is equal to 3. [sent-71, score-0.246]
44 If we had the function Ra in hand, we could simply evaluate Ra (πa ) to immediately compute our new network-wide risk, which would account for all the changes in beliefs to all the other nodes due to learning that a = 3. [sent-73, score-0.224]
45 Define Ra (π) = Risk(P (X )|P (a) = π) (9) T πu C u πu = u∈X (8) P (a)=π This simply restates the risk definition of Equation 7 under the condition that P (a) = π. [sent-75, score-0.225]
46 For any node x, the function Rx (π) is a second-degree polynomial function of the elements of π Proof. [sent-79, score-0.277]
47 Define the matrix Pu|v for every pair of nodes (u, v), such that (Pu|v )ij = P (u = j|v = i). [sent-80, score-0.155]
48 Recall that the the beliefs at node x have a strictly linear relationship to the beliefs of node u, since (πu )i = P (u = i|x = k)P (x = k) (10) k is equivalent to πu = Pu|x πx . [sent-81, score-0.688]
49 From Equation 12, we see a simple equation for computing these Θx directly (though expensively): PT Cu Pu|x u|x Θx = (14) u∈X Example #1 Given the 2-node network a → b, how do we calculate Ra (π), the total risk associated with our beliefs about the value of node a? [sent-84, score-0.701]
50 Our objective is thus to determine Ra (π) = Risk(P (a, b)|P (a) = π) T = π Θa π (15) (16) Equation 14 will give Θa as Θa = PT Ca Pa|a + PT Cb Pb|a a|a b|a = Ca + PT Cb Pb|a b|a (17) (18) with Pa|a = I by definition. [sent-85, score-0.064]
51 The individual coefficients of Θa are thus θaij = Caij + P (b = k|a = i)P (b = l|a = j)Cbkl k (19) l Now we can compute the relation between any marginal π at node a and our total networkwide risk via Ra (π). [sent-86, score-0.561]
52 However, using Equation 14 to compute all the Θ would require evaluating the entire network once per node. [sent-87, score-0.065]
53 1 Recursion To create an efficient message-passing algorithm for computing Θ x for all x ∈ X , we will introduce ΘW , where W is a subset of the network over which Risk(P (X )) is summed. [sent-90, score-0.065]
54 More x importantly, these matrices can be usefully decomposed as follows. [sent-93, score-0.048]
55 ΘW = PT ΘW Py|x if x and W are conditionally independent given y. [sent-96, score-0.042]
56 Note that Pu|x = Pu|y Py|x for u ∈ X , since |y| (Pu|y Py|x )ij = P (u = i|y = k)P (y = k|x = j) (21) k = P (u = i|x = j) (22) = (Pu|x )ij (23) Step (21) is only true if x and u are conditionally independent given y. [sent-98, score-0.082]
57 Our objective is to compute Ra (π) = Risk(P (a, b, c)|P (a) = π) (28) where Θa is by definition (29) = π T Θa π (30) Θa = Θabc a = Ca + PT Cb Pb|a b|a + PT Cc Pc|a c|a (31) Using Theorem 3. [sent-100, score-0.064]
58 4 Message Passing We are now ready to define message passing. [sent-102, score-0.108]
59 Out-messages µ are passed from parent to child, and in-messages λ are passed from child to parent. [sent-105, score-0.289]
60 The messages from x to y will be denoted as µxy and λxy . [sent-106, score-0.164]
61 In the discrete case, µxy and λxy will both be matrices of size |y| × |y|. [sent-107, score-0.048]
62 The messages summarize the effect that y has on the part of the network that y is d-separated from by x. [sent-108, score-0.266]
63 Messages relate to the Θ coefficients by the following definition λyx = Θ\y x (36) Θ\y x (37) µyx = where the (nonstandard) notation Θx indicates the matrix ΘV for which V is the set of all x the nodes in X that are reachable by x if y were removed from the graph. [sent-109, score-0.155]
64 In other words, \y Θx is summarizing the effect that x has on the entire network except for the part of the network that x can only reach through y. [sent-110, score-0.167]
65 \y Propagation: The message-passing scheme is organized to recursively compute the Θ matrices using Theorem 3. [sent-111, score-0.048]
66 As can be seen from Equations 36 and 37, the two types of messages are very similar in meaning. [sent-113, score-0.164]
67 The µ-message from x to child c is created from all other messages entering x except those from c. [sent-116, score-0.286]
68 The definition is µxc = PT Cx + x|c µux + v∈ch(x)\c u∈pa(x) λvx Px|c (38) The λ-messages from x to parent u are only slightly more involved. [sent-117, score-0.107]
69 To account for the “explaining away” effect, we must construct λxu directly from the parents of x. [sent-118, score-0.06]
70 λxu =PT Cx + x|u w∈pa(x)\u c∈ch(x) λcx Px|u + P T C w + w|u µvw + v∈pa(w) c∈ch(w)\x λcw Pw|u (39) Messages are constructed (or “sent”) whenever all of the required incoming messages are present and that particular message has not already been sent. [sent-119, score-0.272]
71 For example, the outmessage µxc can be sent only when messages from all the parents of x and all the children of x (save c) are present. [sent-120, score-0.254]
72 The overall effect of this constraint is a single leaves-inward propagation followed by a single root-outward propagation. [sent-121, score-0.086]
73 Initialization and Termination: Initialization occurs naturally at any singly-connected (leaf) node x, where the message is by definition Cx . [sent-122, score-0.354]
74 Termination occurs when no more messages meet the criteria for sending. [sent-123, score-0.164]
75 Propagation runs in time linear in the number of nodes once the initial local probabilities are calculated. [sent-125, score-0.126]
76 The local probabilities required are the matrices P for each parent-child probability,e. [sent-126, score-0.048]
77 , P (child = j|parent = i), and for each pair (not set) of parents that share a child, P (parent = j|coparent = i). [sent-128, score-0.06]
78 These are all immediately available from a junction tree, or they can be obtained with a run of belief propagation. [sent-129, score-0.091]
79 It is worth noting that the apparent complexity of the λ, µ message propagation equations is due to the Bayes Net representation. [sent-130, score-0.157]
80 5 Experiments The performance of the message-passing algorithm (hereafter CostProp) was compared with a standard information gain algorithm which uses mutual information (hereafter MI). [sent-132, score-0.08]
81 The error function used by MI is from Equation 2, where Error mlog (P (X )||X ∗ ) = ∗ x∈X log P (x ), with a corresponding risk function Risk(P (X )) = x∈X H(x). [sent-133, score-0.264]
82 This corresponds to selecting the node x that has the highest summed mutual information with each of the target nodes (in this case, the set of target nodes is X and the set of query nodes is also X . [sent-134, score-0.846]
83 ) The computational cost of MI grows quadratically as the product of the number of queries and of targets. [sent-135, score-0.276]
84 In order to test the speed and relative accuracy of CostProp, we generated random polytrees with varying numbers of trinary nodes. [sent-136, score-0.199]
85 Speed: We generated polytrees of sizes ranging from 2 to 1000 nodes and ran the MI algorithm, the CostProp algorithm, and a random-query algorithm on each. [sent-139, score-0.287]
86 The two nonrandom algorithms were run using a junction tree, the build time of which was not included in the reported run times of either algorithm. [sent-140, score-0.06]
87 Accuracy: Due to the slow running time of MI, the accuracy comparison was performed on polytrees of size 20. [sent-143, score-0.199]
88 For each run, a true assignment X ∗ was generated from the tree, but were initially hidden from the algorithms. [sent-144, score-0.04]
89 Each algorithm would then determine for itself the best node to observe, receive the true value of that node, then select the next node to observe, et cetera. [sent-145, score-0.532]
90 The true error at each step was computed as the 0-1 error of Equation 3. [sent-146, score-0.118]
91 The reduction in error plotted against number of queries is shown in Figure 2. [sent-147, score-0.141]
92 With uniform 120 Random MI Cost Prop 100 6000 5000 True Cost True Cost 80 60 40 4000 3000 2000 20 0 Random MI Cost Prop 7000 1000 5 10 15 Number of Queries 20 Figure 2: Performance on synthetic polytrees with symmetric costs. [sent-148, score-0.193]
93 0 0 5 10 15 Number of Queries 20 Figure 3: Performance on synthetic polytrees with asymmetric costs. [sent-149, score-0.193]
94 cost matrices, performance of MI and CostProp are approximately equal on this task, but both are better than random. [sent-150, score-0.174]
95 We next made the cost matrices asymmetric by initializing them such that confusing one pair of states was 100 times more costly than confusing the other two pairs. [sent-151, score-0.337]
96 The results of Figure 3 show that CostProp reduces error faster than MI, presumably because it can accomodate the cost matrix information. [sent-152, score-0.242]
97 6 Discussion We have described an all-pairs information gain calculation that scales linearly with network size. [sent-153, score-0.137]
98 The objective function used has a polynomial form that allows for an efficient message-passing algorithm. [sent-154, score-0.095]
99 Empirical results demonstrate large speedups and even improved accuracy in cost-sensitive domains. [sent-155, score-0.119]
100 Future work will explore other applications of this method, including sensitivity analysis and active learning. [sent-156, score-0.103]
wordName wordTfidf (topN-words)
[('pt', 0.41), ('pu', 0.32), ('node', 0.246), ('risk', 0.225), ('ra', 0.206), ('cu', 0.192), ('cost', 0.174), ('messages', 0.164), ('costprop', 0.161), ('polytrees', 0.161), ('cx', 0.153), ('mi', 0.153), ('pb', 0.14), ('py', 0.139), ('nodes', 0.126), ('child', 0.122), ('query', 0.115), ('message', 0.108), ('parent', 0.107), ('queries', 0.102), ('beliefs', 0.098), ('cb', 0.093), ('ch', 0.093), ('gaag', 0.081), ('kjrulff', 0.081), ('speedups', 0.081), ('pa', 0.07), ('prop', 0.07), ('xy', 0.068), ('diagnosis', 0.068), ('equation', 0.067), ('pc', 0.066), ('sensitivity', 0.065), ('network', 0.065), ('objective', 0.064), ('junction', 0.06), ('parents', 0.06), ('rx', 0.056), ('agostak', 0.054), ('brigham', 0.054), ('errorlog', 0.054), ('errormlog', 0.054), ('krishnamurthy', 0.054), ('laskey', 0.054), ('networkwide', 0.054), ('yx', 0.054), ('anderson', 0.053), ('propagation', 0.049), ('matrices', 0.048), ('mutual', 0.047), ('cpt', 0.047), ('misclassi', 0.044), ('confusing', 0.043), ('xc', 0.043), ('misclassifying', 0.043), ('abc', 0.043), ('kohavi', 0.043), ('conditionally', 0.042), ('moore', 0.041), ('net', 0.041), ('true', 0.04), ('cc', 0.04), ('wolpert', 0.04), ('toolbox', 0.04), ('substituting', 0.04), ('der', 0.04), ('error', 0.039), ('calculation', 0.039), ('accuracy', 0.038), ('active', 0.038), ('ux', 0.037), ('effect', 0.037), ('loss', 0.037), ('marginal', 0.036), ('murphy', 0.036), ('inhomogeneous', 0.036), ('costs', 0.035), ('bc', 0.034), ('gain', 0.033), ('ij', 0.033), ('xu', 0.033), ('termination', 0.033), ('synthetic', 0.032), ('px', 0.032), ('hereafter', 0.032), ('polynomial', 0.031), ('van', 0.031), ('belief', 0.031), ('independence', 0.031), ('passed', 0.03), ('marginals', 0.03), ('sent', 0.03), ('management', 0.03), ('nition', 0.03), ('target', 0.03), ('states', 0.029), ('graphical', 0.029), ('matrix', 0.029), ('graph', 0.029), ('explaining', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 70 nips-2005-Fast Information Value for Graphical Models
Author: Brigham S. Anderson, Andrew W. Moore
Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.
2 0.16118552 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
Author: O. P. Kreidl, Alan S. Willsky
Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1
3 0.15754187 125 nips-2005-Message passing for task redistribution on sparse graphs
Author: K. Wong, Zhuo Gao, David Tax
Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efficient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.
4 0.13796599 134 nips-2005-Neural mechanisms of contrast dependent receptive field size in V1
Author: Jim Wielaard, Paul Sajda
Abstract: Based on a large scale spiking neuron model of the input layers 4Cα and β of macaque, we identify neural mechanisms for the observed contrast dependent receptive field size of V1 cells. We observe a rich variety of mechanisms for the phenomenon and analyze them based on the relative gain of excitatory and inhibitory synaptic inputs. We observe an average growth in the spatial extent of excitation and inhibition for low contrast, as predicted from phenomenological models. However, contrary to phenomenological models, our simulation results suggest this is neither sufficient nor necessary to explain the phenomenon.
5 0.13665445 46 nips-2005-Consensus Propagation
Author: Benjamin V. Roy, Ciamac C. Moallemi
Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1
6 0.12620004 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis
7 0.10845879 95 nips-2005-Improved risk tail bounds for on-line algorithms
8 0.10142334 2 nips-2005-A Bayes Rule for Density Matrices
9 0.091538973 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
10 0.090055831 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
11 0.089190237 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
12 0.084142514 121 nips-2005-Location-based activity recognition
13 0.071554691 187 nips-2005-Temporal Abstraction in Temporal-difference Networks
14 0.070533633 33 nips-2005-Bayesian Sets
15 0.066955127 58 nips-2005-Divergences, surrogate loss functions and experimental design
16 0.066587456 47 nips-2005-Consistency of one-class SVM and related algorithms
17 0.064144671 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
18 0.061407208 114 nips-2005-Learning Rankings via Convex Hull Separation
19 0.060094208 41 nips-2005-Coarse sample complexity bounds for active learning
20 0.058321938 140 nips-2005-Nonparametric inference of prior probabilities from Bayes-optimal behavior
topicId topicWeight
[(0, 0.19), (1, 0.037), (2, 0.032), (3, -0.085), (4, 0.013), (5, 0.032), (6, -0.283), (7, 0.144), (8, -0.01), (9, 0.236), (10, 0.127), (11, -0.002), (12, -0.041), (13, -0.039), (14, 0.054), (15, 0.105), (16, -0.022), (17, -0.143), (18, -0.12), (19, 0.159), (20, -0.056), (21, -0.036), (22, 0.082), (23, 0.049), (24, 0.001), (25, 0.048), (26, 0.003), (27, 0.143), (28, -0.096), (29, -0.091), (30, -0.032), (31, 0.033), (32, -0.081), (33, -0.088), (34, -0.161), (35, -0.138), (36, -0.119), (37, -0.101), (38, 0.111), (39, -0.036), (40, -0.033), (41, 0.015), (42, 0.054), (43, -0.003), (44, 0.045), (45, -0.068), (46, -0.03), (47, 0.0), (48, -0.027), (49, -0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.96073663 70 nips-2005-Fast Information Value for Graphical Models
Author: Brigham S. Anderson, Andrew W. Moore
Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.
2 0.64559132 125 nips-2005-Message passing for task redistribution on sparse graphs
Author: K. Wong, Zhuo Gao, David Tax
Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efficient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.
3 0.60076612 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
Author: O. P. Kreidl, Alan S. Willsky
Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1
4 0.52491403 46 nips-2005-Consensus Propagation
Author: Benjamin V. Roy, Ciamac C. Moallemi
Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1
5 0.49197698 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
6 0.45786035 134 nips-2005-Neural mechanisms of contrast dependent receptive field size in V1
7 0.45067111 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis
8 0.4316321 121 nips-2005-Location-based activity recognition
9 0.36155456 33 nips-2005-Bayesian Sets
10 0.34958822 59 nips-2005-Dual-Tree Fast Gauss Transforms
11 0.34890774 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition
12 0.33139297 95 nips-2005-Improved risk tail bounds for on-line algorithms
13 0.31808659 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
14 0.30305207 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
15 0.29003337 187 nips-2005-Temporal Abstraction in Temporal-difference Networks
16 0.28423032 2 nips-2005-A Bayes Rule for Density Matrices
17 0.2809988 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine
18 0.27684081 160 nips-2005-Query by Committee Made Real
19 0.26635179 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
20 0.26002967 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
topicId topicWeight
[(3, 0.038), (10, 0.032), (27, 0.023), (31, 0.041), (34, 0.075), (41, 0.015), (55, 0.539), (69, 0.044), (73, 0.029), (88, 0.054), (91, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.93064994 70 nips-2005-Fast Information Value for Graphical Models
Author: Brigham S. Anderson, Andrew W. Moore
Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.
2 0.8938337 192 nips-2005-The Information-Form Data Association Filter
Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun
Abstract: This paper presents a new filter for online data association problems in high-dimensional spaces. The key innovation is a representation of the data association posterior in information form, in which the “proximity” of objects and tracks are expressed by numerical links. Updating these links requires linear time, compared to exponential time required for computing the exact posterior probabilities. The paper derives the algorithm formally and provides comparative results using data obtained by a real-world camera array and by a large-scale sensor network simulation.
3 0.88945138 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
Author: Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis
Abstract: We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the ℓ1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weights. Mirror descent algorithms have been developed in different contexts and they are known to be particularly efficient in high dimensional problems. Moreover their implementation is adapted to the online setting. The main result of the paper is the upper bound on the convergence rate for the generalization error. 1
4 0.50526094 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
Author: O. P. Kreidl, Alan S. Willsky
Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1
5 0.49970159 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
Author: Jin Yu, Douglas Aberdeen, Nicol N. Schraudolph
Abstract: Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods. 1
6 0.47863388 20 nips-2005-Affine Structure From Sound
7 0.45062387 46 nips-2005-Consensus Propagation
8 0.44876578 125 nips-2005-Message passing for task redistribution on sparse graphs
9 0.4312425 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception
10 0.42829257 86 nips-2005-Generalized Nonnegative Matrix Approximations with Bregman Divergences
11 0.42286658 58 nips-2005-Divergences, surrogate loss functions and experimental design
12 0.41040927 50 nips-2005-Convex Neural Networks
13 0.39998066 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
14 0.3994112 197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments
15 0.39929706 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
16 0.3992267 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
17 0.39905179 121 nips-2005-Location-based activity recognition
18 0.39667785 47 nips-2005-Consistency of one-class SVM and related algorithms
19 0.39481696 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models
20 0.39089948 178 nips-2005-Soft Clustering on Graphs