nips nips2005 nips2005-96 knowledge-graph by maker-knowledge-mining

96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach


Source: pdf

Author: O. P. Kreidl, Alan S. Willsky

Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. [sent-5, score-0.565]

2 The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. [sent-6, score-0.122]

3 We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. [sent-7, score-0.096]

4 1 Introduction Given a probabilistic model with discrete-valued hidden variables, Belief Propagation (BP) and related graph-based algorithms are commonly employed to solve for the Maximum APosteriori (MAP) assignment (i. [sent-8, score-0.164]

5 , the mode of the joint distribution of all hidden variables) and Maximum-Posterior-Marginal (MPM) assignment (i. [sent-10, score-0.164]

6 , the modes of the marginal distributions of every hidden variable) [1]. [sent-12, score-0.097]

7 Specifically, each computation event corresponds to a node evaluating its local processing rule, or a function by which all messages received in the preceding communication event map to messages sent in the next communication event. [sent-14, score-1.057]

8 Practically, the viability of BP appears to rest upon an implicit assumption that network communication resources are abundant. [sent-15, score-0.326]

9 In a general network, because termination of the algorithm is in question, the required communication resources are a-priori unbounded. [sent-16, score-0.21]

10 Even when termination can be guaranteed, transmission of exact messages presumes communication channels with infinite capacity (in bits per observation), or at least of sufficiently high bandwidth such that the resulting finite message precision is essentially error-free. [sent-17, score-0.507]

11 , energy-limited wireless sensor networks), it may be prohibitively costly to justify such idealized online communications. [sent-20, score-0.054]

12 While recent evidence suggests substantial but “small-enough” message errors will not alter the behavior of BP [2], [3], it also suggests BP may perform poorly when communication is very constrained. [sent-21, score-0.226]

13 Assuming communication constraints are severe, we examine the extent to which alternative processing rules can avoid a loss in (MAP or MPM) performance. [sent-22, score-0.325]

14 Specifically, given a directed graphical model with binary-valued hidden variables and real-valued noisy observations, we assume each node may broadcast only to its children a single binary-valued message. [sent-23, score-0.573]

15 We cast the problem within a variational formulation [4], seeking to minimize a decision-theoretic penalty function subject to such online communication constraints. [sent-24, score-0.423]

16 To our knowledge, that this relaxation permits analytical progress given any directed acyclic network is new. [sent-28, score-0.354]

17 Moreover, for MPM assignment in a tree-structured network, we discover an added convenience with respect to the envisioned distributed processor setting: the offline computation itself admits an efficient message-passing interpretation. [sent-29, score-0.107]

18 Section 2 details the decision-theoretic variational formulation for discrete-variable assignment. [sent-31, score-0.113]

19 Section 3 summarizes the main results derived from its connection to decentralized detection, culminating in the offline message-passing algorithm and the assumptions that guarantee convergence and maximal efficiency. [sent-32, score-0.156]

20 The ensuing optimization problem is expressed by J(γ ∗ ) = min J(γ) subject to γ ∈ ΓG , (1) γ∈Γ ∗ where γ then represents an optimal network-constrained strategy for discrete-variable assignment. [sent-36, score-0.132]

21 1 Decision-Theoretic Penalty Function Let U = γ(Y ) denote the decision process induced from the observation process Y by any candidate assignment strategy γ ∈ Γ. [sent-39, score-0.35]

22 If we associate a numeric “cost” c(u, x) to every possible joint realization of (U, X), then the expected cost is a well-posed penalty function: J(γ) = E [c (γ(Y ), X)] = E [E [c(γ(Y ), X) | Y ]] . [sent-40, score-0.191]

23 (2) Expanding the inner expectation and recognizing p(x|y) to be proportional to p(x)p(y|x) for every y such that p(y) > 0, it follows that γ ∗ minimizes (2) over Γ if and only if ¯ γ ∗ (Y ) = arg ¯ min u∈{0,1}N x∈{0,1}N p(x)c(u, x)p(Y |x) with probability one. [sent-41, score-0.113]

24 (3) Of note are (i) the likelihood function p(Y |x) is a finite-dimensional sufficient statistic of Y , (ii) real-valued coefficients ¯ x) provide a finite parameterization of the function b(u, space Γ and (iii) optimal coefficient values ¯∗ (u, x) = p(x)c(u, x) are computable offline. [sent-42, score-0.092]

25 b Before introducing communication constraints, we illustrate by examples how the decisiontheoretic penalty function relates to familiar discrete-variable assignment problems. [sent-43, score-0.363]

26 Then (2) and (3) specialize to, respectively, the word error rate (viewing each x as an N -bit word) and the MAP strategy: γ ∗ (Y ) = arg max p(x|Y ) with probability one. [sent-45, score-0.082]

27 ¯ x∈{0,1}N N Example 2: Let c(u, x) = n=1 cn (un , xn ), where each cn indicates whether un = xn . [sent-46, score-1.114]

28 Then (2) and (3) specialize to, respectively, the bit error rate and the MPM strategy: γ ∗ (Y ) = ¯ 2. [sent-47, score-0.079]

29 , arg max p(xN |Y ) x1 ∈{0,1} xN ∈{0,1} with probability one. [sent-51, score-0.037]

30 Network Communication Constraints Let G(V, E) be any directed acyclic graph with vertex set V = {1, . [sent-52, score-0.185]

31 , N } and edge set E = {(i, j) ∈ V × V | i ∈ π(j) ⇔ j ∈ χ(i)}, where index sets π(n) ⊂ V and χ(n) ⊂ V indicate, respectively, the parents and children of each node n ∈ V. [sent-55, score-0.359]

32 Without loss-of-generality, we assume the node labels respect the natural partial-order implied by the graph G; specifically, we assume every node n has parent nodes π(n) ⊂ {1, . [sent-56, score-0.625]

33 Local to each node n ∈ V are the respective components Xn and Yn of the joint process (X, Y ). [sent-63, score-0.224]

34 , max-product in Example 1, sum-product in Example 2) require at least 2|E| real-valued messages per observation Y = y, one per direction along each edge in G. [sent-66, score-0.241]

35 In contrast, we insist upon a single forward-pass through G where each node n broadcasts to its children (if any) a single binary-valued message. [sent-67, score-0.413]

36 This yields communication overhead of only |E| bits per observation Y = y, but also renders the minimizing strategy of (3) infeasible. [sent-68, score-0.376]

37 γ Specifically, we now translate the stipulated restrictions on communication into explicit constraints on the function space Γ over which to minimize (2). [sent-70, score-0.25]

38 The simplest such translation assumes the binary-valued message produced by node n also determines the respective component un in decision vector u = γ(y). [sent-71, score-0.946]

39 Recognizing that every node n receives the messages uπ(n) from its parents (if any) as side information to yn , any function of the form γn : R × {0, 1}|π(n)| → {0, 1} is a feasible processing rule; we denote the set of all such rules by Γn . [sent-72, score-0.849]

40 Then, every strategy in the set ΓG = Γ1 × · · · × ΓN respects the constraints. [sent-73, score-0.172]

41 3 Summary of Main Results As stated in Section 1, the variational formulation presented in Section 2 can be viewed as an extension of the optimization problem underlying decentralized Bayesian detection [5], [6]. [sent-74, score-0.242]

42 , the N -node chain), it is known that exact solution to (1) is NP-hard, stemming from the absence of a guarantee that γ ∗ ∈ ΓG possesses a finite parameterization. [sent-77, score-0.036]

43 Also known is that analytical progress can be made for a ∗ ∗ relaxation of (1), which is based on the following intuition: if strategy γ ∗ = (γ1 , . [sent-78, score-0.256]

44 , γN ) G is optimal over Γ , then for each n and assuming all components i ∈ V\n are fixed at ∗ ∗ rules γi , the component rule γn must be optimal over Γn . [sent-81, score-0.23]

45 Decentralized detection has roots in team decision theory [7], a subset of game theory, in which the relaxation is named person-by-person (pbp) optimality. [sent-82, score-0.169]

46 Nonetheless, pbp-optimality (along with a specialized observation process) justifies a particular finite parameterization for the function space ΓG , leading to a nonlinear fixed-point equation and an iterative algorithm with favorable convergence properties. [sent-84, score-0.174]

47 Example 3: Consider the MPM assignment problem in Example 2, assuming N = 2 and distribution p(x, y) is defined by positive-valued parameters α, β1 and β2 as follows: p(x) ∝ 1 α , x1 = x2 , x1 = x2 N and p(y|x) = (y − βn xn )2 1 √ exp − n . [sent-86, score-0.272]

48 2 2π n=1 Note that X1 and X2 are marginally uniform and α captures their correlation (positive, zero, or negative when α is less than, equal to, or greater than unity, respectively), while Y captures the presence of additive white Gaussian noise with signal-to-noise ratio at node n equal to βn . [sent-87, score-0.224]

49 , as if the myopic decision by node 1 is always correct). [sent-91, score-0.344]

50 Figure 1 compares these strategies and a pbp-optimal strategy γ k —only γ k is both feasible and consistently “hedging” against all uncertainty i. [sent-92, score-0.207]

51 γ Example 4: Extend Example 3 to N > 2 nodes, but assuming X is equally-likely to be all zeros or all ones (i. [sent-106, score-0.074]

52 The MPM strategy employs thresholds ηn = i∈V\n 1/Li (yi ) for all n, leading to U = γ ∗ (Y ) also being all zeros or all ones; ¯∗ ¯ thus, its cost distribution, or the probability mass function for c(¯ ∗ (Y ), X), has mass only γ 0 on the values 0 and N . [sent-109, score-0.39]

53 The myopic strategy employs thresholds ηn = 1 for all n, leading to 0 independent and identically-distributed (binary-valued) random variables cn (γn (Yn ), Xn ); thus, its cost distribution, approaching a normal shape as N gets large, has mass on all values 0, 1, . [sent-110, score-0.529]

54 8 0 probability mass function k γ1 0 0 0 0. [sent-124, score-0.036]

55 Illustration of the iterative offline computation given p(x, y) as described in Example 4 and the directed network shown (N = 12). [sent-128, score-0.188]

56 05)—thus, with a total of just |E| = 11 γ bits of communication, the pbp-optimal strategy γ 3 recovers roughly 28% of the loss J(γ 0 ) − J(¯ ∗ ). [sent-131, score-0.201]

57 Lemma 1 The minimum penalty J(γ ∗ ) defined in (1) is, firstly, achievable by a deterministic1 strategy and, secondly, equivalently defined by J(γ ∗ ) = min p(u|y) p(x) x∈{0,1}N c(u, x) u∈{0,1}N subject to p(u|y) = n∈V p(u|y)p(y|x) dy y∈RN p(un |yn , uπ(n) ). [sent-134, score-0.221]

58 Lemma 1 is primarily of conceptual value, establishing a correspondence between fixing a component rule γn ∈ Γn and inducing a decision process Un from the information (Yn , Uπ(n) ) local to node n. [sent-135, score-0.543]

59 The following assumption permits analytical progress towards a finite parameterization for each function space Γn and the basis of an offline algorithm. [sent-136, score-0.199]

60 Assumption 1 The observation process Y satisfies p(y|x) = n∈V p(yn |x). [sent-137, score-0.04]

61 Upon fixing a deterministic rule γn ∈ Γn local to node n (in correspondence with p(un |yn , uπ(n) ) by virtue of Lemma 1), we have the identity p(un |x, uπ(n) ) = yn ∈R p(un |yn , uπ(n) )p(yn |x) dyn . [sent-139, score-0.702]

62 (4) Moreover, upon fixing a deterministic strategy γ ∈ ΓG , we have the identity p(u|x) = n∈V p(un |x, uπ(n) ). [sent-140, score-0.253]

63 (5) Lemma 2 implies fixing component rule γn ∈ Γn is in correspondence with inducing the conditional distribution p(un |x, uπ(n) ), now a probabilistic description that persists local to node n no matter the rule γi at any other node i ∈ V\n. [sent-141, score-0.751]

64 That deterministic strategies suffice, however, justifies “post-hoc” our initial abuse of notation for elements in the set Γ. [sent-143, score-0.119]

65 We now argue that, despite these simplifications, the component rules of γ ∗ continue to be globally coupled. [sent-145, score-0.134]

66 Starting with any deterministic strategy γ ∈ ΓG , consider optimizing the nth component rule γn over Γn assuming all other components stay fixed. [sent-146, score-0.316]

67 With γn a degree-of-freedom, decision process Un is no longer well-defined so each un ∈ {0, 1} merely represents a candidate decision local to node n. [sent-147, score-0.972]

68 Online, each local decision will be made only upon receiving both the local observation Yn = yn and all parents’ local decisions Uπ(n) = uπ(n) . [sent-148, score-0.631]

69 It follows that node n, upon deciding a particular un , may assert that random vector U is re′ stricted to values in the subset U[uπ(n) , un ] = {u′ ∈ {0, 1}N | u′ π(n) = uπ(n) , un = un }. [sent-149, score-2.523]

70 Then, viewing (Yn , Uπ(n) ) as a composite local observation and proceeding in the manner by which (3) is derived, the pbp-optimal relaxation of (1) reduces to the following form. [sent-150, score-0.199]

71 2 Offline Message-Passing Algorithm Let fn map from coefficients {bi ; i ∈ V\n} to coefficients bn by the following operations: 1. [sent-155, score-0.179]

72 compute bn via (7) given p(x), c(u, x) and {p(ui |x, uπ(i) ); i ∈ V\n}. [sent-157, score-0.094]

73 Then, the simultaneous satisfaction of Proposition 1 at all N nodes can be viewed as a system of 2N +1 n∈V 2|π(n)| nonlinear equations in as many unknowns, bn = fn (b1 , . [sent-158, score-0.191]

74 , b0 ) and generate the sequence 1 N {bk } using a component-wise iterative application of f in (8) i. [sent-172, score-0.045]

75 γ Direct implementation of (9) is clearly imprudent from a computational perspective, because the transformation from fixed coefficients bk to the corresponding distribution n pk (un |x, uπ(n) ) need not be repeated within every component evaluation of f . [sent-188, score-0.27]

76 In fact, assuming every node n stores in memory its own likelihood function p(yn |x), this transformation can be accomplished locally (cf. [sent-189, score-0.336]

77 (4) and (6)) and, also assuming the resulting distribution is broadcast to all other nodes before they proceed with their subsequent component evaluation of f , the termination guarantee of Proposition 2 is retained. [sent-190, score-0.29]

78 Requiring every node to perform a network-wide broadcast within every iteration k makes (9) a decidedly global algorithm, not to mention that each node n must also store in memory p(x, yn ) and c(u, x) to carry forth the supporting local computations. [sent-191, score-1.009]

79 Assumption 2 The cost function satisfies c(u, x) = n∈V cn (un , x) for some collection of functions {cn : {0, 1}N +1 → R} and the directed graph G is tree-structured. [sent-192, score-0.354]

80 An intuitive interpretation of Proposition 3, from the perspective of node n, is as follows. [sent-195, score-0.224]

81 From (10) in the forward pass, the messages received from each parent define what, during subsequent online operation, that parent’s local decision means (in a likelihood sense) about its ancestors’ outputs and the hidden process. [sent-196, score-0.495]

82 From (12) in the reverse pass, the messages received from each child define what the local decision will mean (in an expected cost sense) to that child and its descendants. [sent-197, score-0.5]

83 From (11), both types of incoming messages impact the local rule update and, in turn, the outgoing messages to both types of neighbors. [sent-198, score-0.515]

84 While Proposition 3 alleviates the need for the iterative global broadcast of distributions pk (un |x, uπ(n) ), the explicit dependence of (10)-(12) on the full vector x implies the memory and computation requirements local to each node can still be exponential in N . [sent-199, score-0.49]

85 Assumption 3 The hidden process X is Markov on G, or p(x) = n∈V p(xn |xπ(n) ), and all component likelihoods/costs satisfy p(yn |x) = p(yn |xn ) and cn (un , x) = cn (un , xn ). [sent-200, score-0.543]

86 Proposition 4 Under Assumption 3, the iterates in Proposition 3 specialize to the form of bk (un , xn ; uπ(n) ), n k Pn→j (un |xn ) k and Cn→i (ui , xi ), k = 0, 1, . [sent-201, score-0.298]

87 and each node n need only store in memory p(xπ(n) , xn , yn ) and cn (un , xn ) to carry forth the supporting local computations. [sent-204, score-1.037]

88 ) Proposition 4 implies the convergence properties of Proposition 2 are upheld with maximal efficiency (linear in N ) when G is tree-structured and the global distribution and costs satisfy p(x, y) = n∈V p(xn |xπ(n) )p(yn |xn ) and c(u, x) = n∈V cn (un , xn ), respectively. [sent-206, score-0.313]

89 Note that these conditions hold for the MPM assignment problems in Examples 3 & 4. [sent-207, score-0.107]

90 4 Discussion Our decision-theoretic variational approach reflects several departures from existing methods for communication-constrained inference. [sent-208, score-0.071]

91 Firstly, instead of imposing the constraints on an algorithm derived from an ideal model, we explicitly model the constraints and derive a different algorithm. [sent-209, score-0.072]

92 Secondly, our penalty function drives the approximation by the desired application of inference (e. [sent-210, score-0.089]

93 Thirdly, the necessary offline computation gives rise to a downside, namely less flexibility against time-varying statistical environments, decision objectives or network conditions. [sent-215, score-0.113]

94 Similar to the sum-product version of Belief Propagation (BP), our message-passing algorithm originates assuming a tree structure, an additive cost and a synchronous message schedule. [sent-217, score-0.162]

95 That we solve for correlated equilibria and depend on probabilistic structure commensurate with cost structure for efficiency is in common with graphical games [10], which distinctly are formulated on undirected graphs and absent of hidden variables. [sent-223, score-0.205]

96 Finally, our offline computation resembles learning a conditional random field [11], in the sense that factors of p(u|x) are iteratively modified to reduce penalty J(γ); online computation via strategy u = γ(y), repeated per realization Y = y, is then viewed as sampling from this distribution. [sent-224, score-0.275]

97 Along the learning thread, a special case of our formulation appears in [12], but assuming p(x, y) is unknown. [sent-225, score-0.083]

98 Acknowledgments This work supported by the Air Force Office of Scientific Research under contract FA955004-1 and by the Army Research Office under contract DAAD19-00-1-0466. [sent-226, score-0.06]

99 Data association based on optimization in graphical models with application to sensor networks. [sent-234, score-0.051]

100 Posterior assignment in directed graphical models with minimal online communication. [sent-273, score-0.313]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('un', 0.548), ('mpm', 0.259), ('yn', 0.224), ('node', 0.224), ('messages', 0.201), ('communication', 0.167), ('cn', 0.159), ('ine', 0.142), ('strategy', 0.132), ('bk', 0.129), ('ui', 0.126), ('proposition', 0.126), ('xn', 0.124), ('assignment', 0.107), ('directed', 0.101), ('bn', 0.094), ('rules', 0.09), ('decentralized', 0.09), ('penalty', 0.089), ('bp', 0.088), ('upon', 0.077), ('strategies', 0.075), ('broadcast', 0.075), ('decision', 0.071), ('variational', 0.071), ('parents', 0.07), ('lemma', 0.066), ('children', 0.065), ('cost', 0.062), ('message', 0.059), ('relaxation', 0.059), ('parameterization', 0.059), ('local', 0.058), ('pk', 0.057), ('hidden', 0.057), ('rule', 0.055), ('online', 0.054), ('xing', 0.054), ('child', 0.054), ('parent', 0.054), ('acyclic', 0.052), ('store', 0.052), ('nodes', 0.051), ('graphical', 0.051), ('correspondence', 0.05), ('employs', 0.05), ('myopic', 0.049), ('coef', 0.049), ('broadcasts', 0.047), ('kreidl', 0.047), ('stipulated', 0.047), ('virtue', 0.047), ('fn', 0.046), ('specialize', 0.045), ('receiving', 0.045), ('iterative', 0.045), ('deterministic', 0.044), ('component', 0.044), ('termination', 0.043), ('cients', 0.043), ('formulation', 0.042), ('network', 0.042), ('viewing', 0.042), ('thresholds', 0.041), ('inducing', 0.041), ('forth', 0.041), ('assuming', 0.041), ('observation', 0.04), ('every', 0.04), ('assumption', 0.04), ('map', 0.039), ('detection', 0.039), ('iii', 0.037), ('arg', 0.037), ('bits', 0.037), ('constraints', 0.036), ('mass', 0.036), ('guarantee', 0.036), ('belief', 0.036), ('pass', 0.036), ('recognizing', 0.036), ('analytical', 0.036), ('permits', 0.035), ('equilibria', 0.035), ('ce', 0.035), ('bit', 0.034), ('computable', 0.033), ('zeros', 0.033), ('loss', 0.032), ('graph', 0.032), ('memory', 0.031), ('rn', 0.031), ('nite', 0.031), ('illustrative', 0.03), ('deciding', 0.03), ('contract', 0.03), ('respectively', 0.03), ('convergence', 0.03), ('progress', 0.029), ('realized', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

Author: O. P. Kreidl, Alan S. Willsky

Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1

2 0.16118552 70 nips-2005-Fast Information Value for Graphical Models

Author: Brigham S. Anderson, Andrew W. Moore

Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.

3 0.13833986 46 nips-2005-Consensus Propagation

Author: Benjamin V. Roy, Ciamac C. Moallemi

Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1

4 0.13222867 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

Author: Aurelie C. Lozano, Sanjeev R. Kulkarni, Robert E. Schapire

Abstract: We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary β-mixing sequences. Utilizing a technique that constructs a sequence of independent blocks close in distribution to the original samples, we prove the consistency of the composite classifiers resulting from a regularization achieved by restricting the 1-norm of the base classifiers’ weights. When compared to the i.i.d. case, the nature of sampling manifests in the consistency result only through generalization of the original condition on the growth of the regularization parameter.

5 0.12835592 125 nips-2005-Message passing for task redistribution on sparse graphs

Author: K. Wong, Zhuo Gao, David Tax

Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efficient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.

6 0.099007986 121 nips-2005-Location-based activity recognition

7 0.095859841 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

8 0.091235429 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

9 0.090779923 58 nips-2005-Divergences, surrogate loss functions and experimental design

10 0.088388398 183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity

11 0.083246432 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

12 0.076502711 69 nips-2005-Fast Gaussian Process Regression using KD-Trees

13 0.07579869 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models

14 0.075514503 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting

15 0.072729871 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

16 0.071776688 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

17 0.066247158 184 nips-2005-Structured Prediction via the Extragradient Method

18 0.064750381 38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks

19 0.063850708 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

20 0.063815899 85 nips-2005-Generalization to Unseen Cases


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.222), (1, 0.054), (2, 0.037), (3, -0.058), (4, 0.059), (5, -0.055), (6, -0.289), (7, 0.141), (8, 0.008), (9, 0.114), (10, 0.039), (11, -0.077), (12, -0.077), (13, -0.01), (14, 0.046), (15, 0.0), (16, -0.058), (17, 0.033), (18, 0.058), (19, 0.078), (20, -0.043), (21, 0.012), (22, -0.017), (23, -0.015), (24, -0.059), (25, -0.088), (26, -0.073), (27, 0.121), (28, -0.027), (29, -0.118), (30, -0.024), (31, -0.087), (32, 0.054), (33, -0.031), (34, -0.134), (35, -0.083), (36, 0.013), (37, 0.097), (38, 0.076), (39, -0.056), (40, -0.009), (41, -0.033), (42, -0.039), (43, 0.026), (44, 0.059), (45, 0.074), (46, 0.031), (47, 0.033), (48, 0.05), (49, -0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97214919 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

Author: O. P. Kreidl, Alan S. Willsky

Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1

2 0.71200281 46 nips-2005-Consensus Propagation

Author: Benjamin V. Roy, Ciamac C. Moallemi

Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1

3 0.69948536 125 nips-2005-Message passing for task redistribution on sparse graphs

Author: K. Wong, Zhuo Gao, David Tax

Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efficient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.

4 0.68826437 70 nips-2005-Fast Information Value for Graphical Models

Author: Brigham S. Anderson, Andrew W. Moore

Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.

5 0.67840016 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson

Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1

6 0.66849816 121 nips-2005-Location-based activity recognition

7 0.54425496 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

8 0.40156579 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

9 0.39023381 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting

10 0.38537464 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

11 0.3525559 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

12 0.35001513 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

13 0.34733903 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

14 0.34606856 183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity

15 0.33815393 184 nips-2005-Structured Prediction via the Extragradient Method

16 0.33343607 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

17 0.31319165 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

18 0.31229919 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

19 0.31154859 69 nips-2005-Fast Gaussian Process Regression using KD-Trees

20 0.30439028 153 nips-2005-Policy-Gradient Methods for Planning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.047), (10, 0.041), (27, 0.035), (29, 0.171), (31, 0.09), (34, 0.075), (39, 0.024), (41, 0.01), (50, 0.013), (55, 0.103), (69, 0.091), (73, 0.018), (77, 0.016), (88, 0.073), (91, 0.095)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89983898 53 nips-2005-Cyclic Equilibria in Markov Games

Author: Martin Zinkevich, Amy Greenwald, Michael L. Littman

Abstract: Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a new (non-stationary) equilibrium concept that we call “cyclic equilibria.” We prove that value iteration identifies cyclic equilibria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games. 1

same-paper 2 0.88986754 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

Author: O. P. Kreidl, Alan S. Willsky

Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1

3 0.70485479 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Author: Jin Yu, Douglas Aberdeen, Nicol N. Schraudolph

Abstract: Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods. 1

4 0.69680971 181 nips-2005-Spiking Inputs to a Winner-take-all Network

Author: Matthias Oster, Shih-Chii Liu

Abstract: Recurrent networks that perform a winner-take-all computation have been studied extensively. Although some of these studies include spiking networks, they consider only analog input rates. We present results of this winner-take-all computation on a network of integrate-and-fire neurons which receives spike trains as inputs. We show how we can configure the connectivity in the network so that the winner is selected after a pre-determined number of input spikes. We discuss spiking inputs with both regular frequencies and Poisson-distributed rates. The robustness of the computation was tested by implementing the winner-take-all network on an analog VLSI array of 64 integrate-and-fire neurons which have an innate variance in their operating parameters. 1

5 0.69393659 46 nips-2005-Consensus Propagation

Author: Benjamin V. Roy, Ciamac C. Moallemi

Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1

6 0.69373614 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

7 0.68529785 78 nips-2005-From Weighted Classification to Policy Search

8 0.68523395 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

9 0.6832298 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks

10 0.68316311 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

11 0.6827811 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

12 0.67973495 30 nips-2005-Assessing Approximations for Gaussian Process Classification

13 0.67265147 197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments

14 0.67212152 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

15 0.6719324 192 nips-2005-The Information-Form Data Association Filter

16 0.67015028 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

17 0.66256529 153 nips-2005-Policy-Gradient Methods for Planning

18 0.66034269 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception

19 0.66029704 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity

20 0.65737188 58 nips-2005-Divergences, surrogate loss functions and experimental design