nips nips2008 nips2008-152 knowledge-graph by maker-knowledge-mining

152 nips-2008-Non-stationary dynamic Bayesian networks

Source: pdf

Author: Joshua W. Robinson, Alexander J. Hartemink

Abstract: A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). An important assumption of DBN structure learning is that the data are generated by a stationary process—an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. We deﬁne the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). [sent-5, score-0.468]

2 In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. [sent-7, score-0.304]

3 Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. [sent-8, score-0.444]

4 1 Introduction Structure learning of dynamic Bayesian networks allows conditional dependencies to be identiﬁed in time-series data with the assumption that the data are generated by a distribution that does not change with time (i. [sent-10, score-0.398]

5 As one example, structure learning of DBNs has been used widely in reconstructing transcriptional regulatory networks from gene expression data [1]. [sent-15, score-0.464]

6 But during development, these regulatory networks are evolving over time, with certain conditional dependencies between gene products being created as the organism develops, while others are destroyed. [sent-16, score-0.435]

7 As another example, dynamic Bayesian networks have been used to identify the networks of neural information ﬂow that operate in the brains of songbirds [2]. [sent-17, score-0.432]

8 The roads upon which trafﬁc passes do not change on a daily basis, but the dynamic utilization of those roads changes daily during morning rush, lunch, evening rush, and weekends. [sent-20, score-0.243]

9 Here, we introduce a new class of graphical model called a non-stationary dynamic Bayesian network (nsDBN), in which the conditional dependence structure of the underlying data-generation 1 process is permitted to change over time. [sent-22, score-0.44]

10 1 Previous work In this paper, we are interested in identifying how the conditional dependencies between time-series change over time; thus, we focus on the task of inferring network structure as opposed to parameters of the graphical model. [sent-25, score-0.446]

11 Here we describe the few previous approaches to identifying non-stationary networks and discuss the advantages and disadvantages of each. [sent-27, score-0.224]

12 Recent work modeling the temporal progression of networks from the social networks community includes an extension to the discrete temporal network model [3], in which the the networks are latent (unobserved) variables that generate observed time-series data [4]. [sent-29, score-0.616]

13 Unfortunately, this technique has certain drawbacks: the variable correlations remain constant over time, only undirected edges can be identiﬁed, and segment or epoch divisions must be identiﬁed a priori. [sent-30, score-0.302]

14 However, some limitations of this method include: the network evolution is restricted to changing at most a single edge at a time and the total number of segments is assumed known a priori. [sent-32, score-0.313]

15 This approach is fast, has no single edge change restriction, and the number of segments is calculated a posteriori; however, it does require that the graph structure is decomposable. [sent-34, score-0.292]

16 Additionally, both of the aforementioned approaches only identify undirected edges and assume that the networks in each segment are independent, preventing data and parameters from being shared between segments. [sent-35, score-0.378]

17 2 Brief review of structure learning of Bayesian networks Bayesian networks are directed acyclic graphical models that represent conditional dependencies between variables as edges. [sent-36, score-0.625]

18 The posterior probability of a given network G (i. [sent-39, score-0.231]

19 The structure prior P (G) can be used to incorporate prior knowledge about the network structure, either about the existence of speciﬁc edges or the topology more generally (e. [sent-42, score-0.408]

20 Alternatively, sampling methods may be used to estimate a posterior over all networks [8]. [sent-53, score-0.29]

21 Ideally, the move set includes changes that allow posterior modes to be frequently visited. [sent-60, score-0.291]

22 For example, it is reasonable to assume that networks that differ by a single edge will have similar likelihoods. [sent-61, score-0.337]

23 DBNs are an extension of Bayesian networks to time-series data, enabling cyclic dependencies between variables to be modeled across time. [sent-64, score-0.241]

24 3 Learning non-stationary dynamic Bayesian networks We would like to extend the dynamic Bayesian network model to account for non-stationarity. [sent-68, score-0.438]

25 The process is non-stationary in the sense that the network of conditional dependencies prevailing at any given time is itself changing over time. [sent-72, score-0.328]

26 We call the initial network of conditional dependencies G1 and subsequent networks are called Gi for i = 2, 3, . [sent-73, score-0.425]

27 The number of edge changes speciﬁed in ∆gi is Si . [sent-78, score-0.233]

28 We deﬁne the transition time ti to be the time at which Gi is replaced by Gi+1 in the data-generation process. [sent-79, score-0.33]

29 We call the period of time between consecutive transition times—during which a single network of conditional dependencies is operative—an epoch. [sent-80, score-0.486]

30 We will refer to the entire series of prevailing networks as the structure of the nsDBN. [sent-82, score-0.3]

31 Since we wish to learn a set of networks instead of one network we must derive a new expression for the marginal likelihood. [sent-83, score-0.344]

32 Assume that there exist m different epochs with m − 1 transition times T = {t1 , . [sent-84, score-0.487]

33 The network Gi+1 prevailing in epoch i + 1 differs from network Gi prevailing in epoch i by a set of edge changes we call ∆gi . [sent-88, score-0.881]

34 , ∆gm−1 ) (4) We assume the prior over networks can be further split into independent components describing the initial network and subsequent edge changes, as demonstrated in Equation (4). [sent-113, score-0.549]

35 As in the stationary setting, if prior knowledge about particular edges or overall topology is available, an informative prior can be placed on G1 . [sent-114, score-0.229]

36 To encode this prior knowledge, we place an exponential prior with rate λs on the total number of edge changes s = i Si . [sent-118, score-0.317]

37 , a transition does not occur at every observation) by 3 placing another exponential prior with rate λm on the number of epochs m. [sent-121, score-0.442]

38 We will assume that any other sources of non-stationarity are either small enough to not alter edges in the predicted network or large enough to be approximated by edge changes in the predicted network. [sent-129, score-0.582]

39 However, in an nsDBN, a node may have multiple parent sets operative at different times. [sent-131, score-0.215]

40 Speciﬁcally, an epoch is deﬁned between adjacent transition times while an interval is deﬁned over the epochs during which a particular parent set is operative (which may include all epochs). [sent-134, score-0.86]

41 For each node i, the previous parent set πi in the BDe metric is replaced by a set of parent sets πih , where h indexes the interval Ih during which parent set πih is operative for node i. [sent-135, score-0.508]

42 Additionally, sampling allows us to answer questions like “what are the most likely transition times? [sent-146, score-0.256]

43 To achieve quick convergence, we want to ensure that every move in the move set efﬁciently jumps between posterior modes. [sent-149, score-0.375]

44 4 Different settings regarding the number and times of transitions An nsDBN can be identiﬁed under a variety of settings that differ in the level of uncertainty about the number of transitions and whether the transition times are known. [sent-151, score-0.695]

45 The different settings are abbreviated according to the type of uncertainty: whether the number of transitions is known (KN) or unknown (UN) and whether the transition times themselves are known (KT) or unknown (UT). [sent-152, score-0.458]

46 When the number and times of transitions are known a priori (KNKT setting), we only need to identify the most likely initial network G1 and sets of edge changes ∆g1 . [sent-153, score-0.6]

47 To create a move set that results in an effectively mixing chain, we consider which types of local moves result in jumps between posterior modes. [sent-158, score-0.28]

48 As mentioned earlier, structures that differ by a single edge will probably have similar likelihoods. [sent-159, score-0.21]

49 Additionally, structures that have slightly different edge change sets will have similar likelihoods. [sent-160, score-0.248]

50 The add edge, remove edge, add to edge set, remove from edge set, and move from edge set moves are listed as (M1 ) − (M5 ) in Table 1 in the Appendix. [sent-161, score-0.748]

51 When the number of transitions is known but the times are unknown a priori (KNUT setting), the transition times T must also be estimated a posteriori. [sent-163, score-0.498]

52 C: Posterior probabilities of transition times when learning an nsDBN in the UNUT setting (with λs = 1 and λm = 5). [sent-170, score-0.37]

53 The blue triangles represent the true transition times and the red dots represent one standard deviation from the mean probability obtained from several runs. [sent-171, score-0.361]

54 Structures with the same edge sets but slightly different transition times will probably have similar likelihoods. [sent-173, score-0.485]

55 Therefore, we can add a new move that proposes a local shift to one of the transition times: let d be some small positive integer and let the new time ti be drawn from a discrete uniform distribution ti ∼ DU (ti − d, ti + d) with the constraint that ti−1 < ti < ti+1 . [sent-174, score-0.797]

56 Initially, we set the m − 1 transition times so that the epochs are roughly equal in length. [sent-175, score-0.487]

57 The complete move set for this setting includes all of the moves described previously as well as the new local shift move, listed as (M6 ) in Table 1 in the Appendix. [sent-176, score-0.249]

58 Using the reversible jump Markov chain Monte Carlo sampling technique [11], we can further augment the move set to allow for the number of transitions to change. [sent-179, score-0.308]

59 Since the number of epochs m is allowed to vary, this is the only setting that incorporates the prior on m. [sent-180, score-0.253]

60 To allow the number of transitions to change during sampling, we introduce merge and split operations to the move set. [sent-181, score-0.317]

61 For the merge operation, two adjacent edge sets (∆gi and ∆gi+1 ) are combined to create a new edge set. [sent-182, score-0.39]

62 The transition time of the new edge set is selected to be the mean of the previous locations weighted by the size of each edge set: ti = (Si ti + Si+1 ti+1 )/(Si + Si+1 ). [sent-183, score-0.793]

63 For the split operation, an edge set ∆gi is randomly chosen and randomly partitioned into two new edge sets ∆gi and ∆gi+1 with all subsequent edge sets re-indexed appropriately. [sent-184, score-0.531]

64 Each new transition time is selected as described above. [sent-185, score-0.221]

65 The move set is completed with the inclusion of the add transition time and delete transition time operations. [sent-186, score-0.582]

66 These moves are similar to the split and merge operations except they also increase or decrease s, the total number of edge changes in the structure. [sent-187, score-0.314]

67 The ﬁrst experiment is on a simulated ten node network with six single-edge changes between seven 5 epochs where the length of each epoch varies between 20 and 400 observations. [sent-190, score-0.563]

68 To obtain a consensus (model averaged) structure prediction, an edge is considered present at a particular time if the posterior probability of the edge is greater than 0. [sent-196, score-0.561]

69 In the KNUT setting, transition times are unknown and must be estimated a posteriori. [sent-204, score-0.308]

70 The estimated structure and transition times are very close to the truth. [sent-209, score-0.385]

71 All edges are correct, with the exception of two missing edges in G1 , and the predicted transition times are all within 10 of the true transition times. [sent-210, score-0.802]

72 This implies that the posterior over transition times is quite smooth; therefore, the mixing rate is not greatly affected when sampling transition times. [sent-212, score-0.659]

73 We can examine the posterior probabilities of transition times over all sampled structures, shown in Figure 1C. [sent-215, score-0.433]

74 Highly probable transition times correspond closely with the true transition times indicated by blue triangles; nevertheless, some uncertainty exists on about the exact locations of t3 and t4 since the fourth epoch is exceedingly short. [sent-216, score-0.772]

75 The most probable posterior number of epochs is six, close to the true number of seven. [sent-218, score-0.305]

76 To evaluate the scalability of our technique, we also simulated data from a 100 variable network with an average of ﬁfty edges over ﬁve epochs spanning 4800 observations, with one to three edges changing between each epoch. [sent-223, score-0.604]

77 6 Results on Drosophila muscle development gene expression data We also apply our method to identify non-stationary networks using Drosophila development gene expression data from [12]. [sent-227, score-0.78]

78 This data contains expression measurements over 66 time steps of 4028 Drosophila genes throughout development and growth during the embryonic, larval, pupal, and adult stages of life. [sent-228, score-0.315]

79 Using a subset of the genes involved in muscle development, some researchers have identiﬁed a single directed network [13], while others have learned a time-varying undirected network [4]. [sent-229, score-0.668]

80 Unfortunately, no other techniques predict non-stationary directed networks, so our prediction in Figure 2C is compared to the stationary directed network in Figure 2A and the non-stationary undirected network in Figure 2B. [sent-231, score-0.5]

81 All of these genes except up are in the 6 Figure 2: Learning nsDBNS from the Drosophila muscle development data. [sent-234, score-0.324]

82 Only the edges that occurred in greater than 50 percent of the samples are shown, with thicker edges representing connections that occurred more frequently. [sent-242, score-0.222]

83 Posterior probabilities of transition times using λm = λs = 2 under the UNUT setting. [sent-244, score-0.338]

84 myosin family, which contains genes involved in muscle contraction. [sent-248, score-0.332]

85 Within the directed predictions, msp-300 primarily serves as a hub gene that regulates the other myosin family genes. [sent-249, score-0.281]

86 First, we predict interactions from myo61f to both prm and up, neither of which is predicted in the other methods, suggesting a greater role for myo61f during muscle development. [sent-253, score-0.321]

87 During muscle development in Drosophila, twi acts as a regulator of mef2 that in turn regulates some myosin family genes, including mlc1 and mhc [14]; our prediction of no direct connection from twi mirrors this biological behavior. [sent-255, score-0.559]

88 Finally, we note that in our predicted structure, actn never connects as a regulator (parent) to any other genes, unlike in the network in Figure 2A. [sent-256, score-0.297]

89 Since actn (actinin) only binds actin, we do not expect it to regulate other muscle development genes, even indirectly. [sent-257, score-0.267]

90 We can also look at the posterior probabilities of transition times and epochs under the UNUT setting. [sent-258, score-0.612]

91 Also, we see that the most probable number of epochs is three or four, mirroring closely the total number of developmental stages. [sent-261, score-0.24]

92 We generated data from an nsDBN with 66 observations and transition times at 30, 40, and 58 to mirror the number of observations in embryonic, larval, pupal, and adult stages of the experimental ﬂy data. [sent-263, score-0.416]

93 75 across all signal-to-noise ratios and experimental replicates) were obtained when λm = λs = 2, which is why we used those values to analyze the Drosophila muscle network data. [sent-267, score-0.301]

94 7 Discussion Non-stationary dynamic Bayesian networks provide a useful framework for learning Bayesian networks when the generating processes are non-stationary. [sent-268, score-0.391]

95 Using the move sets described in this paper, nsDBN learning is efﬁcient even for networks of 100 variables, generalizable to situations of varying uncertainty (KNKT, KNUT, and UNUT), and the predictions are stable over many choices of hyper-parameters. [sent-269, score-0.334]

96 Although the predicted ﬂy muscle development networks are difﬁcult to verify, simulated experiments of a similar scale demonstrate highly accurate predictions, even with noisy data and few replicates. [sent-272, score-0.501]

97 Non-stationary DBNs offer all of the advantages of DBNs (identifying directed non-linear interactions between multivariate time-series) and are additionally able to identify non-stationarities in the interactions between time-series. [sent-273, score-0.221]

98 In future work, we hope to analyze data from other ﬁelds that have traditionally used dynamic Bayesian networks and instead use nsDBNs to identify and model previously unknown or uncharacterized non-stationary behavior. [sent-274, score-0.272]

99 Inferring gene regulatory networks from time series data using the minimum description length principle. [sent-323, score-0.306]

100 A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. [sent-326, score-0.346]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nsdbn', 0.309), ('ih', 0.23), ('transition', 0.221), ('epochs', 0.179), ('gi', 0.177), ('edge', 0.177), ('unut', 0.177), ('ijk', 0.174), ('muscle', 0.165), ('gm', 0.165), ('networks', 0.16), ('knkt', 0.155), ('nijk', 0.155), ('dbns', 0.149), ('move', 0.14), ('network', 0.136), ('epoch', 0.125), ('bde', 0.124), ('drosophila', 0.116), ('edges', 0.111), ('knut', 0.11), ('larval', 0.11), ('nsdbns', 0.11), ('operative', 0.11), ('pupal', 0.11), ('ti', 0.109), ('parent', 0.105), ('transitions', 0.103), ('gene', 0.101), ('genes', 0.101), ('nij', 0.097), ('posterior', 0.095), ('times', 0.087), ('dependencies', 0.081), ('adult', 0.078), ('embryonic', 0.077), ('structure', 0.077), ('dynamic', 0.071), ('simulated', 0.067), ('mhc', 0.066), ('myosin', 0.066), ('prm', 0.066), ('regulator', 0.066), ('bayesian', 0.066), ('undirected', 0.066), ('directed', 0.064), ('dbn', 0.063), ('traf', 0.063), ('prevailing', 0.063), ('ow', 0.058), ('development', 0.058), ('changes', 0.056), ('triangles', 0.053), ('predicted', 0.051), ('metric', 0.05), ('regulates', 0.05), ('ri', 0.049), ('conditional', 0.048), ('expression', 0.048), ('si', 0.048), ('settings', 0.047), ('moves', 0.045), ('regulatory', 0.045), ('actn', 0.044), ('hartemink', 0.044), ('prevails', 0.044), ('qih', 0.044), ('twi', 0.044), ('prior', 0.042), ('identify', 0.041), ('interactions', 0.039), ('roads', 0.039), ('rush', 0.039), ('additionally', 0.038), ('change', 0.038), ('merge', 0.036), ('sampling', 0.035), ('consensus', 0.035), ('steve', 0.035), ('hanneke', 0.035), ('permitted', 0.035), ('graphical', 0.035), ('describing', 0.034), ('predictions', 0.034), ('stationary', 0.034), ('structures', 0.033), ('interval', 0.033), ('transcriptional', 0.033), ('disadvantages', 0.033), ('ij', 0.033), ('heuristic', 0.032), ('setting', 0.032), ('qi', 0.032), ('listed', 0.032), ('identifying', 0.031), ('probable', 0.031), ('stages', 0.03), ('probabilities', 0.03), ('developmental', 0.03), ('reversible', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 152 nips-2008-Non-stationary dynamic Bayesian networks

Author: Joshua W. Robinson, Alexander J. Hartemink

2 0.11399356 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables

Author: David Danks, Clark Glymour, Robert E. Tillman

Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.

3 0.1109345 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris

Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientiﬁc interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of signiﬁcant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efﬁcacy of our method on a microarray dataset proﬁling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1

4 0.09822648 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel

Author: Benjamin Yackley, Eduardo Corona, Terran Lane

Abstract: Many interesting problems, including Bayesian network structure-search, can be cast in terms of ﬁnding the optimum value of a function over the space of graphs. However, this function is often expensive to compute exactly. We here present a method derived from the study of Reproducing Kernel Hilbert Spaces which takes advantage of the regular structure of the space of all graphs on a ﬁxed number of nodes to obtain approximations to the desired function quickly and with reasonable accuracy. We then test this method on both a small testing set and a real-world Bayesian network; the results suggest that not only is this method reasonably accurate, but that the BDe score itself varies quadratically over the space of all graphs. 1

5 0.095474765 235 nips-2008-The Infinite Hierarchical Factor Regression Model

Author: Piyush Rai, Hal Daume

Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1

6 0.094192035 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

7 0.091173783 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns

8 0.087052092 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing

9 0.078239076 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

10 0.077867381 115 nips-2008-Learning Bounded Treewidth Bayesian Networks

11 0.075710006 194 nips-2008-Regularized Learning with Networks of Features

12 0.072748959 233 nips-2008-The Gaussian Process Density Sampler

13 0.072309546 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

14 0.069853105 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

15 0.069602169 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

16 0.066191278 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

17 0.065403946 214 nips-2008-Sparse Online Learning via Truncated Gradient

18 0.064353488 104 nips-2008-Improved Moves for Truncated Convex Models

19 0.063639343 170 nips-2008-Online Optimization in X-Armed Bandits

20 0.0594621 166 nips-2008-On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.195), (1, 0.046), (2, 0.071), (3, 0.029), (4, 0.072), (5, -0.098), (6, 0.011), (7, 0.075), (8, 0.049), (9, -0.071), (10, -0.114), (11, 0.079), (12, 0.051), (13, -0.081), (14, 0.092), (15, -0.048), (16, 0.005), (17, -0.067), (18, -0.012), (19, -0.077), (20, -0.102), (21, 0.143), (22, -0.081), (23, -0.028), (24, -0.006), (25, 0.03), (26, -0.076), (27, -0.039), (28, 0.032), (29, 0.002), (30, 0.052), (31, 0.024), (32, -0.004), (33, -0.056), (34, 0.065), (35, 0.09), (36, -0.013), (37, -0.082), (38, 0.034), (39, -0.119), (40, -0.091), (41, 0.062), (42, 0.053), (43, -0.035), (44, 0.079), (45, -0.008), (46, -0.045), (47, -0.042), (48, -0.087), (49, -0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9510963 152 nips-2008-Non-stationary dynamic Bayesian networks

Author: Joshua W. Robinson, Alexander J. Hartemink

2 0.6571579 115 nips-2008-Learning Bounded Treewidth Bayesian Networks

Author: Gal Elidan, Stephen Gould

Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufﬁciently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overﬁtting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modiﬁcations and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1

3 0.61688107 18 nips-2008-An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering

Author: Dilan Gorur, Yee W. Teh

Abstract: We propose an efﬁcient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. Our algorithm has a quadratic runtime while those in [1] is cubic. In experiments, we were surprised to ﬁnd that in addition to being more efﬁcient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. 1

4 0.60187495 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing

Author: Benjamin Schrauwen, Lars Buesing, Robert A. Legenstein

Abstract: Randomly connected recurrent neural circuits have proven to be very powerful models for online computations when a trained memoryless readout function is appended. Such Reservoir Computing (RC) systems are commonly used in two ﬂavors: with analog or binary (spiking) neurons in the recurrent circuits. Previous work showed a fundamental difference between these two incarnations of the RC idea. The performance of a RC system built from binary neurons seems to depend strongly on the network connectivity structure. In networks of analog neurons such dependency has not been observed. In this article we investigate this apparent dichotomy in terms of the in-degree of the circuit nodes. Our analyses based amongst others on the Lyapunov exponent reveal that the phase transition between ordered and chaotic network behavior of binary circuits qualitatively differs from the one in analog circuits. This explains the observed decreased computational performance of binary circuits of high node in-degree. Furthermore, a novel mean-ﬁeld predictor for computational performance is introduced and shown to accurately predict the numerically obtained results. 1

5 0.59268367 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

Author: Alexandre Bouchard-côté, Dan Klein, Michael I. Jordan

Abstract: Accurate and efﬁcient inference in evolutionary trees is a central problem in computational biology. While classical treatments have made unrealistic site independence assumptions, ignoring insertions and deletions, realistic approaches require tracking insertions and deletions along the phylogenetic tree—a challenging and unsolved computational problem. We propose a new ancestry resampling procedure for inference in evolutionary trees. We evaluate our method in two problem domains—multiple sequence alignment and reconstruction of ancestral sequences—and show substantial improvement over the current state of the art. 1

6 0.56179601 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response

7 0.54300898 236 nips-2008-The Mondrian Process

8 0.52945185 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

9 0.51869404 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables

10 0.48789346 235 nips-2008-The Infinite Hierarchical Factor Regression Model

11 0.47925222 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel

12 0.4694345 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks

13 0.46382716 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

14 0.4624922 204 nips-2008-Self-organization using synaptic plasticity

15 0.46010631 233 nips-2008-The Gaussian Process Density Sampler

16 0.44066784 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

17 0.43724355 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks

18 0.42930889 134 nips-2008-Mixed Membership Stochastic Blockmodels

19 0.42296711 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets

20 0.42141992 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.267), (6, 0.067), (7, 0.072), (12, 0.025), (28, 0.216), (57, 0.094), (59, 0.035), (63, 0.023), (71, 0.017), (77, 0.05), (78, 0.017), (83, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90022659 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets

Author: Jean-philippe Pellet, AndrÄ‚Ĺ Elisseeff

Abstract: Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditionalindependence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable. Keywords: Graphical Models, Structure Learning, Causal Inference. 1 Introduction: Task Definition & Related Work The statistical definition of causality pioneered by Pearl (2000) and Spirtes et al. (2001) has shed new light on how to detect causation. Central in this approach is the automated detection of causeeffect relationships using observational (i.e., non-experimental) data. This can be a necessary task, as in many situations, performing randomized controlled experiments to unveil causation can be impossible, unethical , or too costly. When the analysis deals with variables that cannot be manipulated, being able to learn from data collected by observing the running system is the only possibility. It turns out that learning the full causal structure of a set of variables is, in its most general form , impossible. If we suppose that the

same-paper 2 0.81357133 152 nips-2008-Non-stationary dynamic Bayesian networks

Author: Joshua W. Robinson, Alexander J. Hartemink

3 0.69769937 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the ﬁrst layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the ﬁrst layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can inﬂuence representations at lower levels as a by-product of inference in a graphical model. 1

4 0.69352347 231 nips-2008-Temporal Dynamics of Cognitive Control

Author: Jeremy Reynolds, Michael C. Mozer

Abstract: Cognitive control refers to the ﬂexible deployment of memory and attention in response to task demands and current goals. Control is often studied experimentally by presenting sequences of stimuli, some demanding a response, and others modulating the stimulus-response mapping. In these tasks, participants must maintain information about the current stimulus-response mapping in working memory. Prominent theories of cognitive control use recurrent neural nets to implement working memory, and optimize memory utilization via reinforcement learning. We present a novel perspective on cognitive control in which working memory representations are intrinsically probabilistic, and control operations that maintain and update working memory are dynamically determined via probabilistic inference. We show that our model provides a parsimonious account of behavioral and neuroimaging data, and suggest that it offers an elegant conceptualization of control in which behavior can be cast as optimal, subject to limitations on learning and the rate of information processing. Moreover, our model provides insight into how task instructions can be directly translated into appropriate behavior and then efﬁciently reﬁned with subsequent task experience. 1

5 0.69351768 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to deﬁne a Gaussian process model of human function learning that combines the strengths of both approaches. 1

6 0.69214392 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

7 0.69197351 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

8 0.69175845 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

9 0.6897822 4 nips-2008-A Scalable Hierarchical Distributed Language Model

10 0.68955415 104 nips-2008-Improved Moves for Truncated Convex Models

11 0.68903106 31 nips-2008-Bayesian Exponential Family PCA

12 0.68835163 200 nips-2008-Robust Kernel Principal Component Analysis

13 0.68812925 235 nips-2008-The Infinite Hierarchical Factor Regression Model

14 0.68780631 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

15 0.68732297 62 nips-2008-Differentiable Sparse Coding

16 0.68717343 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

17 0.68711841 24 nips-2008-An improved estimator of Variance Explained in the presence of noise

18 0.68695819 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

19 0.68690842 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images

20 0.68680954 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference