jmlr jmlr2010 jmlr2010-64 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods
Reference: text
sentIndex sentText sentNum sentScore
1 Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. [sent-10, score-0.39]
2 Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. [sent-11, score-0.34]
3 But during development, these regulatory networks are evolving over time, with certain conditional dependencies between gene products being created as the organism develops, and others being destroyed. [sent-32, score-0.367]
4 While the transition model of an htERGM allows for nearly unlimited generality in characterizing how the network structure changes with time, it is restricted to functions of temporally adjacent network structures. [sent-56, score-0.611]
5 Additionally, the total number of segments or epochs in the piecewise stationary process is assumed known a priori, thereby limiting application of the method in situations where the number of epochs is not known. [sent-68, score-0.594]
6 , the set of conditional dependencies) of a Bayesian network is typically expressed using Bayes’ rule, where the posterior probability of a given network structure G (i. [sent-131, score-0.403]
7 The prior over networks P(G) can be used to incorporate prior knowledge about the existence of specific edges (Bernard and Hartemink, 2005) or the overall topology of the network (e. [sent-135, score-0.398]
8 While multiple moves may result in a transition from state x to state x′ (and vice versa), typically there is only a single move for each transition. [sent-187, score-0.466]
9 In such a case, the sums over M and M ′ each only include one term, and the proposal ratio can be split into two terms: one is the ratio of the proposal probabilities for move types and the other is the ratio of selecting a particular state given the current state and the move type. [sent-188, score-0.375]
10 3653 ROBINSON AND H ARTEMINK Figure 1: An example nsDBN with labeled components (namely, transition times t1 and t2 and edge change sets ∆g1 and ∆g2 ). [sent-203, score-0.482]
11 Note that the network at each epoch is actually a DBN drawn in compact form, where each edge represents a statistical dependence between a node at time t and its parent at the previous time t − 1. [sent-205, score-0.398]
12 We define the transition time ti to be the time at which Gi is replaced by Gi+1 in the data-generation process. [sent-220, score-0.326]
13 We call the period of time between consecutive transition times—during which a single network of conditional dependencies is operative—an epoch. [sent-221, score-0.482]
14 Consider the simplest case where m = 2 and the transition time t1 at which the structure of the nsDBN evolves from network G1 to network G2 is known. [sent-227, score-0.519]
15 If prior knowledge about the problem dictates that the networks will not vary dramatically across adjacent epochs, information about the networks learned in adjacent epochs can be leveraged to increase the accuracy of the network learned in the current epoch. [sent-234, score-0.666]
16 Expanding the simple formulation to multiple epochs, assume there exist m different epochs with transition times T = {t1 , . [sent-235, score-0.65]
17 The network Gi+1 prevailing in epoch i + 1 differs from network Gi prevailing in epoch i by a set of edge changes we call ∆gi . [sent-239, score-0.655]
18 We assume the prior over networks can be further split into components describing the initial network and subsequent edge changes, shown below: P(G1 , ∆g1 , . [sent-264, score-0.389]
19 To encode this prior knowledge, we place a truncated geometric prior with parameter p = 1 − e−λs on the number of changes (si ) in each edge change set (∆gi ), with the distribution truncated for si > smax . [sent-285, score-0.487]
20 Therefore, a (truncated) geometric prior on each si is essentially equivalent to a (truncated) geometric prior on the sufficient statistic s, the total number of edge changes. [sent-300, score-0.344]
21 When the transition times are a priori unknown, we would like to estimate them. [sent-301, score-0.388]
22 We assume that the joint prior over the evolutionary behavior of the network and the locations of transition times can be decomposed into two independent components. [sent-324, score-0.504]
23 Any choice for P(T ) can be made, but for the purposes of this paper, we have no prior knowledge about the transition times; therefore, we assume a uniform prior on T so that all settings of transition times are equally likely for a given value of m. [sent-326, score-0.76]
24 Finally, when neither the transition times nor the number of epochs are known a priori, both the number and times of transitions must be estimated a posteriori. [sent-327, score-0.774]
25 , a transition does not occur at every observation), we can include this knowledge by placing a non-uniform prior on m, for example a (truncated) geometric prior with parameter p = 1 − eλm on the number of epochs m, with the distribution truncated for m > N. [sent-330, score-0.704]
26 A geometric prior on the number of epochs is equivalent to a geometric prior on epoch lengths (see Appendix A). [sent-331, score-0.527]
27 , few epochs exist) and large values of λs encode the strong prior belief that the structure changes smoothly (i. [sent-335, score-0.452]
28 Fortunately, the likelihood component of the posterior does not change whether we know the transition times a priori or not. [sent-338, score-0.517]
29 Therefore, any uncertainty about the transition times can be incorporated into the evaluation of the prior, leaving the evaluation of the likelihood unchanged. [sent-339, score-0.353]
30 An epoch is defined between adjacent transition times while an interval is defined as the union of consecutive epochs during which a particular parent set is operative (which may include all epochs). [sent-348, score-0.869]
31 In particular, when the transition times are not known a priori, the posterior is highly multimodal because structures with slightly different transition times likely have similar posterior probabilities. [sent-374, score-0.964]
32 To provide quick convergence, we want to ensure that every move in the move set efficiently jumps between posterior modes. [sent-377, score-0.381]
33 The different settings will be abbreviated according to the type of uncertainty: whether the number of transitions is known (KN) or unknown (UN) and whether the transition times themselves are known (KT) or unknown (UT). [sent-381, score-0.413]
34 1 Known Number and Known Times of Transitions (KNKT) In the KNKT setting, we know all of the transition times a priori; therefore, we only need to identify the most likely initial network G1 and sets of edge changes ∆g1 , . [sent-384, score-0.624]
35 In the KNKT setting, the transition times t j are known a priori, but they must be estimated in the two other settings. [sent-399, score-0.353]
36 In the KNUT setting, the number of epochs m is still known, but the transition times themselves are not. [sent-400, score-0.65]
37 Finally, in the UNUT setting, even the number of epochs is unknown; instead a truncated geometric prior is placed on m. [sent-401, score-0.356]
38 Structures with the same edge sets but slightly different transition times will probably have similar likelihoods. [sent-409, score-0.482]
39 Therefore, we can add a new move that proposes a local shift to one of the transition times: let d be some small positive integer and let the new time ti′ be drawn from a discrete uniform distribution ti′ ∼ DU(ti − d,ti + d) with the constraint that ti−1 < ti′ < ti+1 . [sent-410, score-0.415]
40 Initially, we set the m − 1 transition times so that the epochs are roughly equal in length. [sent-411, score-0.65]
41 This placement allows the transition times ample room to locally shift without “bumping” into each other too early in the sampling 3659 ROBINSON AND H ARTEMINK procedure. [sent-412, score-0.39]
42 As with the last setting, the number of epochs does not change; therefore, only the prior on the number of edge changes s is used. [sent-414, score-0.535]
43 3 Unknown Number and Unknown Times of Transitions (UNUT) In the most general UNUT setting, both the transition times T and the number of transitions are unknown and must be estimated. [sent-416, score-0.413]
44 Since both the number of edge changes s and the number of epochs m are allowed to vary, we need to incorporate both priors mentioned in Section 3. [sent-421, score-0.512]
45 To allow the number of epochs m to change during sampling, we introduce merge and split operations to the move set. [sent-423, score-0.423]
46 The transition time of the new edge set is selected to be the mean of the previous locations weighted by the size of each edge set: ti′ = (siti + si+1ti+1 )/(si + si+1 ). [sent-425, score-0.547]
47 For the split operation, an edge set ∆gi is randomly chosen and randomly partitioned into two new edge sets ∆g′ and ∆g′ with all subsequent edge sets re-indexed appropriately. [sent-426, score-0.387]
48 The move set is completed with the inclusion of the add transition time and delete transition time operations. [sent-428, score-0.737]
49 For example, if either move M1 or move M2 is randomly selected, the sampling procedure is as follows: random variables xi and x j are selected, if the edge xi → x j exists in G1 , it is deleted, otherwise it is added (subject to the maximum of pmax parents constraint). [sent-433, score-0.468]
50 E1 is the total number of edges in G1 , pmax is the maximum parent set size, smax is the maximum number of edge changes allowed in a single transition time, and si is the number of edge changes in the set ∆gi . [sent-445, score-1.031]
51 The epochs in which the various networks are active are shown in the horizontal bars, roughly to scale. [sent-462, score-0.406]
52 The horizontal bars represent the segmentation of the 1020 observations, with the transition times labeled below. [sent-463, score-0.353]
53 The learned networks and most likely transition times are highly accurate (only missing two edges in G1 and all predicted transition times close to the truth). [sent-466, score-0.947]
54 To obtain a consensus (model averaged) structure prediction, an edge is considered present at a particular time if the posterior probability of the edge is greater than 0. [sent-473, score-0.433]
55 2 KNUT S ETTING In this setting, transition times are unknown and must be estimated a posteriori. [sent-481, score-0.353]
56 3662 L EARNING N ON -S TATIONARY DYNAMIC BAYESIAN N ETWORKS KNUT Setting Figure 4: Posterior probability of transition times when learning an nsDBN in the KNUT setting. [sent-484, score-0.353]
57 The blue triangles on the baseline represent the true transition times and the red dots represent one standard deviation from the mean probability, which is drawn as a black line. [sent-485, score-0.353]
58 The highly probable transition times correspond closely with the true transition times. [sent-487, score-0.642]
59 The estimated structure and transition times are very close to the truth. [sent-491, score-0.399]
60 All edges are correct, with the exception of two missing edges in G1 , and the predicted transition times are all within 10 of the true transition times. [sent-492, score-0.853]
61 We can also examine the posterior probabilities of transition times over all sampled structures. [sent-493, score-0.519]
62 The blue triangles on the baseline represent the true transition times, and spikes represent transition times that frequently occurred in the sampled structures. [sent-495, score-0.642]
63 While the highest probability regions do occur near the true transition times, some uncertainty exists about the exact locations of t3 and t4 since the fourth epoch is exceedingly short. [sent-496, score-0.401]
64 To obtain more accurate measures of the posterior quantities of interest (such as the locations of transition times), we generate samples from multiple chains; we use 25 in the KNUT setting. [sent-497, score-0.418]
65 Combining the samples from several chains allows us to estimate both the probability of a transition occurring at a certain time and the variance of that estimate. [sent-498, score-0.327]
66 This unexpected result implies that the posterior over transition times is rather smooth; therefore, the mixing rate is not greatly affected when sampling transition times. [sent-501, score-0.808]
67 Figure 5 shows the posterior probabilities of transition times for various settings of λs and λm . [sent-509, score-0.519]
68 Essentially, when λm is large, only the few transition times that best characterize the non-stationary behavior of the data will be identified. [sent-511, score-0.353]
69 On the other hand, when λm is very small, noises within the data begin to be identified as transition times, leading to poor estimates of transition times. [sent-512, score-0.578]
70 A smaller λm results in more predicted epochs and less confidence about the most probable value of m. [sent-515, score-0.35]
71 2 Larger Simulated Data Set To evaluate the scalability of our technique in the most difficult UNUT setting, we also simulate data from a 100 variable network with an average of 50 edges over five epochs spanning 4800 observations, with one to three edges changing between each epoch. [sent-524, score-0.547]
72 The posterior probabilities of transition times and the number of epochs (corresponding to Figures 5 and 6) for one of the simulated data sets are shown in Figure 8. [sent-527, score-0.875]
73 The number of epochs with the highest posterior probability is five for all choices of priors, which is exactly the true number of epochs for this data set. [sent-529, score-0.723]
74 3664 L EARNING N ON -S TATIONARY DYNAMIC BAYESIAN N ETWORKS UNUT Setting Figure 5: The posterior probabilities of transition times from the sampled structures in the UNUT setting for various values of λs and λm . [sent-531, score-0.519]
75 As in Figure 4, the blue triangles on the baseline represent the true transition times and the red dots represent one standard deviation from the mean probability obtained from several runs, which is drawn as a black line. [sent-532, score-0.353]
76 3665 ROBINSON AND H ARTEMINK UNUT Setting Figure 6: The posterior probabilities of the number of epochs for various values of λs and λm . [sent-536, score-0.463]
77 The posterior estimates on the number of epochs m are closest to the true value of 7 when λm is 1 or 2. [sent-539, score-0.426]
78 3667 ROBINSON AND H ARTEMINK UNUT Setting Figure 8: The posterior probabilities of transition times and number of epochs from one of the larger (100 variables, 5 epochs, and 4800 observations) simulated data sets under the UNUT setting for various values of λs and λm . [sent-576, score-0.875]
79 Top: The blue triangles on the baseline represent the true transition times and the red dots represent one standard deviation from the mean probability obtained from several runs, which is drawn as a black line. [sent-578, score-0.353]
80 The nsDBN in Figure 10C was learned using the KNKT setting with transition times defined at the borders between the embryonic, larval, pupal, and adult stages. [sent-604, score-0.398]
81 If we transition to the UNUT setting, we can also examine the posterior probabilities of transition times and epochs. [sent-628, score-0.808]
82 We generate data from an nsDBN with 66 observations and transition times at 30, 40, and 58 to mirror the number of observations in embryonic, larval, pupal, and adult stages of the experimental fly data. [sent-634, score-0.398]
83 0 0 65 Time 1 2 3 4 5 6 7 8 9 10 Number of epochs Figure 11: Learning nsDBN structure in the UNUT setting using the Drosophila muscle development data. [sent-648, score-0.484]
84 A: Posterior probabilities of transition times using λm = λs = 2. [sent-649, score-0.39]
85 3671 ROBINSON AND H ARTEMINK Figure 12: An nsDBN was learned on simulated data that mimicked the number of nodes, connectivity, and transition behavior of the experimental fly data. [sent-655, score-0.348]
86 As discussed earlier, to obtain posterior estimates of quantities of interest, such as the number of epochs or transition times, we generate many samples from several chains; averaging over chains provides a more efficient exploration of the sample space. [sent-660, score-0.753]
87 2 0 0 0 1 2 3 4 5 6 0 0 1 Time (s) 2 3 Time (s) 4 5 6 Number of epochs Figure 13: Posterior results of learning nsDBNs under the UNUT setting for two birds presented with two different stimuli (white noise and song). [sent-712, score-0.389]
88 The estimated number of epochs is three or four, with strong support for the value four when the bird is presented with a song. [sent-716, score-0.343]
89 The posterior transition time probabilities and the posterior number of epochs for two birds presented with two different stimuli under the UNUT setting (λs = λm = 2) can be seen in Figure 13. [sent-717, score-0.973]
90 When listening to a song, an additional transition is predicted 300–400 ms after the onset of a song but not after the onset of white noise. [sent-720, score-0.455]
91 For example, recording just the transition times and number of epochs adds only a few seconds to the runtime on the larger simulated data set. [sent-746, score-0.745]
92 We have demonstrated the feasibility of learning an nsDBN in all three settings using simulated data sets of various numbers of transition times, observations, variables, epochs, and connection densities. [sent-755, score-0.348]
93 The predicted transition times and number of epochs also correspond to the known times of large developmental changes. [sent-758, score-0.767]
94 Although each connection on the predicted Drosophila muscle development network is difficult to verify, simulated experiments of a similar scale demonstrate highly accurate predictions, even with moderately noisy data and one replicate. [sent-759, score-0.345]
95 While we decided to place a (truncated) geometric prior on the number of epochs m, other priors may be considered. [sent-789, score-0.392]
96 Here, we prove that a geometrically distributed prior on epoch lengths is equivalent to a geometrically distributed prior on the number of epochs. [sent-794, score-0.328]
97 Letting li be the length of epoch i and N be the total number of observations, we can write a geometrically distributed prior on epoch lengths as: m m ∏(1 − p)li −1 p = pm ∏(1 − p)li −1 i=1 i=1 = p (1 − p)∑i=1 li −1 m m = pm (1 − p)N−m m p = (1 − p)N . [sent-795, score-0.4]
98 1− p Since N is the same for every nsDBN, we see that the geometrically distributed prior is simply a function of the number of epochs m. [sent-796, score-0.405]
99 Therefore, a geometrically distributed prior on epoch lengths with success probability p = 1−q < 1/2 is equivalent to a geometrically distributed 2−q prior on the number of epochs with success probability q. [sent-799, score-0.625]
100 Informative structure priors: Joint learning of dynamic regulatory networks from multiple types of data. [sent-823, score-0.33]
wordName wordTfidf (topN-words)
[('nsdbn', 0.307), ('epochs', 0.297), ('transition', 0.289), ('unut', 0.258), ('gm', 0.171), ('artemink', 0.169), ('drosophila', 0.169), ('knut', 0.159), ('tationary', 0.159), ('knkt', 0.149), ('robinson', 0.134), ('posterior', 0.129), ('edge', 0.129), ('move', 0.126), ('bde', 0.119), ('epoch', 0.112), ('networks', 0.109), ('muscle', 0.106), ('ih', 0.105), ('bayesian', 0.104), ('etworks', 0.099), ('dynamic', 0.098), ('gi', 0.098), ('si', 0.097), ('smax', 0.093), ('network', 0.092), ('nsdbns', 0.089), ('guo', 0.085), ('edges', 0.079), ('npmax', 0.079), ('pupal', 0.079), ('regulatory', 0.077), ('jk', 0.075), ('dbns', 0.074), ('parent', 0.065), ('times', 0.064), ('transitions', 0.06), ('larval', 0.059), ('pae', 0.059), ('simulated', 0.059), ('ow', 0.059), ('prior', 0.059), ('dependencies', 0.057), ('dbn', 0.056), ('earning', 0.054), ('predicted', 0.053), ('additionally', 0.052), ('moves', 0.051), ('pde', 0.051), ('changes', 0.05), ('embryonic', 0.05), ('pmax', 0.05), ('stimuli', 0.05), ('sampler', 0.05), ('geometrically', 0.049), ('metric', 0.048), ('structure', 0.046), ('bird', 0.046), ('evolving', 0.045), ('adult', 0.045), ('mcmc', 0.044), ('ni', 0.044), ('conditional', 0.044), ('proposal', 0.043), ('birds', 0.042), ('operative', 0.042), ('replicates', 0.042), ('traf', 0.042), ('sep', 0.042), ('temporally', 0.042), ('genes', 0.042), ('giudici', 0.04), ('songbird', 0.04), ('talih', 0.04), ('tergm', 0.04), ('thrown', 0.04), ('undirected', 0.039), ('directed', 0.039), ('zhao', 0.039), ('hartemink', 0.038), ('onset', 0.038), ('chains', 0.038), ('song', 0.037), ('ti', 0.037), ('sampling', 0.037), ('probabilities', 0.037), ('smith', 0.036), ('priors', 0.036), ('seconds', 0.036), ('priori', 0.035), ('gene', 0.035), ('development', 0.035), ('pd', 0.034), ('pag', 0.034), ('prevailing', 0.034), ('pdg', 0.034), ('pm', 0.034), ('delete', 0.033), ('jan', 0.032), ('decomposable', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods
2 0.14844665 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
Author: Yu Fan, Jing Xu, Christian R. Shelton
Abstract: A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a finite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we first present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle filtering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people’s life history events. We show the accuracy as well as the time efficiency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. Keywords: continuous time Bayesian networks, importance sampling, approximate inference, filtering, smoothing
3 0.13985273 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman
Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean field approximation
4 0.073473595 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
Author: James Henderson, Ivan Titov
Abstract: We propose a class of Bayesian networks appropriate for structured prediction problems where the Bayesian network’s model structure is a function of the predicted output structure. These incremental sigmoid belief networks (ISBNs) make decoding possible because inference with partial output structures does not require summing over the unboundedly many compatible model structures, due to their directed edges and incrementally specified model structure. ISBNs are specifically targeted at challenging structured prediction problems such as natural language parsing, where learning the domain’s complex statistical dependencies benefits from large numbers of latent variables. While exact inference in ISBNs with large numbers of latent variables is not tractable, we propose two efficient approximations. First, we demonstrate that a previous neural network parsing model can be viewed as a coarse mean-field approximation to inference with ISBNs. We then derive a more accurate but still tractable variational approximation, which proves effective in artificial experiments. We compare the effectiveness of these models on a benchmark natural language parsing task, where they achieve accuracy competitive with the state-of-the-art. The model which is a closer approximation to an ISBN has better parsing accuracy, suggesting that ISBNs are an appropriate abstract model of natural language grammar learning. Keywords: Bayesian networks, dynamic Bayesian networks, grammar learning, natural language parsing, neural networks
5 0.067150004 10 jmlr-2010-An Exponential Model for Infinite Rankings
Author: Marina Meilă, Le Bao
Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound
6 0.065968849 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
7 0.065654188 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide
8 0.064306214 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
9 0.056869883 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine
10 0.056687884 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
11 0.055994809 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models
12 0.054588456 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning
13 0.053581756 63 jmlr-2010-Learning Instance-Specific Predictive Models
14 0.050510924 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars
15 0.050305616 44 jmlr-2010-Graph Kernels
16 0.050270882 69 jmlr-2010-Lp-Nested Symmetric Distributions
17 0.045874886 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?
19 0.041617408 23 jmlr-2010-Classification with Incomplete Data Using Dirichlet Process Priors
20 0.040878482 71 jmlr-2010-Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases
topicId topicWeight
[(0, -0.218), (1, 0.098), (2, -0.146), (3, -0.208), (4, 0.025), (5, 0.069), (6, 0.002), (7, -0.008), (8, 0.012), (9, -0.034), (10, -0.004), (11, 0.095), (12, 0.017), (13, -0.098), (14, -0.037), (15, -0.055), (16, 0.003), (17, 0.097), (18, 0.018), (19, -0.006), (20, 0.026), (21, -0.066), (22, -0.047), (23, 0.002), (24, -0.135), (25, -0.129), (26, 0.008), (27, -0.025), (28, 0.025), (29, -0.084), (30, 0.01), (31, 0.02), (32, -0.054), (33, -0.161), (34, 0.01), (35, 0.005), (36, 0.118), (37, 0.007), (38, 0.001), (39, 0.01), (40, 0.155), (41, 0.03), (42, -0.035), (43, -0.082), (44, -0.088), (45, -0.086), (46, 0.117), (47, -0.021), (48, 0.196), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.94339472 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods
2 0.55519682 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman
Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean field approximation
3 0.54087448 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
Author: Yu Fan, Jing Xu, Christian R. Shelton
Abstract: A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a finite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we first present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle filtering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people’s life history events. We show the accuracy as well as the time efficiency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. Keywords: continuous time Bayesian networks, importance sampling, approximate inference, filtering, smoothing
4 0.42296669 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
Author: James Henderson, Ivan Titov
Abstract: We propose a class of Bayesian networks appropriate for structured prediction problems where the Bayesian network’s model structure is a function of the predicted output structure. These incremental sigmoid belief networks (ISBNs) make decoding possible because inference with partial output structures does not require summing over the unboundedly many compatible model structures, due to their directed edges and incrementally specified model structure. ISBNs are specifically targeted at challenging structured prediction problems such as natural language parsing, where learning the domain’s complex statistical dependencies benefits from large numbers of latent variables. While exact inference in ISBNs with large numbers of latent variables is not tractable, we propose two efficient approximations. First, we demonstrate that a previous neural network parsing model can be viewed as a coarse mean-field approximation to inference with ISBNs. We then derive a more accurate but still tractable variational approximation, which proves effective in artificial experiments. We compare the effectiveness of these models on a benchmark natural language parsing task, where they achieve accuracy competitive with the state-of-the-art. The model which is a closer approximation to an ISBN has better parsing accuracy, suggesting that ISBNs are an appropriate abstract model of natural language grammar learning. Keywords: Bayesian networks, dynamic Bayesian networks, grammar learning, natural language parsing, neural networks
5 0.41136664 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
Author: Ryo Yoshida, Mike West
Abstract: We describe a class of sparse latent factor models, called graphical factor models (GFMs), and relevant sparse learning algorithms for posterior mode estimation. Linear, Gaussian GFMs have sparse, orthogonal factor loadings matrices, that, in addition to sparsity of the implied covariance matrices, also induce conditional independence structures via zeros in the implied precision matrices. We describe the models and their use for robust estimation of sparse latent factor structure and data/signal reconstruction. We develop computational algorithms for model exploration and posterior mode search, addressing the hard combinatorial optimization involved in the search over a huge space of potential sparse configurations. A mean-field variational technique coupled with annealing is developed to successively generate “artificial” posterior distributions that, at the limiting temperature in the annealing schedule, define required posterior modes in the GFM parameter space. Several detailed empirical studies and comparisons to related approaches are discussed, including analyses of handwritten digit image and cancer gene expression data. Keywords: annealing, graphical factor models, variational mean-field method, MAP estimation, sparse factor analysis, gene expression profiling
6 0.40906152 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine
7 0.40301594 63 jmlr-2010-Learning Instance-Specific Predictive Models
8 0.40063995 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
9 0.39106759 10 jmlr-2010-An Exponential Model for Infinite Rankings
10 0.37653783 14 jmlr-2010-Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes
11 0.36551398 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
12 0.35404289 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods
13 0.34069544 8 jmlr-2010-A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design
14 0.33365563 34 jmlr-2010-Erratum: SGDQN is Less Careful than Expected
15 0.32959855 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
16 0.32344005 71 jmlr-2010-Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases
17 0.31393373 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide
18 0.31143358 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
19 0.28705585 62 jmlr-2010-Learning Gradients: Predictive Models that Infer Geometry and Statistical Dependence
20 0.28289446 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars
topicId topicWeight
[(3, 0.012), (4, 0.013), (8, 0.013), (21, 0.014), (32, 0.03), (36, 0.614), (37, 0.04), (75, 0.116), (85, 0.036), (96, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.87814683 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods
2 0.82895833 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman
Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean field approximation
3 0.48758402 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
Author: Yu Fan, Jing Xu, Christian R. Shelton
Abstract: A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a finite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we first present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle filtering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people’s life history events. We show the accuracy as well as the time efficiency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. Keywords: continuous time Bayesian networks, importance sampling, approximate inference, filtering, smoothing
4 0.42478934 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
5 0.41987312 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning
Author: Evangelos Theodorou, Jonas Buchli, Stefan Schaal
Abstract: With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-JacobiBellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2 ) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs. Keywords: stochastic optimal control, reinforcement learning, parameterized policies
6 0.41887796 37 jmlr-2010-Evolving Static Representations for Task Transfer
7 0.41243827 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
9 0.38671643 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning
10 0.38459513 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
11 0.38137895 2 jmlr-2010-A Convergent Online Single Time Scale Actor Critic Algorithm
12 0.3792147 69 jmlr-2010-Lp-Nested Symmetric Distributions
13 0.37529382 63 jmlr-2010-Learning Instance-Specific Predictive Models
14 0.36191431 106 jmlr-2010-Stability Bounds for Stationary φ-mixing and β-mixing Processes
15 0.35895938 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
16 0.35187241 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
18 0.34181195 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models
19 0.34019777 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars
20 0.33563855 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation