nips nips2002 nips2002-7 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eric P. Xing, Michael I. Jordan, Richard M. Karp, Stuart Russell
Abstract: We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over previous models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
Reference: text
sentIndex sentText sentNum sentScore
1 edu ¡ Abstract We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. [sent-6, score-1.398]
2 Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. [sent-7, score-0.187]
3 Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. [sent-8, score-0.367]
4 Our model improves over previous models that ignore biological priors and positional dependence. [sent-10, score-0.094]
5 It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns. [sent-11, score-0.727]
6 1 Introduction The identification of motif structures in biopolymer sequences such as proteins and DNA is an important task in computational biology and is essential in advancing our knowledge about biological systems. [sent-12, score-0.988]
7 For example, the gene regulatory motifs in DNA provide key clues about the regulatory network underlying the complex control and coordination of gene expression in response to physiological or environmental changes in living cells [11]. [sent-13, score-0.548]
8 Most motif models assume independence of position-specific multinomial distributions of monomers such as nucleotides (nt) and animo acids (aa). [sent-15, score-1.004]
9 Such strategies contradict our intuition that the sites in motifs naturally possess spatial dependencies for functional reasons. [sent-16, score-0.356]
10 Furthermore, the vague Dirichlet prior used in some of these models acts as no more than a smoother, taking little consideration of the rich prior knowledge in biologically identified motifs. [sent-17, score-0.094]
11 The distribution of the monomers is a continuous mixture of position-specific multinomials which admit a Dirichlet prior according to the hidden Markov states, introducing both multi-modal prior information and dependencies. [sent-20, score-0.153]
12 We also propose a framework for decomposing the general motif model into a local alignment model for motif pattern and a global model for motif instance distribution, which allows complex models to be developed in a modular way. [sent-21, score-2.718]
13 To simplify our discussion, we use DNA motif modeling as a running example in this paper, though it should be clear that the model is applicable to other sequence modeling problems. [sent-22, score-0.929]
14 2 Preliminaries DNA motifs are short (about 6-30 bp) stochastic string patterns (Figure 1) in the regulatory sequences of genes that facilitate control functions by interacting with specific transcriptional regulatory proteins. [sent-23, score-0.599]
15 Each motif typically appears once or multiple times in the control regions of a small set of genes. [sent-24, score-0.841]
16 The goal of motif detection is to identify instances of possible motifs hidden in sequences and learn a model for each motif for future prediction. [sent-27, score-2.168]
17 A regulatory DNA sequence can be fully specified by a character string A,T,C,G , and an indicator string that signals the locations of the motif occurrences. [sent-28, score-1.035]
18 The reason to call a motif a stochastic string pattern rather than a word is due to the variability in the “spellings” of different instances of the same motif in the genome. [sent-29, score-1.748]
19 Conventionally, biologists display a motif pattern (of length ) by a multi-alignment of all its instances. [sent-30, score-0.862]
20 The stochasticity of motif patterns is reflected in the heterogeneity of nucleotide species appearing in each column (corresponding to a position or site in the motif) of the multi-alignment. [sent-31, score-0.988]
21 We denote the multi-alignment of all instances of a motif specified by the indicator string in sequence by . [sent-32, score-0.962]
22 Since any can be characterized (or ), by the nucleotide counts for each column, we define a counting matrix where each column is an integer vector with four elements, giving the number of occurrences of each nucleotide at position of the motif. [sent-33, score-0.186]
23 ) With these settings, one can model the nt-distribution of a position of the motif by a position-specific multinomial distribution, . [sent-35, score-0.981]
24 Formally, the problem of inferring and (often called a position-weight matrix, or PWM), given a sequence set , is motif detection in a nutshell 1 . [sent-36, score-0.916]
25 The axis indexes position and the axis represents the information content of the multinomial distribution of nt at position . [sent-45, score-0.194]
26 yt’ xt xt’ y m,l y m,l’ M M Figure 2: (Left) A general motif model is a Bayes-ian multinet. [sent-47, score-0.861]
27 (Right) The HMDM model for motif instances specified by a given . [sent-49, score-0.903]
28 x e d 2 20 θl’ 0 0 mcb (16) 2 0 ql’ θl Θ c t s d w t q i g e 7vuHsrphf d 1 Multiple motif detection can be formulated in a similar way, but for simplicity, we omit this elaboration. [sent-51, score-0.931]
29 y e d 3 Generative models for regulatory DNA sequences 3. [sent-54, score-0.129]
30 1 General setting and related work Without loss of generality, assume that the occurrences of motifs in a DNA sequence, as indicated by , are governed by a global distribution ; for each type of motif, the nucleotide sequence pattern shared by all its instances admits a local alignment model . [sent-55, score-0.612]
31 (Usually, the background non-motif sequences are modeled by a simple conditional model, , where the background nt-distribution are assumed to be learned a priori from the entire sequence and supplied parameters as constants in the motif detection process. [sent-56, score-0.988]
32 Thus, the likelihood of a regulatory sequence is: £ ¦¨ ¤X ¢¥! [sent-58, score-0.129]
33 Note that here is not necessarily equivalent to the position-specific multinomial parameters in Eq. [sent-74, score-0.087]
34 2 below, but is a generic symbol for the parameters of a general model of aligned motif instances. [sent-75, score-0.861]
35 The model captures properties such as the frequencies of different motifs and the dependencies between motif occurrences. [sent-76, score-1.217]
36 Although specifying this model is an important aspect of motif detection and remains largely unexplored, we defer this issue to future work. [sent-77, score-0.898]
37 In the current paper, our focus is on capturing the intrinsic properties within motifs that can help to improve sensitivity and specificity to genuine motif patterns. [sent-78, score-1.207]
38 For this the key lies in the local alignment model , which determines the PWM of the motif. [sent-79, score-0.1]
39 Depending on the value of the latent indicator (a motif or not at position ), admits different probabilistic models, such as a motif alignment model or a background model. [sent-80, score-1.872]
40 Thus sequence is characterized by a Bayesian multinet [6], a mixture model in which each component of the mixture is a specific nt-distribution model corresponding to sequences of a particular nature. [sent-81, score-0.169]
41 Our goal in this paper is to develop an expressive local alignment model capable of capturing characteristic site-dependencies in motifs. [sent-82, score-0.137]
42 £ 54 3 ¥¨ ¨ ¡ ¨ £ # (2) G "DA BE FEC 3 B ¦ ¦09 A 0@93 ¡ ¡ £ h 1¥# 98 Y8 X Although a popular model for many motif finders, PM nevertheless is sensitive to noise and random or trivial recurrent patterns, and is unable to capture potential site-dependencies inside the motifs. [sent-89, score-0.861]
43 , split a ’two-block’ motif into two coupled sub-motifs [9, 1]) have been developed to handle special patterns such as the U-shaped motifs, but they are inflexible and difficult to generalize. [sent-94, score-0.876]
44 We depart from the PM model and introduce a dynamic hierarchical Bayesian model for motif alignment , which captures site dependencies inside the motif so that we can predict biologically more plausible motifs, and incorporate prior knowledge of nucleotide frequencies of general motif sites. [sent-96, score-2.8]
45 In order to keep the local alignment model our main focus as well as simplifying the presentation, we adopt an idealized global motif distribution model called “one-per-sequence” [8], which, as the name suggests, assumes each sequence harbors one motif instance (at an unknown location). [sent-97, score-1.911]
46 2 Hidden Markov Dirichlet-Multinomial (HMDM) Model In the HMDM model, we assume that there are underlying latent nt-distribution prototypes, according to which position-specific multinomial distributions of nt are determined, and that each prototype is represented by a Dirichlet distribution. [sent-100, score-0.168]
47 Furthermore, the choice of prototype at each position in the motif is governed by a first-order Markov process. [sent-101, score-0.903]
48 More precisely, a multi-alignment containing motif instances is generated by the following process. [sent-102, score-0.883]
49 First we sample a sequence of prototype indicators from a first-order Markov process with initial distribution and transition matrix . [sent-103, score-0.109]
50 (2) A multinomial distribution is sampled according to , the probability defined by Dirichlet component over all such distributions. [sent-106, score-0.103]
51 characterized by counting matrix is: The complete likelihood of motif alignment §P¦¥£ r¥ Y¥¨ ¨¦ ¡ © Y£ ¡ §¤¢# $ ¨ 3B2 E 9 §¨¨ ¦ ¦£ ¡ 2 " '¨ ¨ C ¡ ! [sent-111, score-0.954]
52 In such a model the transition would be between the emission models (i. [sent-114, score-0.09]
53 In HMDM, the transitions are between different priors of the emission models, and the direct output of the HMM is the parameter vector of a generative model, which will be sampled multiple times at each position to generate random instances. [sent-117, score-0.096]
54 For example, for the case of motifs, biological evidence show that conserved positions (manifested by a low-entropy multinomial nt-distribution) are likely to concatenate, and maybe so do the less conserved positions. [sent-119, score-0.246]
55 However, it is unlikely that conserved and less conserved positions are interpolated [4]. [sent-120, score-0.131]
56 1 Variational Bayesian Learning In order to do Bayesian estimation of the motif parameter , and to predict the locations of motif instances via , we need to be able to compute the posterior distribution , which is infeasible in a complex motif model. [sent-123, score-2.607]
57 We seek to approximate the joint posterior over parameters and hidden states with a simpler distribution , where and can be, for the time being, thought of as free distributions to be optimized. [sent-125, score-0.084]
58 £ T T ¡¥ KL ¥ H ¡ £ 3PI Thus, maximizing the lower bound of the log likelihood (call it ) with respect to free distributions and is equivalent to minimizing the KL divergence between the true joint posterior and its variational approximation. [sent-142, score-0.086]
59 £ D £ In our motif model, the prior and the conditional submodels form a conjugate-exponential pair (Dirichlet-Multinomial). [sent-147, score-0.883]
60 It can be shown that in this case we can essentially recover the same form of the original conditional and prior distributions in their variational approximations except that the parameterization is augmented with appropriate Bayesian and posterior updates, respectively: (7) (8) ! [sent-148, score-0.107]
61 7 and 8 make clear, the locality of inference and marginalization on the latent variables is preserved in the variational approximation, which means probabilistic calculations can be performed in the prior and the conditional models separately and iteratively. [sent-158, score-0.132]
62 For motif modeling, this modular property means that the motif alignment model and motif distribution model can be treated separately with a simple interface of the posterior mean for the motif parameters and expected sufficient statistics for the motif instances. [sent-159, score-4.388]
63 for multinomial We next compute the expectation of the natural parameters (which is parameters). [sent-168, score-0.087]
64 7, given the posterior means of the multinomial parameters, computing the expected counting matrix under the the one-per-sequence global model for sequence is straightforward based on Eq. [sent-179, score-0.225]
65 G Variational M step: Compute the expected natural parameter ence in the local motif alignment model given . [sent-183, score-0.941]
66 For example, the motif distribution model can be made more sophisticated so as to model complex properties of multiple motifs such as motif-level dependencies (e. [sent-185, score-1.253]
67 , co-occurrence, overlaps and concentration within regulatory modules) without complicating the inference in the local alignment model. [sent-187, score-0.207]
68 Similarly, the motif alignment model can also be more expressive (e. [sent-188, score-0.947]
69 , a mixture of HMDMs) without interfering with inference in the motif distribution model. [sent-190, score-0.913]
70 5 Experiments We test the HMDM model on a motif collection from The Promoter Database of Saccharomyces cerevisiae (SCPD). [sent-192, score-0.861]
71 The posterior distribution of the position-specific multinomial parameters , reflected in the parameters of the Dirichlet mixtures learned from data, can reveal the ntdistribution patterns of the motifs. [sent-195, score-0.164]
72 (c) Boxplots of hit and mishit rate of HMDM(1) and PM(2) on two motifs used during HMDM training. [sent-221, score-0.392]
73 Are the motif properties captured in HMDM useful in motif detection? [sent-222, score-1.682]
74 We first examine an HMDM trained on the complete dataset for its ability to detect motifs used in training in the presence of a “decoy”: a permuted motif. [sent-223, score-0.343]
75 By randomly permuting the positions in the motif, the shapes of the “U-shaped” motifs (e. [sent-224, score-0.341]
76 2 We insert each instance of motif/decoy pair into a 300-500 bp random background sequence at random position and . [sent-227, score-0.127]
77 3 We allow a 3 bp offset as a tolerance window, and score a hit when (and a mis-hit when ), where is the position where a motif instance is found. [sent-228, score-0.964]
78 The (mis)hit rate is the proportion of (mis)hits to the total number of motif instances to be found in an experiment. [sent-229, score-0.883]
79 Figure 3(c) shows a boxplot of the hit and mishit rate of HMDM on abf1 and gal4 over 50 randomly generated experiments. [sent-230, score-0.093]
80 Note the dramatic contrast of the sensitivity of the HMDM to true motifs compared to that of the PM model (which is essentially the MEME model). [sent-231, score-0.355]
81 2 0 0 0 0 0 0 0 1 2 3 4 1 mat−a2 2 3 4 1 mcb 2 3 4 1 2 mig1 3 4 1 crp 2 3 4 1 mat−a2 2 3 4 0. [sent-263, score-0.085]
82 2 0 1 mcb 2 3 4 1 2 mig1 3 4 3 4 crp 1 1 1 1 1 1 1 1 0. [sent-264, score-0.085]
83 2 0 1 2 3 4 0 1 2 3 4 1 2 (a) true motif only (b) true motif + decoy Figure 4: Motif detection on an independent test dataset (the 8 motifs in Figure 1(a)). [sent-296, score-2.06]
84 In the first motif finding task, we are given sequences each of which has only one true motif instance at a random position. [sent-305, score-1.734]
85 In three other cases they are comparable, but for motif mcb, all HMDM models lose. [sent-308, score-0.841]
86 The second task is more challenging and biologically more realistic, where we have both the true motifs and the permuted “decoys. [sent-314, score-0.364]
87 6 Conclusions We have presented a generative probabilistic framework for modeling motifs in biopolymer sequences. [sent-317, score-0.388]
88 Naively, categorical random variables with spatial/temporal dependencies can be modeled by a standard HMM with multinomial emission models. [sent-318, score-0.18]
89 However, the limited flexibility of each multinomial distribution and the concomitant need for a potentially large number of states to model complex domains may require a large parameter count and lead to overfitting. [sent-319, score-0.123]
90 3 We resisted the temptation of using biological background sequences because we would not know if and how many other motifs are in such sequences, which renders them ill-suited for purposes of evaluation. [sent-322, score-0.403]
91 Furthermore, when the output of the HMM involves hidden variables (as for the case of motif detection), inference and learning is further complicated. [sent-324, score-0.906]
92 HMDM assumes that positional dependencies are induced at a higher level among the finite number of informative Dirichlet priors rather than between the multinomials themselves. [sent-325, score-0.121]
93 In motif modeling, such a strategy was used to capture different distribution patterns of nucleotides (homogeneous and heterogeneous) and transition properties between patterns (site clustering). [sent-327, score-0.995]
94 Such a prior proves to be beneficial in searching for unseen motifs in our experiment and helps to distinguish more probable motifs from biologically meaningless random recurrent patterns. [sent-328, score-0.682]
95 This divide and conquer strategy makes it much easier to develop more sophisticated models for various aspects of motif analysis without being overburdened by the somewhat daunting complexity of the full motif problem. [sent-330, score-1.682]
96 Unsupervised learning of multiple motifs in biopolymers using EM. [sent-335, score-0.32]
97 The value of prior knowledge in discovering motifs with MEME. [sent-341, score-0.355]
98 Bioprospector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. [sent-397, score-0.466]
99 Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. [sent-407, score-0.118]
100 Deciphering genetic regulatory codes: A challenge for functional genomics. [sent-414, score-0.091]
wordName wordTfidf (topN-words)
[('motif', 0.841), ('motifs', 0.32), ('hmdm', 0.263), ('regulatory', 0.091), ('multinomial', 0.087), ('dirichlet', 0.085), ('alignment', 0.066), ('pm', 0.058), ('conserved', 0.055), ('biopolymer', 0.053), ('mcb', 0.053), ('nucleotide', 0.053), ('hit', 0.051), ('dna', 0.049), ('variational', 0.047), ('emission', 0.044), ('nucleotides', 0.042), ('instances', 0.042), ('sequence', 0.038), ('sequences', 0.038), ('detection', 0.037), ('dependencies', 0.036), ('inference', 0.036), ('hmm', 0.035), ('patterns', 0.035), ('counting', 0.034), ('mis', 0.033), ('position', 0.033), ('crp', 0.032), ('hmdms', 0.032), ('mat', 0.032), ('pwm', 0.032), ('heterogeneous', 0.031), ('hidden', 0.029), ('prototype', 0.029), ('bayesian', 0.029), ('biological', 0.028), ('positional', 0.027), ('transition', 0.026), ('site', 0.026), ('posterior', 0.026), ('multinomials', 0.025), ('nt', 0.025), ('bp', 0.025), ('liu', 0.025), ('homogeneous', 0.024), ('string', 0.024), ('admits', 0.023), ('permuted', 0.023), ('gene', 0.023), ('bailey', 0.021), ('biologists', 0.021), ('bioprospector', 0.021), ('boxplot', 0.021), ('decoy', 0.021), ('harbors', 0.021), ('meme', 0.021), ('mishit', 0.021), ('modular', 0.021), ('monomer', 0.021), ('monomers', 0.021), ('neuwald', 0.021), ('recurring', 0.021), ('submodels', 0.021), ('uhsq', 0.021), ('ah', 0.021), ('positions', 0.021), ('biologically', 0.021), ('prior', 0.021), ('expressive', 0.02), ('hd', 0.02), ('mixture', 0.02), ('model', 0.02), ('global', 0.02), ('priors', 0.019), ('ql', 0.018), ('markov', 0.018), ('background', 0.017), ('pi', 0.017), ('indicator', 0.017), ('capturing', 0.017), ('vague', 0.017), ('distribution', 0.016), ('prototypes', 0.016), ('sensitivity', 0.015), ('modeling', 0.015), ('latent', 0.014), ('instance', 0.014), ('knowledge', 0.014), ('biology', 0.014), ('informative', 0.014), ('yt', 0.014), ('genuine', 0.014), ('picked', 0.014), ('local', 0.014), ('separately', 0.014), ('boxes', 0.013), ('categorical', 0.013), ('distributions', 0.013), ('characterized', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Author: Eric P. Xing, Michael I. Jordan, Richard M. Karp, Stuart Russell
Abstract: We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over previous models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
2 0.062517822 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
Author: Harald Steck, Tommi S. Jaakkola
Abstract: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter - often interpreted as
3 0.054227073 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
Author: Eleazar Eskin, Jason Weston, William S. Noble, Christina S. Leslie
Abstract: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings. ¡ ¢
4 0.050916832 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables
Author: David Barber
Abstract: The application of latent/hidden variable Dynamic Bayesian Networks is constrained by the complexity of marginalising over latent variables. For this reason either small latent dimensions or Gaussian latent conditional tables linearly dependent on past states are typically considered in order that inference is tractable. We suggest an alternative approach in which the latent variables are modelled using deterministic conditional probability tables. This specialisation has the advantage of tractable inference even for highly complex non-linear/non-Gaussian visible conditional probability tables. This approach enables the consideration of highly complex latent dynamics whilst retaining the benefits of a tractable probabilistic model. 1
5 0.042908154 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
Author: Peter Sykacek, Stephen J. Roberts
Abstract: We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques. We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. The variational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.
6 0.042540975 25 nips-2002-An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
7 0.038480867 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
8 0.03117943 36 nips-2002-Automatic Alignment of Local Representations
9 0.030417392 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
10 0.030321224 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
11 0.029462151 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
12 0.029027835 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
13 0.027412765 191 nips-2002-String Kernels, Fisher Kernels and Finite State Automata
14 0.026669355 98 nips-2002-Going Metric: Denoising Pairwise Data
15 0.026052235 113 nips-2002-Information Diffusion Kernels
16 0.02603716 53 nips-2002-Clustering with the Fisher Score
17 0.025941817 93 nips-2002-Forward-Decoding Kernel-Based Phone Recognition
18 0.025339542 10 nips-2002-A Model for Learning Variance Components of Natural Images
19 0.025182473 163 nips-2002-Prediction and Semantic Association
20 0.024219502 140 nips-2002-Margin Analysis of the LVQ Algorithm
topicId topicWeight
[(0, -0.085), (1, -0.014), (2, -0.01), (3, 0.015), (4, -0.04), (5, 0.047), (6, -0.035), (7, 0.013), (8, 0.02), (9, -0.017), (10, 0.024), (11, -0.003), (12, -0.0), (13, 0.031), (14, -0.108), (15, -0.024), (16, -0.018), (17, 0.058), (18, 0.022), (19, 0.016), (20, 0.039), (21, -0.075), (22, -0.032), (23, 0.004), (24, -0.097), (25, 0.052), (26, -0.016), (27, 0.004), (28, -0.027), (29, -0.042), (30, -0.057), (31, -0.011), (32, -0.026), (33, -0.026), (34, -0.004), (35, -0.016), (36, 0.051), (37, 0.011), (38, -0.068), (39, 0.021), (40, 0.066), (41, -0.023), (42, 0.025), (43, 0.007), (44, -0.037), (45, 0.02), (46, 0.033), (47, 0.101), (48, -0.106), (49, -0.161)]
simIndex simValue paperId paperTitle
same-paper 1 0.88060552 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Author: Eric P. Xing, Michael I. Jordan, Richard M. Karp, Stuart Russell
Abstract: We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over previous models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
2 0.56600595 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
Author: Harald Steck, Tommi S. Jaakkola
Abstract: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter - often interpreted as
3 0.49690762 25 nips-2002-An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
Author: Samy Bengio
Abstract: This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same event. It is based on two other Markovian models, namely Asynchronous Input/ Output Hidden Markov Models and Pair Hidden Markov Models. An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model has been tested on an audio-visual speech recognition task using the M2VTS database and yielded robust performances under various noise conditions. 1
4 0.44945911 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
Author: Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda
Abstract: In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classification; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition performance can be found within a VBEC framework. Unlike conventional methods, including BIC or MDL criterion based on the maximum likelihood approach, the proposed model selection is valid in principle, even when there are insufficient amounts of data, because it does not use an asymptotic assumption. In isolated word recognition experiments, we show the advantage of VBEC over conventional methods, especially when dealing with small amounts of data.
5 0.40383983 53 nips-2002-Clustering with the Fisher Score
Author: Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
6 0.39562583 150 nips-2002-Multiple Cause Vector Quantization
7 0.38086724 137 nips-2002-Location Estimation with a Differential Update Network
8 0.37670702 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables
9 0.34310496 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
10 0.34078383 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks
11 0.33759853 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
12 0.33080977 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
13 0.32569969 191 nips-2002-String Kernels, Fisher Kernels and Finite State Automata
14 0.31864813 195 nips-2002-The Effect of Singularities in a Learning Machine when the True Parameters Do Not Lie on such Singularities
15 0.31344551 114 nips-2002-Information Regularization with Partially Labeled Data
16 0.30597061 84 nips-2002-Fast Exact Inference with a Factored Model for Natural Language Parsing
17 0.27346507 101 nips-2002-Handling Missing Data with Variational Bayesian Learning of ICA
18 0.27309644 1 nips-2002-"Name That Song!" A Probabilistic Approach to Querying on Music and Text
19 0.269427 98 nips-2002-Going Metric: Denoising Pairwise Data
20 0.263197 117 nips-2002-Intrinsic Dimension Estimation Using Packing Numbers
topicId topicWeight
[(11, 0.024), (14, 0.012), (23, 0.026), (42, 0.033), (44, 0.011), (54, 0.088), (55, 0.032), (57, 0.016), (67, 0.024), (68, 0.037), (74, 0.079), (77, 0.318), (87, 0.02), (92, 0.029), (98, 0.108)]
simIndex simValue paperId paperTitle
same-paper 1 0.75067896 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Author: Eric P. Xing, Michael I. Jordan, Richard M. Karp, Stuart Russell
Abstract: We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over previous models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
2 0.55278438 53 nips-2002-Clustering with the Fisher Score
Author: Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
3 0.46770969 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits
Author: Wolfgang Maass, Thomas Natschläger, Henry Markram
Abstract: A key challenge for neural modeling is to explain how a continuous stream of multi-modal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real-time. We propose a new computational model that is based on principles of high dimensional dynamical systems in combination with statistical learning theory. It can be implemented on generic evolved or found recurrent circuitry.
4 0.46522319 135 nips-2002-Learning with Multiple Labels
Author: Rong Jin, Zoubin Ghahramani
Abstract: In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1
5 0.46511593 10 nips-2002-A Model for Learning Variance Components of Natural Images
Author: Yan Karklin, Michael S. Lewicki
Abstract: We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.
6 0.46506298 44 nips-2002-Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution
7 0.46437269 93 nips-2002-Forward-Decoding Kernel-Based Phone Recognition
8 0.46360195 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
9 0.46204028 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
10 0.46195403 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
11 0.46183944 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
12 0.46134502 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories
13 0.4613297 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals
14 0.46002644 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
15 0.45968992 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
16 0.45860577 41 nips-2002-Bayesian Monte Carlo
17 0.45803529 27 nips-2002-An Impossibility Theorem for Clustering
18 0.45776731 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
19 0.45773482 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
20 0.45739457 2 nips-2002-A Bilinear Model for Sparse Coding