nips nips2008 nips2008-35 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Phil Blunsom, Trevor Cohn, Miles Osborne
Abstract: We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over maximum likelihood models. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. [sent-4, score-0.477]
2 SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. [sent-5, score-0.226]
3 We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. [sent-6, score-0.696]
4 Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over maximum likelihood models. [sent-7, score-1.843]
5 1 Introduction A recent trend in statistical machine translation (SMT) has been the use of synchronous grammar based formalisms, permitting polynomial algorithms for exploring exponential forests of translation options. [sent-8, score-1.608]
6 Current state-of-the-art synchronous grammar translation systems rely upon heuristic relative frequency parameter estimates borrowed from phrase-based machine translation[1, 2]. [sent-9, score-1.076]
7 In this work we draw upon recent Bayesian models of monolingual parsing [3, 4] to develop a generative synchronous grammar model of translation using a hierarchical Dirichlet process (HDP) [5]. [sent-10, score-1.304]
8 The first is that we include sparse priors over the model parameters, encoding the intuition that source phrases will have few translations, and also addressing the problem of overfitting when using long multi-word translations pairs. [sent-12, score-0.283]
9 In addition, we investigate different priors based on standard machine translation models. [sent-14, score-0.631]
10 Our second contribution is the induction of categories for the synchronous grammar using a HDP prior. [sent-16, score-0.614]
11 Such categories allow the model to learn the latent structure of translational equivalence between strings, such as a preference to reorder adjectives and nouns when translating between French to English or to encode that a phrase pair should be used at the beginning or end of a sentence. [sent-17, score-0.279]
12 Automatically induced non-terminal symbols give synchronous grammar models increased power over single non-terminal systems such as [2], while avoiding the problems of relying on noisy domainspecific parsers, as in [7]. [sent-18, score-0.488]
13 As the model is non-parametric, the HDP prior will provide a bias towards parameter distributions using as many, or as few, non-terminals as necessary to model the training data. [sent-19, score-0.148]
14 We focus on modelling the generation of a translation for a source sentence, putting aside for further work integration with common components of a state-of-the-art translation system, such as a language model and minimum error rate training [6]. [sent-22, score-1.29]
15 Figure 1: An example SCFG derivation from a Chinese source sentence which yields the English sentence: “Standing tall on Taihang Mountain is the Monument to the Hundred Regiment Offensive. [sent-24, score-0.273]
16 [8] described the ITG subclass of SCFGs and performed many experiments using MLE training to induce translation models on small corpora. [sent-27, score-0.56]
17 Most subsequent work with ITG grammars has focused on the sub-task of word alignment [9], rather than actual translation, and has continued to use MLE trained models. [sent-28, score-0.224]
18 Our results clearly indicate that MLE models considerably overfit when used to estimate synchronous grammars, while the judicious use of priors can alleviate this problem. [sent-30, score-0.302]
19 This result raises the prospect that many MLE trained models of translation (e. [sent-31, score-0.56]
20 2 Synchronous context free grammar A synchronous context free grammar (SCFG, [13]) describes the generation of pairs of strings. [sent-34, score-0.803]
21 A string pair is generated by applying a series of paired context-free rewrite rules of the form, X φ , where X is a non-terminal, and φ are strings of terminals and non-terminals and specifies a one-to-one alignment between non-terminals in and φ. [sent-35, score-0.2]
22 In the context of SMT, by assigning the source and target languages to the respective sides of a SCFG it is possible to describe translation as the process of parsing the source sentence, while generating the target translation [2]. [sent-36, score-1.39]
23 In this paper we only consider binary normal-form SCFGs which allow productions to rewrite as either a pair of a pair of non-terminals, or a pair of non-empty terminal strings (these may span multiple words). [sent-37, score-0.365]
24 Such grammars are equivalent to the inversion transduction grammars presented in [8]. [sent-38, score-0.43]
25 Note however that our approach is general and could be used with other synchronous grammar transducers (e. [sent-39, score-0.488]
26 The binary non-terminal productions can specify that the order of the child non-terminals is the same in both languages (a monotone production), or is reversed (a reordering production). [sent-42, score-0.424]
27 Monotone and reordering rules are written: Z X1 Y 2 X1 Y 2 and Z X1 Y 2 Y 2 X1 respectively, where X Y and Z are non-terminals and the boxed indices denote the alignment. [sent-43, score-0.151]
28 Without loss of generality, here we add the restriction that non-terminals on the source and target sides of the grammar must have the same category. [sent-44, score-0.377]
29 Although conceptually simple, a binary normalform SCFGs can still represent a wide range of linguistic phenomena required for translation [8]. [sent-45, score-0.589]
30 The grammar in this example has non-terminals A and B which distinguish between translation phrases which permit re-orderings. [sent-47, score-0.868]
31 3 Generative Model A sequence of SCFG rule applications which produces both a source and a target sentence is referred to as a derivation, denoted z. [sent-48, score-0.266]
32 This rule type determines if the symbol will rewrite as a source-target translation pair, or a pair of non-terminals with either monotone or reversed order. [sent-51, score-0.865]
33 This continues until no non-terminals are remaining, at which point the derivation is complete and the source and target sentences can be read off. [sent-54, score-0.212]
34 When expanding a production each decision is drawn from a multinomial distribution specific to the non-terminal, zi . [sent-55, score-0.367]
35 This allows different nonterminals to rewrite in different ways – as an emission, reordering or monotone production. [sent-56, score-0.355]
36 The prior distribution for each binary production is parametrised by π, the top-level stick-breaking weights, thereby ensuring that each production draws its children from a shared inventory of category labels. [sent-57, score-0.767]
37 For instance, we can encode a preference towards longer or short derivations using αY , and a preference for sparse or dense translation lexicons with αE . [sent-60, score-0.714]
38 In addition to allowing for the incorporation of prior knowledge about sparsity, the priors have been chosen to be conjugate to the multinomial distribution. [sent-64, score-0.18]
39 1 Rule type distribution The rule type distribution determines the relative likelihood of generating a terminal string pair, a monotone production, or a reordering. [sent-67, score-0.254]
40 Synchronous grammars that allow multiple words to be emitted at the leaves of a derivation are prone to focusing probability mass on only the longest translation pairs, i. [sent-68, score-0.825]
41 if a training set sentence pair can be explained by many short translation pairs, or a few long ones the maximum likelihood solution will be to use the longest pairs. [sent-70, score-0.751]
42 We can counter this tendency by assuming a prior distribution that allows us to temper the model’s preference for short derivations with large translation pairs. [sent-72, score-0.682]
43 2 Emission distribution The Dirichlet process prior on the terminal emission distribution serves two purposes. [sent-75, score-0.316]
44 Firstly the prior allows us to encode the intuition that our model should have few translation pairs. [sent-76, score-0.675]
45 The translation pairs in our system are induced from noisy data and thus many of them will be of little use. [sent-77, score-0.618]
46 Therefore a sparse prior should lead to these noisy translation pairs being assigned probabilities close to zero. [sent-78, score-0.672]
47 Secondly, the base distribution P0 of the Dirichlet process can be used to include sophisticated prior distributions over translation pairs from other popular models of translation. [sent-79, score-0.672]
48 We use a unigram language model for the probability P (e), and train the parameters p(fj |ei ) using a variational approximation, similar to that which is described in Section 3. [sent-83, score-0.162]
49 Model 1 allows us to assign a prior probability to each translation pair in our model. [sent-85, score-0.664]
50 This prior suggests that lexically similar translation pairs should have similar probabilities. [sent-86, score-0.672]
51 RF Relative frequency (P0 ) Most statistical machine translation models currently in use estimate the probabilities for translation pairs using a simple relative frequency estimator. [sent-88, score-1.234]
52 Although this estimator doesn’t take into account any generative process for how the translation pairs were observed, and by extension of the arguments for tree substitution grammars is biased and inconsistent [15], it has proved effective in many state-of-the-art translation systems. [sent-90, score-1.389]
53 3 Non-terminal distributions We employ a structured prior for binary production rules inspired by similar approaches in monolingual grammar induction [3, 4]. [sent-92, score-0.705]
54 In addition, both the monotone and reordering production parameters are drawn from a Dirichlet process parameterised by the matrix of the expectations for each pair of nonterminals, ππ T , assuming independence in the prior. [sent-96, score-0.612]
55 This allows the model to prefer grammars with few non-terminal labels and where each non-terminal has a sparse distribution over productions. [sent-97, score-0.216]
56 4 Inference Previous work with monolingual HDP-CFG grammars have employed either Gibbs sampling [4] or variational Bayes [3] approaches to inference. [sent-99, score-0.309]
57 In this work we follow the mean-field approximation presented in [16, 3], truncating the top-level stick-breaking prior on the non-terminals and optimising a variational bound on the probability of the training sample. [sent-100, score-0.181]
58 First we start with our objective, the likelihood of the observed string pairs, x = {(e, f )}: log p(x) = log p(θ)p(x, z|θ) ≥ dθ z 1 dθ q(θ, z) log z p(θ)p(x, z|θ) , q(θ, z) Current translation systems more commonly use the conditional, rather than joint, estimator. [sent-102, score-0.594]
59 The starred rewrites in the denominators indicate a sum over any monotone or reordering production, respectively. [sent-108, score-0.292]
60 The weights for the rule-type and emission distributions are defined similarly. [sent-109, score-0.219]
61 The variational training cycles between optimising the q(θ) distribution by re-estimating the weights W and the stick-breaking prior π, then using these estimates, with the inside-outside dynamic programming algorithm, to calculate the q(z) distribution. [sent-110, score-0.181]
62 Optimising the top-level stick-breaking weights has no closed form solution as a dependency is induced between the GEM prior and production distributions. [sent-111, score-0.324]
63 5 Prediction The predictive distribution under our Bayesian model is given by: dθ p(θ|x)p(z|f , θ) ≈ p(z|x, f ) = dθ q(θ)p(z|f , θ) ≥ exp dθ q(θ) log p(z|f , θ) , where x is the training set of parallel sentence pairs, f is a testing source sentence and z its derivation. [sent-115, score-0.337]
64 4 Evaluation We evaluate our HDP-SCFG model on both synthetic and real-world translation tasks. [sent-118, score-0.634]
65 Recovering a synthetic grammar This experiment investigates the ability of our model to recover a simple synthetic grammar, using the minimum number of constituent categories. [sent-119, score-0.397]
66 Binary production posterior distribution Emission posterior distribution 1. [sent-123, score-0.27]
67 0 1 2 3 4 5 Category 1 2 3 4 5 Category Figure 2: Synthetic grammar experiments. [sent-135, score-0.257]
68 The HDP model correctly allocates a single binary production non-terminal and three equally weighted emission non-terminals. [sent-136, score-0.586]
69 Sentence Length Longest Sentence Training Chinese English 33164 253724 279104 7 8 41 45 Development Chinese English 500 3464 3752 6 7 58 62 Test Chinese English 506 3784 3823 7 7 61 56 Table 2: Chinese to English translation corpus statistics. [sent-138, score-0.59]
70 Figure 2 shows the emission and production distributions produced by the HDP-SCFG model,3 as well as an EM trained maximum likelihood (MLE) model. [sent-139, score-0.489]
71 The variational inference for the HDP model was truncated at five categories, likewise the MLE model was trained with five categories. [sent-140, score-0.17]
72 It allocates category 2 to the S category, giving 2 it a 3 probability of generating a monotone production (A,C), versus 1 for a reordering (B). [sent-142, score-0.706]
73 For 3 the emission distribution the HDP model assigns category 1 to A, 3 to B and 5 to C, each of which has a posterior probability of 1 . [sent-143, score-0.363]
74 The stick-breaking prior biases the model towards using a small set 3 of categories, and therefore the model correctly uses only four categories, assigning zero posterior probability mass to category 4. [sent-144, score-0.258]
75 The MLE model has no bias for small grammars and therefore uses all available categories to model the data. [sent-145, score-0.333]
76 For the production distribution it creates two categories with equal posteriors to model the S category, while for emissions the model collapses categories A and C into category 1, and splits category B over 3 and 5. [sent-146, score-0.752]
77 This grammar is more expressive than the target grammar, over-generating but including the target grammar as a subset. [sent-147, score-0.588]
78 The particular grammar found by the MLE model is dependent on the (random) initialisation and the fact that the EM algorithm can only find a local maximum, however it will always use all available categories to model the data. [sent-148, score-0.408]
79 Chinese-English machine translation The real-world translation experiment aims to determine whether the model can learn and generalise from a noisy large-scale parallel machine translation corpus, and provide performance benefits on the standard evaluation metrics. [sent-149, score-1.714]
80 We evaluate our model on the IWSLT 2005 Chinese to English translation task [17], using the 2004 test set as development data for tuning the hyperparameters. [sent-150, score-0.624]
81 The translation phrase pairs that form the base of our grammar are induced using the standard alignment and translation phrase pair extraction heuristics used in phrase-based translation models [6]. [sent-153, score-2.222]
82 As these heuristics aren’t based on a generative model, and don’t guarantee that the target translation will be reachable from the source, we discard those sentence pairs for which we cannot produce a derivation, leaving 33,164 sentences for training. [sent-154, score-0.865]
83 3 No structured P0 was used in this model, rather a simple Dirichlet prior with uniform αE was employed for the emission distribution. [sent-156, score-0.273]
84 0 1e+00 1e+02 αE 1e+04 1e+06 αY Figure 3: Tuning the Dirichlet α parameters for the emission and rule type distributions (development set). [sent-169, score-0.255]
85 7 Table 3: Test results for the model with a single non-terminal category and various emission priors (B LEU ). [sent-174, score-0.434]
86 8 Table 4: Test set results for the hierarchical model with the variational distribution truncated at five non-terminal categories (B LEU ). [sent-177, score-0.26]
87 We first evaluate our model using a grammar with a single non-terminal category (rendering the hierarchical prior redundant) and vary the prior P0 used for the emission parameters. [sent-178, score-0.769]
88 For this model we investigate the effect that the emission and rule-type priors have on translation performance. [sent-179, score-0.884]
89 75, indicating that the emission distribution benefits from a slightly sparse distribution, but not far from the uniform value of 1. [sent-183, score-0.219]
90 The sharp curve for the αY rule-type distribution hyperparameter confirms our earlier hypothesis that the model requires considerable smoothing in order to force it to place probability mass on long derivations rather than simply placing it all on the largest translation pairs. [sent-185, score-0.666]
91 The optimal hyperparameter values on the development data for the two structured emission distribution priors, Model 1 (M 1 ) and relative frequency (RF ), also provide insight into the underlying models. [sent-186, score-0.314]
92 The M 1 prior has a heavy bias towards smaller translation pairs, countering the model’s inherent bias. [sent-187, score-0.64]
93 Conversely the RF prior is biased towards larger translation pairs reinforcing the model’s bias, thus a very large value (106 ) for the αY parameter gives optimal development set performance. [sent-190, score-0.728]
94 Table 3 shows the performance of the single category models with each of the priors on the test set. [sent-191, score-0.181]
95 5 Conclusion We have proposed a Bayesian model for inducing synchronous grammars and demonstrated its efficacy on both synthetic and real machine translation tasks. [sent-200, score-1.047]
96 The sophisticated priors over the model’s parameters address limitations of MLE models, most notably overfitting, and effectively model the nature of the translation task. [sent-201, score-0.665]
97 In addition, the incorporation of a hierarchical prior opens the door to the unsupervised induction of grammars capable of representing the latent structure of translation. [sent-202, score-0.32]
98 Our Bayesian model of translation using synchronous grammars provides a basis upon which more sophisticated models can be built, enabling a move away from the current heuristically engineered translation systems. [sent-203, score-1.567]
99 Scalable inference and training of context-rich syntactic translation models. [sent-236, score-0.56]
100 Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. [sent-240, score-0.278]
wordName wordTfidf (topN-words)
[('translation', 0.56), ('production', 0.27), ('grammar', 0.257), ('synchronous', 0.231), ('emission', 0.219), ('grammars', 0.182), ('mle', 0.178), ('zr', 0.154), ('reordering', 0.151), ('zl', 0.146), ('monotone', 0.141), ('hdp', 0.116), ('category', 0.11), ('sentence', 0.11), ('rf', 0.097), ('scfgs', 0.086), ('source', 0.083), ('categories', 0.083), ('dirichlet', 0.078), ('chinese', 0.076), ('variational', 0.075), ('scfg', 0.075), ('english', 0.074), ('priors', 0.071), ('draw', 0.07), ('bleu', 0.069), ('productions', 0.069), ('pairs', 0.058), ('multinomial', 0.055), ('linguistics', 0.055), ('prior', 0.054), ('language', 0.053), ('derivation', 0.052), ('phrase', 0.052), ('itg', 0.052), ('leu', 0.052), ('monolingual', 0.052), ('optimising', 0.052), ('phrases', 0.051), ('pair', 0.05), ('ti', 0.046), ('meeting', 0.045), ('symbol', 0.045), ('translations', 0.044), ('terminal', 0.043), ('daniel', 0.043), ('induction', 0.043), ('zi', 0.042), ('alignment', 0.042), ('strings', 0.041), ('hierarchical', 0.041), ('sentences', 0.04), ('synthetic', 0.04), ('transduction', 0.039), ('target', 0.037), ('hyperparameter', 0.037), ('ibm', 0.036), ('rule', 0.036), ('annual', 0.036), ('derivations', 0.035), ('dan', 0.035), ('model', 0.034), ('allocates', 0.034), ('chapeau', 0.034), ('iwslt', 0.034), ('marcu', 0.034), ('maximised', 0.034), ('monument', 0.034), ('parametrised', 0.034), ('phrasal', 0.034), ('rouge', 0.034), ('taihang', 0.034), ('wz', 0.034), ('string', 0.034), ('child', 0.034), ('preference', 0.033), ('rewrite', 0.033), ('longest', 0.031), ('acl', 0.031), ('heuristics', 0.031), ('nonterminals', 0.03), ('smt', 0.03), ('development', 0.03), ('parsing', 0.03), ('dp', 0.03), ('corpus', 0.03), ('binary', 0.029), ('generative', 0.029), ('frequency', 0.028), ('tall', 0.028), ('constituents', 0.028), ('emissions', 0.028), ('prague', 0.028), ('republic', 0.028), ('encode', 0.027), ('inversion', 0.027), ('truncated', 0.027), ('towards', 0.026), ('pietra', 0.026), ('constituent', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 35 nips-2008-Bayesian Synchronous Grammar Induction
Author: Phil Blunsom, Trevor Cohn, Miles Osborne
Abstract: We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over maximum likelihood models. 1
2 0.29526073 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
Author: Shay B. Cohen, Kevin Gimpel, Noah A. Smith
Abstract: We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. 1
3 0.17454478 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
Author: Shuang Wu, Hongjing Lu, Alan L. Yuille
Abstract: Psychophysical experiments show that humans are better at perceiving rotation and expansion than translation. These findings are inconsistent with standard models of motion integration which predict best performance for translation [6]. To explain this discrepancy, our theory formulates motion perception at two levels of inference: we first perform model selection between the competing models (e.g. translation, rotation, and expansion) and then estimate the velocity using the selected model. We define novel prior models for smooth rotation and expansion using techniques similar to those in the slow-and-smooth model [17] (e.g. Green functions of differential operators). The theory gives good agreement with the trends observed in human experiments. 1
4 0.10233942 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
Author: Roger P. Levy, Florencia Reali, Thomas L. Griffiths
Abstract: Language comprehension in humans is significantly constrained by memory, yet rapid, highly incremental, and capable of utilizing a wide range of contextual information to resolve ambiguity and form expectations about future input. In contrast, most of the leading psycholinguistic models and fielded algorithms for natural language parsing are non-incremental, have run time superlinear in input length, and/or enforce structural locality constraints on probabilistic dependencies between events. We present a new limited-memory model of sentence comprehension which involves an adaptation of the particle filter, a sequential Monte Carlo method, to the problem of incremental parsing. We show that this model can reproduce classic results in online sentence comprehension, and that it naturally provides the first rational account of an outstanding problem in psycholinguistics, in which the preferred alternative in a syntactic ambiguity seems to grow more attractive over time even in the absence of strong disambiguating information. 1
5 0.099070139 229 nips-2008-Syntactic Topic Models
Author: Jordan L. Boyd-graber, David M. Blei
Abstract: We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1
6 0.075469486 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
7 0.059418168 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
8 0.059389256 65 nips-2008-Domain Adaptation with Multiple Sources
9 0.057361901 113 nips-2008-Kernelized Sorting
10 0.055644911 4 nips-2008-A Scalable Hierarchical Distributed Language Model
11 0.053902581 28 nips-2008-Asynchronous Distributed Learning of Topic Models
12 0.052661519 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems
13 0.051246624 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
14 0.049813017 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
15 0.049632888 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
16 0.049074192 118 nips-2008-Learning Transformational Invariants from Natural Movies
17 0.047287967 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
18 0.046042621 233 nips-2008-The Gaussian Process Density Sampler
19 0.045940295 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
20 0.045901202 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
topicId topicWeight
[(0, -0.146), (1, -0.049), (2, 0.086), (3, -0.079), (4, -0.001), (5, -0.035), (6, 0.03), (7, 0.159), (8, -0.077), (9, 0.021), (10, -0.081), (11, 0.028), (12, 0.071), (13, 0.137), (14, 0.01), (15, 0.123), (16, -0.037), (17, -0.012), (18, -0.149), (19, 0.034), (20, 0.02), (21, 0.059), (22, -0.02), (23, -0.096), (24, -0.091), (25, 0.074), (26, 0.144), (27, -0.217), (28, 0.011), (29, 0.06), (30, -0.018), (31, -0.125), (32, -0.217), (33, -0.011), (34, 0.147), (35, 0.04), (36, -0.014), (37, -0.022), (38, -0.118), (39, 0.03), (40, 0.006), (41, 0.133), (42, -0.069), (43, 0.044), (44, 0.049), (45, 0.043), (46, -0.002), (47, -0.124), (48, 0.122), (49, -0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.95102268 35 nips-2008-Bayesian Synchronous Grammar Induction
Author: Phil Blunsom, Trevor Cohn, Miles Osborne
Abstract: We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over maximum likelihood models. 1
2 0.8249113 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
Author: Shay B. Cohen, Kevin Gimpel, Noah A. Smith
Abstract: We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. 1
3 0.65603882 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
Author: Roger P. Levy, Florencia Reali, Thomas L. Griffiths
Abstract: Language comprehension in humans is significantly constrained by memory, yet rapid, highly incremental, and capable of utilizing a wide range of contextual information to resolve ambiguity and form expectations about future input. In contrast, most of the leading psycholinguistic models and fielded algorithms for natural language parsing are non-incremental, have run time superlinear in input length, and/or enforce structural locality constraints on probabilistic dependencies between events. We present a new limited-memory model of sentence comprehension which involves an adaptation of the particle filter, a sequential Monte Carlo method, to the problem of incremental parsing. We show that this model can reproduce classic results in online sentence comprehension, and that it naturally provides the first rational account of an outstanding problem in psycholinguistics, in which the preferred alternative in a syntactic ambiguity seems to grow more attractive over time even in the absence of strong disambiguating information. 1
4 0.46022987 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
Author: Shuang Wu, Hongjing Lu, Alan L. Yuille
Abstract: Psychophysical experiments show that humans are better at perceiving rotation and expansion than translation. These findings are inconsistent with standard models of motion integration which predict best performance for translation [6]. To explain this discrepancy, our theory formulates motion perception at two levels of inference: we first perform model selection between the competing models (e.g. translation, rotation, and expansion) and then estimate the velocity using the selected model. We define novel prior models for smooth rotation and expansion using techniques similar to those in the slow-and-smooth model [17] (e.g. Green functions of differential operators). The theory gives good agreement with the trends observed in human experiments. 1
5 0.39330581 4 nips-2008-A Scalable Hierarchical Distributed Language Model
Author: Andriy Mnih, Geoffrey E. Hinton
Abstract: Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPLMs is their extremely long training and testing times. Morin and Bengio have proposed a hierarchical language model built around a binary tree of words, which was two orders of magnitude faster than the nonhierarchical model it was based on. However, it performed considerably worse than its non-hierarchical counterpart in spite of using a word tree created using expert knowledge. We introduce a fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data. We then show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models. 1
6 0.39051667 229 nips-2008-Syntactic Topic Models
7 0.34767428 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
8 0.33133778 52 nips-2008-Correlated Bigram LSA for Unsupervised Language Model Adaptation
9 0.29798737 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
10 0.286053 65 nips-2008-Domain Adaptation with Multiple Sources
11 0.28369239 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
12 0.27707696 249 nips-2008-Variational Mixture of Gaussian Process Experts
13 0.27557364 124 nips-2008-Load and Attentional Bayes
14 0.27124536 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
15 0.2674377 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
16 0.26322836 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
17 0.25822386 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
18 0.2563481 134 nips-2008-Mixed Membership Stochastic Blockmodels
19 0.25057879 216 nips-2008-Sparse probabilistic projections
20 0.25014037 29 nips-2008-Automatic online tuning for fast Gaussian summation
topicId topicWeight
[(6, 0.064), (7, 0.049), (11, 0.301), (12, 0.058), (15, 0.01), (28, 0.112), (45, 0.049), (57, 0.126), (59, 0.019), (63, 0.016), (71, 0.014), (77, 0.043), (78, 0.014), (83, 0.035), (87, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.78567457 35 nips-2008-Bayesian Synchronous Grammar Induction
Author: Phil Blunsom, Trevor Cohn, Miles Osborne
Abstract: We present a novel method for inducing synchronous context free grammars (SCFGs) from a corpus of parallel string pairs. SCFGs can model equivalence between strings in terms of substitutions, insertions and deletions, and the reordering of sub-strings. We develop a non-parametric Bayesian model and apply it to a machine translation task, using priors to replace the various heuristics commonly used in this field. Using a variational Bayes training procedure, we learn the latent structure of translation equivalence through the induction of synchronous grammar categories for phrasal translations, showing improvements in translation performance over maximum likelihood models. 1
2 0.53615195 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
Author: Mehmet K. Muezzinoglu, Alexander Vergara, Ramon Huerta, Thomas Nowotny, Nikolai Rulkov, Henry Abarbanel, Allen Selverston, Mikhail Rabinovich
Abstract: The odor transduction process has a large time constant and is susceptible to various types of noise. Therefore, the olfactory code at the sensor/receptor level is in general a slow and highly variable indicator of the input odor in both natural and artificial situations. Insects overcome this problem by using a neuronal device in their Antennal Lobe (AL), which transforms the identity code of olfactory receptors to a spatio-temporal code. This transformation improves the decision of the Mushroom Bodies (MBs), the subsequent classifier, in both speed and accuracy. Here we propose a rate model based on two intrinsic mechanisms in the insect AL, namely integration and inhibition. Then we present a MB classifier model that resembles the sparse and random structure of insect MB. A local Hebbian learning procedure governs the plasticity in the model. These formulations not only help to understand the signal conditioning and classification methods of insect olfactory systems, but also can be leveraged in synthetic problems. Among them, we consider here the discrimination of odor mixtures from pure odors. We show on a set of records from metal-oxide gas sensors that the cascade of these two new models facilitates fast and accurate discrimination of even highly imbalanced mixtures from pure odors. 1
3 0.52998203 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
Author: Shay B. Cohen, Kevin Gimpel, Noah A. Smith
Abstract: We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. 1
4 0.52494597 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
Author: Erik B. Sudderth, Michael I. Jordan
Abstract: We develop a statistical framework for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY processes, we use Gaussian processes to discover spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes. 1
5 0.52443182 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
Author: Jing Xu, Thomas L. Griffiths
Abstract: Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases. 1
6 0.52103645 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
7 0.51892662 236 nips-2008-The Mondrian Process
8 0.51263994 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
9 0.50799638 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks
10 0.5078876 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
11 0.50782073 148 nips-2008-Natural Image Denoising with Convolutional Networks
12 0.5069387 200 nips-2008-Robust Kernel Principal Component Analysis
13 0.50668663 66 nips-2008-Dynamic visual attention: searching for coding length increments
14 0.50436223 233 nips-2008-The Gaussian Process Density Sampler
15 0.50207353 118 nips-2008-Learning Transformational Invariants from Natural Movies
16 0.50024563 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
17 0.49938759 62 nips-2008-Differentiable Sparse Coding
18 0.4989571 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
19 0.49890253 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
20 0.49800733 234 nips-2008-The Infinite Factorial Hidden Markov Model