nips nips2013 nips2013-260 knowledge-graph by maker-knowledge-mining

260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator

Source: pdf

Author: Benigno Uria, Iain Murray, Hugo Larochelle

Abstract: We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of onedimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradientbased optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, ﬁnding it outperforms mixture models in all but one case. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 RNADE: The real-valued neural autoregressive density-estimator Benigno Uria and Iain Murray School of Informatics University of Edinburgh {b. [sent-1, score-0.333]

2 uk Hugo Larochelle D´ partement d’informatique e Universit´ de Sherbrooke e hugo. [sent-5, score-0.069]

3 ca Abstract We introduce RNADE, a new model for joint density estimation of real-valued vectors. [sent-7, score-0.14]

4 Our model calculates the density of a datapoint as the product of onedimensional conditionals modeled using mixture density networks with shared parameters. [sent-8, score-0.907]

5 RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. [sent-9, score-0.161]

6 A tractable likelihood allows direct comparison with other methods and training by standard gradientbased optimizers. [sent-10, score-0.155]

7 We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, ﬁnding it outperforms mixture models in all but one case. [sent-11, score-0.377]

8 1 Introduction Probabilistic approaches to machine learning involve modeling the probability distributions over large collections of variables. [sent-12, score-0.181]

9 The number of parameters required to describe a general discrete distribution grows exponentially in its dimensionality, so some structure or regularity must be imposed, often through graphical models [e. [sent-13, score-0.194]

10 Graphical models are also used to describe probability densities over collections of real-valued variables. [sent-16, score-0.212]

11 Often parts of a task-speciﬁc probabilistic model are hard to specify, and are learned from data using generic models. [sent-17, score-0.096]

12 For example, the natural probabilistic approach to image restoration tasks (such as denoising, deblurring, inpainting) requires a multivariate distribution over uncorrupted patches of pixels. [sent-18, score-0.326]

13 It has long been appreciated that large classes of densities can be estimated consistently by kernel density estimation [2], and a large mixture of Gaussians can closely represent any density. [sent-19, score-0.566]

14 In practice, a parametric mixture of Gaussians seems to ﬁt the distribution over patches of pixels and obtains state-of-the-art restorations [3]. [sent-20, score-0.46]

15 It may not be possible to ﬁt small image patches signiﬁcantly better, but alternative models could further test this claim. [sent-21, score-0.205]

16 Moreover, competitive alternatives to mixture models might improve performance in other applications that have insufﬁcient training data to ﬁt mixture models well. [sent-22, score-0.647]

17 Restricted Boltzmann Machines (RBMs), which are undirected graphical models, ﬁt samples of binary vectors from a range of sources better than mixture models [4, 5]. [sent-23, score-0.433]

18 One explanation is that RBMs form a distributed representation: many hidden units are active when explaining an observation, which is a better match to most real data than a single mixture component. [sent-24, score-0.501]

19 Another explanation is that RBMs are mixture models, but the number of components is exponential in the number of hidden units. [sent-25, score-0.403]

20 Parameter tying among components allows these more ﬂexible models to generalize better from small numbers of examples. [sent-26, score-0.188]

21 There are two practical difﬁculties with RBMs: the likelihood of the model must be approximated, and samples can only be drawn from the model approximately by Gibbs sampling. [sent-27, score-0.028]

22 The Neural Autoregressive Distribution Estimator (NADE) overcomes these difﬁculties [5]. [sent-28, score-0.06]

23 NADE is a directed graphical model, or feed-forward neural network, initially derived as an approximation to an RBM, but then ﬁtted as a model in its own right. [sent-29, score-0.176]

24 An autoregressive model expresses the density of a vector as an ordered product of one-dimensional distributions, each conditioned on the values of previous dimensions in the (perhaps arbitrary) ordering. [sent-31, score-0.599]

25 We use the parameter sharing previously introduced by NADE, combined with mixture density networks [6], an existing ﬂexible approach to modeling real-valued distributions with neural networks. [sent-32, score-0.551]

26 By construction, the density of a test point under RNADE is cheap to compute, unlike RBM-based models. [sent-33, score-0.185]

27 The neural network structure provides a ﬂexible way to alter the mean and variance of a mixture component depending on context, potentially modeling non-linear or heteroscedastic data with fewer components than unconstrained mixture models. [sent-34, score-0.815]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rnade', 0.63), ('nade', 0.36), ('autoregressive', 0.301), ('mixture', 0.244), ('rbms', 0.225), ('density', 0.14), ('patches', 0.124), ('exible', 0.092), ('collections', 0.088), ('appreciated', 0.079), ('gradientbased', 0.079), ('densities', 0.076), ('graphical', 0.075), ('onedimensional', 0.073), ('deblurring', 0.073), ('explanation', 0.073), ('inpainting', 0.069), ('partement', 0.069), ('tying', 0.069), ('culties', 0.069), ('gaussians', 0.067), ('uncorrupted', 0.065), ('heteroscedastic', 0.065), ('datapoint', 0.065), ('iain', 0.065), ('hugo', 0.063), ('informatique', 0.06), ('conditionals', 0.06), ('overcomes', 0.06), ('restoration', 0.06), ('edinburgh', 0.058), ('calculates', 0.055), ('explaining', 0.055), ('informatics', 0.053), ('alter', 0.052), ('factorizes', 0.052), ('rbm', 0.052), ('expresses', 0.051), ('murray', 0.051), ('product', 0.049), ('larochelle', 0.049), ('models', 0.048), ('tractable', 0.048), ('components', 0.046), ('regularity', 0.046), ('imposed', 0.046), ('xd', 0.045), ('cheap', 0.045), ('insuf', 0.044), ('probabilistic', 0.044), ('universit', 0.043), ('perceptual', 0.043), ('tted', 0.043), ('heterogeneous', 0.042), ('estimator', 0.041), ('hidden', 0.04), ('rule', 0.039), ('denoising', 0.039), ('modeling', 0.039), ('unconstrained', 0.037), ('initially', 0.037), ('boltzmann', 0.036), ('alternatives', 0.036), ('sharing', 0.036), ('school', 0.035), ('sources', 0.035), ('units', 0.034), ('obtains', 0.033), ('image', 0.033), ('directed', 0.032), ('ordered', 0.032), ('neural', 0.032), ('pixels', 0.031), ('undirected', 0.031), ('networks', 0.031), ('calculation', 0.03), ('distributions', 0.029), ('learns', 0.029), ('distributed', 0.029), ('network', 0.029), ('likelihood', 0.028), ('machines', 0.028), ('parametric', 0.028), ('specify', 0.027), ('competitive', 0.027), ('consistently', 0.027), ('gibbs', 0.027), ('parts', 0.027), ('fewer', 0.027), ('perhaps', 0.027), ('match', 0.026), ('shared', 0.026), ('chain', 0.026), ('conditioned', 0.026), ('exponentially', 0.025), ('generic', 0.025), ('involve', 0.025), ('representation', 0.025), ('generalize', 0.025), ('modeled', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator

Author: Benigno Uria, Iain Murray, Hugo Larochelle

2 0.10319473 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs

Author: Yann Dauphin, Yoshua Bengio

Abstract: Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised. Unfortunately, this creates a computational bottleneck for unsupervised feature learning algorithms such as those based on auto-encoders and RBMs, because they involve a reconstruction step where the whole input vector is predicted from the current feature values. An algorithm was recently developed to successfully handle the case of auto-encoders, based on an importance sampling scheme stochastically selecting which input elements to actually reconstruct during training for each particular example. To generalize this idea to RBMs, we propose a stochastic ratio-matching algorithm that inherits all the computational advantages and unbiasedness of the importance sampling scheme. We show that stochastic ratio matching is a good estimator, allowing the approach to beat the state-of-the-art on two bag-of-word text classiﬁcation benchmarks (20 Newsgroups and RCV1), while keeping computational cost linear in the number of non-zeros. 1

3 0.093624093 331 nips-2013-Top-Down Regularization of Deep Belief Networks

Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim

Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1

4 0.07458546 36 nips-2013-Annealing between distributions by averaging moments

Author: Roger B. Grosse, Chris J. Maddison, Ruslan Salakhutdinov

Abstract: Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and the intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families deﬁned by averaging the moments of the initial and target distributions. We analyze the asymptotic performance of both the geometric and moment averages paths and derive an asymptotically optimal piecewise linear schedule. AIS with moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models. 1

5 0.072705559 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines

Author: James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel

Abstract: This paper examines the question: What kinds of distributions can be efﬁciently represented by Restricted Boltzmann Machines (RBMs)? We characterize the RBM’s unnormalized log-likelihood function as a type of neural network, and through a series of simulation results relate these networks to ones whose representational properties are better understood. We show the surprising result that RBMs can efﬁciently capture any distribution whose density depends on the number of 1’s in their input. We also provide the ﬁrst known example of a particular type of distribution that provably cannot be efﬁciently represented by an RBM, assuming a realistic exponential upper bound on the weights. By formally demonstrating that a relatively simple distribution cannot be represented efﬁciently by an RBM our results provide a new rigorous justiﬁcation for the use of potentially more expressive generative models, such as deeper ones. 1

6 0.072338425 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

7 0.069493189 344 nips-2013-Using multiple samples to learn mixture models

8 0.066636443 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

9 0.063263677 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

10 0.063238099 160 nips-2013-Learning Stochastic Feedforward Neural Networks

11 0.062495131 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models

12 0.061991617 167 nips-2013-Learning the Local Statistics of Optical Flow

13 0.058806598 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

14 0.056463916 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

15 0.055148188 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

16 0.049550787 5 nips-2013-A Deep Architecture for Matching Short Texts

17 0.04881981 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty

18 0.04774864 251 nips-2013-Predicting Parameters in Deep Learning

19 0.046146628 75 nips-2013-Convex Two-Layer Modeling

20 0.046133988 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.105), (1, 0.06), (2, -0.056), (3, -0.028), (4, 0.04), (5, 0.026), (6, 0.052), (7, 0.03), (8, 0.03), (9, -0.058), (10, 0.039), (11, 0.015), (12, -0.045), (13, 0.051), (14, 0.048), (15, 0.065), (16, 0.016), (17, -0.093), (18, -0.047), (19, -0.011), (20, -0.007), (21, 0.062), (22, 0.06), (23, -0.03), (24, 0.0), (25, -0.015), (26, -0.103), (27, 0.006), (28, -0.063), (29, -0.008), (30, 0.055), (31, 0.073), (32, 0.028), (33, -0.031), (34, -0.016), (35, -0.021), (36, 0.065), (37, 0.073), (38, -0.09), (39, 0.112), (40, -0.033), (41, 0.003), (42, -0.076), (43, -0.052), (44, -0.036), (45, -0.08), (46, -0.02), (47, 0.023), (48, 0.124), (49, -0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90935254 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator

Author: Benigno Uria, Iain Murray, Hugo Larochelle

2 0.60911375 36 nips-2013-Annealing between distributions by averaging moments

Author: Roger B. Grosse, Chris J. Maddison, Ruslan Salakhutdinov

3 0.55937159 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs

Author: Yann Dauphin, Yoshua Bengio

4 0.52242458 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines

Author: James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel

5 0.51835299 167 nips-2013-Learning the Local Statistics of Optical Flow

Author: Dan Rosenbaum, Daniel Zoran, Yair Weiss

Abstract: Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical ﬂow to learn the local statistics of optical ﬂow and compare the learned models to prior models assumed by computer vision researchers. We ﬁnd that a Gaussian mixture model (GMM) with 64 components provides a signiﬁcantly better model for local ﬂow statistics when compared to commonly used models. We investigate the source of the GMM’s success and show it is related to an explicit representation of ﬂow boundaries. We also learn a model that jointly models the local intensity pattern and the local optical ﬂow. In accordance with the assumptions often made in computer vision, the model learns that ﬂow boundaries are more likely at intensity boundaries. However, when evaluated on a large dataset, this dependency is very weak and the beneﬁt of conditioning ﬂow estimation on the local intensity pattern is marginal. 1

6 0.51834625 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

7 0.50097954 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

8 0.48215881 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models

9 0.47050029 331 nips-2013-Top-Down Regularization of Deep Belief Networks

10 0.45980051 160 nips-2013-Learning Stochastic Feedforward Neural Networks

11 0.45314223 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

12 0.45191932 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

13 0.44450906 5 nips-2013-A Deep Architecture for Matching Short Texts

14 0.42384899 344 nips-2013-Using multiple samples to learn mixture models

15 0.4234961 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs

16 0.42107895 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

17 0.41413265 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

18 0.403413 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

19 0.40190026 200 nips-2013-Multi-Prediction Deep Boltzmann Machines

20 0.38911662 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.034), (33, 0.192), (34, 0.109), (41, 0.028), (49, 0.083), (56, 0.066), (70, 0.019), (75, 0.269), (85, 0.042), (89, 0.024), (93, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83023334 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

Author: Stefan Mathe, Cristian Sminchisescu

Abstract: Human eye movements provide a rich source of information into the human visual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difﬁcult to develop reliable eye movement prediction systems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected under the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million ﬁxations recorded in 9157 images) and different task controls. Second, we propose Markov models to automatically discover areas of interest (AOI) and introduce novel sequential consistency metrics based on them. Our methods can automatically determine the number, the spatial support and the transitions between AOIs, in addition to their locations. Based on such encodings, we quantitatively show that given unconstrained read-world stimuli, task instructions have signiﬁcant inﬂuence on the human visual search patterns and are stable across subjects. Finally, we leverage powerful machine learning techniques and computer vision features in order to learn task-sensitive reward functions from eye movement data within models that allow to effectively predict the human visual search patterns based on inverse optimal control. The methodology achieves state of the art scanpath modeling results. 1

same-paper 2 0.79306406 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator

Author: Benigno Uria, Iain Murray, Hugo Larochelle

3 0.78786939 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

Author: Tuo Zhao, Han Liu

Abstract: We propose a semiparametric method for estimating sparse precision matrix of high dimensional elliptical distribution. The proposed method calibrates regularizations when estimating each column of the precision matrix. Thus it not only is asymptotically tuning free, but also achieves an improved ﬁnite sample performance. Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. 1

4 0.7738499 347 nips-2013-Variational Planning for Graph-based MDPs

Author: Qiang Cheng, Qiang Liu, Feng Chen, Alex Ihler

Abstract: Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is ﬁrst approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph. Experimental comparison on different models shows that our algorithm outperforms existing approximation algorithms at ﬁnding good policies. 1

5 0.77131802 166 nips-2013-Learning invariant representations and applications to face verification

Author: Qianli Liao, Joel Z. Leibo, Tomaso Poggio

Abstract: One approach to computer object recognition and modeling the brain’s ventral stream involves unsupervised learning of representations that are invariant to common transformations. However, applications of these ideas have usually been limited to 2D afﬁne transformations, e.g., translation and scaling, since they are easiest to solve via convolution. In accord with a recent theory of transformationinvariance [1], we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identitypreserving transformations. The model’s wiring can be learned from videos of transforming objects—or any other grouping of images into sets by their depicted object. Through a series of successively more complex empirical tests, we study the invariance/discriminability properties of this model with respect to different transformations. First, we empirically conﬁrm theoretical predictions (from [1]) for the case of 2D afﬁne transformations. Next, we apply the model to non-afﬁne transformations; as expected, it performs well on face veriﬁcation tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction. Surprisingly, it can also tolerate clutter “transformations” which map an image of a face on one background to an image of the same face on a different background. Motivated by these empirical ﬁndings, we tested the same model on face veriﬁcation benchmark tasks from the computer vision literature: Labeled Faces in the Wild, PubFig [2, 3, 4] and a new dataset we gathered—achieving strong performance in these highly unconstrained cases as well. 1

6 0.6832636 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

7 0.6663667 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

8 0.66550779 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization

9 0.66366881 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

10 0.66339904 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

11 0.6600861 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

12 0.66005969 301 nips-2013-Sparse Additive Text Models with Low Rank Background

13 0.65743124 294 nips-2013-Similarity Component Analysis

14 0.65699339 331 nips-2013-Top-Down Regularization of Deep Belief Networks

15 0.65603113 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

16 0.65455639 200 nips-2013-Multi-Prediction Deep Boltzmann Machines

17 0.65354609 64 nips-2013-Compete to Compute

18 0.65345335 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

19 0.65245652 173 nips-2013-Least Informative Dimensions

20 0.65176386 183 nips-2013-Mapping paradigm ontologies to and from the brain