fast_ml fast_ml-2013 fast_ml-2013-46 knowledge-graph by maker-knowledge-mining

46 fast ml-2013-12-07-13 NIPS papers that caught our eye

meta infos for this blog

Source: html

Introduction: Recently Rob Zinkov published his selection of interesting-looking NIPS papers . Inspired by this, we list some more. Rob seems to like Bayesian stuff, we’re more into neural networks. If you feel like browsing, Andrej Karpathy has a page with all NIPS 2013 papers . They are categorized by topics discovered by running LDA. When you see an interesting paper, you can discover ones ranked similiar by TF-IDF. Here’s what we found. Understanding Dropout Pierre Baldi, Peter J. Sadowski Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characte

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We show that they reach state-of-the-art performance for recurrent networks in character-level language modelling when trained with simple stochastic gradient descent. [sent-19, score-0.617]

2 RNADE: The real-valued neural autoregressive density-estimator Benigno Uria, Iain Murray, Hugo Larochelle We introduce RNADE, a new model for joint density estimation of real-valued vectors. [sent-21, score-0.448]

3 Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. [sent-22, score-0.491]

4 We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case. [sent-25, score-0.743]

5 We describe experiments on real-world datasets which sometimes show improvements of several orders of magnitude over the classical algorithm. [sent-35, score-0.515]

6 The MP-DBM can be seen as a single probabilistic model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent nets that share parameters and approximately solve different inference problems. [sent-39, score-0.654]

7 Using Dirichlet process mixture models for image clustering and denoising, we demonstrate major improvements in robustness and accuracy. [sent-48, score-0.53]

8 We propose a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation. [sent-51, score-0.597]

9 We also investigate several model types and find that the embeddings learned by the simpler models perform at least as well as those learned by the more complex ones. [sent-55, score-0.713]

10 Learning Stochastic Feedforward Neural Networks Yichuan Tang, Ruslan Salakhutdinov Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks. [sent-56, score-0.51]

11 As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. [sent-57, score-0.464]

12 In this paper, we propose a stochastic feedforward network with hidden layers having both deterministic and stochastic variables. [sent-64, score-0.553]

13 We demonstrate the superiority of our model to conditional Restricted Boltzmann Machines and Mixture Density Networks on synthetic datasets and on modeling facial expressions. [sent-66, score-0.451]

14 Corrado, Jeff Dean The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. [sent-69, score-0.477]

15 We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude. [sent-82, score-0.524]

16 Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. [sent-84, score-0.66]

17 Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. [sent-85, score-0.494]

18 Approximate inference in latent Gaussian-Markov models from continuous time observations Botond Cseke, Manfred Opper, Guido Sanguinetti We propose an approximate inference algorithm for continuous time Gaussian-Markov process models with both discrete and continuous time likelihoods. [sent-88, score-1.821]

19 We show that the continuous time limit of the expectation propagation algorithm exists and results in a hybrid fixed point iteration consisting of (1) expectation propagation updates for the discrete time terms and (2) variational updates for the continuous time term. [sent-89, score-0.685]

20 This approach extends the classical Kalman-Bucy smoothing procedure to non-Gaussian observations, enabling continuous-time inference in a variety of models, including spiking neuronal models (state-space models with point process observations) and box likelihood models. [sent-91, score-0.805]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('models', 0.234), ('inference', 0.233), ('stochastic', 0.183), ('conditional', 0.156), ('variational', 0.156), ('algorithm', 0.149), ('several', 0.145), ('continuous', 0.137), ('recurrent', 0.137), ('embeddings', 0.13), ('propose', 0.125), ('rnade', 0.125), ('latent', 0.124), ('introduce', 0.114), ('mixture', 0.114), ('word', 0.108), ('show', 0.106), ('density', 0.104), ('distribution', 0.104), ('classical', 0.104), ('local', 0.104), ('networks', 0.103), ('dropout', 0.099), ('approaches', 0.093), ('outperforms', 0.093), ('improvements', 0.091), ('demonstrate', 0.091), ('representations', 0.088), ('performance', 0.088), ('regression', 0.087), ('neural', 0.086), ('significant', 0.078), ('observations', 0.078), ('capture', 0.078), ('estimation', 0.078), ('hierarchy', 0.078), ('views', 0.078), ('algorithms', 0.069), ('datasets', 0.069), ('learned', 0.069), ('variables', 0.069), ('boltzmann', 0.069), ('modeling', 0.069), ('model', 0.066), ('air', 0.062), ('approximation', 0.062), ('cca', 0.062), ('dbm', 0.062), ('deterministic', 0.062), ('efficiently', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

2 0.18242559 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

3 0.12533799 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

4 0.12117848 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

Introduction: Last time we explored dimensionality reduction in practice using Gensim’s LSI and LDA. Now, having spent some time researching the subject matter, we will give an overview of other options. UPDATE : We now consider the topic quite irrelevant, because sparse high-dimensional data is precisely where linear models shine. See Amazon aspires to automate access control , Predicting advertised salaries and Predicting closed questions on Stack Overflow . And the few most popular methods are: LSI/LSA - a multinomial PCA LDA - Latent Dirichlet Allocation matrix factorization, in particular non-negative variants: NMF ICA, or Independent Components Analysis mixtures of Bernoullis stacked RBMs correlated topic models, an extension of LDA We tried the first two before. As regards matrix factorization, you do the same stuff as with movie recommendations (think Netflix challenge). The difference is, now all the matrix elements are known and we are only interested in

5 0.11766096 22 fast ml-2013-03-07-Choosing a machine learning algorithm

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like

6 0.11763088 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

7 0.11550124 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

8 0.097853512 18 fast ml-2013-01-17-A very fast denoising autoencoder

9 0.097306192 58 fast ml-2014-04-12-Deep learning these days

10 0.088751696 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

11 0.087713316 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

12 0.084529556 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

13 0.08386416 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

14 0.080602266 20 fast ml-2013-02-18-Predicting advertised salaries

15 0.077678643 40 fast ml-2013-10-06-Pylearn2 in practice

16 0.076147266 19 fast ml-2013-02-07-The secret of the big guys

17 0.073573276 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

18 0.072387785 43 fast ml-2013-11-02-Maxing out the digits

19 0.072376691 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit

20 0.071752332 15 fast ml-2013-01-07-Machine learning courses online

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.314), (1, 0.148), (2, 0.159), (3, 0.075), (4, 0.128), (5, 0.05), (6, -0.03), (7, -0.102), (8, -0.162), (9, -0.006), (10, -0.002), (11, -0.088), (12, 0.127), (13, 0.004), (14, -0.144), (15, -0.07), (16, 0.016), (17, -0.018), (18, -0.297), (19, -0.091), (20, 0.04), (21, -0.063), (22, -0.176), (23, -0.046), (24, 0.147), (25, -0.09), (26, -0.121), (27, -0.013), (28, -0.133), (29, -0.159), (30, 0.047), (31, 0.183), (32, 0.169), (33, -0.182), (34, -0.083), (35, -0.202), (36, -0.175), (37, 0.11), (38, 0.049), (39, 0.015), (40, 0.103), (41, -0.262), (42, 0.187), (43, 0.167), (44, -0.026), (45, 0.22), (46, 0.148), (47, -0.142), (48, 0.12), (49, 0.117)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97766733 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

2 0.38822019 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

3 0.22434537 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

4 0.22082917 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

Introduction: We continue with CIFAR-10-based competition at Kaggle to get to know DropConnect. It’s supposed to be an improvement over dropout. And dropout is certainly one of the bigger steps forward in neural network development. Is DropConnect really better than dropout? TL;DR DropConnect seems to offer results similiar to dropout. State of the art scores reported in the paper come from model ensembling. Dropout Dropout , by Hinton et al., is perhaps a biggest invention in the field of neural networks in recent years. It adresses the main problem in machine learning, that is overfitting. It does so by “dropping out” some unit activations in a given layer, that is setting them to zero. Thus it prevents co-adaptation of units and can also be seen as a method of ensembling many networks sharing the same weights. For each training example a different set of units to drop is randomly chosen. The idea has a biological inspiration . When a child is conceived, it receives half its genes f

5 0.21141857 22 fast ml-2013-03-07-Choosing a machine learning algorithm

6 0.19881062 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

7 0.19215406 27 fast ml-2013-05-01-Deep learning made easy

8 0.17875049 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

9 0.169101 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

10 0.16619147 19 fast ml-2013-02-07-The secret of the big guys

11 0.16513145 18 fast ml-2013-01-17-A very fast denoising autoencoder

12 0.15412594 20 fast ml-2013-02-18-Predicting advertised salaries

13 0.14861906 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

14 0.14713685 15 fast ml-2013-01-07-Machine learning courses online

15 0.14499006 43 fast ml-2013-11-02-Maxing out the digits

16 0.14210038 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

17 0.13411681 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

18 0.13253544 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

19 0.1325154 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit

20 0.12739153 40 fast ml-2013-10-06-Pylearn2 in practice

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.015), (22, 0.014), (26, 0.028), (30, 0.017), (31, 0.062), (35, 0.033), (55, 0.022), (69, 0.102), (71, 0.051), (72, 0.013), (75, 0.01), (78, 0.035), (79, 0.025), (84, 0.027), (96, 0.416), (99, 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92423582 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

2 0.84266526 36 fast ml-2013-08-23-A bag of words and a nice little network

Introduction: In this installment we will demonstrate how to turn text into numbers by a method known as a bag of words . We will also show how to train a simple neural network on resulting sparse data for binary classification. We will achieve the first feat with Python and scikit-learn, the second one with sparsenn . The example data comes from a Kaggle competition, specifically Stumbleupon Evergreen. The subject of the contest is to classify webpage content as either evergreen or not. The train set consist of about 7k examples. For each example we have a title and a body for a webpage and then about 20 numeric features describing the content (usually, because some tidbits are missing). We will use text only: extract body and turn it into a bag of words in libsvm format. In case you don’t know, libsvm is pretty popular for storing sparse data as text. It looks like this: 1 94:1 298:1 474:1 492:1 1213:1 1536:1 (...) First goes the label and then indexes of non-zero features. It’

3 0.32944849 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

4 0.30115756 19 fast ml-2013-02-07-The secret of the big guys

Introduction: Are you interested in linear models, or K-means clustering? Probably not much. These are very basic techniques with fancier alternatives. But here’s the bomb: when you combine those two methods for supervised learning, you can get better results than from a random forest. And maybe even faster. We have already written about Vowpal Wabbit , a fast linear learner from Yahoo/Microsoft. Google’s response (or at least, a Google’s guy response) seems to be Sofia-ML . The software consists of two parts: a linear learner and K-means clustering. We found Sofia a while ago and wondered about K-means: who needs K-means? Here’s a clue: This package can be used for learning cluster centers (…) and for mapping a given data set onto a new feature space based on the learned cluster centers. Our eyes only opened when we read a certain paper, namely An Analysis of Single-Layer Networks in Unsupervised Feature Learning ( PDF ). The paper, by Coates , Lee and Ng, is about object recogni

5 0.29524499 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

6 0.28931448 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

7 0.28913406 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

8 0.28025681 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

9 0.27918053 18 fast ml-2013-01-17-A very fast denoising autoencoder

10 0.27508748 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

11 0.27370617 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

12 0.27246234 9 fast ml-2012-10-25-So you want to work for Facebook

13 0.27183044 13 fast ml-2012-12-27-Spearmint with a random forest

14 0.27036151 27 fast ml-2013-05-01-Deep learning made easy

15 0.26820156 40 fast ml-2013-10-06-Pylearn2 in practice

16 0.26731321 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

17 0.26586741 16 fast ml-2013-01-12-Intro to random forests

18 0.26263463 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

19 0.26244965 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

20 0.26165178 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction