nips nips2013 nips2013-85 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Deep content-based music recommendation A¨ ron van den Oord, Sander Dieleman, Benjamin Schrauwen a Electronics and Information Systems department (ELIS), Ghent University {aaron. [sent-1, score-0.575]
2 be Abstract Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. [sent-5, score-1.003]
3 However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. [sent-7, score-0.194]
4 In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. [sent-8, score-1.253]
5 We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. [sent-9, score-0.556]
6 We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. [sent-10, score-0.796]
7 We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. [sent-11, score-0.78]
8 1 Introduction In recent years, the music industry has shifted more and more towards digital distribution through online music stores and streaming services such as iTunes, Spotify, Grooveshark and Google Play. [sent-12, score-0.848]
9 As a result, automatic music recommendation has become an increasingly relevant problem: it allows listeners to discover new music that matches their tastes, and enables online music stores to target their wares to the right audience. [sent-13, score-1.387]
10 Although recommender systems have been studied extensively, the problem of music recommendation in particular is complicated by the sheer variety of different styles and genres, as well as social and geographic factors that influence listener preferences. [sent-14, score-0.672]
11 automatic playlist generation), and it disregards the fact that the repertoire of an artist is rarely homogenous: listeners may enjoy particular songs more than others. [sent-18, score-0.4]
12 Many recommender systems rely on usage patterns: the combinations of items that users have consumed or rated provide information about the users’ preferences, and how the items relate to each other. [sent-19, score-0.431]
13 The consensus is that collaborative filtering will generally outperform content-based recommendation [1]. [sent-22, score-0.193]
14 Additionally, items that are only of interest to a niche audience are more difficult to recommend because usage data is scarce. [sent-25, score-0.208]
15 The factors seem to discriminate between different styles, such as indie rock, electronic music and classic rock. [sent-27, score-0.482]
16 1 Content-based music recommendation Music can be recommended based on available metadata: information such as the artist, album and year of release is usually known. [sent-31, score-0.539]
17 For example, recommending songs by artists that the user is known to enjoy is not particularly useful. [sent-33, score-0.601]
18 One can also attempt to recommend music that is perceptually similar to what the user has previously listened to, by measuring the similarity between audio signals [3, 4]. [sent-34, score-0.988]
19 Such metrics are often defined ad hoc, based on prior knowledge about music audio, and as a result they are not necessarily optimal for the task of music recommendation. [sent-36, score-0.848]
20 The former methods rely on a similarity measure between users or items: they recommend items consumed by other users with similar preferences, or similar items to the ones that the user has already consumed. [sent-40, score-0.461]
21 Modelbased methods on the other hand attempt to model latent characteristics of the users and items, which are usually represented as vectors of latent factors. [sent-41, score-0.302]
22 3 The semantic gap in music Latent factor vectors form a compact description of the different facets of users’ tastes, and the corresponding characteristics of the items. [sent-44, score-0.465]
23 To demonstrate this, we computed latent factors for a small set of usage data, and listed some artists whose songs have very positive and very negative values for each factor in Table 1. [sent-45, score-0.79]
24 Therefore it would be useful to be able to predict them from music audio content. [sent-49, score-0.789]
25 There is a large semantic gap between the characteristics of a song that affect user preference, and the corresponding audio signal. [sent-50, score-0.535]
26 Extracting high-level properties such as genre, mood, instrumentation and lyrical themes from audio signals requires powerful models that are capable of capturing the complex hierarchical structure of music. [sent-51, score-0.379]
27 Additionally, some properties are impossible to obtain from audio signals alone, such as the popularity of the artist, their reputation and and their location. [sent-52, score-0.379]
28 Researchers in the domain of music information retrieval (MIR) concern themselves with extracting these high-level properties from music. [sent-53, score-0.486]
29 They have grown to rely on a particular set of engineered audio features, such as mel-frequency cepstral coefficients (MFCCs), which are used as input to simple classifiers or regressors, such as SVMs and linear regression [9]. [sent-54, score-0.329]
30 2 In this paper, we strive to bridge the semantic gap in music by training deep convolutional neural networks to predict latent factors from music audio. [sent-56, score-1.237]
31 We evaluate our approach on an industrialscale dataset with audio excerpts of over 380,000 songs, and compare it with a more conventional approach using a bag-of-words feature representation for each song. [sent-57, score-0.329]
32 We assess to what extent it is possible to extract characteristics that affect user preference directly from audio signals, and evaluate the predictions from our models in a music recommendation setting. [sent-58, score-1.006]
33 2 The dataset The Million Song Dataset (MSD) [13] is a collection of metadata and precomputed audio features for one million contemporary songs. [sent-59, score-0.379]
34 Several other datasets linked to the MSD are also available, featuring lyrics, cover songs, tags and user listening data. [sent-60, score-0.325]
35 This makes the dataset suitable for a wide range of different music information retrieval tasks. [sent-61, score-0.486]
36 Two linked datasets are of interest for our experiments: • The Echo Nest Taste Profile Subset provides play counts for over 380,000 songs in the MSD, gathered from 1 million users. [sent-62, score-0.418]
37 Traditionally, research in music information retrieval (MIR) on large-scale datasets was limited to industry, because large collections of music audio cannot be published easily due to licensing issues. [sent-66, score-1.239]
38 Unfortunately, the audio features provided with the MSD are of limited use, and the process by which they were obtained is not very well documented. [sent-68, score-0.329]
39 [15], but the absence of raw audio data, or at least a mid-level representation, is still an issue. [sent-70, score-0.329]
40 However, we were able to attain 29 second audio clips for over 99% of the dataset from 7digital. [sent-71, score-0.329]
41 Due to its size, the MSD allows for the music recommendation problem to be studied in a more realistic setting than was previously possible. [sent-73, score-0.539]
42 We know how many times the users have listened to each of the songs in the dataset, but they have not explicitly rated them. [sent-76, score-0.488]
43 However, we can assume that users will probably listen to songs more often if they enjoy them. [sent-77, score-0.434]
44 [16], to learn latent factor representations of all users and items in the Taste Profile Subset. [sent-81, score-0.304]
45 For each user-item pair, we define a preference variable pui and a confidence variable cui (I(x) is the indicator function, α and are hyperparameters): pui = I(rui > 0), cui = 1 + α log(1 + (1) −1 rui ). [sent-84, score-0.224]
46 (2) The preference variable indicates whether user u has ever listened to song i. [sent-85, score-0.306]
47 It is a function of the play count, because songs with higher play counts are more likely to be preferred. [sent-88, score-0.368]
48 The WMF objective function is given by: 3 cui (pui − xT yi )2 + λ u min x ,y ||xu ||2 + u u,i ||yi ||2 , (3) i where λ is a regularization parameter, xu is the latent factor vector for user u, and yi is the latent factor vector for song i. [sent-90, score-0.564]
49 4 Predicting latent factors from music audio Predicting latent factors for a given song from the corresponding audio signal is a regression problem. [sent-96, score-1.548]
50 We evaluate two methods to achieve this: one follows the conventional approach in MIR by extracting local features from audio signals and aggregating them into a bag-of-words (BoW) representation. [sent-98, score-0.379]
51 1 Bag-of-words representation Many MIR systems rely on the following feature extraction pipeline to convert music audio signals into a fixed-size representation that can be used as input to a classifier or regressor [5, 17, 18, 19, 20]: • Extract MFCCs from the audio signals. [sent-105, score-1.132]
52 We computed 13 MFCCs from windows of 1024 audio frames, corresponding to 23 ms at a sampling rate of 22050 Hz, and a hop size of 512 samples. [sent-106, score-0.329]
53 This was used as a baseline for our music recommendation experiments, which are described in Section 5. [sent-115, score-0.539]
54 We first extracted an intermediate time-frequency representation from the audio signals to use as input to the network. [sent-125, score-0.379]
55 We used log-compressed mel-spectrograms with 128 components and the same window size and hop size that we used for the MFCCs (1024 and 512 audio frames respectively). [sent-126, score-0.329]
56 The networks were trained on windows of 3 seconds sampled randomly from the audio clips. [sent-127, score-0.329]
57 To predict the latent factors for an entire clip, we averaged over the predictions for consecutive windows. [sent-129, score-0.212]
58 Let yi be the latent factor vector for song i, obtained with WMF, and yi the corresponding prediction by the model. [sent-134, score-0.273]
59 u min θ (5) u,i Experiments Versatility of the latent factor representation To demonstrate the versatility of the latent factor vectors, we compared them with audio features in a tag prediction task. [sent-137, score-0.647]
60 We ran WMF to obtain 50-dimensional latent factor vectors for all 9,330 songs in the subset, and trained a logistic regression model to predict the 50 most popular tags from the Last. [sent-139, score-0.597]
61 We also trained a logistic regression model on a bag-of-words representation of the audio signals, which was first reduced in size using PCA (see Section 4. [sent-141, score-0.329]
62 2 Latent factor prediction: quantitative evaluation To assess quantitatively how well we can predict latent factors from music audio, we used the predictions from our models for music recommendation. [sent-148, score-1.101]
63 For every user u and for every song i in the test set, we computed the score xT yi , and recommended the songs with the highest scores first. [sent-149, score-0.574]
64 In this case, scores for a given user are computed by averaging similarity scores across all the songs that the user has listened to. [sent-151, score-0.645]
65 The following models were used to predict latent factor vectors: • Linear regression trained on the bag-of-words representation described in Section 4. [sent-152, score-0.195]
66 For the other experiments, we used all available data: we used all songs that we have usage data for and that we were able to download an audio clip for (382,410 songs and 1 million users in total, 46,728 songs were used for testing). [sent-159, score-1.678]
67 We evaluated all only the 9,330 most popular songs, and models on the subset, using latent factor vectors with listening data for 20,000 users. [sent-171, score-0.196]
68 We compared the convolutional neural network with linear regression on the bag-of-words representation on the full dataset as well, using latent factor vectors with 400 dimensions. [sent-173, score-0.272]
69 Presumably this is because the weighting causes the importance of the songs to be proportional to their popularity. [sent-180, score-0.368]
70 In other words, the model will be encouraged to predict latent factor vectors for popular songs from the training set very well, at the expense of all other songs. [sent-181, score-0.563]
71 We also included results for when the latent factor vectors are obtained from usage data. [sent-184, score-0.288]
72 There is a large gap between our best result and this theoretical maximum, but this is to be expected: as we mentioned before, many aspects of the songs that influence user preference cannot possibly be extracted from audio signals only. [sent-186, score-0.885]
73 96070 Table 3: Results for linear regression on a bag-of-words representation of the audio signals, and a convolutional neural network trained with the MSE objective, on the full dataset (382,410 songs and 1 million users). [sent-197, score-0.86]
74 Also shown are the scores achieved when the latent factor vectors are randomized, and when they are learned from usage data using WMF (upper bound). [sent-198, score-0.288]
75 For each song, we searched for similar songs by measuring the cosine similarity between the predicted usage patterns. [sent-201, score-0.575]
76 We compared the usage patterns predicted using the latent factors obtained with WMF (50 dimensions), with those using latent factors predicted with a convolutional neural network. [sent-202, score-0.672]
77 A few songs and their closest matches according to both models are shown in Table 4. [sent-203, score-0.368]
78 When the predicted latent factors are used, the matches are mostly different, but the results are quite reasonable in the sense that the matched songs are likely to appeal to the same audience. [sent-204, score-0.583]
79 Oizo Flying Lotus Jem Radiohead Four Tet Thievery Corporation Figure 1: t-SNE visualization of the distribution of predicted usage patterns, using latent factors predicted from audio. [sent-250, score-0.383]
80 A few close-ups show artists whose songs are projected in specific areas. [sent-251, score-0.444]
81 We can discern hip-hop (red), rock (green), pop (yellow) and electronic music (blue). [sent-252, score-0.424]
82 Clusters of songs that appeal to the same audience seem to be preserved quite well, even though the latent factor vectors for all songs were predicted from audio. [sent-257, score-0.934]
83 Topic proportions obtained from the content of the articles are used instead of latent factors when no usage data is available. [sent-262, score-0.338]
84 Our approach is sequential instead: we first obtain latent factor vectors for songs for which usage data is available, and use these to train a regression model. [sent-264, score-0.656]
85 Because we reduce the incorporation of content information to a regression problem, we are able to use a deep convolutional network. [sent-265, score-0.21]
86 [5] define an artist-level content-based similarity measure for music learned from a sample of collaborative filter data using metric learning to rank [21]. [sent-267, score-0.541]
87 They use a variation on the typical bag-of-words approach for audio feature extraction (see section 4. [sent-268, score-0.329]
88 For audio data they used the CAL10K dataset, which consists of 10,832 songs, so it is comparable in size to the subset of the MSD that we used for our initial experiments. [sent-271, score-0.329]
89 [17] investigate the problem of recommending items to a user given another item as a query, which they call ‘collaborative retrieval’. [sent-273, score-0.236]
90 They also use the bag-of-words approach to extract audio features and evaluate this method on a large proprietary dataset. [sent-275, score-0.329]
91 7 Conclusion In this paper, we have investigated the use of deep convolutional neural networks to predict latent factors from music audio when they cannot be obtained from usage data. [sent-284, score-1.271]
92 We evaluated the predictions by using them for music recommendation on an industrial-scale dataset. [sent-285, score-0.539]
93 Even though a lot of characteristics of songs that affect user preference cannot be predicted from audio signals, the resulting recommendations seem to be sensible. [sent-286, score-0.91]
94 We can conclude that predicting latent factors from music audio is a viable method for recommending new and unpopular music. [sent-287, score-0.994]
95 We also showed that recent advances in deep learning translate very well to the music recommendation setting in combination with this approach, with deep convolutional neural networks significantly outperforming a more traditional approach using bag-of-words representations of audio signals. [sent-288, score-1.109]
96 Moving beyond feature design: Deep architectures and automatic feature learning in music informatics. [sent-324, score-0.424]
97 Learning features from music audio with deep belief networks. [sent-327, score-0.817]
98 Unsupervised feature learning for audio classification using convolutional deep belief networks. [sent-330, score-0.506]
99 Audio-based music classification with a e pretrained convolutional network. [sent-334, score-0.537]
100 Large-scale music annotation and retrieval: Learning to rank in joint semantic spaces. [sent-358, score-0.424]
wordName wordTfidf (topN-words)
[('music', 0.424), ('songs', 0.368), ('audio', 0.329), ('featuring', 0.162), ('punk', 0.159), ('daft', 0.147), ('wmf', 0.147), ('beyonc', 0.135), ('usage', 0.129), ('latent', 0.118), ('recommendation', 0.115), ('song', 0.114), ('convolutional', 0.113), ('eminem', 0.098), ('user', 0.092), ('boys', 0.086), ('msd', 0.086), ('brothers', 0.086), ('mariah', 0.086), ('items', 0.079), ('collaborative', 0.078), ('artists', 0.076), ('keys', 0.076), ('ismir', 0.076), ('lil', 0.076), ('coldplay', 0.073), ('users', 0.066), ('recommending', 0.065), ('mcfee', 0.065), ('mir', 0.065), ('deep', 0.064), ('carey', 0.062), ('retrieval', 0.062), ('alicia', 0.061), ('clearwater', 0.061), ('creedence', 0.061), ('flying', 0.061), ('lotus', 0.061), ('noize', 0.061), ('rihanna', 0.061), ('wpe', 0.061), ('jonas', 0.06), ('factors', 0.058), ('listened', 0.054), ('revival', 0.054), ('million', 0.05), ('signals', 0.05), ('wayne', 0.05), ('kanye', 0.049), ('pui', 0.049), ('soundsystem', 0.049), ('swift', 0.049), ('yeah', 0.049), ('preference', 0.046), ('lcd', 0.043), ('dmx', 0.043), ('mfccs', 0.043), ('prince', 0.043), ('steve', 0.043), ('ltering', 0.043), ('factor', 0.041), ('west', 0.041), ('consumed', 0.04), ('cui', 0.04), ('notorious', 0.04), ('similarity', 0.039), ('predicted', 0.039), ('recommender', 0.038), ('auc', 0.038), ('taste', 0.037), ('styles', 0.037), ('brian', 0.037), ('akon', 0.037), ('basshunter', 0.037), ('beatz', 0.037), ('bieber', 0.037), ('bonobo', 0.037), ('calle', 0.037), ('chromeo', 0.037), ('cyrus', 0.037), ('fiasco', 0.037), ('girl', 0.037), ('listening', 0.037), ('lupe', 0.037), ('miley', 0.037), ('peas', 0.037), ('shaggy', 0.037), ('spears', 0.037), ('stevens', 0.037), ('twins', 0.037), ('recommendations', 0.036), ('predict', 0.036), ('ron', 0.036), ('chris', 0.034), ('tags', 0.034), ('content', 0.033), ('band', 0.033), ('artist', 0.032), ('nash', 0.032), ('crystal', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 85 nips-2013-Deep content-based music recommendation
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
2 0.13247012 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
Author: Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro
Abstract: In this paper, we propose a new computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing a form of taskspecific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications. 1
3 0.10264266 7 nips-2013-A Gang of Bandits
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Multi-armed bandit problems formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, content may be served to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a consistent increase in prediction performance obtained by exploiting the network structure. 1
4 0.07969705 251 nips-2013-Predicting Parameters in Deep Learning
Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas
Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1
5 0.076080784 75 nips-2013-Convex Two-Layer Modeling
Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1
6 0.070411704 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors
7 0.068422876 148 nips-2013-Latent Maximum Margin Clustering
8 0.066770986 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
9 0.065168567 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
10 0.056956705 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement
11 0.049231265 205 nips-2013-Multisensory Encoding, Decoding, and Identification
12 0.048120651 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
13 0.046147861 331 nips-2013-Top-Down Regularization of Deep Belief Networks
14 0.045800418 294 nips-2013-Similarity Component Analysis
15 0.044962067 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
16 0.043422516 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
17 0.042743608 201 nips-2013-Multi-Task Bayesian Optimization
18 0.042008642 5 nips-2013-A Deep Architecture for Matching Short Texts
19 0.04161622 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition
20 0.041605782 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
topicId topicWeight
[(0, 0.114), (1, 0.047), (2, -0.054), (3, -0.032), (4, 0.046), (5, -0.065), (6, -0.012), (7, 0.005), (8, 0.012), (9, -0.043), (10, -0.003), (11, -0.017), (12, 0.014), (13, -0.022), (14, -0.027), (15, 0.028), (16, 0.021), (17, 0.049), (18, -0.015), (19, -0.039), (20, 0.026), (21, -0.067), (22, 0.017), (23, 0.004), (24, 0.033), (25, -0.071), (26, 0.014), (27, -0.017), (28, 0.043), (29, -0.041), (30, -0.042), (31, 0.009), (32, 0.068), (33, 0.0), (34, -0.058), (35, 0.004), (36, 0.025), (37, -0.061), (38, 0.015), (39, -0.007), (40, 0.04), (41, 0.007), (42, 0.074), (43, -0.109), (44, -0.017), (45, 0.027), (46, 0.015), (47, -0.002), (48, -0.097), (49, 0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.91121608 85 nips-2013-Deep content-based music recommendation
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
2 0.5650993 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
Author: Yacine Jernite, Yonatan Halpern, David Sontag
Abstract: We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable. For each latent variable, the existence of a singly-coupled quartet allows us to uniquely identify and learn all parameters involving that latent variable. We give a proof of the polynomial sample complexity of our learning algorithm, and experimentally compare it to variational EM. 1
3 0.56487972 75 nips-2013-Convex Two-Layer Modeling
Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1
4 0.55172253 251 nips-2013-Predicting Parameters in Deep Learning
Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas
Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1
5 0.54017371 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
6 0.53886485 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
7 0.48206425 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks
8 0.48103338 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition
9 0.47078386 294 nips-2013-Similarity Component Analysis
10 0.46971935 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation
11 0.46712905 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
12 0.46083057 7 nips-2013-A Gang of Bandits
13 0.45285669 84 nips-2013-Deep Neural Networks for Object Detection
14 0.45171192 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
15 0.44257915 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
16 0.44060677 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
17 0.43733293 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
18 0.43481547 5 nips-2013-A Deep Architecture for Matching Short Texts
19 0.42914996 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising
20 0.42373848 65 nips-2013-Compressive Feature Learning
topicId topicWeight
[(2, 0.015), (16, 0.022), (27, 0.454), (33, 0.084), (34, 0.074), (41, 0.015), (49, 0.026), (56, 0.058), (65, 0.012), (70, 0.03), (85, 0.063), (89, 0.027), (93, 0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.77985579 85 nips-2013-Deep content-based music recommendation
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
2 0.50330478 69 nips-2013-Context-sensitive active sensing in humans
Author: Sheeraz Ahmad, He Huang, Angela J. Yu
Abstract: Humans and animals readily utilize active sensing, or the use of self-motion, to focus sensory and cognitive resources on the behaviorally most relevant stimuli and events in the environment. Understanding the computational basis of natural active sensing is important both for advancing brain sciences and for developing more powerful artificial systems. Recently, we proposed a goal-directed, context-sensitive, Bayesian control strategy for active sensing, C-DAC (ContextDependent Active Controller) (Ahmad & Yu, 2013). In contrast to previously proposed algorithms for human active vision, which tend to optimize abstract statistical objectives and therefore cannot adapt to changing behavioral context or task goals, C-DAC directly minimizes behavioral costs and thus, automatically adapts itself to different task conditions. However, C-DAC is limited as a model of human active sensing, given its computational/representational requirements, especially for more complex, real-world situations. Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a significant reduction in complexity by looking only one step ahead. We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior. We find that C-DAC and its myopic variant both achieve better fit to human data than Infomax (Butko & Movellan, 2010), which maximizes expected cumulative future information gain. In summary, this work provides novel experimental results that differentiate theoretical models for human active sensing, as well as a novel active sensing algorithm that retains the context-sensitivity of the optimal controller while achieving significant computational savings. 1
3 0.44545263 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
Author: Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang
Abstract: Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the exact one. These approximate LP solutions can be computed efficiently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation of the LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude faster than Cplex (a commercial LP solver) while producing solutions of similar quality. 1
4 0.31427205 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
Author: Stefan Mathe, Cristian Sminchisescu
Abstract: Human eye movements provide a rich source of information into the human visual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difficult to develop reliable eye movement prediction systems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected under the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million fixations recorded in 9157 images) and different task controls. Second, we propose Markov models to automatically discover areas of interest (AOI) and introduce novel sequential consistency metrics based on them. Our methods can automatically determine the number, the spatial support and the transitions between AOIs, in addition to their locations. Based on such encodings, we quantitatively show that given unconstrained read-world stimuli, task instructions have significant influence on the human visual search patterns and are stable across subjects. Finally, we leverage powerful machine learning techniques and computer vision features in order to learn task-sensitive reward functions from eye movement data within models that allow to effectively predict the human visual search patterns based on inverse optimal control. The methodology achieves state of the art scanpath modeling results. 1
5 0.310709 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
6 0.30845916 168 nips-2013-Learning to Pass Expectation Propagation Messages
7 0.30808485 196 nips-2013-Modeling Overlapping Communities with Node Popularities
8 0.30758488 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
9 0.30426076 99 nips-2013-Dropout Training as Adaptive Regularization
10 0.30415535 5 nips-2013-A Deep Architecture for Matching Short Texts
11 0.30410087 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
12 0.30292779 173 nips-2013-Least Informative Dimensions
13 0.30283508 350 nips-2013-Wavelets on Graphs via Deep Learning
14 0.3025797 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
15 0.30255327 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
16 0.30162418 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
17 0.30160666 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
18 0.30152917 232 nips-2013-Online PCA for Contaminated Data
19 0.30137488 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
20 0.3013171 251 nips-2013-Predicting Parameters in Deep Learning