nips nips2013 nips2013-148 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract We present a maximum margin framework that clusters data using latent variables. [sent-3, score-0.508]
2 Using latent representations enables our framework to model unobserved information embedded in the data. [sent-4, score-0.28]
3 We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. [sent-6, score-2.006]
4 Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. [sent-7, score-0.62]
5 Popular clustering approaches include the k-means algorithm [7], mixture models [22], normalized cuts [27], and spectral clustering [18]. [sent-10, score-0.374]
6 Recent progress has been made using maximum margin clustering (MMC) [32], which extends the supervised large margin theory (e. [sent-11, score-0.586]
7 MMC performs clustering by simultaneously optimizing cluster-specific models and instance-specific labeling assignments, and often generates better performance than conventional methods [33, 29, 37, 38, 16, 6]. [sent-14, score-0.307]
8 For example, DPMs are learned in a latent SVM framework [5] for object detection; similar models have been shown to improve human action recognition [31]. [sent-25, score-0.332]
9 Motivated by their success in supervised learning, we believe latent variable models can also help in unsupervised clustering – data instances with similar latent representations should be grouped together in one cluster. [sent-27, score-0.833]
10 As the latent variables are unobserved in the original data, we need a learning framework to handle this latent knowledge. [sent-28, score-0.559]
11 To implement this idea, we develop a novel clustering algorithm based on MMC that incorporates latent variables – we call this latent maximum margin clustering (LMMC). [sent-29, score-1.094]
12 Each iteration involves three steps: inferring latent variables for each sample point, optimizing cluster assignments, and updating cluster model parameters. [sent-31, score-0.543]
13 To evaluate the efficacy of this clustering algorithm, we instantiate LMMC for tag-based video clustering, where each video is modeled with latent variables controlling the presence or absence of a set of descriptive tags. [sent-32, score-0.931]
14 We describe tag-based video clustering in Section 4, followed by experimental results reported in Section 5. [sent-37, score-0.405]
15 [5] formulate latent SVM by extending binary linear SVM with latent variables. [sent-46, score-0.49]
16 All of this work demonstrates the power of max-margin latent variable models for supervised learning; our framework conducts unsupervised clustering while modeling data with latent variables. [sent-53, score-0.79]
17 Our framework deals with multi-cluster clustering, and we model data instances with latent variables to exploit rich representations. [sent-69, score-0.322]
18 Hoai and Zisserman [8] form a joint framework of maximum margin classification and clustering to improve sub-categorization. [sent-77, score-0.383]
19 Tagging videos with relevant concepts or attributes is common in video analysis. [sent-79, score-0.394]
20 [20] predict multiple correlative tags in a structural SVM framework. [sent-81, score-0.403]
21 Yang and Toderici [34] exploit latent sub-categories of tags in large-scale videos. [sent-82, score-0.619]
22 Izadinia and Shah [10] model low-level event tags (e. [sent-90, score-0.374]
23 people dancing, animal eating) as latent variables to recognize complex video events (e. [sent-92, score-0.56]
24 Instead of supervised recognition of tags or video categories, we focus on unsupervised tag-based video clustering. [sent-95, score-0.897]
25 In fact, recently research collects various sources of tags for video clustering. [sent-96, score-0.592]
26 Our paper uses latent tag models, and our LMMC framework is general enough to handle various types of tags. [sent-101, score-0.508]
27 2 3 Latent Maximum Margin Clustering As stated above, modeling data with latent variables can be beneficial in a variety of supervised applications. [sent-102, score-0.321]
28 For unsupervised clustering, we believe it also helps to group data instances based on latent representations. [sent-103, score-0.333]
29 When fitting an instance to a cluster, we find the optimal values for latent variables and use the corresponding latent representation of the instance. [sent-106, score-0.524]
30 Furthermore, as the latent variables are unobserved in the original data, we need a learning framework to exploit this latent knowledge. [sent-110, score-0.559]
31 1 Maximum Margin Clustering MMC [32, 37, 38] extends the maximum margin principle popularized by supervised SVMs to unsupervised clustering, where the input instances are unlabeled. [sent-118, score-0.326]
32 Suppose there are N instances {xi }N to be clustered into K clusters, MMC is formulated as follows [33, 38]: i=1 min W,Y,ξ≥0 1 2 K ||wt ||2 + t=1 C K N K ξir (1) i=1 r=1 K yit wt xi − wr xi ≥ 1 − yir − ξir , ∀i, r s. [sent-120, score-0.443]
33 t=1 K yit = 1, ∀i yit ∈ {0, 1}, ∀i, t t=1 where W = {wt }K are the linear model parameters for each cluster, ξ = {ξir } (i ∈ {1, . [sent-122, score-0.422]
34 , K}), where yit = 1 indicates that the instance xi is clustered into the t-th cluster, and yit = 0 otherwise. [sent-135, score-0.464]
35 1 enforces a large margin between clusters by constraining that the score of xi to the assigned cluster is sufficiently larger than the score of xi to any other clusters. [sent-141, score-0.496]
36 Note that MMC is an unsupervised clustering method, which jointly estimates the model parameters W and finds the best labeling Y. [sent-142, score-0.302]
37 Note that we explicitly enforce cluster balance using a hard constraint on the cluster sizes. [sent-148, score-0.291]
38 Formally, we denote h as the latent variable of an instance x associated to a cluster parameterized by w. [sent-156, score-0.403]
39 We call the resultant framework latent maximum margin clustering (LMMC). [sent-167, score-0.672]
40 LMMC finds clusters via the following optimization: 1 2 min W,Y,ξ≥0 K ||wt ||2 + t=1 C K N K ξir (4) i=1 r=1 K yit fwt (xi ) − fwr (xi ) ≥ 1 − yir − ξir , ∀i, r s. [sent-168, score-0.453]
41 t=1 K yit ∈ {0, 1}, ∀i, t N yit = 1, ∀i L≤ t=1 yit ≤ U, ∀t i=1 We adopt the notation Y from the MMC formulation to denote the labeling assignment. [sent-170, score-0.703]
42 4 enforces the large margin criterion where the score of fitting xi to the assigned cluster is marginally larger than the score of fitting xi to any other clusters. [sent-172, score-0.429]
43 Note that LMMC jointly optimizes the model parameters W and finds the best labeling assignment Y, while inferring the optimal latent variables. [sent-175, score-0.344]
44 4 is non-convex due to the optimization over the labeling assignment variables Y and the latent variables H = {hit } (i ∈ {1, . [sent-178, score-0.412]
45 4 equivalently as: K C 1 ||wt ||2 + R(W) (5) min W 2 K t=1 where R(W) is the risk function defined by: N R(W) = K max 0, 1 − yir + fwr (xi ) − min Y K i=1 r=1 yit fwt (xi ) K s. [sent-186, score-0.416]
46 (6) t=1 yit ∈ {0, 1}, ∀i, t N yit = 1, ∀i L≤ t=1 yit ≤ U, ∀t i=1 Note that Eq. [sent-188, score-0.633]
47 6 minimizes over the labeling assignment variables Y while inferring the latent variables H. [sent-190, score-0.412]
48 We first infer the latent variables H and then optimize the labeling assignment Y. [sent-195, score-0.378]
49 3, the latent variable hit of an instance xi associated to cluster t can be obtained via: argmaxhit wt Φ(xi , hit ). [sent-197, score-0.601]
50 For our latent tag model, we present an efficient inference method in Section 4. [sent-199, score-0.508]
51 After obtaining the latent variables H, we optimize the labeling assignment Y from Eq. [sent-200, score-0.378]
52 Intuitively, this is to minimize the total risk of labeling all instances yet maintaining the cluster balance constraints. [sent-202, score-0.302]
53 They are added for a better understanding of the video content and the latent tag representations. [sent-212, score-0.726]
54 Our goal is to jointly learn video clusters and tags in a single framework. [sent-229, score-0.659]
55 We treat tags of a video as latent variables and capture the correlations between clusters and tags. [sent-230, score-0.938]
56 Intuitively, videos with a similar set of tags should be assigned to the same cluster. [sent-231, score-0.55]
57 We assume a separate training dataset consisting of videos with ground-truth tag labels exists, from which we train tag detectors independently. [sent-232, score-0.702]
58 During clustering, we are given a set of new videos without the ground-truth tag labels, and our goal is to assign cluster labels to these videos. [sent-233, score-0.571]
59 We employ a latent tag model to represent videos. [sent-234, score-0.508]
60 We are particularly interested in tags which describe different aspects of videos. [sent-235, score-0.374]
61 For example, a video from the cluster “feeding animal” (see Figure 1) may be annotated with “dog”, “food”, “man”, etc. [sent-236, score-0.35]
62 For a video being assigned to a particular cluster, we know it could have a number of tags from T describing its visual content related to the cluster. [sent-238, score-0.592]
63 However, we do not know which tags are present in the video. [sent-239, score-0.374]
64 To address this problem, we associate latent variables to the video to denote the presence and absence of tags. [sent-240, score-0.497]
65 Formally, given a cluster parameterized by w, we associate a latent variable h to a video x, where h = {ht }t∈T and ht ∈ {0, 1} is a binary variable denoting the presence/absence of each tag t. [sent-241, score-0.972]
66 ht = 1 means x has the tag t, while ht = 0 means x does not have the tag t. [sent-242, score-0.65]
67 Figure 1 shows the latent tag representations of two sample videos. [sent-243, score-0.508]
68 3: fw (x) = maxh w Φ(x, h), where the potential function w Φ(x, h) is defined as follows: w Φ(x, h) = 1 |T | ht · ωt φt (x) (8) t∈T This potential function measures the compatibility between the video x and tag t associated with the current cluster. [sent-245, score-0.583]
69 Note that w = {ωt }t∈T are the cluster-specific model parameters, and Φ = {ht · φt (x)}t∈T is the feature vector depending on the video x and its tags h. [sent-246, score-0.592]
70 Here φt (x) ∈ Rd is the feature vector extracted from the video x, and the parameter ωt is a template for tag t. [sent-247, score-0.54]
71 In our current implementation, instead of keeping φt (x) as a high dimensional vector of video features, we 5 simply represent it as a scalar score of detecting tag t on x by a pre-trained binary tag detector. [sent-248, score-0.77]
72 8, the term corresponding to tag t is ht · ωt φt (x). [sent-253, score-0.325]
73 TRECVID MED 11 dataset [19]: This dataset contains web videos collected by the Linguistic Data Consortium from various web video hosting sites. [sent-258, score-0.394]
74 By removing 13 short videos that contain no visual content, we finally have a total of 2,379 videos for clustering. [sent-265, score-0.352]
75 We use tags that were generated in Vahdat and Mori [28] for the TRECVID MED 11 dataset. [sent-266, score-0.374]
76 In [28], text analysis tools are employed to extract binary tags based on frequent nouns in the judgment files. [sent-270, score-0.403]
77 Examples of 74 frequent tags used in this work are: “music”, “person”, “food”, “kitchen”, “bird”, “bike”, “car”, “street”, “boat”, “water”, etc. [sent-271, score-0.374]
78 The complete list of tags are available on our website. [sent-272, score-0.374]
79 To train tag detectors, we use the DEV-T and DEV-O videos that belong to the 15 event categories. [sent-273, score-0.439]
80 To remove biases between tag detectors, we normalize the detection scores by z-score normalization. [sent-278, score-0.34]
81 Note that we make no use of the ground-truth tags on the Event-Kit videos that are to be clustered. [sent-279, score-0.55]
82 We use Action Bank [24] to generate tags for this dataset. [sent-282, score-0.374]
83 Specifically, on each video and for each template action, we use the set of Action Bank action detection scores collected at different spatiotemporal scales and correlation volumes. [sent-286, score-0.415]
84 We perform max-pooling on the scores to obtain the corresponding tag detection score. [sent-287, score-0.34]
85 The tags and tag detection scores are generated from Action Bank, in the same way as KTH Actions. [sent-291, score-0.714]
86 Baselines: To evaluate the efficacy of LMMC, we implement three conventional clustering methods for comparison, including the k-means algorithm (KM), normalized cut (NC) [27], and spectral clustering (SC) [18]. [sent-292, score-0.424]
87 Therefore, for a fair comparison with LMMC, they are directly performed on the data where each video is represented by a vector of tag detection scores. [sent-358, score-0.523]
88 In order to show the benefits of incorporating latent variables, we further develop a baseline called MMC by replacing the latent variable model fw (x) in Eq. [sent-362, score-0.556]
89 This is equivalent to running an ordinary maximum margin clustering algorithm on the video data represented by tag detection scores. [sent-364, score-0.906]
90 Performance measures: Following the convention of maximum margin clustering [32, 33, 29, 37, 38, 16, 6], we set the number of clusters to be the ground-truth number of classes for all the compared methods. [sent-374, score-0.45]
91 This demonstrates that learning the latent presence and absence of tags can exploit rich representations of videos, and boost clustering performance. [sent-382, score-0.806]
92 This provides evidence for the effectiveness of maximum margin clustering as well as the proposed alternating descent algorithm for optimizing the non-convex objective. [sent-386, score-0.411]
93 6 Conclusion We have presented a latent maximum margin framework for unsupervised clustering. [sent-389, score-0.486]
94 By representing instances with latent variables, our method features the ability to exploit the unobserved information embedded in data. [sent-390, score-0.323]
95 We label each cluster by the dominating video class, e. [sent-392, score-0.35]
96 A “” sign indicates that the video label is consistent with the cluster label; otherwise, a “” sign is used. [sent-395, score-0.35]
97 Below each video, we show the top eight inferred tags sorted by the potential calculated from Eq. [sent-397, score-0.374]
98 We instantiate our framework with tag-based video clustering, where each video is represented by a latent tag model with latent presence and absence of video tags. [sent-400, score-1.436]
99 video clustering with latent key segments, image clustering with latent region-of-interest, etc. [sent-404, score-1.082]
100 Discriminative tag learning on YouTube videos with latent sub-tags. [sent-625, score-0.684]
wordName wordTfidf (topN-words)
[('tags', 0.374), ('mmc', 0.346), ('lmmc', 0.334), ('tag', 0.263), ('latent', 0.245), ('video', 0.218), ('yit', 0.211), ('clustering', 0.187), ('videos', 0.176), ('trecvid', 0.174), ('margin', 0.161), ('cluster', 0.132), ('med', 0.106), ('man', 0.09), ('wedding', 0.09), ('fwt', 0.087), ('ucf', 0.087), ('ceremony', 0.077), ('pur', 0.073), ('labeling', 0.07), ('clusters', 0.067), ('svm', 0.065), ('beach', 0.064), ('birthday', 0.064), ('actions', 0.063), ('animal', 0.063), ('ht', 0.062), ('wt', 0.061), ('action', 0.061), ('nmi', 0.059), ('template', 0.059), ('sc', 0.057), ('km', 0.056), ('feeding', 0.053), ('dpms', 0.051), ('vahdat', 0.051), ('ir', 0.051), ('sports', 0.051), ('conventional', 0.05), ('party', 0.049), ('board', 0.049), ('kth', 0.048), ('boat', 0.047), ('fm', 0.047), ('unsupervised', 0.045), ('resultant', 0.044), ('bike', 0.044), ('fwr', 0.044), ('woodworking', 0.044), ('yir', 0.044), ('instances', 0.043), ('bank', 0.043), ('sh', 0.043), ('nc', 0.043), ('xi', 0.042), ('supervised', 0.042), ('detection', 0.042), ('fw', 0.04), ('food', 0.039), ('indoors', 0.038), ('swinging', 0.038), ('lady', 0.038), ('ilp', 0.038), ('parade', 0.038), ('boy', 0.035), ('unobserved', 0.035), ('scores', 0.035), ('maximum', 0.035), ('variables', 0.034), ('cvpr', 0.034), ('baby', 0.033), ('hit', 0.033), ('street', 0.032), ('cacy', 0.031), ('risk', 0.03), ('assignment', 0.029), ('instantiate', 0.029), ('kitchen', 0.029), ('wood', 0.029), ('argmaxhit', 0.029), ('clapping', 0.029), ('correlative', 0.029), ('grooming', 0.029), ('hoai', 0.029), ('izadinia', 0.029), ('judgment', 0.029), ('men', 0.029), ('parkour', 0.029), ('valizadegan', 0.029), ('walking', 0.028), ('alternating', 0.028), ('balance', 0.027), ('dog', 0.027), ('balanced', 0.026), ('human', 0.026), ('score', 0.026), ('variable', 0.026), ('water', 0.026), ('landing', 0.026), ('dancing', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
2 0.1544304 75 nips-2013-Convex Two-Layer Modeling
Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1
3 0.15233055 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
Author: Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori
Abstract: We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths. 1
4 0.14133644 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition
Author: Fang Zhao, Yongzhen Huang, Liang Wang, Tieniu Tan
Abstract: Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a “relevance topic model” for jointly learning meaningful mid-level representations upon bagof-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos. 1
5 0.13108855 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin
Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1
6 0.10293508 7 nips-2013-A Gang of Bandits
7 0.085221693 355 nips-2013-Which Space Partitioning Tree to Use for Search?
8 0.083183348 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
9 0.077825896 158 nips-2013-Learning Multiple Models via Regularized Weighting
10 0.07770016 149 nips-2013-Latent Structured Active Learning
11 0.07754641 119 nips-2013-Fast Template Evaluation with Vector Quantization
12 0.075308599 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
13 0.075189158 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation
14 0.073981948 63 nips-2013-Cluster Trees on Manifolds
15 0.071860395 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting
16 0.069955625 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
17 0.068422876 85 nips-2013-Deep content-based music recommendation
18 0.06533663 202 nips-2013-Multiclass Total Variation Clustering
19 0.063612401 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
20 0.059487879 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
topicId topicWeight
[(0, 0.157), (1, 0.046), (2, -0.053), (3, -0.034), (4, 0.082), (5, -0.036), (6, 0.04), (7, 0.012), (8, 0.055), (9, 0.043), (10, -0.111), (11, 0.032), (12, 0.074), (13, -0.038), (14, -0.004), (15, 0.01), (16, 0.012), (17, -0.015), (18, 0.107), (19, -0.096), (20, 0.019), (21, 0.108), (22, 0.1), (23, 0.001), (24, 0.051), (25, -0.119), (26, -0.03), (27, -0.116), (28, -0.048), (29, -0.079), (30, -0.004), (31, -0.103), (32, 0.261), (33, -0.026), (34, 0.039), (35, -0.014), (36, -0.064), (37, -0.098), (38, -0.047), (39, 0.098), (40, 0.076), (41, 0.078), (42, 0.116), (43, -0.086), (44, -0.035), (45, 0.005), (46, 0.033), (47, 0.008), (48, -0.003), (49, 0.132)]
simIndex simValue paperId paperTitle
same-paper 1 0.96669358 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
2 0.71397102 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
Author: Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori
Abstract: We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths. 1
3 0.58409756 294 nips-2013-Similarity Component Analysis
Author: Soravit Changpinyo, Kuan Liu, Fei Sha
Abstract: Measuring similarity is crucial to many learning tasks. To this end, metric learning has been the dominant paradigm. However, similarity is a richer and broader notion than what metrics entail. For example, similarity can arise from the process of aggregating the decisions of multiple latent components, where each latent component compares data in its own way by focusing on a different subset of features. In this paper, we propose Similarity Component Analysis (SCA), a probabilistic graphical model that discovers those latent components from data. In SCA, a latent component generates a local similarity value, computed with its own metric, independently of other components. The final similarity measure is then obtained by combining the local similarity values with a (noisy-)OR gate. We derive an EM-based algorithm for fitting the model parameters with similarity-annotated data from pairwise comparisons. We validate the SCA model on synthetic datasets where SCA discovers the ground-truth about the latent components. We also apply SCA to a multiway classification task and a link prediction task. For both tasks, SCA attains significantly better prediction accuracies than competing methods. Moreover, we show how SCA can be instrumental in exploratory analysis of data, where we gain insights about the data by examining patterns hidden in its latent components’ local similarity values. 1
4 0.57871968 85 nips-2013-Deep content-based music recommendation
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
5 0.57457048 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
Author: George H. Chen, Stanislav Nikolov, Devavrat Shah
Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1
6 0.57382143 75 nips-2013-Convex Two-Layer Modeling
7 0.55613714 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition
8 0.54665095 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
9 0.50299269 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
10 0.44236055 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation
11 0.42792004 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
12 0.41338745 63 nips-2013-Cluster Trees on Manifolds
13 0.40812898 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
14 0.39352548 158 nips-2013-Learning Multiple Models via Regularized Weighting
15 0.392396 202 nips-2013-Multiclass Total Variation Clustering
16 0.39015967 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks
17 0.38827369 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
18 0.36889893 344 nips-2013-Using multiple samples to learn mixture models
19 0.35914099 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
20 0.35792315 277 nips-2013-Restricting exchangeable nonparametric distributions
topicId topicWeight
[(16, 0.043), (18, 0.011), (19, 0.011), (33, 0.102), (34, 0.123), (41, 0.028), (49, 0.089), (56, 0.065), (70, 0.04), (81, 0.275), (85, 0.036), (89, 0.027), (93, 0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.76705021 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
2 0.62809169 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
3 0.5868367 121 nips-2013-Firing rate predictions in optimal balanced networks
Author: David G. Barrett, Sophie Denève, Christian K. Machens
Abstract: How are firing rates in a spiking network related to neural input, connectivity and network function? This is an important problem because firing rates are a key measure of network activity, in both the study of neural computation and neural network dynamics. However, it is a difficult problem, because the spiking mechanism of individual neurons is highly non-linear, and these individual neurons interact strongly through connectivity. We develop a new technique for calculating firing rates in optimal balanced networks. These are particularly interesting networks because they provide an optimal spike-based signal representation while producing cortex-like spiking activity through a dynamic balance of excitation and inhibition. We can calculate firing rates by treating balanced network dynamics as an algorithm for optimising signal representation. We identify this algorithm and then calculate firing rates by finding the solution to the algorithm. Our firing rate calculation relates network firing rates directly to network input, connectivity and function. This allows us to explain the function and underlying mechanism of tuning curves in a variety of systems. 1
4 0.58087862 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin
Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1
5 0.58058351 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
Author: Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers
Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1
6 0.57046127 64 nips-2013-Compete to Compute
7 0.56925732 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
8 0.567837 70 nips-2013-Contrastive Learning Using Spectral Methods
9 0.56775934 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
10 0.56750381 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines
11 0.56476349 5 nips-2013-A Deep Architecture for Matching Short Texts
12 0.56413686 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization
13 0.56365865 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models
14 0.56248862 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
15 0.56232756 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
16 0.56200546 86 nips-2013-Demixing odors - fast inference in olfaction
17 0.56076306 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
18 0.56002361 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
19 0.55821168 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
20 0.55729342 173 nips-2013-Least Informative Dimensions