nips nips2010 nips2010-89 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yangqing Jia, Mathieu Salzmann, Trevor Darrell
Abstract: Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. [sent-4, score-1.368]
2 In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. [sent-6, score-0.287]
3 In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. [sent-7, score-0.391]
4 Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. [sent-8, score-1.991]
5 Given these multiple views, an important problem therefore is that of learning a latent representation of the data that best leverages the information contained in each input view. [sent-12, score-0.358]
6 Multiple kernel learning [2, 24] methods have proven successful under the assumption that the views are independent. [sent-14, score-0.447]
7 In contrast, techniques that learn a latent space shared across the views (Fig. [sent-15, score-1.081]
8 1(a)), such as Canonical Correlation Analysis (CCA) [12, 3], the shared Kernel Information Embedding model (sKIE) [23], and the shared Gaussian Process Latent Variable Model (shared GPLVM) [21, 6, 15], have shown particularly effective to model the dependencies between the modalities. [sent-16, score-0.605]
9 However, they do not account for the independent parts of the views, and therefore either totally fail to represent them, or mix them with the information shared by all views. [sent-17, score-0.348]
10 To this end, these methods factorize the latent space into a shared part common to all views and a private part for each modality (Fig. [sent-19, score-1.44]
11 In particular, [20] proposed to encourage the shared-private factorization to be nonredundant while simultaneously discovering the dimensionality of the latent space. [sent-22, score-0.485]
12 FOLS also lacks an efficient inference method, and extension from two views to multiple views is not straightforward since the number of shared/latent spaces that need to be explicitly modeled grows exponentially with the number of views. [sent-25, score-0.811]
13 In this paper, we propose a novel approach to finding a latent space in which the information is correctly factorized into shared and private parts, while avoiding the computational burden of previous techniques [14, 20]. [sent-26, score-1.171]
14 Due to structured sparsity, rows Πs of α are shared across the views, whereas rows Π1 and Π2 are private to view 1 and 2, respectively. [sent-28, score-1.004]
15 In particular, we represent each view as a linear combination of view-dependent dictionary entries. [sent-30, score-0.298]
16 While the dictionaries are specific to each view, the weights of these dictionaries act as latent variables and are the same for all the views. [sent-31, score-0.598]
17 1(c), the data is embedded in a latent space that generates all the views. [sent-33, score-0.399]
18 By exploiting the idea of structured sparsity [26, 18, 4, 17, 9], we encourage each view to only use a subset of the latent variables, and at the same time encourage the whole latent space to be low-dimensional. [sent-34, score-1.24]
19 1(d), the latent space is factorized into shared parts which represent information common to multiple views, and private parts which model the remaining information of the individual views. [sent-36, score-1.261]
20 We demonstrate the effectiveness of our approach on the problem of human pose estimation where the existence of shared and private spaces has been shown [7]. [sent-38, score-0.947]
21 We show that our approach correctly factorizes the latent space and outperforms state-of-the-art techniques. [sent-39, score-0.435]
22 2 Learning a Latent Space with Structured Sparsity In this section, we first formulate the problem of learning a latent space for multi-view modeling. [sent-40, score-0.399]
23 We then briefly review the concepts of sparse coding and structured sparsity, and finally introduce our approach within this framework. [sent-41, score-0.292]
24 We aim to find an embedding α ∈ Nd ×N of the data into an Nd -dimensional latent space and a set of dictionaries D = {D(1) , D(2) , · · · , D(V ) }, with D(v) ∈ Pv ×Nd the dictionary entries for view v, such that X(v) is generated by D(v) α, as depicted in Fig. [sent-44, score-0.972]
25 More specifically, we seek the latent embedding α and the dictionaries that best reconstruct the data in the least square sense by solving the optimization problem V X(v) − D(v) α min D,α 2 Fro . [sent-46, score-0.567]
26 (1) v=1 Furthermore, as explained in Section 1, we aim to find a latent space that naturally separates the information shared among several views from the information private to each view. [sent-47, score-1.44]
27 , image features) as a linear combination of dictionary entries, while encouraging each observation vector to only employ a subset of all the available dictionary entries. [sent-53, score-0.524]
28 Sparse coding aims to find a set of dictionary entries D ∈ P ×Nd and the corresponding linear combination weights α ∈ Nd ×N by solving the optimization problem 1 ||X − Dα||2 + λφ(α) Fro N s. [sent-55, score-0.389]
29 This problem has been addressed by structured sparse coding techniques [26, 4, 9], which encode the structure of the problem in the regularizer. [sent-62, score-0.292]
30 Typically, these methods rely on the notion of groups among the training examples to encourage members of the same group to rely on the same dictionary entries. [sent-63, score-0.377]
31 i=1 k∈Ωg (5) In general, structured sparsity can lead to more meaningful latent embeddings than sparse coding. [sent-68, score-0.673]
32 For example, [4] showed that the dictionary learned by grouping local image descriptors into images or classes achieved better accuracy than sparse coding for small dictionary sizes. [sent-69, score-0.691]
33 Here, we propose an approach to multi-view learning inspired by structured sparse coding techniques. [sent-72, score-0.32]
34 To correctly account for the dependencies and independencies of the views, we cast the problem as that of finding a factorization of the latent space into subspaces that are shared across several views and subspaces that are private to the individual views. [sent-73, score-1.715]
35 In essence, this can be seen as having each view exploiting only a subset of the dimensions of the global latent space, as depicted by Fig. [sent-74, score-0.579]
36 Note that this definition is in fact more general than the usual definition of shared-private factorizations [7, 14, 20], since it allows latent dimensions to be shared across any subset of the views rather than across all views only. [sent-76, score-1.653]
37 More formally, to find a shared-private factorization of the latent embedding α that represents the multiple input modalities, we adopt the idea of structured sparsity and aim to find a set of dictionaries D = {D(1) , D(2) , · · · , D(V ) }, each of which uses only a subspace of the latent space. [sent-77, score-1.216]
38 Note that, here, we enforce structured sparsity on the dictionary entries instead of on the weights α. [sent-83, score-0.509]
39 Furthermore, note that this sparsity encourages the columns of the individual D(v) to be zeroed-out instead of the rows in the usual formulation. [sent-84, score-0.292]
40 The intuition behind this is that we expect each view X(v) to only depend on a subset of the latent dimensions. [sent-85, score-0.47]
41 Since X(v) is generated by D(v) α, having zero-valued columns of D(v) removes the influence of the corresponding latent dimensions on the reconstruction. [sent-86, score-0.456]
42 6 encourages each view to only use a limited number of latent dimensions, it doesn’t guarantee that parts of the latent space will be shared across the views. [sent-88, score-1.256]
43 With a sufficiently large number Nd of dictionary entries, the same information can be represented in several parts of the dictionary. [sent-89, score-0.276]
44 A simple approach would be to manually choose the dimension of the latent space, but this introduces an additional hyperparameter to tune. [sent-91, score-0.384]
45 In spirit, the motivation is similar to [8, 20] that use a relaxation of rank constraints to discover the dimensionality of the latent space. [sent-93, score-0.358]
46 Here, we further exploit structured sparsity and re-write problem (6) as min D,α 1 N V V X(v) − D(v) α 2 Fro ψ((D(v) )T ) + γψ(α) , +λ v=1 (7) v=1 where we replaced the constraints on α by an L1,∞ norm regularizer that encourages rows of α to be zeroed-out. [sent-94, score-0.45]
47 This lets us automatically discover the dimensonality of the latent space α. [sent-95, score-0.399]
48 Furthermore, if there is shared information between several views, this regularizer will favor representing it in a single latent dimension, instead of having redundant parts of the latent space. [sent-96, score-1.122]
49 Furthermore, to speed up the process, after each iteration, we remove the latent dimensions whose norm is less than a pre-defined threshold. [sent-99, score-0.461]
50 4 Inference (1) (V ) At inference, given a new observation {x∗ , · · · , x∗ }, the corresponding latent embedding α∗ can be obtained by solving the convex problem V (v) x∗ − D(v) α∗ min α∗ 2 2 + γ α∗ 1 , (8) v=1 where the regularizer lets us deal with noise in the observations. [sent-102, score-0.536]
51 Another advantage of our model is that it easily allows us to address the case where only a subset of the views are observed at test time. [sent-103, score-0.396]
52 This scenario arises, for example, in human pose estimation, where view X(1) corresponds to image features and view X(2) contains the 3D poses. [sent-104, score-0.534]
53 At inference, (2) (1) the goal is to estimate the pose x∗ given new image features x∗ . [sent-105, score-0.301]
54 To this end, we seek to estimate the latent variables α∗ , as well as the unknown views from the available views. [sent-106, score-0.727]
55 The remaining unobserved views x∗ (v) are then estimated as x∗ = D(v) α∗ . [sent-108, score-0.369]
56 3 , v ∈ Va / Related Work While our method is closely related to the shared-private factorization algorithms which we discussed in Section 1, it was inspired by the existing sparse coding literature and therefore is also 4 Method ψ(D) φ(α) PCA none none SC (e. [sent-109, score-0.375]
57 D ∈ CD , α ∈ Cα , min (10) where CD and Cα are the domains of the dictionary D and of latent embedding α, respectively. [sent-120, score-0.633]
58 Several existing algorithms, such as PCA, sparse coding (SC), group SC, structured sparse PCA (SSPCA) and group Lasso, can be considered as special cases of this general framework. [sent-122, score-0.448]
59 Algorithms relying on structured sparsity exploit different types of matrix norm1 to impose sparsity and different ways of grouping the rows or columns of D and α using algorithm-specific knowledge. [sent-124, score-0.476]
60 As a result, while group sparse coding finds dictionary entries that encode class-related information, our method finds latent spaces factorized into subspaces shared among different views and subspaces private to the individual views. [sent-126, score-2.129]
61 Furthermore, while structured sparsity is typically enforced on α, our method employs it on the dictionary. [sent-127, score-0.251]
62 Their intuition for doing so was to encourage dictionary entries to represent the variability of parts of the observation space, such as the variability of the eyes in the context of face images. [sent-130, score-0.381]
63 Finally, it is worth noting that imposing structured sparsity regularization on both D and α naturally yields a multi-view, multi-class latent space learning algorithm that can be deemed as a generalization of several algorithms summarized here. [sent-131, score-0.65]
64 4 Experimental Evaluation In this section, we show the results of our approach on learning factorized latent spaces from multiview inputs. [sent-132, score-0.553]
65 This shows our method’s ability to correctly factorize a latent space into shared and private parts. [sent-136, score-1.107]
66 Fully white columns correspond to zero-valued vectors; note that the dictionary for each view uses only the shared dimension and its own private dimension. [sent-153, score-1.006]
67 views generated from one shared signal and one private signal per view depicted by Fig. [sent-154, score-1.147]
68 This yields a 3-dimensional ground-truth latent space, with 1 shared dimension and 2 private dimensions. [sent-157, score-1.029]
69 Note that the fact that this latent space is redundant is dealt with by our regularization on α. [sent-166, score-0.399]
70 We then alternately optimized D and α, and let the algorithm determine the optimal latent dimensionality. [sent-167, score-0.409]
71 2(e,f) depicts the reconstructed latent spaces for both views, as well as the learned dictionaries, which clearly show the shared-private factorization. [sent-169, score-0.457]
72 Note that our approach correctly discovered the original generative signals and discarded the noise, whereas CCA recovered the shared signal, but also the correlated noise and an additional noise. [sent-172, score-0.446]
73 2 Human Pose Estimation We then applied our method to the problem of human pose estimation, in which the task is to recover 3D poses from 2D image features. [sent-175, score-0.3]
74 These two types of observations can be seen as two views of the same problem from which we can learn a latent space. [sent-178, score-0.727]
75 We also compare our results with those obtained with the FOLS-GPLVM [20], which also proposes a shared-private factorization of the latent space. [sent-181, score-0.425]
76 Note that we did not compare against other shared-private factorizations [7, 14], or purely shared 6 Data Jogging Walking Lin-Reg 1. [sent-182, score-0.384]
77 4 6 8 10 12 14 PHOG (a) jogging (b) walking 20 40 15 20 10 15 20 5 RT 10 5 3D Pose 5 10 15 20 20 40 60 10 20 30 40 50 (c) walking with multiple features Figure 3: Dictionaries learned from the HumanEva data. [sent-195, score-0.415]
78 Note that in (c) our model found latent dimensions shared among all views, but also shared between the image features only. [sent-198, score-1.151]
79 To initialize the latent spaces for our model and for the FOLS-GPLVM, we proceeded similarly as for the toy example; We applied PCA on both views separately, as well as on the concatenated views, and retained the components representing 95% of the variance. [sent-200, score-0.901]
80 For the FOLS-GPLVM, we initialized the shared latent space with the coefficients of the joint PCA, and the private spaces with those of the individual PCAs. [sent-202, score-1.117]
81 At inference for human pose estimation, only one of the views (i. [sent-206, score-0.598]
82 4, our model provides a natural way to deal with this case by computing the latent variables from the image features first, and then recovering the 3D coordinates using the learned dictionary. [sent-210, score-0.519]
83 For the FOLS-GPLVM, we followed the same strategy as in [20]; we computed the nearest-neighbor among the training examples in image feature space and took the corresponding shared and private latent variables that we mapped to the pose. [sent-211, score-1.142]
84 As a first case, we used hierarchical features [10] computed on the walking and jogging video sequences of the first subject seen from a single camera. [sent-213, score-0.292]
85 3(a,b), we show the factorization of the latent space obtained by our approach by displaying the learned dictionaries 2 . [sent-218, score-0.612]
86 For the jogging case our algorithm automatically found a low-dimensional latent space of 10 dimensions, with a 4D private space for the image features, a 4D shared space, and a 2D private space for the 3D pose3 . [sent-219, score-1.688]
87 For the walking case, the 2 Note that the latent space per se is a dense, low-dimensional space, and whether a dimension is private or shared among multiple views is determined by the corresponding dictionary entries. [sent-220, score-1.776]
88 3 A latent dimension is considered private if the norm of the corresponding dictionary entry in the other view is smaller than 10% of the average norm of the dictionary entries for that view. [sent-221, score-1.384]
89 private space for the image features was found to be higher-dimensional. [sent-260, score-0.536]
90 Note that, with the RT features that were designed to eliminate the ambiguities in pose estimation, GP regression with an RBF kernel performs slightly better than us. [sent-266, score-0.269]
91 To show the ability of our method to model more than two views, we learned a latent space by simultaneously using RT features, PHOG features and 3D poses. [sent-268, score-0.489]
92 We computed the NN in feature space for both features individually and took the mean of the corresponding shared latent variables. [sent-273, score-0.748]
93 We then obtained the private part by computing the NN in shared space and taking the corresponding private variables. [sent-274, score-1.046]
94 Also, notice in Table 3 that the performance drops when structured sparsity is only imposed on either D’s or α, showing the advantage of our model over simple structured sparsity approaches. [sent-276, score-0.502]
95 Note that our approach allowed us to find latent dimensions shared among all views, as well as shared among the image features only. [sent-279, score-1.178]
96 5 Conclusion In this paper, we have proposed an approach to learning a latent space factorized into dimensions shared across subsets of the views and dimensions private to each individual view. [sent-286, score-1.654]
97 We have demonstrated the effectiveness of our approach on the task of estimating 3D human pose from image features. [sent-288, score-0.3]
98 To this end, we would extend our approach to incorporating an additional group sparsity regularizer on the latent variables to encode class membership. [sent-290, score-0.589]
99 Gaussian process latent variable models for human pose estimation. [sent-328, score-0.587]
100 Learning shared latent structure for image synthesis and robotic imitation. [sent-426, score-0.714]
wordName wordTfidf (topN-words)
[('views', 0.369), ('private', 0.36), ('latent', 0.358), ('shared', 0.285), ('phog', 0.219), ('dictionary', 0.213), ('fols', 0.175), ('pose', 0.166), ('jogging', 0.131), ('sparsity', 0.127), ('structured', 0.124), ('dictionaries', 0.12), ('fro', 0.115), ('coding', 0.104), ('nn', 0.103), ('factorizations', 0.099), ('walking', 0.097), ('factorized', 0.091), ('rt', 0.086), ('view', 0.085), ('gp', 0.078), ('spaces', 0.073), ('image', 0.071), ('factorization', 0.067), ('rmf', 0.066), ('sspca', 0.066), ('sparse', 0.064), ('features', 0.064), ('human', 0.063), ('parts', 0.063), ('cca', 0.063), ('embedding', 0.062), ('dimensions', 0.061), ('rows', 0.061), ('encourage', 0.06), ('pca', 0.059), ('rbf', 0.058), ('toy', 0.058), ('regularizer', 0.058), ('humaneva', 0.058), ('scotland', 0.058), ('none', 0.056), ('sc', 0.054), ('alternately', 0.051), ('squared', 0.048), ('depicted', 0.048), ('subspaces', 0.047), ('group', 0.046), ('va', 0.045), ('entries', 0.045), ('salzmann', 0.044), ('sharedprivate', 0.044), ('concatenated', 0.043), ('recovered', 0.043), ('norm', 0.042), ('independencies', 0.042), ('space', 0.041), ('ek', 0.04), ('mse', 0.039), ('kernel', 0.039), ('proven', 0.039), ('factorizing', 0.038), ('sardinia', 0.038), ('encourages', 0.038), ('columns', 0.037), ('vision', 0.037), ('correctly', 0.036), ('concatenation', 0.036), ('sigal', 0.035), ('dependencies', 0.035), ('cd', 0.035), ('sin', 0.034), ('lin', 0.033), ('nd', 0.033), ('groups', 0.031), ('pv', 0.031), ('west', 0.031), ('multiview', 0.031), ('torr', 0.031), ('convex', 0.031), ('urtasun', 0.03), ('usual', 0.029), ('generative', 0.029), ('quattoni', 0.029), ('across', 0.028), ('inspired', 0.028), ('multimodal', 0.028), ('signals', 0.027), ('among', 0.027), ('modalities', 0.027), ('synchronized', 0.027), ('factorize', 0.027), ('subset', 0.027), ('solving', 0.027), ('canonical', 0.026), ('dimension', 0.026), ('italy', 0.026), ('learned', 0.026), ('furthermore', 0.026), ('correlated', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
Author: Yangqing Jia, Mathieu Salzmann, Trevor Darrell
Abstract: Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation. 1
2 0.14315258 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
Author: Ning Chen, Jun Zhu, Eric P. Xing
Abstract: Learning from multi-view data is important in many applications, such as image classification and annotation. In this paper, we present a large-margin learning framework to discover a predictive latent subspace representation shared by multiple views. Our approach is based on an undirected latent space Markov network that fulfills a weak conditional independence assumption that multi-view observations and response variables are independent given a set of latent variables. We provide efficient inference and parameter estimation methods for the latent subspace model. Finally, we demonstrate the advantages of large-margin learning on real video and web image data for discovering predictive latent representations and improving the performance on image classification, annotation and retrieval.
3 0.13483585 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
Author: Taehwan Kim, Gregory Shakhnarovich, Raquel Urtasun
Abstract: Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend the sparse coding framework to learn interpretable spatio-temporal primitives. We formulated the problem as a tensor factorization problem with tensor group norm constraints over the primitives, diagonal constraints on the activations that provide interpretability as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms. 1
4 0.12517157 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
Author: Iain Murray, Ryan P. Adams
Abstract: The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes. 1
5 0.12493502 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
6 0.10829341 59 nips-2010-Deep Coding Network
7 0.10761323 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
8 0.1066457 103 nips-2010-Generating more realistic images using gated MRF's
9 0.09972582 235 nips-2010-Self-Paced Learning for Latent Variable Models
10 0.097097903 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
11 0.09563946 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
12 0.093225919 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
13 0.092896104 216 nips-2010-Probabilistic Inference and Differential Privacy
14 0.091490574 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
15 0.090395197 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
16 0.089850694 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
17 0.089622386 181 nips-2010-Network Flow Algorithms for Structured Sparsity
18 0.089471176 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
19 0.082812659 99 nips-2010-Gated Softmax Classification
20 0.080468446 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
topicId topicWeight
[(0, 0.216), (1, 0.093), (2, -0.065), (3, -0.055), (4, 0.023), (5, -0.08), (6, 0.051), (7, 0.143), (8, -0.152), (9, -0.03), (10, 0.004), (11, 0.077), (12, 0.051), (13, -0.046), (14, -0.035), (15, -0.015), (16, -0.04), (17, 0.087), (18, 0.042), (19, -0.012), (20, 0.002), (21, -0.022), (22, -0.068), (23, -0.135), (24, -0.059), (25, -0.043), (26, -0.061), (27, -0.016), (28, -0.029), (29, -0.001), (30, 0.279), (31, -0.051), (32, -0.095), (33, 0.074), (34, -0.022), (35, -0.046), (36, -0.005), (37, -0.086), (38, -0.085), (39, 0.007), (40, -0.095), (41, -0.019), (42, -0.073), (43, 0.059), (44, -0.116), (45, 0.082), (46, 0.072), (47, 0.054), (48, -0.086), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.95819104 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
Author: Yangqing Jia, Mathieu Salzmann, Trevor Darrell
Abstract: Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation. 1
2 0.67152566 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
Author: Ning Chen, Jun Zhu, Eric P. Xing
Abstract: Learning from multi-view data is important in many applications, such as image classification and annotation. In this paper, we present a large-margin learning framework to discover a predictive latent subspace representation shared by multiple views. Our approach is based on an undirected latent space Markov network that fulfills a weak conditional independence assumption that multi-view observations and response variables are independent given a set of latent variables. We provide efficient inference and parameter estimation methods for the latent subspace model. Finally, we demonstrate the advantages of large-margin learning on real video and web image data for discovering predictive latent representations and improving the performance on image classification, annotation and retrieval.
3 0.65087169 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
Author: Taehwan Kim, Gregory Shakhnarovich, Raquel Urtasun
Abstract: Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend the sparse coding framework to learn interpretable spatio-temporal primitives. We formulated the problem as a tensor factorization problem with tensor group norm constraints over the primitives, diagonal constraints on the activations that provide interpretability as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms. 1
4 0.58493453 262 nips-2010-Switched Latent Force Models for Movement Segmentation
Author: Mauricio Alvarez, Jan R. Peters, Neil D. Lawrence, Bernhard Schölkopf
Abstract: Latent force models encode the interaction between multiple related dynamical systems in the form of a kernel or covariance function. Each variable to be modeled is represented as the output of a differential equation and each differential equation is driven by a weighted sum of latent functions with uncertainty given by a Gaussian process prior. In this paper we consider employing the latent force model framework for the problem of determining robot motor primitives. To deal with discontinuities in the dynamical systems or the latent driving force we introduce an extension of the basic latent force model, that switches between different latent functions and potentially different dynamical systems. This creates a versatile representation for robot movements that can capture discrete changes and non-linearities in the dynamics. We give illustrative examples on both synthetic data and for striking movements recorded using a Barrett WAM robot as haptic input device. Our inspiration is robot motor primitives, but we expect our model to have wide application for dynamical systems including models for human motion capture data and systems biology. 1
5 0.547997 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
6 0.53607589 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
7 0.51907873 103 nips-2010-Generating more realistic images using gated MRF's
8 0.51349658 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
9 0.51114094 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
10 0.51100743 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
11 0.49694413 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
12 0.47314683 284 nips-2010-Variational bounds for mixed-data factor analysis
13 0.46702501 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
14 0.45323819 216 nips-2010-Probabilistic Inference and Differential Privacy
15 0.44795287 235 nips-2010-Self-Paced Learning for Latent Variable Models
16 0.44480509 99 nips-2010-Gated Softmax Classification
17 0.44273216 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
18 0.43869659 59 nips-2010-Deep Coding Network
19 0.43551382 45 nips-2010-CUR from a Sparse Optimization Viewpoint
20 0.41267475 101 nips-2010-Gaussian sampling by local perturbations
topicId topicWeight
[(13, 0.168), (17, 0.048), (27, 0.079), (30, 0.065), (33, 0.142), (35, 0.046), (45, 0.22), (50, 0.04), (52, 0.055), (60, 0.02), (77, 0.022), (90, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.89074469 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
Author: Yangqing Jia, Mathieu Salzmann, Trevor Darrell
Abstract: Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation. 1
2 0.88457376 258 nips-2010-Structured sparsity-inducing norms through submodular functions
Author: Francis R. Bach
Abstract: Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1 -norm. In this paper, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lov´ sz extension, a common tool in submodua lar analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning.
3 0.87865806 45 nips-2010-CUR from a Sparse Optimization Viewpoint
Author: Jacob Bien, Ya Xu, Michael W. Mahoney
Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.
4 0.87203729 284 nips-2010-Variational bounds for mixed-data factor analysis
Author: Mohammad E. Khan, Guillaume Bouchard, Kevin P. Murphy, Benjamin M. Marlin
Abstract: We propose a new variational EM algorithm for fitting factor analysis models with mixed continuous and categorical observations. The algorithm is based on a simple quadratic bound to the log-sum-exp function. In the special case of fully observed binary data, the bound we propose is significantly faster than previous variational methods. We show that EM is significantly more robust in the presence of missing data compared to treating the latent factors as parameters, which is the approach used by exponential family PCA and other related matrix-factorization methods. A further benefit of the variational approach is that it can easily be extended to the case of mixtures of factor analyzers, as we show. We present results on synthetic and real data sets demonstrating several desirable properties of our proposed method. 1
5 0.86906046 221 nips-2010-Random Projections for $k$-means Clustering
Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas
Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
6 0.86827832 102 nips-2010-Generalized roof duality and bisubmodular functions
7 0.86792004 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
8 0.86783379 261 nips-2010-Supervised Clustering
9 0.85765916 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms
10 0.85636264 139 nips-2010-Latent Variable Models for Predicting File Dependencies in Large-Scale Software Development
11 0.85376668 192 nips-2010-Online Classification with Specificity Constraints
12 0.85328865 262 nips-2010-Switched Latent Force Models for Movement Segmentation
13 0.8413952 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
14 0.84023166 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
15 0.83600247 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
16 0.83307028 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
17 0.82818335 117 nips-2010-Identifying graph-structured activation patterns in networks
18 0.82472569 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
19 0.82455522 265 nips-2010-The LASSO risk: asymptotic results and real world examples
20 0.82433492 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models