Author: Sumit Shekhar, Vishal M. Patel, Hien V. Nguyen, Rama Chellappa
Abstract: Data-driven dictionaries have produced state-of-the-art results in various classification tasks. However, when the target data has a different distribution than the source data, the learned sparse representation may not be optimal. In this paper, we investigate if it is possible to optimally represent both source and target by a common dictionary. Specifically, we describe a technique which jointly learns projections of data in the two domains, and a latent dictionary which can succinctly represent both the domains in the projected low-dimensional space. An efficient optimization technique is presented, which can be easily kernelized and extended to multiple domains. The algorithm is modified to learn a common discriminative dictionary, which can be further used for classification. The proposed approach does not require any explicit correspondence between the source and target domains, and shows good results even when there are only a few labels available in the target domain. Various recognition experiments show that the methodperforms onparor better than competitive stateof-the-art methods.
1 Nguyen Rama Chellappa University of Maryland, College Park, USA { s shekha ,pvi shalm, , hien , rama } @umiacs . [sent-3, score-0.109]
2 edu Abstract Data-driven dictionaries have produced state-of-the-art results in various classification tasks. [sent-5, score-0.17]
3 However, when the target data has a different distribution than the source data, the learned sparse representation may not be optimal. [sent-6, score-0.346]
4 In this paper, we investigate if it is possible to optimally represent both source and target by a common dictionary. [sent-7, score-0.278]
5 Specifically, we describe a technique which jointly learns projections of data in the two domains, and a latent dictionary which can succinctly represent both the domains in the projected low-dimensional space. [sent-8, score-0.962]
6 The algorithm is modified to learn a common discriminative dictionary, which can be further used for classification. [sent-10, score-0.111]
7 The proposed approach does not require any explicit correspondence between the source and target domains, and shows good results even when there are only a few labels available in the target domain. [sent-11, score-0.359]
8 Introduction The study of sparse representation of signals and images has attracted tremendous interest in last few years. [sent-14, score-0.111]
9 Sparse representations of signals and images require learning an over-complete set of bases called a dictionary along with linear decomposition of signals and images as a combination of few atoms from the learned dictionary. [sent-15, score-0.861]
10 Olshausen and Field [16] in their seminal work introduced the idea of learning dictionary from data instead of using off-the-shelf bases. [sent-16, score-0.598]
11 Since then, data-driven dictionaries have been shown to work well for both image restoration [3] and classification tasks [26]. [sent-17, score-0.17]
12 The efficiency of dictionaries in these wide range of applications can be attributed to the robust discriminant rep∗This work 0124. [sent-18, score-0.17]
13 However, the learned dictionary may not be optimal if the target data has different distribution than the data used for training. [sent-22, score-0.719]
14 Adapting dictionaries to new domains is a challenging task, but has hardly been explored in the vision literature. [sent-25, score-0.443]
15 [12] considered a special case where corresponding samples from each domain were available, and learned a dictionary for each domain. [sent-27, score-0.747]
16 [19] proposed a method for adapt- ing dictionaries for smoothly varying domains using regression. [sent-29, score-0.443]
17 However, in practical applications, target domains are scarcely labeled, and domain shifts may result in abrupt feature changes (e. [sent-30, score-0.574]
18 Hence learning a separate dictionary for each domain will have a severe space constraint, rendering it unfeasible for many practical applications. [sent-34, score-0.744]
19 333565991 In view of the above challenges, we propose a robust method for learning a single dictionary to optimally represent both source and target data. [sent-35, score-0.839]
20 As the features may not be correlated well in the original space, we project data from both the domains onto a common low-dimensional space, while maintaining the manifold structure of data. [sent-36, score-0.343]
21 Simultaneously, we learn a compact dictionary which represents projected data from both the domains well. [sent-37, score-0.899]
22 Firstly, learning separate projection matrix for each domain makes it easy to handle any changes in feature dimension and type in different domains. [sent-40, score-0.268]
23 Further, learning the dictionary on a lowdimensional space makes the algorithm faster, and irrele- vant information in original features is discarded. [sent-42, score-0.598]
24 Moreover, joint learning of dictionary and projections ensures that the common internal structure of data in both the domains is extracted, which can be represented well by sparse linear combinations of dictionary atoms. [sent-43, score-1.589]
25 We will see that by constraining the projection matrices to be orthonormal matrices, convenient forms for optimal dictionary and projection matrices can be obtained. [sent-45, score-0.764]
26 The classification scheme for the learned dictionary is described in Section 5. [sent-52, score-0.601]
27 Related Work The problem of adapting classifiers to new visual domains has recently gained importance in the vision community and several methods have been proposed [21, 13, 6, 5, 11]. [sent-55, score-0.361]
28 [11] learnt a transformation of source data onto target space, such that the joint representation is low-rank. [sent-57, score-0.274]
29 On the other hand, our methodjointly learns projections ofboth the domains, while utilizing the available labels to learn a discriminative dictionary. [sent-59, score-0.139]
30 [10] suggested learning a shared embedding for different domains, along with a sparsity constraint on the representation. [sent-61, score-0.17]
31 Similarly, methods for joint dimensionality reduction and sparse representation have also been proposed [29, 4, 14, 15]. [sent-66, score-0.125]
32 Problem Framework The classical dictionary learning approach minimizes the representation error of the given set of data samples subject to a sparsity constraint [1]. [sent-69, score-0.643]
33 , xN] ∈ RK×N is the sparse representation of Y over D, and ]T ∈0 iRs the sparsity level. [sent-81, score-0.105]
34 We wish to learn a shared K∈-at Roms dictionary, D∈ ∈R Rn×K and mappings Pn a1 h∈a Redn K×n-1a , oPm2s d∈i Rtionn×anry2 oDnt ∈o a common lowdimensiona∈l space, which w∈ill R minimize the representation error in the projected space. [sent-96, score-0.161]
35 Regularization: It will be desirable if the projections, while bringing the data from two domains to a shared subspace, do not lose too much information available in the original domains. [sent-105, score-0.327]
36 Multiple domains The above formulation can be extended so that it can handle multiple domains. [sent-123, score-0.273]
37 For M domain problem, we simply construct matrices as: Y˜, P˜, X˜ P˜ = [P1,· · ,PM] ,Y˜ =⎝⎜⎛Y0. [sent-124, score-0.207]
38 With these definitions, (3) can be extended to multiple domains as follows {D∗,P˜∗,X˜∗} = aDrg,P˜m,X˜inC1(D,P˜,X˜) + λC2(P˜) s. [sent-130, score-0.273]
39 Discriminative Dictionary The dictionary learned in (3) can reconstruct the two domains well, but it cannot discriminate between the data from different classes. [sent-136, score-0.874]
40 Recent advances in learning discriminative dictionaries [20, 28] suggest that learning class-wise, mutually incoherent dictionaries works better for discrimination. [sent-137, score-0.468]
41 To incorporate this into our framework, we write the dictionary D as D = [D1, · · · , DC], where C is the total number of classes. [sent-138, score-0.556]
42 We modify ,thDe cost function similar to [28], which encourages reconstruction samples of a given class by the dictionary of the corresponding class, and penalizes reconstruction by out-of-class dictionaries. [sent-139, score-0.67]
43 2F, (5) where μ and ν are the weights given to the discriminative terms, and matrices X˜in and X˜out are given as: X˜in[i,j] =? [sent-146, score-0.105]
44 Unlabeled data can be handled using semisupervised approaches to dictionary learning [18]. [sent-150, score-0.598]
45 Also, note that we do not need to modify the forms ofprojection matrices, since they capture the overall domain shift, and hence are independent of class variations. [sent-152, score-0.198]
46 Update step for For a fixed B˜, X˜ A˜, the problem becomes that of discrimina- tive dictionary learning, with data as Z = nary D = A˜TK˜B˜. [sent-189, score-0.602]
47 A˜TK˜ and dictio- To jointly learn the dictionary, D, and X˜, sparse code, we use the framework of the discriminative dictionary learning approach presented in [28]. [sent-190, score-0.732]
48 Classification Given a test sample, yte from domain k, we propose the following steps for classification, similar to [15]. [sent-214, score-0.341]
49 Compute the embedding of the sample in the common subspace, zte using the projection, Pk∗. [sent-217, score-0.169]
50 Compute the sparse coefficients, xte, of the embedded sample over dictionary D using the OMP algorithm [17]. [sent-222, score-0.616]
51 Now, the sample can be assigned to class i, if the reconstruction using the class dictionary, Di and the sparse code corresponding to the atoms of the dictionary, xtie is minimum. [sent-231, score-0.311]
52 So, we project the dictionary, Di into the feature space, and assign the test sample to the class with the minimum error in the original feature space: Output class = ai=rg1,m···i,nC? [sent-236, score-0.104]
53 First, we demonstrate some synthesis and recognition results on the CMU MultiPie dataset for face recognition across pose and illumination variations. [sent-244, score-0.189]
54 Next we show the performance of our method on domain adaptation databases and compare it with existing adaptation algorithms. [sent-246, score-0.378]
55 Frontal faces were taken as the source domain, while different off-frontal poses were taken as target domains. [sent-252, score-0.27]
56 Dictionaries were trained using illuminations {1, 4, 7, 12, 17} from the source and the target poses, itino Snses {si1o,n4 ,17 per subject. [sent-253, score-0.278]
57 Amll t hthee s oilluurcmei anantdio tnh images from Session 2, for the target pose, were taken as probe images. [sent-254, score-0.154]
58 1 Pose Alignment First we consider the problem of pose alignment using the proposed dictionary learning framework. [sent-258, score-0.694]
59 Images at the extreme pose of 60o were taken as the target pose. [sent-260, score-0.158]
60 A shared discriminative dictionary was learned using the approach described in this paper. [sent-261, score-0.699]
61 Given the probe image, it was projected on the latent subspace and reconstructed using the dictionary. [sent-262, score-0.117]
62 The reconstruction was back-projected onto the source pose domain, to give the aligned image. [sent-263, score-0.227]
63 Firstly, it can be seen that there is an optimal dictionary size, K = 5, where the best alignment is achieved. [sent-266, score-0.612]
64 For K = 7, the alignment is not good, as the learned dictionary is not able to successfully correlate the two domains when there are more atoms in the dictionary. [sent-268, score-1.046]
65 Moreover, the learned projection matrices (Figure 2(b)) show that our method can learn the internal structure of the two domains. [sent-274, score-0.179]
66 (b) First few components of the learned projection matrices for the two poses. [sent-279, score-0.149]
67 The dictionary learning algorithm, FDDL [28] is not optimal here as it is not able to efficiently represent the non-linear changes introduced by the pose variation. [sent-284, score-0.675]
68 In the second set-up, we evaluate the methods for adaptation using multiple domains. [sent-300, score-0.116]
69 For both the cases, we use 20 training samples per class for Amazon/Caltech, and 8 samples per class for DSLR/Webcam when used as source, and 3 training samples for all of them when used for target domain. [sent-302, score-0.222]
70 Rest of the data in the target domain is used for testing. [sent-303, score-0.264]
71 Dictionary size, K = 4 atoms per class and final dimension, n = 60 for the first set-up. [sent-315, score-0.168]
72 For the second set-up, K = 6 atoms per class and n = 90. [sent-316, score-0.168]
73 For FDDL, the parameters, μ and ν are the same as SDDL, and we learn K = 8 atoms per class for the first set-up and K = 10 atoms per class for the second. [sent-317, score-0.366]
74 The FDDL dictionary was trained using both the source and the target domain features, as it was found to give the best results. [sent-318, score-0.943]
75 2 Results using single source Table 2(a) shows a comparison of the results of different methods on 8 source-target pairs. [sent-322, score-0.123]
76 The proposed algorithm gives the best performance for 5 domain pairs, and is the second best for 2 pairs. [sent-323, score-0.146]
77 For Caltech-DSLR and AmazonWebcam domain pairs, there is more than 15% improvement over the GFK algorithm [5]. [sent-324, score-0.146]
78 Furthermore, a com333666446 (a) Performance comparison on single source four domains benchmark (C: caltech, A: amazon, D: dslr, W: webcam) MFSGDe GFtrKiDhLc[o562Ld1]8s43906C. [sent-325, score-0.396]
79 3 Results using multiple sources As our proposed framework can also handle multiple domains, we also experimented with multiple source adaptation. [sent-357, score-0.123]
80 However, [7] reports higher numbers on webcam and amazon as targets, using boosted classifiers. [sent-360, score-0.277]
81 4 Ease of adaptation A rank of domain (ROD) metric was introduced in [5] to measure the adaptability of different domains. [sent-364, score-0.262]
82 It was shown that ROD correlates with the performance of adaptation algorithm. [sent-365, score-0.116]
83 This is the case because by learning projections along-with the common dictionary, we can achieve a better alignment of the datasets. [sent-370, score-0.2]
84 Number of source images: Here, we choose Amazon/Webcam domain pair, as it is "difficult" to adapt. [sent-376, score-0.269]
85 We increased the number of source images and studied the performance of SDDL and compared it with FDDL. [sent-377, score-0.123]
86 It can be seen that while FDDL’s performance decreases sharply with more source images, SDDL method shows an increase in the performance. [sent-378, score-0.123]
87 Hence, by adapting the source to the target domain, our method can use the source information to increase the accuracy of target recognition, even when their distributions are very different. [sent-379, score-0.57]
88 Dictionary size: All the domain pairs show an initial sharp increase in the performance, and then become almost flat after the dictionary size of 3 or 4. [sent-381, score-0.743]
89 The flat region indicates that alignment of the source and the target data is limited by the number of available target samples. [sent-382, score-0.456]
90 But also, on a positive note, it can be seen that even a smaller dictionary can give the optimal performance. [sent-383, score-0.556]
91 Common subspace dimension: Similar to the previous case, we get an initial sharp increase followed by a flat recognition curve. [sent-385, score-0.118]
92 Conclusion We have proposed a novel framework for adapting dictionaries to testing domains under arbitrary domain shifts. [sent-388, score-0.677]
93 Future works will include studying the effect of using unlabeled data while training, and other relevant problems like large-scale and online adaptation of dictionaries. [sent-393, score-0.116]
94 K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. [sent-398, score-0.26]
95 Recognition performance under different: (a) number of source images, (b) dictionary size, and (c) common subspace dimension. [sent-400, score-0.757]
96 Image denoising via sparse and redundant representations over learned dictionaries. [sent-410, score-0.105]
97 Unsupervised adaptation across domain shift by generating intermediate data representations. [sent-434, score-0.262]
98 What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. [sent-480, score-0.148]
99 Classification and clustering via dictionary learning with structured incoherence and shared features. [sent-530, score-0.652]
100 On the dimensionality reduction for sparse representation based face recognition. [sent-596, score-0.163]
