cvpr cvpr2013 cvpr2013-201 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding
Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.
Reference: text
sentIndex sentText sentNum sentScore
1 a edu Abstract To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. [sent-5, score-0.039]
2 How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. [sent-6, score-0.337]
3 In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. [sent-7, score-0.648]
4 A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. [sent-8, score-0.079]
5 We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. [sent-9, score-0.226]
6 For each data set we integrate six different types of popularly used image features. [sent-10, score-0.219]
7 Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured. [sent-11, score-0.285]
8 Introduction Scene categorization and visual recognition problem analyzes and classifies the images into semantically meaningful categories. [sent-13, score-0.074]
9 To enhance the visual understanding, computer vision researchers have proposed many feature representation methods to describe the visual objects in ∗Corresponding author. [sent-15, score-0.039]
10 This work was partially supported by NSF IIS1117965 different types of images [12, 18, 3]. [sent-16, score-0.064]
11 However, it is usually not clear what the best feature descriptor to solve a given application problem is. [sent-17, score-0.039]
12 Because different features describe different aspects of the visual characteristics, one descriptor can be better than others under certain circumstances. [sent-18, score-0.062]
13 How to integrate heterogeneous features is becoming a challenging as well as attractive problem nowadays. [sent-19, score-0.285]
14 Considering different feature representations give rise to different kernel functions, the Multiple Kernel Learning (MKL) approaches [19, 9, 13] have been recently studied and employed to integrate heterogenous features or data and select multi-modal features. [sent-20, score-0.215]
15 Particularly, [5] surveyed sev- eral MKL methods, as well as their variants using boosting method, for computer vision tasks and applied them in object classification. [sent-21, score-0.085]
16 However, such models train a weight for each type of features, i. [sent-22, score-0.109]
17 , when multiple types of features are combined together, all features from the same type are weighted equally. [sent-24, score-0.297]
18 To address the integration of the heterogeneous image features for visual recognition tasks, in this paper, we propose a novel Sparse Multimodal Learning (SMML) method by utilizing a new mixed structured sparsity norms. [sent-26, score-0.413]
19 In our method, we concatenate all features of one image together as its feature vector and learn the weight of each feature in the classification decision functions. [sent-27, score-0.233]
20 The main difficulty of integrating heterogeneous image features is to simultaneously consider the feature group property (e. [sent-28, score-0.293]
21 GIST is good at detecting natural scenes and the GIST features should have large weights for classes related to outdoor natural scenes) and individual feature property, i. [sent-30, score-0.204]
22 , although a type of features are not useful to categorize certain specific classes, a small number of individual features from the same type can still be discriminative for these classes. [sent-32, score-0.381]
23 We propose to utilize two structured sparsity regularizers to capture both group and individual properties of different modalities (types) of features. [sent-33, score-0.376]
24 Meanwhile the sparse weight matrix provides a natural feature selection results. [sent-34, score-0.087]
25 Our new 333000999755 objective, employing the hinge loss and the two non-smooth regularizers, is highly non-smooth and difficult to solve in general. [sent-35, score-0.076]
26 Thus, we derive a new efficient algorithm to solve it with rigorously proved global convergence. [sent-36, score-0.05]
27 We applied our new sparse multimodal learning method to five broadly used object categorization and scene understanding image data sets for both single-label and multi-label image classification tasks. [sent-37, score-0.44]
28 For each data set we integrate six different types of popularly used image features. [sent-38, score-0.219]
29 In all experimental results, our new method always achieves a better object and scene categorization performance than traditional classification methods utilizing each single type descriptor and the widely used MKL based feature integration methods. [sent-39, score-0.368]
30 Sparse Multimodal Learning In recent research, sparse regularizations have been widely investigated and applied into different computer vision and machine learning studies [1, 16, 11]. [sent-41, score-0.173]
31 The sparse representations are typically achieved by imposing nonsmooth norms as regularizers in the optimization problems. [sent-42, score-0.141]
32 Because the structured sparsity regularizers can capture the structures existing in data points and features, they are useful in discovering the underlying patterns. [sent-43, score-0.234]
33 We will use a new joint structured sparsity regularizers to explore both group-wise and individual importance of each feature for different classes. [sent-44, score-0.312]
34 Joint Structured Sparsity Regularizations In given the n supervised learning setting, we are training images {(xi, yi)}in=1, where ? [sent-47, score-0.033]
35 a total of k modalities and each modality j has dj features (d = dj). [sent-59, score-0.388]
36 Different to MKL, we directly learn a d×c feature weight matDriifxf Wren o=M [KwL11,,w w· ·e ·, iwre1cc;t ·l y· ·, ·a ·r ·, ·a ·d ·; wc1kfe, a· t·u ·, ewwcke] h∈t ? [sent-72, score-0.039]
37 ures from the q-th∈ modality in the classification decision function of the p-th class. [sent-75, score-0.349]
38 Typically we can use a convex loss function L(X, W) to measure the loss incurred by W on tsh feu training images. [sent-76, score-0.064]
39 ) W toe mcheoaossuere eto th use tshse i hinge ldo bssy, bWecause the hinge loss based SVM has shown state-of-the-art performance in classifications. [sent-77, score-0.149]
40 Compared to MKL approaches that learn weight matrices in unsupervised way, our method will learn all weights of features using supervised learning model. [sent-78, score-0.125]
41 Because the heterogeneous features come from different visual descriptors, we cannot only use the loss function that equivalents to assign the uniform weights to all features. [sent-79, score-0.281]
42 multimodal viewpoint, the features of a specific modality can be more or less discriminative for specific classes. [sent-93, score-0.422]
43 For instance, the color features substantially increases the detection of stop signs while they are almost irrelevant for finding cars in images. [sent-94, score-0.062]
44 1-norm between modalities, it enforces the sparsity between different modalities, i. [sent-118, score-0.092]
45 , if one modality of features are not discriminative for certain tasks, the ob- jective in Eq. [sent-120, score-0.289]
46 (2) will assign zeros (in ideal case, usually they are very small values) to them for corresponding tasks; otherwise, their weights are large. [sent-121, score-0.03]
47 However, in certain cases, even if most features in one modality are not discriminative for certain visual categories, a small number of features from the same modality can still be highly discriminative. [sent-124, score-0.578]
48 From the multi-task learning point of view, such important features should be shared by all tasks. [sent-125, score-0.095]
49 2,1-norm regularizer imposes the sparsity between all features and non-sparsity between classes, the features that are discriminative for all classes will get large weights. [sent-136, score-0.309]
50 Because of the imposed sparsity, the weights of most features are close to zeroes and only the features important to classification tasks have large weights. [sent-137, score-0.328]
51 Thus, our SMML results automatically perform the feature selection procedure. [sent-138, score-0.039]
52 Experimental Results In this section, we experimentally evaluate the proposed Sparse Multi-Modal Learning (SMML) approach in both single-label image classification tasks and multi-label image classification tasks. [sent-264, score-0.238]
53 We experiment with the following three benchmark single-label multi-modal image data sets, which are broadly used computer vision studies. [sent-268, score-0.059]
54 Following [4], we select to use a subset of 26 classes in our experiments. [sent-270, score-0.034]
55 Following [2], we refine the data set to get 7 classes including tree, building, airplane, cow, face, car, bicycle, and each refined class has 30 images. [sent-273, score-0.034]
56 We classify the images in the above three data sets using the proposed methods by integrating the six types ofimage features ofeach ofthem. [sent-276, score-0.171]
57 Image feature descriptors of image data sets used in experiments. [sent-289, score-0.039]
58 In addition, we also report the classification performances by SVM on each individual type of features and a straightforward concatenation of all six types of features as baselines. [sent-298, score-0.632]
59 2,1-norm regularization, which thereby select feature shared across tasks yet modality structure is not considered. [sent-305, score-0.318]
60 We denote this degenerate version of the proposed method as “Our method (? [sent-306, score-0.032]
61 For each of the 5 trials, within the training data, an internal 5-fold cross-validation is performed to fine tune the parameters. [sent-311, score-0.03]
62 We implement the compared MKL methods using the codes published by [20]. [sent-334, score-0.152]
63 2 methods, the regularization parameter λ is estimated jointly as the kernel coefficient of an identity matrix; in LSSVM ? [sent-337, score-0.048]
64 1 method, λ is set to 1; in all other SVM approaches, the C parameter of the box constraint is fine tuned in the same range as γ. [sent-338, score-0.03]
65 For LPBoost-β and LPBoostB methods, we use the codes published by the authors4. [sent-339, score-0.084]
66 052 use LIBSVM5 software package to implement SVM in all our experiments. [sent-582, score-0.068]
67 Because we are concerned with single-label image classification, we employ the most widely used classification accuracy to assess the classification performance. [sent-584, score-0.186]
68 The results of all compared methods in the three image classification tasks are reported in Table 2. [sent-585, score-0.145]
69 In addition, the methods using multiple data sources are significantly better than SVM using one single type of data. [sent-587, score-0.109]
70 This confirms the usefulness of data integration in image classification. [sent-588, score-0.053]
71 In contrast, the MKL methods and boosting enhanced MKL methods only address the former while not being able to take into account the latter. [sent-590, score-0.063]
72 Same as the MSRC-v1 data set, we extract six types of image features for these two data sets as detailed in the last column of Table 1following [2]. [sent-603, score-0.171]
73 We still implement the three versions of our method and compare them against the MKL methods used in the above experiments with the same settings. [sent-605, score-0.102]
74 For our method and MKL methods, we conduct classification for every class individually. [sent-606, score-0.093]
75 For each class, we consider it as a binary classification task by using onevs. [sent-607, score-0.093]
76 In addition, we also implement two most recent multi-label classification methods including multi-label correlated Green’s function (MCGF) [15] method and Multi-Label Least Square (MLLS) [6] method. [sent-610, score-0.161]
77 However, these two methods are designed for data with single type of features, therefore we use the concatenation of all types of features as their input. [sent-611, score-0.335]
78 We implement the two multi-label classification methods using the codes published by the authors. [sent-612, score-0.245]
79 The conventional classification performance metrics in statistical learning, precision and F1 score, are used to evaluate the compared methods. [sent-613, score-0.135]
80 For every class, the precision and F1 score are computed following the standard definition for a binary classification problem. [sent-614, score-0.135]
81 To address the multilabel scenario, following [10], macro average and micro average of precision and F1 score are computed to assess the overall performance across multiple labels. [sent-615, score-0.172]
82 The classification results by standard 5-fold cross-validation on TRECVID 2005 data set 6http://www-nlpir. [sent-617, score-0.093]
83 The bolded and underlined labels can only be correctly predict by our method. [sent-629, score-0.116]
84 Besides, MKL methods using multiple types of features are generally better than the methods using single type of features. [sent-632, score-0.235]
85 Although Mk-NN, MCGF and MLLS methods are designed for multi-label data, they can only work with one type of features. [sent-633, score-0.109]
86 When using the feature concatenation as input, they assume all types of features as homogenous with no distinction, which, however, is not true in both our experiments as well as most real world applications. [sent-634, score-0.265]
87 Although our method and MKL methods do not purposely address the multi-label settings, we still achieve better performance due to properly making use of the available information from various types of features. [sent-635, score-0.064]
88 The bolded and underlined labels can only be correctly predict by our method, but not the other compared methods. [sent-637, score-0.116]
89 All these observations, again, confirm the effectiveness of the proposed approach in feature integration for multi-label image classification tasks. [sent-638, score-0.185]
90 Conclusions We proposed a novel Sparse Multimodal Learning method to integrate different types of visual features for scene and object classifications. [sent-640, score-0.192]
91 Instead of learning one parameter for all features from one modality as in multiple kernel learning, our method learned the parameters for each feature on different classes via the joint structured sparsity regularizations. [sent-641, score-0.584]
92 Our new combined convex regularizations consider the importance of both feature modality and individual feature. [sent-642, score-0.397]
93 The natural property of sparse regularization automatically identifies the important visual features for different visual recognition tasks. [sent-643, score-0.11]
94 Multi-label classification Data set Methods performances ± std) of the compared methods. [sent-646, score-0.151]
95 Extensive experiments have been performed on both single-label and multi-label image categorization tasks, our approach outperforms other related methods in all benchmark data sets. [sent-1328, score-0.074]
96 Multi-layer group sparse codingłfor concurrent image classification and annotation. [sent-1351, score-0.176]
97 Rcv1 : A new benchmark collection for text categorization research. [sent-1392, score-0.074]
98 A new sparse multi-task regression and feature selection method to identify brain imaging predictors for memory performance. [sent-1436, score-0.087]
99 Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. [sent-1447, score-0.367]
100 L 2-norm multiple kernel learning and its application to biomedical data fusion. [sent-1469, score-0.081]
wordName wordTfidf (topN-words)
[('mkl', 0.623), ('lssvm', 0.276), ('modality', 0.227), ('smml', 0.162), ('heterogeneous', 0.157), ('multimodal', 0.133), ('svm', 0.126), ('type', 0.109), ('wt', 0.105), ('nie', 0.104), ('trecvid', 0.101), ('concatenation', 0.1), ('mcgf', 0.097), ('mlls', 0.097), ('mwini', 0.097), ('classification', 0.093), ('regularizers', 0.093), ('sparsity', 0.092), ('regularizations', 0.092), ('ij', 0.08), ('categorization', 0.074), ('std', 0.069), ('micro', 0.069), ('implement', 0.068), ('modalities', 0.068), ('integrate', 0.066), ('risacher', 0.065), ('saykin', 0.065), ('types', 0.064), ('features', 0.062), ('macro', 0.061), ('broadly', 0.059), ('regularizer', 0.059), ('gist', 0.059), ('performances', 0.058), ('outdo', 0.058), ('bolded', 0.058), ('arlington', 0.058), ('underlined', 0.058), ('integration', 0.053), ('yji', 0.053), ('centrist', 0.053), ('witxj', 0.053), ('featur', 0.053), ('tasks', 0.052), ('published', 0.05), ('heng', 0.05), ('suykens', 0.05), ('rigorously', 0.05), ('structured', 0.049), ('gp', 0.048), ('sparse', 0.048), ('colorado', 0.048), ('kernel', 0.048), ('wti', 0.046), ('bt', 0.045), ('six', 0.045), ('animal', 0.045), ('popularly', 0.044), ('bioinformatics', 0.044), ('hinge', 0.044), ('huang', 0.043), ('texas', 0.043), ('precision', 0.042), ('individual', 0.039), ('feature', 0.039), ('group', 0.035), ('jmlr', 0.034), ('classes', 0.034), ('codes', 0.034), ('fi', 0.034), ('versions', 0.034), ('boosting', 0.033), ('learning', 0.033), ('ding', 0.032), ('moment', 0.032), ('degenerate', 0.032), ('loss', 0.032), ('dj', 0.031), ('enhanced', 0.03), ('weights', 0.03), ('fine', 0.03), ('objective', 0.029), ('feea', 0.029), ('zeroes', 0.029), ('tshse', 0.029), ('ernel', 0.029), ('doubts', 0.029), ('wren', 0.029), ('genetics', 0.029), ('chqding', 0.029), ('cinl', 0.029), ('correlogram', 0.029), ('feiping', 0.029), ('gestel', 0.029), ('neous', 0.029), ('nonsparse', 0.029), ('sonnenburg', 0.029), ('ures', 0.029), ('gmai', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding
Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.
2 0.14571866 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
Author: Lin Chen, Lixin Duan, Dong Xu
Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.
3 0.12904067 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning
Author: Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
Abstract: Since the seminal work of Thrun [17], the learning to learnparadigm has been defined as the ability ofan agent to improve its performance at each task with experience, with the number of tasks. Within the object categorization domain, the visual learning community has actively declined this paradigm in the transfer learning setting. Almost all proposed methods focus on category detection problems, addressing how to learn a new target class from few samples by leveraging over the known source. But if one thinks oflearning over multiple tasks, there is a needfor multiclass transfer learning algorithms able to exploit previous source knowledge when learning a new class, while at the same time optimizing their overall performance. This is an open challenge for existing transfer learning algorithms. The contribution of this paper is a discriminative method that addresses this issue, based on a Least-Squares Support Vector Machine formulation. Our approach is designed to balance between transferring to the new class and preserving what has already been learned on the source models. Exten- sive experiments on subsets of publicly available datasets prove the effectiveness of our approach.
4 0.12340665 237 cvpr-2013-Kernel Learning for Extrinsic Classification of Manifold Features
Author: Raviteja Vemulapalli, Jaishanker K. Pillai, Rama Chellappa
Abstract: In computer vision applications, features often lie on Riemannian manifolds with known geometry. Popular learning algorithms such as discriminant analysis, partial least squares, support vector machines, etc., are not directly applicable to such features due to the non-Euclidean nature of the underlying spaces. Hence, classification is often performed in an extrinsic manner by mapping the manifolds to Euclidean spaces using kernels. However, for kernel based approaches, poor choice of kernel often results in reduced performance. In this paper, we address the issue of kernelselection for the classification of features that lie on Riemannian manifolds using the kernel learning approach. We propose two criteria for jointly learning the kernel and the classifier using a single optimization problem. Specifically, for the SVM classifier, we formulate the problem of learning a good kernel-classifier combination as a convex optimization problem and solve it efficiently following the multiple kernel learning approach. Experimental results on image set-based classification and activity recognition clearly demonstrate the superiority of the proposed approach over existing methods for classification of manifold features.
5 0.089565895 238 cvpr-2013-Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices
Author: Sadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi
Abstract: Symmetric Positive Definite (SPD) matrices have become popular to encode image information. Accounting for the geometry of the Riemannian manifold of SPD matrices has proven key to the success of many algorithms. However, most existing methods only approximate the true shape of the manifold locally by its tangent plane. In this paper, inspired by kernel methods, we propose to map SPD matrices to a high dimensional Hilbert space where Euclidean geometry applies. To encode the geometry of the manifold in the mapping, we introduce a family of provably positive definite kernels on the Riemannian manifold of SPD matrices. These kernels are derived from the Gaussian kernel, but exploit different metrics on the manifold. This lets us extend kernel-based algorithms developed for Euclidean spaces, such as SVM and kernel PCA, to the Riemannian manifold of SPD matrices. We demonstrate the benefits of our approach on the problems of pedestrian detection, object categorization, texture analysis, 2D motion segmentation and Diffusion Tensor Imaging (DTI) segmentation.
6 0.086977199 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition
7 0.074864335 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification
8 0.074162975 143 cvpr-2013-Efficient Large-Scale Structured Learning
9 0.072507218 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit
10 0.07249254 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
11 0.071187019 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation
12 0.069263183 130 cvpr-2013-Discriminative Color Descriptors
13 0.06849511 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
14 0.066641837 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
15 0.064974934 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
16 0.064019881 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
17 0.063454263 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
18 0.063429549 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
19 0.062913798 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
20 0.060913805 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
topicId topicWeight
[(0, 0.154), (1, -0.057), (2, -0.042), (3, 0.025), (4, 0.021), (5, 0.027), (6, -0.028), (7, -0.02), (8, -0.026), (9, -0.003), (10, -0.0), (11, -0.05), (12, -0.008), (13, -0.032), (14, -0.068), (15, -0.028), (16, -0.013), (17, -0.03), (18, -0.001), (19, -0.051), (20, -0.035), (21, -0.011), (22, 0.037), (23, -0.026), (24, -0.009), (25, 0.089), (26, -0.01), (27, 0.071), (28, -0.016), (29, -0.054), (30, -0.021), (31, 0.016), (32, 0.0), (33, 0.02), (34, 0.025), (35, 0.011), (36, -0.005), (37, 0.084), (38, 0.033), (39, 0.029), (40, -0.009), (41, -0.023), (42, -0.075), (43, -0.068), (44, 0.064), (45, 0.031), (46, 0.072), (47, 0.054), (48, -0.049), (49, 0.08)]
simIndex simValue paperId paperTitle
same-paper 1 0.90420216 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding
Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.
2 0.72334874 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit
Author: Yan Pan, Hanjiang Lai, Cong Liu, Shuicheng Yan
Abstract: Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. We also provide high probability recovery guarantees of the base algorithm. The proposed method is evaluated on various fusion problems in object categorization and video event detection. Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection.
3 0.68611848 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
Author: Peng Wang, Jingdong Wang, Gang Zeng, Weiwei Xu, Hongbin Zha, Shipeng Li
Abstract: In visual recognition tasks, the design of low level image feature representation is fundamental. The advent of local patch features from pixel attributes such as SIFT and LBP, has precipitated dramatic progresses. Recently, a kernel view of these features, called kernel descriptors (KDES) [1], generalizes the feature design in an unsupervised fashion and yields impressive results. In this paper, we present a supervised framework to embed the image level label information into the design of patch level kernel descriptors, which we call supervised kernel descriptors (SKDES). Specifically, we adopt the broadly applied bag-of-words (BOW) image classification pipeline and a large margin criterion to learn the lowlevel patch representation, which makes the patch features much more compact and achieve better discriminative ability than KDES. With this method, we achieve competitive results over several public datasets comparing with stateof-the-art methods.
4 0.68400139 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.
5 0.6783523 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection
Author: Paul Bodesheim, Alexander Freytag, Erik Rodner, Michael Kemmler, Joachim Denzler
Abstract: Detecting samples from previously unknown classes is a crucial task in object recognition, especially when dealing with real-world applications where the closed-world assumption does not hold. We present how to apply a null space method for novelty detection, which maps all training samples of one class to a single point. Beside the possibility of modeling a single class, we are able to treat multiple known classes jointly and to detect novelties for a set of classes with a single model. In contrast to modeling the support of each known class individually, our approach makes use of a projection in a joint subspace where training samples of all known classes have zero intra-class variance. This subspace is called the null space of the training data. To decide about novelty of a test sample, our null space approach allows for solely relying on a distance measure instead of performing density estimation directly. Therefore, we derive a simple yet powerful method for multi-class novelty detection, an important problem not studied sufficiently so far. Our novelty detection approach is assessed in com- prehensive multi-class experiments using the publicly available datasets Caltech-256 and ImageNet. The analysis reveals that our null space approach is perfectly suited for multi-class novelty detection since it outperforms all other methods.
6 0.65905565 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning
8 0.63727969 452 cvpr-2013-Vantage Feature Frames for Fine-Grained Categorization
9 0.620588 83 cvpr-2013-Classification of Tumor Histology via Morphometric Context
10 0.61786592 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
11 0.60966992 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
13 0.58805507 237 cvpr-2013-Kernel Learning for Extrinsic Classification of Manifold Features
14 0.58597618 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
15 0.57647145 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
16 0.564991 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
17 0.56040376 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification
19 0.5556249 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
20 0.55148029 130 cvpr-2013-Discriminative Color Descriptors
topicId topicWeight
[(10, 0.084), (16, 0.03), (26, 0.03), (28, 0.01), (33, 0.237), (67, 0.073), (69, 0.043), (76, 0.324), (77, 0.017), (87, 0.073), (99, 0.011)]
simIndex simValue paperId paperTitle
1 0.73391891 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
Author: Erik Ask, Olof Enqvist, Fredrik Kahl
Abstract: This paper is concerned with model fitting in the presence of noise and outliers. Previously it has been shown that the number of outliers can be minimized with polynomial complexity in the number of measurements. This paper improves on these results in two ways. First, it is shown that for a large class of problems, the statistically more desirable truncated L2-norm can be optimized with the same complexity. Then, with the same methodology, it is shown how to transform multi-model fitting into a purely combinatorial problem—with worst-case complexity that is polynomial in the number of measurements, though exponential in the number of models. We apply our framework to a series of hard registration and stitching problems demonstrating that the approach is not only of theoretical interest. It gives a practical method for simultaneously dealing with measurement noise and large amounts of outliers for fitting problems with lowdimensional models.
same-paper 2 0.72454649 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding
Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.
3 0.72387123 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion
Author: Minsik Lee, Jungchan Cho, Chong-Ho Choi, Songhwai Oh
Abstract: Non-rigid structure from motion is a fundamental problem in computer vision, which is yet to be solved satisfactorily. The main difficulty of the problem lies in choosing the right constraints for the solution. In this paper, we propose new constraints that are more effective for non-rigid shape recovery. Unlike the other proposals which have mainly focused on restricting the deformation space using rank constraints, our proposal constrains the motion parameters so that the 3D shapes are most closely aligned to each other, which makes the rank constraints unnecessary. Based on these constraints, we define a new class ofprobability distribution called the Procrustean normal distribution and propose a new NRSfM algorithm, EM-PND. The experimental results show that the proposed method outperforms the existing methods, and it works well even if there is no temporal dependence between the observed samples.
4 0.71620572 426 cvpr-2013-Tensor-Based Human Body Modeling
Author: Yinpeng Chen, Zicheng Liu, Zhengyou Zhang
Abstract: In this paper, we present a novel approach to model 3D human body with variations on both human shape and pose, by exploring a tensor decomposition technique. 3D human body modeling is important for 3D reconstruction and animation of realistic human body, which can be widely used in Tele-presence and video game applications. It is challenging due to a wide range of shape variations over different people and poses. The existing SCAPE model [4] is popular in computer vision for modeling 3D human body. However, it considers shape and pose deformations separately, which is not accurate since pose deformation is persondependent. Our tensor-based model addresses this issue by jointly modeling shape and pose deformations. Experimental results demonstrate that our tensor-based model outperforms the SCAPE model quite significantly. We also apply our model to capture human body using Microsoft Kinect sensors with excellent results.
5 0.7134009 434 cvpr-2013-Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior
Author: Gangqiang Zhao, Junsong Yuan, Gang Hua
Abstract: A topical video object refers to an object that is frequently highlighted in a video. It could be, e.g., the product logo and the leading actor/actress in a TV commercial. We propose a topic model that incorporates a word co-occurrence prior for efficient discovery of topical video objects from a set of key frames. Previous work using topic models, such as Latent Dirichelet Allocation (LDA), for video object discovery often takes a bag-of-visual-words representation, which ignored important co-occurrence information among the local features. We show that such data driven co-occurrence information from bottom-up can conveniently be incorporated in LDA with a Gaussian Markov prior, which combines top down probabilistic topic modeling with bottom up priors in a unified model. Our experiments on challenging videos demonstrate that the proposed approach can discover different types of topical objects despite variations in scale, view-point, color and lighting changes, or even partial occlusions. The efficacy of the co-occurrence prior is clearly demonstrated when comparing with topic models without such priors.
6 0.70440602 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
7 0.64928639 413 cvpr-2013-Story-Driven Summarization for Egocentric Video
8 0.64218831 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
10 0.64147776 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
11 0.64144796 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
12 0.64114416 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
13 0.64087123 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
14 0.64075887 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
15 0.6404286 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
16 0.63975936 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
17 0.63950789 202 cvpr-2013-Hierarchical Saliency Detection
18 0.63932639 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
19 0.63880748 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
20 0.63867152 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image