iccv iccv2013 iccv2013-194 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
Reference: text
sentIndex sentText sentNum sentScore
1 Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. [sent-6, score-0.122]
2 On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. [sent-7, score-0.057]
3 Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. [sent-9, score-0.338]
4 In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. [sent-10, score-0.598]
5 As a multi-class classification problem, image categorization has been widely studied in the computer vision community. [sent-14, score-0.122]
6 It is a challenging task not only due to the fact that the number of labeled images is much smaller than that of the unlabeled images in the real world but also because of image’s variability, ambiguity, and wide range of illumination. [sent-27, score-0.251]
7 As we know, in the traditional supervised learning paradigm, increasing the quantity and diversity of labeled images enhances the performance of the learned classifier. [sent-28, score-0.102]
8 To solve the classification problem caused by the scarce or expensive labeled data, we resort to semi-supervised learning, which takes advantage of the combination of both labeled and unlabeled images. [sent-31, score-0.379]
9 The most popular way to do semi-supervised learning for image categorization is to use some low-level image descriptors. [sent-32, score-0.096]
10 Ifwe integrate all the descriptors via a proper learning method, we could create a generally more accurate and more robust descriptor than any single one. [sent-35, score-0.125]
11 In this paper, we propose a novel semi-supervised learning approach to integrate heterogeneous features from both labeled and unlabeled as well as unsegmented images. [sent-36, score-0.629]
12 We applied our AMMSS method to integrate multiple popularly used image features, which describe the image content from different perspectives, and evaluated the performance 11773377 by four benchmark datasets. [sent-38, score-0.117]
13 Compared with the existing semi-supervised scene and object categorization methods, our approach always achieves superior performances in terms of both macro and micro classification accuracy. [sent-39, score-0.466]
14 Related Work As the most popularly used semi-supervised learning models, the graph based semi-supervised methods define a graph where the nodes encompass labeled as well as unlabeled data, and edges (may be weighted) reflect the similarity of data points [26, 27, 28, 7]. [sent-41, score-0.421]
15 Nevertheless, they take advantage of the affinity matrix or graph Laplacian matrix extracted from one visual descriptor only. [sent-42, score-0.217]
16 It can fuse the descriptors by learning a separate classifier using each visual feature and iteratively training examples for each classifier based on the output of the other classifier. [sent-44, score-0.141]
17 Each classifier then classifies the unlabeled data and teaches the other classifier with the few unlabeled examples they feel most confident. [sent-45, score-0.388]
18 But the drawback of the co-training is that it requires all the classifiers over the separate feature sets to be accurate. [sent-47, score-0.057]
19 In other words, the performance of co-training is not robust to the outlier feature set whose performance is far below the av- erage, since the outlier will provide erroneous information to other classifiers and deteriorate the overall classification result. [sent-48, score-0.128]
20 How to properly integrate heterogeneous features is becoming an emerging topic nowadays. [sent-49, score-0.293]
21 As a multikernel learning algorithm, the heterogeneous feature machine (HFM) [3] was recently proposed based on logistic regression loss function and group LASSO regularization to supervised fuse the multiple types of features for visual classifications. [sent-50, score-0.354]
22 Although many supervised and unsupervised methods have been proposed to integrate the multi-modal features, there is no semi-supervised learning model to integrate heterogeneous image visual features. [sent-52, score-0.388]
23 The structured sparse-inducing norms were also used for feature integration in different applications [21, 1, 20, 19]. [sent-53, score-0.107]
24 In this paper, we will tackle this problem by a novel graph based semi-supervised learning model to adaptively fuse different visual features for semi-supervised image categorizations. [sent-54, score-0.194]
25 Each data point xi belongs to one of K classes C = {c1, · · · , cK} represented by yi ∈ {0, 1}K, such that yi (k) = 1 if xi is classified into k-th class, and 0 otherwise. [sent-61, score-0.13]
26 Our task is to learn a function f : X → {0, 1}K from T that is able to classify the given unlabeled data xi (l + 1 ≤ i≤ n) into one and only one class in C. [sent-64, score-0.262]
27 For simplicity, we use u to denote the number of unlabeled data point. [sent-65, score-0.194]
28 that is, l+ u = n and split the label matrix Y = [y1, y2, . [sent-66, score-0.109]
29 t X, all the image data including the labeled and unlabeled ones are abstracted as the vertices on K − NN graph. [sent-74, score-0.301]
30 To be specific, we connect xi, xj if one of them is among the other’s K-nearest neighbor by Euclidean distance and define the corresponding weight on the edge as the following, wij=? [sent-75, score-0.079]
31 The normalized graph Laplacian matrix L is defined as L=I (2) − D−21WD−21 2. [sent-86, score-0.1]
32 Label Propogation for Single Modality According to graph theory, if the edge weight between two vertices on affinity matrix is large, then the class labels of these two instances should be similar. [sent-88, score-0.308]
33 Based on the above assumption, denote G ∈ Rn×K as the class label matrix, for each feature modality, we use the following way to propagate the class label information from labeled data to unlabeled data, mGinGTLG s. [sent-89, score-0.55]
34 With the addition of weight factor for each feature modality, we adaptively learn the weight for each feature modality, assigning the more discriminative modality with higher weight. [sent-106, score-0.711]
35 =1 = 1, where V is the number of image visual features, α(v) is the non-negative normalized weight factor for the v-th modality, L(v) and G(v) are the normalized Laplacian matrix and class label matrix for the v-th feature modality respectively. [sent-116, score-0.772]
36 G is the shared consensus class label matrix that we are interested. [sent-117, score-0.212]
37 We use the scalar r to control the distribution of different weights for different feature modalities and λ is the regularization parameter to balance the 1st term and the 2nd term. [sent-118, score-0.265]
38 (6) as the following three subproblems and solve them alternatively and iteratively. [sent-125, score-0.089]
39 The predicted labels for the unlabeled images yi , ∀i = l 1, l 2, . [sent-153, score-0.259]
40 Initialize the weight for each modality, = ∀v = 1, 2, . [sent-164, score-0.079]
41 Calculate the normalized Laplacian matrices for each feature modalui,j Lt(v) − (D(tv))−21Wt(v)(Dt(v))−21 ity, =I Procedure: repeat 1. [sent-174, score-0.102]
42 Calculate the class indicator matrix for each modality Gt(v) = λ(L˜(tv) + λI)−1Gt 3. [sent-176, score-0.527]
43 Update t = t until Converges +1 Assign the single class label for the unlabeled images by Eq. [sent-190, score-0.315]
44 In order to get the optimal solution of the above subproblem, set the derivative of Eq. [sent-200, score-0.062]
45 ,l (15) To compute class label matrix for the unlabeled image explicitly in terms of matrix operations, we split the matrix H into 4 blocks by the l-th row and l-th column: H= ? [sent-245, score-0.483]
46 (18) to zero with respect to Gu, we get Gu = −H−uu1HulYl (19) By the above three steps, we alternatively update α(v) , G(v) and G and repeat them iteratively until the objective function converges. [sent-255, score-0.063]
47 At last, we resort to the following decision function to assign the single class label to the unlabeled im- ages, yi= argjmaxGij, ∀i = l+1,l+2,. [sent-256, score-0.315]
48 (1) into three subproblems and each of them is convex problem. [sent-270, score-0.058]
49 Since the original problem is not a joint convex problem, by solving the subproblems alternatively, Alg. [sent-271, score-0.058]
50 1 will converge to the local solution and we use 1/V as the initial weight for each modality. [sent-272, score-0.079]
51 Discussion of The Parameter r In AMMSS, we use one parameter r to control the distribution of weight factors for different feature modalities. [sent-276, score-0.172]
52 (11), we can see that when r → ∞, we will get equal weight factors. [sent-278, score-0.111]
53 And when r → 1, we will assign 1 to the weight factor of the modality whose p(v) value is the smallest and assign 0 to the weights of other modalities. [sent-279, score-0.518]
54 Using such kind of strategy, on one hand, we avoid the trivial solution to the weight distribution of the different modalities, that is, the solution when r → 1. [sent-280, score-0.079]
55 The image classification performance is evaluated in terms of average macro and micro classification accuracy. [sent-284, score-0.486]
56 The 20 classes include 11774400 by the weighted six different feature modalities, where the weight for different feature modality is learned by the training images. [sent-291, score-0.638]
57 MSRC-v1 Images We follow Lee and Grauman’s approach [13] to refine the dataset, getting 7 classes composed of tree, building, airplane, cow, face, car, bicycle, and each refined class has 30 images. [sent-293, score-0.068]
58 Since there is no published image descriptors for Caltech-101 and MSRC-v1 datasets, we extract the following six popular visual features for each image: On one hand, we extract three holistic visual features for each image, i. [sent-295, score-0.132]
59 45 dimension color moment (CMT) [24]; 512 dimension GIST feature [17]; 1302 dimension CENTRIST feature [23]. [sent-297, score-0.422]
60 256 dimension local binary pattern (LBP) [16]; 576 dimension × HOG feature and famous 128 dimension DoG-SIFT descriptor [15]. [sent-300, score-0.333]
61 Handwritten numerals (HW) Handwritten numerals dataset consists of 2000 data point for 0 to 9 ten digit classes. [sent-301, score-0.398]
62 ) We use the published six visual features [10] extracted from each image. [sent-303, score-0.072]
63 Animal with attributes (AWA) Animal with attributes data set is the largest data set, which is also an image data set consisting of 6 feature 50 classes. [sent-305, score-0.057]
64 We randomly sample 50 images for each class and get 2500 images in total. [sent-306, score-0.1]
65 We utilize all the published features, that is, 2688 dimension Color Histogram (CQ) features , 2000 dimension Local Self-Similarity (LSS) features , 252 dimension PyramidHOG (PHOG) features, 2000 dimension SIFT features, 2000 dimension colorSIFT (RGSIFT) feature and 2000 dimension SURF features. [sent-307, score-0.669]
66 (1) with 7-nearest neighbor to get the affinity matrices for different visual features. [sent-311, score-0.138]
67 r is the parameter to control the distribution of the weights for different feature modalities, which we will discuss in detail later. [sent-318, score-0.129]
68 Classification Results Comparison First of all, in order to test the feature integration power of our method, we compare classification performance using all the feature modalities with that using only one feature modality. [sent-325, score-0.428]
69 We also compare our methods with some graph based state-of-the-art semi-supervised learning methods: (a) the harmonic function (HF) approach [28], (b) learning with local and global consistency approach (LGC) [26] and (c) the random walk approach (RW) [27]. [sent-328, score-0.134]
70 For each of the above three methods, we use the kernel addition (KA), that is, the simple average of equal weighted Laplacian matrices or the graph Laplacian of the concatenated features of all modalities (FC) as the input for HF, LGC as well as RW. [sent-329, score-0.296]
71 Since Multiple Kernel Learning (MKL) approaches [18] can also realize feature integration if we consider one feature modality as one kernel, we report its classification result as well. [sent-332, score-0.638]
72 Moreover, since our method can learn the weight for each feature modality adaptively, we compare the results of our model using equal weight (MMSS). [sent-333, score-0.618]
73 As for performance evaluation, we utilize the widely-used performance metrics, average macro classification accuracy as well as average micro classification accuracy for each class. [sent-335, score-0.486]
74 Average macro classification accuracy is shown in Table 4 and micro accuracy for all the datasets are shown in Fig. [sent-336, score-0.415]
75 We can see that our method always achieves consistently better results than the other state-of-art methods in terms of average macro classification accuracy and choosing different weights for different features can even boost the performance of multi modality semi-supervised learning results. [sent-338, score-0.794]
76 As for average micro classification accuracy, the results of AMMSS are the best for most classes. [sent-339, score-0.206]
77 The confusion matrices of MSRCV1, Caltech101-7 and Handwritten numerals are shown in Fig. [sent-340, score-0.244]
78 Moreover, since our method can learn the weight for each feature modality after convergence, we add the generalization ability of the objective function Eq. [sent-342, score-0.539]
79 Instead of treating each feature modality equally, our method can do weighting each feature modality and classification simultaneously. [sent-348, score-0.991]
80 − (a) MSRCV-1 (b) Caltech101-7 (c) handwritten number ? [sent-349, score-0.185]
81 The average macro classification accuracy compared with single view on Caltech101-7, Caltech101-20 and MSRCV1 datasets. [sent-384, score-0.28]
82 The average macro classification accuracy compared with single view on Handwritten numerals dataset. [sent-389, score-0.479]
83 9M8SS At last, we test the convergency speed of our AMMSS algorithm, which is shown in Fig. [sent-397, score-0.057]
84 The convergency (b) Caltech101 − 20 (c) MSRCV1 (d) Handwritten (e) AwA of five datasets (a) Caltech101-7 (b) Caltech101-20 (c) MSRCV1 (d) Handwritten numerals (e) AwA Table 3. [sent-412, score-0.292]
85 The average macro classification accuracy compared with single view on animal with attribute dataset. [sent-413, score-0.367]
86 The average macro classification accuracy compared with baseline methods on all datasets. [sent-422, score-0.28]
87 Conclusion In this paper, we proposed a novel adaptive multimodality semi-supervised learning method (AMMSS) to jointly learn the weight for each feature modality as well as the common class labels for the unlabeled data. [sent-425, score-0.846]
88 Utilizing our algorithm, decomposing the original problem into three convex subproblems, we can solve the proposed mod- The feature modaitly nidex Figure 4. [sent-426, score-0.057]
89 The learned weight factor for different modalities on five dataset. [sent-427, score-0.251]
90 The feature index on x-axis from 1 to 6 stands for CMT, LBP, GIST, HOG, CENTRIST and DOG-SIFT respectively for Caltech7, Caltech-20 and MSRCV1 datasets. [sent-428, score-0.096]
91 And the index on x-axis from 1to 6 stands for FOU, FAC, KAR, PIX, ZER, MOR respectively for Handwritten numerals dataset. [sent-429, score-0.238]
92 We use a single parameter r to control the weights for different feature modalities, avoiding the trouble of tuning lots of parame- ters. [sent-432, score-0.129]
93 Our method has been evaluated on five benchmark datasets and achieves the best performance with comparison to five state-of-art methods in terms of macro and micro classification accuracy. [sent-433, score-0.487]
94 In the future work, we will adjust the proposed method to the additional text or attribute feature modality evaluated on the recently widely studied large image dataset with associated tags or attributes. [sent-434, score-0.492]
95 With the help of more general text or attribute representations other 11774433 than visual features only, we hope to explore more powerful shared common label matrix and improve the state-of-art performance on the corresponding benchmark datasets. [sent-435, score-0.171]
96 A learning framework using function and kernel regularization with application to recommender system. [sent-497, score-0.086]
97 The pyramid match kernel: Discriminative classification with sets of image features. [sent-518, score-0.071]
98 Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. [sent-549, score-0.071]
99 Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. [sent-585, score-0.224]
100 Learning from labeled and unlabeled data using random walks. [sent-623, score-0.251]
wordName wordTfidf (topN-words)
[('ammss', 0.454), ('modality', 0.403), ('macro', 0.209), ('numerals', 0.199), ('unlabeled', 0.194), ('handwritten', 0.185), ('heterogeneous', 0.183), ('gt', 0.148), ('modalities', 0.136), ('nie', 0.135), ('micro', 0.135), ('tr', 0.111), ('awa', 0.11), ('cai', 0.103), ('dimension', 0.092), ('integrate', 0.08), ('tl', 0.08), ('weight', 0.079), ('laplacian', 0.076), ('centrist', 0.076), ('yl', 0.074), ('classification', 0.071), ('rp', 0.068), ('class', 0.068), ('yi', 0.065), ('affinity', 0.061), ('subproblems', 0.058), ('feature', 0.057), ('labeled', 0.057), ('convergency', 0.057), ('garfield', 0.057), ('gut', 0.057), ('gyul', 0.057), ('lss', 0.057), ('motorbikes', 0.057), ('propogation', 0.057), ('rgsift', 0.057), ('snoopy', 0.057), ('windsorchair', 0.057), ('yyul', 0.057), ('matrix', 0.056), ('pages', 0.055), ('animal', 0.055), ('label', 0.053), ('sch', 0.051), ('categorization', 0.051), ('abstracted', 0.05), ('lgc', 0.05), ('kar', 0.05), ('phog', 0.05), ('zer', 0.05), ('integration', 0.05), ('mor', 0.047), ('rtr', 0.047), ('fou', 0.047), ('calculate', 0.047), ('gu', 0.046), ('learning', 0.045), ('matrices', 0.045), ('international', 0.044), ('graph', 0.044), ('tv', 0.043), ('cq', 0.042), ('cmt', 0.042), ('arlington', 0.042), ('six', 0.042), ('kernel', 0.041), ('multimodal', 0.041), ('heng', 0.04), ('unsegmented', 0.04), ('dii', 0.04), ('fuse', 0.039), ('incremental', 0.039), ('stands', 0.039), ('fac', 0.037), ('popularly', 0.037), ('adaptively', 0.036), ('pix', 0.036), ('weights', 0.036), ('control', 0.036), ('gist', 0.036), ('five', 0.036), ('consensus', 0.035), ('sydney', 0.035), ('substitute', 0.034), ('hw', 0.034), ('get', 0.032), ('attribute', 0.032), ('fixing', 0.032), ('moment', 0.032), ('texas', 0.032), ('gi', 0.032), ('hf', 0.031), ('appendix', 0.031), ('conference', 0.031), ('alternatively', 0.031), ('features', 0.03), ('derivative', 0.03), ('proof', 0.03), ('holistic', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
2 0.17290446 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: In multi-label image annotations, because each image is associated to multiple categories, the semantic terms (label classes) are not mutually exclusive. Previous research showed that such label correlations can largely boost the annotation accuracy. However, all existing methods only directly apply the label correlation matrix to enhance the label inference and assignment without further learning the structural information among classes. In this paper, we model the label correlations using the relational graph, and propose a novel graph structured sparse learning model to incorporate the topological constraints of relation graph in multi-label classifications. As a result, our new method will capture and utilize the hidden class structures in relational graph to improve the annotation results. In proposed objective, a large number of structured sparsity-inducing norms are utilized, thus the optimization becomes difficult. To solve this problem, we derive an efficient optimization algorithm with proved convergence. We perform extensive experiments on six multi-label image annotation benchmark data sets. In all empirical results, our new method shows better annotation results than the state-of-the-art approaches.
3 0.15457143 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
4 0.13329983 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
Author: Kaiye Wang, Ran He, Wei Wang, Liang Wang, Tieniu Tan
Abstract: Cross-modal matching has recently drawn much attention due to the widespread existence of multimodal data. It aims to match data from different modalities, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous works mainly focus on solving the first problem. In this paper, we propose a novel coupled linear regression framework to deal with both problems. Our method learns two projection matrices to map multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the ?21-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, which enhances the relevance of different modal data with connections. We also present an iterative algorithm based on halfquadratic minimization to solve the proposed regularized linear regression problem. The experimental results on two challenging cross-modal datasets demonstrate that the proposed method outperforms the state-of-the-art approaches.
5 0.12944886 6 iccv-2013-A Convex Optimization Framework for Active Learning
Author: Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
Abstract: In many image/video/web classification problems, we have access to a large number of unlabeled samples. However, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informative unlabeled samples, in order to obtain a high classification performance. Most existing active learning algorithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant information overlap or they involve solving a non-convex optimization. More importantly, the majority of active learning algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunction with any type of classifiers, including those of the fam- ily of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classifier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informative unlabeled samples, which have the least information overlap. Our method can incorporate the data distribution in the selection process by using the appropriate dissimilarity between pairs of samples. We show the effectiveness of our framework in person detection, scene categorization and face recognition on real-world datasets.
6 0.12347181 142 iccv-2013-Ensemble Projection for Semi-supervised Image Classification
7 0.11782884 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
8 0.090057358 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces
9 0.087778099 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
10 0.080975905 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
11 0.078282483 238 iccv-2013-Learning Graphs to Match
12 0.075425357 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
13 0.070712656 271 iccv-2013-Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction
14 0.069483295 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
15 0.068454131 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
16 0.066776581 31 iccv-2013-A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects
17 0.066664115 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
18 0.066096343 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
19 0.064431757 81 iccv-2013-Combining the Right Features for Complex Event Recognition
20 0.063876383 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
topicId topicWeight
[(0, 0.175), (1, 0.076), (2, -0.037), (3, -0.063), (4, -0.018), (5, 0.03), (6, -0.029), (7, 0.005), (8, 0.054), (9, -0.047), (10, -0.014), (11, -0.042), (12, -0.025), (13, -0.017), (14, 0.055), (15, -0.01), (16, -0.037), (17, -0.025), (18, -0.045), (19, -0.003), (20, 0.001), (21, 0.023), (22, -0.077), (23, 0.015), (24, 0.043), (25, 0.015), (26, 0.073), (27, 0.079), (28, 0.101), (29, 0.041), (30, 0.031), (31, 0.075), (32, 0.026), (33, -0.024), (34, -0.068), (35, 0.064), (36, 0.009), (37, -0.02), (38, -0.023), (39, 0.032), (40, 0.085), (41, -0.087), (42, -0.069), (43, 0.007), (44, 0.069), (45, -0.092), (46, 0.059), (47, -0.001), (48, 0.001), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.93071806 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
2 0.86572248 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
Author: Bo Wang, Zhuowen Tu, John K. Tsotsos
Abstract: In graph-based semi-supervised learning approaches, the classification rate is highly dependent on the size of the availabel labeled data, as well as the accuracy of the similarity measures. Here, we propose a semi-supervised multi-class/multi-label classification scheme, dynamic label propagation (DLP), which performs transductive learning through propagation in a dynamic process. Existing semi-supervised classification methods often have difficulty in dealing with multi-class/multi-label problems due to the lack in consideration of label correlation; our algorithm instead emphasizes dynamic metric fusion with label information. Significant improvement over the state-of-the-art methods is observed on benchmark datasets for both multiclass and multi-label tasks.
3 0.8520211 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: In multi-label image annotations, because each image is associated to multiple categories, the semantic terms (label classes) are not mutually exclusive. Previous research showed that such label correlations can largely boost the annotation accuracy. However, all existing methods only directly apply the label correlation matrix to enhance the label inference and assignment without further learning the structural information among classes. In this paper, we model the label correlations using the relational graph, and propose a novel graph structured sparse learning model to incorporate the topological constraints of relation graph in multi-label classifications. As a result, our new method will capture and utilize the hidden class structures in relational graph to improve the annotation results. In proposed objective, a large number of structured sparsity-inducing norms are utilized, thus the optimization becomes difficult. To solve this problem, we derive an efficient optimization algorithm with proved convergence. We perform extensive experiments on six multi-label image annotation benchmark data sets. In all empirical results, our new method shows better annotation results than the state-of-the-art approaches.
4 0.78663605 142 iccv-2013-Ensemble Projection for Semi-supervised Image Classification
Author: Dengxin Dai, Luc Van_Gool
Abstract: This paper investigates the problem of semi-supervised classification. Unlike previous methods to regularize classifying boundaries with unlabeled data, our method learns a new image representation from all available data (labeled and unlabeled) andperformsplain supervised learning with the new feature. In particular, an ensemble of image prototype sets are sampled automatically from the available data, to represent a rich set of visual categories/attributes. Discriminative functions are then learned on these prototype sets, and image are represented by the concatenation of their projected values onto the prototypes (similarities to them) for further classification. Experiments on four standard datasets show three interesting phenomena: (1) our method consistently outperforms previous methods for semi-supervised image classification; (2) our method lets itself combine well with these methods; and (3) our method works well for self-taught image classification where unlabeled data are not coming from the same distribution as la- beled ones, but rather from a random collection of images.
5 0.72997558 332 iccv-2013-Quadruplet-Wise Image Similarity Learning
Author: Marc T. Law, Nicolas Thome, Matthieu Cord
Abstract: This paper introduces a novel similarity learning framework. Working with inequality constraints involving quadruplets of images, our approach aims at efficiently modeling similarity from rich or complex semantic label relationships. From these quadruplet-wise constraints, we propose a similarity learning framework relying on a convex optimization scheme. We then study how our metric learning scheme can exploit specific class relationships, such as class ranking (relative attributes), and class taxonomy. We show that classification using the learned metrics gets improved performance over state-of-the-art methods on several datasets. We also evaluate our approach in a new application to learn similarities between webpage screenshots in a fully unsupervised way.
6 0.71484995 227 iccv-2013-Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning
7 0.70240808 222 iccv-2013-Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers
8 0.69257611 431 iccv-2013-Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias
9 0.68925381 6 iccv-2013-A Convex Optimization Framework for Active Learning
10 0.68249911 177 iccv-2013-From Point to Set: Extend the Learning of Distance Metrics
11 0.67959124 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
12 0.6600495 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
14 0.64008224 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds
15 0.63890988 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
16 0.63090843 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
17 0.60346365 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
19 0.59580839 191 iccv-2013-Handling Uncertain Tags in Visual Recognition
20 0.59383553 248 iccv-2013-Learning to Rank Using Privileged Information
topicId topicWeight
[(2, 0.079), (7, 0.025), (26, 0.078), (31, 0.053), (35, 0.013), (42, 0.103), (61, 0.204), (64, 0.071), (73, 0.028), (77, 0.016), (78, 0.059), (89, 0.141), (98, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.79789358 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
2 0.78597558 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
3 0.72747201 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: In multi-label image annotations, because each image is associated to multiple categories, the semantic terms (label classes) are not mutually exclusive. Previous research showed that such label correlations can largely boost the annotation accuracy. However, all existing methods only directly apply the label correlation matrix to enhance the label inference and assignment without further learning the structural information among classes. In this paper, we model the label correlations using the relational graph, and propose a novel graph structured sparse learning model to incorporate the topological constraints of relation graph in multi-label classifications. As a result, our new method will capture and utilize the hidden class structures in relational graph to improve the annotation results. In proposed objective, a large number of structured sparsity-inducing norms are utilized, thus the optimization becomes difficult. To solve this problem, we derive an efficient optimization algorithm with proved convergence. We perform extensive experiments on six multi-label image annotation benchmark data sets. In all empirical results, our new method shows better annotation results than the state-of-the-art approaches.
4 0.72407985 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
5 0.71996891 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding
Author: Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
Abstract: We present a multi-attributed dictionary learning algorithm for sparse coding. Considering training samples with multiple attributes, a new distance matrix is proposed by jointly incorporating data and attribute similarities. Then, an objective function is presented to learn categorydependent dictionaries that are compact (closeness of dictionary atoms based on data distance and attribute similarity), reconstructive (low reconstruction error with correct dictionary) and label-consistent (encouraging the labels of dictionary atoms to be similar). We have demonstrated our algorithm on action classification and face recognition tasks on several publicly available datasets. Experimental results with improved performance over previous dictionary learning methods are shown to validate the effectiveness of the proposed algorithm.
6 0.71966606 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition
7 0.71935385 180 iccv-2013-From Where and How to What We See
8 0.71854717 131 iccv-2013-EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory
9 0.71766162 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
10 0.71659136 150 iccv-2013-Exemplar Cut
11 0.71114278 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
12 0.71107376 252 iccv-2013-Line Assisted Light Field Triangulation and Stereo Matching
13 0.71095145 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
14 0.70961839 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
15 0.70949328 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
16 0.70846343 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
17 0.7078523 277 iccv-2013-Multi-channel Correlation Filters
18 0.70766556 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
19 0.70525497 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
20 0.70493841 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning