cvpr cvpr2013 cvpr2013-377 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.
Reference: text
sentIndex sentText sentNum sentScore
1 tw ai Abstract Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. [sent-8, score-1.04]
2 However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. [sent-9, score-1.556]
3 In this paper, we propose a sample-specific late fusion method to address this issue. [sent-10, score-1.055]
4 Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. [sent-11, score-2.086]
5 In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. [sent-12, score-1.873]
6 Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. [sent-14, score-1.105]
7 To the best knowledge, this is the first method supporting sample-specific fusion weight learning. [sent-15, score-0.703]
8 Introduction Recently, “multi-feature late fusion” has been advocated in the computer vision community, and its effectiveness has been demonstrated in various applications such as object recognition [22, 24], biometric analysis [15], video event detection [14, 24]. [sent-17, score-0.529]
9 Given multiple classifiers trained with different low-level features, late fusion tries to combine the prediction scores of all classifiers (the prediction score of each sample generated by a classifier indicates the confidence of classifying the sample as positive). [sent-18, score-1.833]
10 Such a fusion method is expected to assign positive samples higher fusion scores than the negative ones so that the overall performance can be improved. [sent-19, score-1.641]
11 An illustration of the proposed sample-specific late fusion method. [sent-21, score-1.055]
12 Given n images and their prediction score vectors si (i = 1, . [sent-22, score-0.287]
13 , n), where the images with green and red borders are respectively labeled as positive and negative while the others are unlabeled, we want to learn a fusion weight vector wi for each sample. [sent-25, score-0.921]
14 The problem is cast into an information propagation procedure which propagates the fusion weights of the labeled images to the individual unlabeled ones along a graph built on low-level features. [sent-26, score-0.975]
15 During the propagation, we use an infinite push constraint to ensure the positive samples have higher fusion scores than the negative samples. [sent-27, score-1.183]
16 si can be used to rank the images where the positive images will appear at the top positions of the rank list. [sent-29, score-0.239]
17 individual classifier and also produces highly comparative results to multi-feature early fusion methods [21, 24]. [sent-31, score-0.741]
18 The simplest approach to late fusion is to estimate a fixed weight for each classifier and then use a weighted summation of the prediction scores as the fusion result. [sent-32, score-2.043]
19 Obviously, this assumes all the prediction scores of a classifier share the same weight and fails to consider the differences of the classifier’s prediction capability on individual samples. [sent-33, score-0.507]
20 A classifier, in fact, does have different prediction capabilities on different samples, where some samples are correctly predicted while others are not. [sent-34, score-0.255]
21 Therefore, instead of using a fixed weight for each classifier, a promising alternative is to estimate the specific fusion weights for each sample to 8 8 80 0 03 1 1 alleviate the individual prediction errors from the imperfect classifiers and achieve robust fusion. [sent-35, score-1.074]
22 Discovering the sample specific fusion weights is a nontrivial task due to the following issues. [sent-36, score-0.806]
23 First, given the prediction scores of a test sample, since its label information is unavailable, it is unknown how to determine the sample specific fusion weights for such an unlabeled sample. [sent-37, score-1.119]
24 Second, to get a robust late fusion result, we need to maximally ensure positive samples have the highest fusion scores in the fusion result. [sent-38, score-2.647]
25 Indeed, the visual recognition task can be seen as a ranking process that aims at assigning positive samples higher scores than the negative samples. [sent-39, score-0.385]
26 In this paper, we address the above issues by proposing the Sample Specific Late Fusion (SSLF) method, which learns the optimal sample-specific fusion weights from supervision information while directly enforcing that positive samples have the highest fusion scores in the fusion result. [sent-40, score-2.377]
27 Specifically, we define the fusion process as an information propagation procedure which propagates the fusion weights learned on the individual labeled samples to the individual unlabeled ones. [sent-42, score-1.786]
28 The propagation is guided by a graph built on low-level features of all samples, which enforces visually similar samples have similar fusion scores and offers the capability to infer fusion weights for unlabeled samples. [sent-43, score-1.758]
29 To ensure most positive samples have the highest fusion scores as possible, we use the L∞ norm infinite push constraint to minimize the number of positive samples scored lower than the highest-scored negative sample. [sent-44, score-1.4]
30 By this propagation process, we identify the optimal sample-specific fusion weights and push positive samples to have the highest fusion scores. [sent-45, score-1.726]
31 [15] employed the Gaussian mixture model to approximate the score distributions of the classifier, and then performed score fusion using likelihood ratio test. [sent-49, score-0.811]
32 [22] developed a supervised late fusion method which tried to minimize the classification error rates under L1 constraints on the fusion weights. [sent-51, score-1.712]
33 However, these works focus on classifier-level fusion which determines a fixed weight for all prediction scores of a specific classifier. [sent-52, score-0.952]
34 Such fusion methods blindly treat the prediction scores of a classifier as equally important and cannot optimally determine the fusion weights for each sample. [sent-53, score-1.734]
35 [14] proposed a local expert forest model for late fusion, which partitioned the score space into local regions and learned the local fusion weights in each region. [sent-55, score-1.199]
36 However, the learning can only be performed on the training samples whose label information is provided, and hence cannot be applied to learn the fusion weights on the test samples. [sent-56, score-0.873]
37 One promising work that tries to obtain sample specific fusion scores is the low rank late fusion method proposed by Ye et al [24]. [sent-58, score-1.979]
38 Specifically, they converted the prediction score vectors of multiple classifiers into various pairwise relation matrices and then extracted a shared rank-2 matrix by decomposing each original matrix into a common rank-2 matrix and a sparse residual matrix. [sent-59, score-0.371]
39 Finally, a score vector is extracted from the rank-2 matrix as the late fusion result. [sent-60, score-1.158]
40 As a result, it totally depends on the agreement of different classifiers, which may blindly bring the common prediction errors shared across different classifiers into the final fusion. [sent-62, score-0.261]
41 Instead, we focus on learning the optimal fusion weights for the individual samples by exploiting the supervision information, which accounts for the differences in the classifiers’ prediction abilities on the individual samples, and hence achieve robust fusion. [sent-63, score-1.094]
42 We are motivated by the recent infinite push ranking method in machine learning. [sent-64, score-0.257]
43 One representative work is the support vector infinite push method [1], which introduces the L∞ push loss function into the learning-to-rank problem with the purpose ofmaximizing the number ofpositive samples on the absolute top of the list [20]. [sent-65, score-0.449]
44 [19] further developed a sparse support vector infinite push method, which incorporated feature selection into the support vector infinite push method. [sent-67, score-0.398]
45 However, these methods can only learn a uniform ranking function for all the test samples, and cannot be applied to the sample specific fusion weight learning. [sent-68, score-0.843]
46 Our method is related to instance-specific metric learning [26], which aims at deriving a proper distance metric for each instance rather than optimally determining the fusion weights of each instance for ranking. [sent-70, score-0.747]
47 The proposed method works in a transductive setting in which llabeled samples {xi, yi}il=1 and u unlabeled samples {xi}li+=ul+1 are aamvapil aebsl e{,x whe}re yi ∈ {0, 1} is the lsaabmelp olefs sample xi. [sent-80, score-0.431]
48 Specifically, the labeled samples are responsible for providing supervision information while the 888880000044222 unlabeled samples correspond to test samples whose prediction confidences are expected from the fusion. [sent-81, score-0.708]
49 Since our method works on the prediction scores of the classifiers, it is important to note that the labeled samples employed in our method should be disjoint from the training samples used for classifier training. [sent-82, score-0.614]
50 This is due to the fact that the ground-truth labels of the training samples have been exploited by the classifiers, making the prediction scores on these training samples bias toward the ground-truth labels. [sent-83, score-0.526]
51 Such prediction scores cannot reflect the classifier’s prediction capabilities on unseen samples, defeating the value of a fusion method. [sent-84, score-1.013]
52 Even when the validation set is not available, we can also obtain such samples by splitting from the training samples before classifier training or crawling online resources. [sent-88, score-0.386]
53 By applying the classifiers on the labeled samples and unlabeled samples, we obtain a labeled score vector set {si , yi}il=1 and unlabeled score vector set where si = [si1, . [sent-89, score-0.706]
54 denotes the prediction score vector of sample xi (i = 1, . [sent-93, score-0.3]
55 , l + u) with being the prediction score of the j-th classifier Cj . [sent-96, score-0.262]
56 , sl+u], where the positive samples are placed before the negative samples and all unlabeled samples are placed in the last columns of the matrix. [sent-101, score-0.566]
57 Problem Formulation We want to learn a sample-specific fusion function fi(si) = wi? [sent-104, score-0.657]
58 is a non-negative fusion weight vector with being the fusion weight of . [sent-112, score-1.406]
59 Obviously, we can directly derive the fusion weights of the labeled samples based on their label information. [sent-113, score-0.904]
60 However, it is non-trivial to learn fusion weights for the unlabeled samples since there is no supervision information that can be directly applied. [sent-114, score-0.997]
61 Our late fusion method is formulated as follows: mWin s. [sent-124, score-1.055]
62 , wl+u] consists of l+ u fusion weight vectors to be derived for both labeled and unlabeled samples, and is a trade-off parameter among the two competing terms. [sent-133, score-0.841]
63 The first term is a regularization term designed λ for the purpose of fusion weight propagation: ? [sent-134, score-0.703]
64 sion score propagation over the graph structure, making similar samples have similar fusion scores. [sent-160, score-0.905]
65 However, this ignores fth wee prediction scores o ? [sent-168, score-0.228]
66 idual test samples and does not fully take advantage of the prediction capability of the trained classifiers. [sent-170, score-0.276]
67 The second term is an infinite push loss function [1], which tries to minimize the number of positive samples scored below the highest-scored negative. [sent-171, score-0.448]
68 In fact, the number of positive samples scored below the highest-scored negative is exactly the maximum number ofpositives scored below any negative, which is defined as: ? [sent-172, score-0.305]
69 ≥ 0, a− ≥ 0, (17) where a = a− and Gj denotes the indices of the positive samples in− v aectaorn a that are coupled with the negative sample sj . [sent-288, score-0.323]
70 Experiments In this section, we will evaluate the proposed late fusion method by applying it to various visual recognition tasks including object categorization and video event detection. [sent-333, score-1.187]
71 This is actually the most common method for early fusion of multiple features and is proved to achieve highly comparative results as multiple kernel learning [9]. [sent-335, score-0.657]
72 (2) Average Late Fusion (ALF), we directly average the prediction scores from all the classifiers as the fusion results. [sent-337, score-0.952]
73 (3) Low Rank Late Fusion (LRLF), in this method, the prediction scores of each classifier are first converted into a binary comparative relationship matrix and a shared rank-2 matrix is then discovered across all matrices. [sent-338, score-0.358]
74 The final fusion score vector can be extracted from the rank-2 matrix by matrix decomposition. [sent-339, score-0.786]
75 (4) Fixed Weight Late Fusion (FWLF), instead of learning sample-specific fusion functions, we learn a fixed fusion function f(s) = w? [sent-340, score-1.314]
76 To achieve this, we replace the fusion function fi(si) = wi? [sent-343, score-0.657]
77 Following previous work on late fusion [24], we employ the probabilistic outputs of the one-vs-all SVM classifier as the prediction scores, in which each value measures the possibility of classifying a sample as positive. [sent-350, score-1.301]
78 To generate the labeled sample set for late fusion, 888880000077555 SLVFAWRKMLAF[8]7p l6a3204. [sent-376, score-0.512]
79 ExampleiLAFKSaWARLSgF e“Fcs(a1 t351(6”0a 52n61732)dtheirankpositn KSFLA WtRhL F“ec(1a f80r26(u”7512s3780)ion score rank list obtained from different fusion methods. [sent-427, score-0.787]
80 For each method, the rank list is obtained by ranking all 4, 952 test images in descending order based on the fusion scores. [sent-428, score-0.768]
81 we uniformly divide the training samples of each category into 5 folds, and then use 4 folds as the training data for SVM training while using the remaining 1 fold as the labeled sample set for late fusion. [sent-429, score-0.763]
82 From the results, we have the following observations: (1) The proposed SSLF method consistently beats all the other baseline methods by a large margin, which demonstrates its effectiveness in determining the optimal fusion weights for each sample. [sent-434, score-0.748]
83 (2) The LRLF, FWLF and SSLF late fusion methods all outperform the ALF method. [sent-435, score-1.055]
84 This is due to the fact that the former methods take advantage of additional knowledge (either consistent score patterns across the classifiers or supervision information) while the latter only blindly averages the scores from different classifiers without accounting their difference. [sent-436, score-0.417]
85 (3) The sample level late fusion methods including LRLF and SSLF outperform the FWLF. [sent-437, score-1.116]
86 The reason may be that FWLF only tries to learn uniform fusion weights for all the samples and hence cannot discover the optimal fusion weights for each sample. [sent-438, score-1.607]
87 This clearly demonstrates that our method is able to assign higher fusion scores to the positive samples. [sent-442, score-0.808]
88 In our experiments, we also observe that prediction scores from more reliable classifiers tend to have higher fusion weights than the scores from the less reliable classifiers. [sent-443, score-1.119]
89 Figure 3 shows the rank positions of some example images after ranking the 4, 952 test images based on fusion scores of different methods. [sent-444, score-0.868]
90 Figure 4 shows the image ranking results of different fusion methods. [sent-457, score-0.715]
91 Top 15 images ranked with the fusion scores of different methods. [sent-478, score-0.757]
92 Following the experiment χ2 setting on PASCAL VOC’07, we uniformly split the training samples into 5 folds and use 4 folds for SVM training and 1fold for learning fusion weight. [sent-500, score-0.902]
93 Discussion We note that we can apply the classical out-of-sample extension method in transductive learning to estimate the fusion score of a new sample [5, 10, 24]. [sent-533, score-0.826]
94 Based on the neighborhood set, the late fusion score can be determined as f(z) = ? [sent-535, score-1.132]
95 si is the fusion score of xi obtained on the original dataset. [sent-541, score-0.85]
96 In this way, we obtain the fusion score for the unseen sample. [sent-542, score-0.734]
97 Conclusions We have introduced a sample-specific late fusion method to learn the optimal fusion weights for each sample. [sent-544, score-1.779]
98 The proposed method works in a transductive setting which the fusion weights of the labeled samples to the individual unlabeled samples, while leveraging the infinite push constraint to enforce positive samples to have higher fusion scores than negative samples. [sent-545, score-2.23]
99 For future work, we will pursue the sample-specific late fusion for multi-class and multi-label visual recognition tasks. [sent-548, score-1.055]
100 Local expert forest of score fusion for video event classification. [sent-617, score-0.84]
wordName wordTfidf (topN-words)
[('fusion', 0.657), ('late', 0.398), ('sslf', 0.237), ('fwlf', 0.218), ('lrlf', 0.146), ('prediction', 0.128), ('samples', 0.127), ('push', 0.123), ('scores', 0.1), ('admm', 0.094), ('unlabeled', 0.085), ('si', 0.082), ('alf', 0.081), ('score', 0.077), ('infinite', 0.076), ('event', 0.074), ('weights', 0.067), ('classifiers', 0.067), ('wi', 0.065), ('wedding', 0.064), ('supervision', 0.061), ('sample', 0.061), ('ranking', 0.058), ('classifier', 0.057), ('labeled', 0.053), ('rank', 0.053), ('positive', 0.051), ('ak', 0.051), ('negative', 0.049), ('mja', 0.048), ('bow', 0.047), ('wj', 0.047), ('weight', 0.046), ('trecvid', 0.046), ('ka', 0.046), ('flower', 0.045), ('blindly', 0.045), ('med', 0.044), ('propagation', 0.044), ('propagates', 0.042), ('taiwan', 0.042), ('xk', 0.041), ('ceremony', 0.04), ('aij', 0.039), ('mwin', 0.039), ('scored', 0.039), ('folds', 0.037), ('mfcc', 0.036), ('woodworking', 0.036), ('ja', 0.036), ('oxford', 0.036), ('wk', 0.036), ('kinds', 0.036), ('sj', 0.035), ('feeding', 0.034), ('xi', 0.034), ('nandakumar', 0.032), ('terrades', 0.032), ('video', 0.032), ('tries', 0.032), ('transductive', 0.031), ('validation', 0.031), ('trick', 0.031), ('voc', 0.03), ('ccv', 0.03), ('landing', 0.03), ('gij', 0.028), ('multipliers', 0.027), ('alternating', 0.027), ('individual', 0.027), ('il', 0.027), ('matrix', 0.026), ('categorization', 0.026), ('pascal', 0.025), ('animal', 0.025), ('biometric', 0.025), ('videos', 0.024), ('beats', 0.024), ('sij', 0.024), ('calculate', 0.024), ('ye', 0.024), ('fi', 0.024), ('fish', 0.023), ('optimally', 0.023), ('training', 0.022), ('ip', 0.022), ('stip', 0.022), ('xn', 0.022), ('svm', 0.022), ('hsv', 0.021), ('fold', 0.021), ('eigenmaps', 0.021), ('deviations', 0.021), ('capability', 0.021), ('iq', 0.021), ('ul', 0.021), ('shared', 0.021), ('specific', 0.021), ('columbia', 0.02), ('belkin', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.
2 0.33057517 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit
Author: Yan Pan, Hanjiang Lai, Cong Liu, Shuicheng Yan
Abstract: Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. We also provide high probability recovery guarantees of the base algorithm. The proposed method is evaluated on various fusion problems in object categorization and video event detection. Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection.
3 0.25281852 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
Author: Chunfeng Yuan, Xi Li, Weiming Hu, Haibin Ling, Stephen Maybank
Abstract: Spatio-temporal interest points serve as an elementary building block in many modern action recognition algorithms, and most of them exploit the local spatio-temporal volume features using a Bag of Visual Words (BOVW) representation. Such representation, however, ignorespotentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a new global feature to capture the detailed geometrical distribution of interest points. It is calculated by using the ℛ transform which is defined as an extended 3D discrete Rℛa tdroann transform, followed by applying a tewdo 3-dDir decitsicorneatel two-dimensional principal component analysis. Such ℛ feature captures the geometrical information of the Sinuctehre ℛst points and keeps invariant to geometry transformation and robust to noise. In addition, we propose a new fusion strategy to combine the ℛ feature with the BOVW representation for further improving recognition accuracy. Wpree suetnilitzaea context-aware fusion method to capture both the pairwise similarities and higher-order contextual interactions of the videos. Experimental results on several publicly available datasets demonstrate the effectiveness of the proposed approach for action recognition.
4 0.11029947 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
Author: Xi Li, Chunhua Shen, Anthony Dick, Anton van_den_Hengel
Abstract: A key problem in visual tracking is to represent the appearance of an object in a way that is robust to visual changes. To attain this robustness, increasingly complex models are used to capture appearance variations. However, such models can be difficult to maintain accurately and efficiently. In this paper, we propose a visual tracker in which objects are represented by compact and discriminative binary codes. This representation can be processed very efficiently, and is capable of effectively fusing information from multiple cues. An incremental discriminative learner is then used to construct an appearance model that optimally separates the object from its surrounds. Furthermore, we design a hypergraph propagation method to capture the contextual information on samples, which further improves the tracking accuracy. Experimental results on challenging videos demonstrate the effectiveness and robustness of the proposed tracker.
5 0.10535467 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
Author: Lin Chen, Lixin Duan, Dong Xu
Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.
6 0.10234138 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
7 0.091234699 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
8 0.090725832 219 cvpr-2013-In Defense of 3D-Label Stereo
9 0.090588391 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
10 0.083397895 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
11 0.079109192 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies
12 0.078072689 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
13 0.075356305 364 cvpr-2013-Robust Object Co-detection
14 0.074722923 34 cvpr-2013-Adaptive Active Learning for Image Classification
15 0.073465712 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
16 0.072460286 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields
17 0.070958205 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints
18 0.070928201 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
19 0.069227271 390 cvpr-2013-Semi-supervised Node Splitting for Random Forest Construction
20 0.068530194 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
topicId topicWeight
[(0, 0.163), (1, -0.061), (2, -0.03), (3, -0.007), (4, 0.029), (5, 0.024), (6, -0.055), (7, -0.038), (8, -0.036), (9, 0.025), (10, 0.006), (11, -0.061), (12, -0.005), (13, -0.045), (14, -0.095), (15, -0.051), (16, -0.024), (17, -0.059), (18, 0.011), (19, -0.078), (20, -0.118), (21, -0.03), (22, -0.023), (23, -0.038), (24, 0.056), (25, 0.039), (26, -0.014), (27, 0.083), (28, 0.028), (29, -0.046), (30, -0.008), (31, 0.007), (32, -0.011), (33, -0.068), (34, -0.047), (35, 0.051), (36, 0.06), (37, -0.078), (38, 0.063), (39, -0.096), (40, 0.048), (41, -0.198), (42, -0.03), (43, -0.005), (44, 0.108), (45, 0.067), (46, 0.038), (47, 0.181), (48, -0.126), (49, 0.159)]
simIndex simValue paperId paperTitle
same-paper 1 0.94731116 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.
2 0.84973598 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit
Author: Yan Pan, Hanjiang Lai, Cong Liu, Shuicheng Yan
Abstract: Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. We also provide high probability recovery guarantees of the base algorithm. The proposed method is evaluated on various fusion problems in object categorization and video event detection. Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection.
3 0.57192737 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
Author: Chunfeng Yuan, Xi Li, Weiming Hu, Haibin Ling, Stephen Maybank
Abstract: Spatio-temporal interest points serve as an elementary building block in many modern action recognition algorithms, and most of them exploit the local spatio-temporal volume features using a Bag of Visual Words (BOVW) representation. Such representation, however, ignorespotentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a new global feature to capture the detailed geometrical distribution of interest points. It is calculated by using the ℛ transform which is defined as an extended 3D discrete Rℛa tdroann transform, followed by applying a tewdo 3-dDir decitsicorneatel two-dimensional principal component analysis. Such ℛ feature captures the geometrical information of the Sinuctehre ℛst points and keeps invariant to geometry transformation and robust to noise. In addition, we propose a new fusion strategy to combine the ℛ feature with the BOVW representation for further improving recognition accuracy. Wpree suetnilitzaea context-aware fusion method to capture both the pairwise similarities and higher-order contextual interactions of the videos. Experimental results on several publicly available datasets demonstrate the effectiveness of the proposed approach for action recognition.
4 0.55794716 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding
Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.
5 0.53765106 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection
Author: Paul Bodesheim, Alexander Freytag, Erik Rodner, Michael Kemmler, Joachim Denzler
Abstract: Detecting samples from previously unknown classes is a crucial task in object recognition, especially when dealing with real-world applications where the closed-world assumption does not hold. We present how to apply a null space method for novelty detection, which maps all training samples of one class to a single point. Beside the possibility of modeling a single class, we are able to treat multiple known classes jointly and to detect novelties for a set of classes with a single model. In contrast to modeling the support of each known class individually, our approach makes use of a projection in a joint subspace where training samples of all known classes have zero intra-class variance. This subspace is called the null space of the training data. To decide about novelty of a test sample, our null space approach allows for solely relying on a distance measure instead of performing density estimation directly. Therefore, we derive a simple yet powerful method for multi-class novelty detection, an important problem not studied sufficiently so far. Our novelty detection approach is assessed in com- prehensive multi-class experiments using the publicly available datasets Caltech-256 and ImageNet. The analysis reveals that our null space approach is perfectly suited for multi-class novelty detection since it outperforms all other methods.
6 0.53382498 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images
7 0.52553558 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
8 0.50317788 143 cvpr-2013-Efficient Large-Scale Structured Learning
9 0.49805957 41 cvpr-2013-An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision
10 0.4834874 134 cvpr-2013-Discriminative Sub-categorization
11 0.48040065 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
12 0.47460848 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning
13 0.46854383 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
14 0.46101749 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies
15 0.44777989 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
16 0.44139439 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets
17 0.43991897 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
18 0.41733348 34 cvpr-2013-Adaptive Active Learning for Image Classification
19 0.41201174 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
20 0.41094589 442 cvpr-2013-Transfer Sparse Coding for Robust Image Representation
topicId topicWeight
[(10, 0.137), (16, 0.035), (19, 0.163), (26, 0.036), (28, 0.011), (33, 0.281), (65, 0.02), (67, 0.065), (69, 0.055), (77, 0.032), (87, 0.068)]
simIndex simValue paperId paperTitle
1 0.96597862 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
Author: Huizhong Chen, Andrew C. Gallagher, Bernd Girod
Abstract: This paper introduces a new idea in describing people using their first names, i.e., the name assigned at birth. We show that describing people in terms of similarity to a vector of possible first names is a powerful description of facial appearance that can be used for face naming and building facial attribute classifiers. We build models for 100 common first names used in the United States and for each pair, construct a pairwise firstname classifier. These classifiers are built using training images downloaded from the internet, with no additional user interaction. This gives our approach important advantages in building practical systems that do not require additional human intervention for labeling. We use the scores from each pairwise name classifier as a set of facial attributes. We show several surprising results. Our name attributes predict the correct first names of test faces at rates far greater than chance. The name attributes are applied to gender recognition and to age classification, outperforming state-of-the-art methods with all training images automatically gathered from the internet.
2 0.95254105 356 cvpr-2013-Representing and Discovering Adversarial Team Behaviors Using Player Roles
Author: Patrick Lucey, Alina Bialkowski, Peter Carr, Stuart Morgan, Iain Matthews, Yaser Sheikh
Abstract: In this paper, we describe a method to represent and discover adversarial group behavior in a continuous domain. In comparison to other types of behavior, adversarial behavior is heavily structured as the location of a player (or agent) is dependent both on their teammates and adversaries, in addition to the tactics or strategies of the team. We present a method which can exploit this relationship through the use of a spatiotemporal basis model. As players constantly change roles during a match, we show that employing a “role-based” representation instead of one based on player “identity” can best exploit the playing structure. As vision-based systems currently do not provide perfect detection/tracking (e.g. missed or false detections), we show that our compact representation can effectively “denoise ” erroneous detections as well as enabling temporal analysis, which was previously prohibitive due to the dimensionality of the signal. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed highdefinition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the- art real-time player detector and compare it to manually labelled data.
3 0.93446487 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
Author: Yun Jiang, Hema Koppula, Ashutosh Saxena
Abstract: For scene understanding, one popular approach has been to model the object-object relationships. In this paper, we hypothesize that such relationships are only an artifact of certain hidden factors, such as humans. For example, the objects, monitor and keyboard, are strongly spatially correlated only because a human types on the keyboard while watching the monitor. Our goal is to learn this hidden human context (i.e., the human-object relationships), and also use it as a cue for labeling the scenes. We present Infinite Factored Topic Model (IFTM), where we consider a scene as being generated from two types of topics: human configurations and human-object relationships. This enables our algorithm to hallucinate the possible configurations of the humans in the scene parsimoniously. Given only a dataset of scenes containing objects but not humans, we show that our algorithm can recover the human object relationships. We then test our algorithm on the task ofattribute and object labeling in 3D scenes and show consistent improvements over the state-of-the-art.
same-paper 4 0.90227401 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang
Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.
5 0.90150249 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning
Author: Yu-Tseh Chi, Mohsen Ali, Ajit Rajwade, Jeffrey Ho
Abstract: This paper proposes a dictionary learning framework that combines the proposed block/group (BGSC) or reconstructed block/group (R-BGSC) sparse coding schemes with the novel Intra-block Coherence Suppression Dictionary Learning (ICS-DL) algorithm. An important and distinguishing feature of the proposed framework is that all dictionary blocks are trained simultaneously with respect to each data group while the intra-block coherence being explicitly minimized as an important objective. We provide both empirical evidence and heuristic support for this feature that can be considered as a direct consequence of incorporating both the group structure for the input data and the block structure for the dictionary in the learning process. The optimization problems for both the dictionary learning and sparse coding can be solved efficiently using block-gradient descent, and the details of the optimization algorithms are presented. We evaluate the proposed methods using well-known datasets, and favorable comparisons with state-of-the-art dictionary learning methods demonstrate the viability and validity of the proposed framework.
7 0.88857323 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
8 0.88777173 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.88532931 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
10 0.88450676 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
11 0.8844831 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
12 0.88428009 414 cvpr-2013-Structure Preserving Object Tracking
13 0.88411093 325 cvpr-2013-Part Discovery from Partial Correspondence
14 0.88386291 364 cvpr-2013-Robust Object Co-detection
15 0.88317823 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
16 0.88284683 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
17 0.88257599 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models
18 0.88202566 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
19 0.8817659 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
20 0.88166654 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking