cvpr cvpr2013 cvpr2013-7 knowledge-graph by maker-knowledge-mining

7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit


Source: pdf

Author: Yan Pan, Hanjiang Lai, Cong Liu, Shuicheng Yan

Abstract: Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. We also provide high probability recovery guarantees of the base algorithm. The proposed method is evaluated on various fusion problems in object categorization and video event detection. Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. [sent-4, score-0.191]

2 Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. [sent-5, score-0.926]

3 Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. [sent-6, score-0.265]

4 To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. [sent-7, score-0.255]

5 The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. [sent-8, score-0.54]

6 Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. [sent-9, score-0.42]

7 Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. [sent-10, score-0.428]

8 We also provide high probability recovery guarantees of the base algorithm. [sent-11, score-0.32]

9 The proposed method is evaluated on various fusion problems in object categorization and video event detection. [sent-12, score-0.313]

10 Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection. [sent-13, score-0.206]

11 Introduction Effectively integrating multiple cues/features to obtain an accurate model, which is known as data fusion, is a popular approach to solve various tasks in computer vision and multimedia analysis, such as object classification and detection [4, 17, 19], video event detection [7, 8]. [sent-15, score-0.098]

12 Various methods that fuse multiple kinds of features have been proposed in the literature (see, e. [sent-16, score-0.099]

13 The second stream is Late Fusion, which firstly learns models from different kinds of features, and then combines the intermediate output scores of the learned models to yield a final model. [sent-24, score-0.196]

14 In this paper, we focus on the problem ofRobust Late Fusion (RLF) proposed in [21], in which the authors formulated late fusion as an optimization problem of Low-rank Latent Matrix Pursuit (LLMP). [sent-25, score-0.557]

15 As shown in Figure 1(a), RLF fuses multiple models via seeking a shared low-rank comparison matrix. [sent-26, score-0.107]

16 The formulation of LLMP is general and can also be applied to (with simple extensions) other important problems, such as rank aggregation [5] for information retrieval and collaborative filtering. [sent-27, score-0.092]

17 The main drawback of RLF-ALM is that it needs repeated operations of full Singular Value Decomposition (SVD), whose time complexity is cubic w. [sent-29, score-0.131]

18 This limits RLF-ALM’s scalability on large-scale datasets with thousands of test examples. [sent-33, score-0.107]

19 Our method has two building blocks: (1) Inspired by the recent advance in divide-and-conquer matrix factorization, we develop a divide-and-conquer ap- proach for LLMP. [sent-35, score-0.089]

20 As illustrated in Figure 1(b), our method divides the original LLMP problem into reduced-size problems, solves the subproblems sub- via any base algorithm × × for LLMP, and then combines the results from the subproblems to obtain the final solution. [sent-36, score-0.555]

21 Our theoretical analysis shows that with fixed probability, the proposed divide-andconquer algorithm has recovery those of the base algorithm. [sent-37, score-0.307]

22 scalability, guarantees comparable (2) To further improve to the we propose an efficient base algorithm for the 0Corresponding author: Cong Liu 555222224 Given 푛 score lists 푠푖 (푖 = 1푡표 푛) from different models, RLF constructs 푛 matrices 푇(푖) (푖 = 1, 2, . [sent-38, score-0.475]

23 ,푛), where each 푇(푖) encodes the pairwise relations of scores of every two test images in score list 푠푖. [sent-41, score-0.222]

24 , 푛) are used as input of LLMP to learn a shared, low-rank and skew-symmetric matrix 푇. [sent-45, score-0.089]

25 Each 푇(푖) can be reconstructed by 푇 combining with an additive sparse matrix 퐸(푖) . [sent-46, score-0.089]

26 At last, RLF can recover a final and more accurate score list from 푇. [sent-47, score-0.156]

27 Then in step 1-2, we recover 푇푆 and 푇퐴 by solving two size-reduced subproblems like the original LLMP problem, respectively. [sent-50, score-0.202]

28 In step 4, we calculate 푇퐶 by simple matrix algebra. [sent-52, score-0.117]

29 Finally, we simply combine the four parts to obtain the final matrix 푇. [sent-53, score-0.089]

30 In each iteration, the base al- gorithm factorizes a large comparison matrix 푇 ∈ ℝ푚1 ×푚2 ignotroi hthme product osf a tw laorg esi czeo-mrepdaurciseodn m maatrtircixes 푇 ( ∈i. [sent-55, score-0.259]

31 We further provide recovery guarantees of the proposed base algorithm. [sent-58, score-0.32]

32 The results on the Columbia Consumer Video (CCV) dataset with 4,658 test examples show that, under comparable accuracy, the propose method performs more than 180 times faster than the state-of-the-art RLF-ALM algorithm. [sent-60, score-0.108]

33 The strategies of feature combination can be divided into early fusion and late fusion. [sent-63, score-0.565]

34 A representative method of early fusion is MKL [1], which simultaneously learns kernel matrices and their associated combination weights. [sent-64, score-0.371]

35 [14] used finite Gaussian mixture to model the distribution of the intermediate output scores, and then combined the scores via likelihood ratio test. [sent-69, score-0.106]

36 [18] used a non-Bayesian probabilistic framework to fuse multiple output scores by minimizing the misclassification rates under the ℓ1 constraint. [sent-71, score-0.113]

37 However, the optimization algorithm in [21] needs repeated full SVD operations whose time complexity is cubic w. [sent-73, score-0.162]

38 the number of text examples, which limits its scalability on large-scale real world applications. [sent-76, score-0.117]

39 Our work is motivated by the recent progress in efficient algorithms for large-scale low-rank matrix learning. [sent-77, score-0.089]

40 The main idea is to divide a large-scale matrix learning problem into several size-reduced subproblems, solve the subproblems in parallel, and then combine the results from the subproblems. [sent-80, score-0.258]

41 With fixed probability, their divide-and-conquer algorithm has estimation error bound guarantees compared to its base algorithm that directly solves the original large-scale problem. [sent-81, score-0.236]

42 In contrast to RPCA with a single low-rank and sparse decomposition constraint, our work focus on a divide-and-conquer 555222335 approach for the LLMP problem that seeks to learn a shared × skew-symmetric comparison matrix with multiple low-rank and sparse decomposition constraints. [sent-82, score-0.213]

43 Problem Formulation The robust late fusion problem [21] is to fuse 푛 confidence score lists {푠(푖) }푖푛=1 . [sent-84, score-0.698]

44 For each 푠(푖), we convert it into a 푚 푚 comparison (푠1(푖), 푠2(푖), matrix 푇(푖), 푠푚(푖))푇 in which each 푇푗(,푖푘) 푇푗(,푖푘) = is defined as: 푠푖푔푛(푠(푗푖) − 푠푘(푖)). [sent-92, score-0.089]

45 (1) Each comparison matrix 푇(푖) is an isotonic representation that encodes the relative relationship among the test examples. [sent-93, score-0.148]

46 Some entries in 푇(푖) correctly reflect the relative relationship among the test examples, while other entries may be incorrect due to the wrong predictions in the score list 푠(푖) . [sent-94, score-0.241]

47 푇(푖) can be viewed as a combination of two parts: a shared low-rank part that reflects the correct relationship among the test examples, and a sparse part 퐸(푖) that encodes the irregular corruptions made by the incorrect entries. [sent-95, score-0.151]

48 Ideally, assume there exists a score list 푠 that correctly explains the relations of the test examples, then 푇 can be viewed as a matrix which encodes the relations in 푠, i. [sent-97, score-0.271]

49 However, if there is large variation in the scores, the matrix 푇 might have an unknown rank higher than 2. [sent-101, score-0.145]

50 This motivates us to formulate RLF as an optimization problem of pursuing a shared, low-rank and skew-symmetric latent matrix: ∑푛 푇,{퐸m(i푖n)}푖푛=1푟푎푛푘(푇) + 휆푖∑=1∣∣퐸(푖)∣∣0 푠. [sent-102, score-0.155]

51 Gereive ∣∣n푇 ∣th∣e learned matrix 푇, we can easily obtain the score list 푠 by 푠 = 푚1푇푒푇. [sent-115, score-0.212]

52 More importantly, LLMP is general and can also be applied to other family of problems, such as rank aggregation [5] for information retrieval and collaborative filtering. [sent-117, score-0.092]

53 A Divide-and-Conquer Solution The existing algorithms for LLMP, such as RLFALM [21], need repeated SVD operations on the whole comparison matrix, which have cubic time complexity w. [sent-120, score-0.131]

54 Inspired by the recent advance in divide-and-conquer algorithms [13] for RPCA problems [3], we propose a scalable divide-and-conquer method for the LLMP optimization problem. [sent-125, score-0.091]

55 The main idea is to divide the optimization problem (3) into smallsize subproblems, each of which can be easily solved by a base algorithm or by simple algebra, and then combine the results to get a low-rank and skew-symmetric comparison matrix. [sent-126, score-0.201]

56 푘 }W, ea sample the submatrices 푇푆, and and then each matrix is partitioned into four parts: 푇푆(푖) 퐸푆(푖), [푇푇푆퐵 푇 퐶퐴 ],(4) T(i)=[푇푇푆퐵((푖푖)) 푇푇퐴퐶((푖푖))], E(i)=[퐸퐸(푆퐵(푖푖)) 퐸퐸((퐴퐶푖푖))]. [sent-143, score-0.124]

57 (2) Recovering the seed matrix We recover 푇the seed matrix 푇푆 by solving the following subproblem: 푇푆,{m퐸푆i(푖n)}푖푛=1∣∣푇푆∣∣∗+ 휆푖∑=푛1∣∣퐸푆(푖)∣∣1 푠. [sent-145, score-0.285]

58 Obviously this is a size-reduced problem of (3) which can be solved by any base algorithm for LLMP (i. [sent-151, score-0.17]

59 Randomly sample a set of 푘 indices to partition each (푇푆(푖), 푇퐴(푖), 푇퐵(푖), 푇퐶(푖)); 퐸푆(푖) by (5); matrix of 푇(푖) into 4 parts 2. [sent-161, score-0.089]

60 , kernel matrices learning) [20], and it has been generalized for arbitrary matrices [6]. [sent-187, score-0.175]

61 It is worth noting that, although our framework is based on Generalized Nystrom Method, the involved subproblems (i. [sent-188, score-0.169]

62 update the+ multiplier 푌푡+(푖1) 푌푡(푖) + 휇푡(푇(푖) 8. [sent-208, score-0.09]

63 , the algorithm in [21]), a challenging issue arising in solving (9) is that, when applied to subproblems with large 푚1 and 푚2, the repeated full SVD operations are still computationally expensive. [sent-214, score-0.3]

64 To address this issue, we propose to factorize the large comparison matrix ∈ ℝ푚1 ×푚2 into tfahec product eo lfa trwgoe csiozmep-raerdiusocend m maatrtirxice 푇s ∈(i. [sent-215, score-0.089]

65 This kind of fixed ra×nk 푟 prior i푟s ×us 푚ed i,n 푟 r ≪ece 푚nt푖 푛w(o푚rks [12, 11] to speed up low-rank matrix learning problems such as RPCA [3] and LRR [10]. [sent-218, score-0.089]

66 There are two advantages from the factorization: (1) the resulting optimization problem only has 푂((푚1 +푚2)푟) vari- 푇 × ables, and (2) only size-reduced SVD operations are needed in each iteration, which is much cheaper than the full SVD operations. [sent-219, score-0.118]

67 Theoretical Analysis A natural question arising here is whether there is any theoretical guarantee on the recovery errors of the proposed divide-and-conquer method compared to the errors of the base algorithm. [sent-241, score-0.411]

68 Our analysis is based on the standard matrix coherence assumption. [sent-243, score-0.089]

69 certain conditions, the solution to (11) guarantees low recovery errors with high probability. [sent-255, score-0.181]

70 The second theorem shows that, with fixed probability, the recovery errors of the proposed divide-and-conquer method is bounded by the errors of the solution to (11). [sent-263, score-0.186]

71 Let 푇0 be the ground truth of the rank푟0 latent matrix, and 푇 be the output of Algorithm 1. [sent-265, score-0.1]

72 푇 is partitioned as in (4), and correspondingly 푇0 is partitioned into (푇0,푆 , 푇0,퐴 , 푇0,퐵 , 푇0,퐶). [sent-266, score-0.101]

73 Experimental Evaluation In this section, we evaluate the accuracy and running time of the proposed divide-and-conquer method on various real world and simulated datasets. [sent-271, score-0.147]

74 We compare the performance of three algorithms for the LLMP problem, including the proposed base algorithm LLMP-Base, the proposed divide-and-conquer algorithm DC-LLMP, and the state-of- RLF-ALM1 the-art algorithm [21]. [sent-272, score-0.17]

75 Note that the original RLF-ALM algorithm in [21] enforces the learned low-rank matrix to be rank-2 as a convergence condition. [sent-276, score-0.089]

76 As explained in Section 3, in practice, the matrix 푇 might have an unknown rank higher than 2 due to large variation in scores. [sent-277, score-0.145]

77 Experiments on real world datasets We evaluate the proposed method on object categorization and video event detection tasks. [sent-283, score-0.136]

78 We use one-vs-all SVM to generate the output score lists from different kinds of features. [sent-285, score-0.23]

79 In all the experiments, the running time results of DC-LLMP is the average of 5 trials. [sent-294, score-0.074]

80 MAP and running time comparison results on Oxford Flower 17 dataset. [sent-298, score-0.074]

81 These distance matrices are computed from seven kinds of features: HOG, color, shape, texture, clustered HSV values, SIFT on the foreground internal region (SIFTint) and on the foreground boundary (SIFTbdy) (see [15] for details). [sent-330, score-0.119]

82 The running time and MAP are first 2Although we can simply set 휆 = 1/√푚2 by Theorem 1, we found that choosing 휆 by cross validation leads to better performance. [sent-332, score-0.105]

83 And all the three robust late fusion methods have superior performance on MAP over the other three early and late fusion baselines. [sent-335, score-1.091]

84 Results on video event detection We evaluate the proposed method on video event detection using the Columbia Consumer Video dataset [7] which contains 9,3 17 video over 20 classes of semantic events. [sent-347, score-0.231]

85 (2) Under comparable MAP, DCLLMP performs more than 30 times faster than its base algorithm LLMP-Base, and it performs more than 180 times faster than the state-of-the-art RLF-ALM. [sent-359, score-0.283]

86 We can conclude that DC-LLMP has superior running time performance in real world vision tasks which usually have thousands of test examples, and it is a practical and scalable method for largescale low-rank latent matrix pursuit. [sent-360, score-0.354]

87 To answer this question, We conducted experiments on the CCV dataset to observe the effects of different 푟 and 푘. [sent-391, score-0.082]

88 The results in Figure 2(b) indicate that when 푟 ≥ 10, DC-LLMP and its base algoirintdhimca tLeL tMhaPt- wBhaesne h푟a v≥e comparable MMPA aPn dva iltuse bsa stoe athlgosoeof RLF-ALM. [sent-393, score-0.217]

89 This justifies that DCLLMP has comparable performance with the algorithms directly working on full matrices under relative small values of 푘, which is a key factor that leads to the speed-ups of DC-LLMP. [sent-401, score-0.105]

90 Simulation results To further investigate the characteristics of the proposed algorithms, we conduct simulated experiments to observe the effects on the running time and MAP with (1) different input data size (i. [sent-404, score-0.22]

91 In our simulation, we firstly generate 푛 score lists 푓1 , 푓2, . [sent-407, score-0.134]

92 Then we add random noise (sampled uniformly from [−100, 100]) to 푠% entries ofeach score lliestd. [sent-411, score-0.102]

93 After that, we use each score list to construct the corresponding input matrix 푇(푖) . [sent-413, score-0.212]

94 Effects of different input data size The first simulation is to explore the effects on the running time w. [sent-415, score-0.211]

95 In practice, we usually assume matrix 푇 have small rank 푟0, and set 푘 to be 푟0 + 푝 with a small constant 푝. [sent-419, score-0.145]

96 Effects of different proportion of corruptions Figure 4(c) show the comparison results on MAP with different proportion of corrupted entries in each score list, assuming other parameters being fixed. [sent-430, score-0.175]

97 Conclusions In this paper, we developed DC-LLMP, a scalable divideand-conquer method for large low-rank latent matrix pursuit. [sent-434, score-0.214]

98 DC-LLMP divides the original low-rank latent matrix learning problem into size-reduced subproblems which can be solved cheaply by a base algorithm, and obtains the fi555232880 )imeTc(s1250 20DLRCMF−LPA nBLMu4maP0sebrofsa6m0ples(m)801 )(cesmiT32105 0 RLD5CFM−PLA nBMuaPmsebr1o0fmdels(n1)520AMP0 . [sent-435, score-0.54]

99 The theoretical analysis showed that the recovery errors of DC-LLMP are bounded by the recovery errors of the base algorithm with fixed probability. [sent-451, score-0.453]

100 Furthermore, we developed an efficient base algorithm to solve the corresponding subproblems. [sent-452, score-0.17]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('llmp', 0.618), ('late', 0.311), ('fusion', 0.215), ('rlf', 0.215), ('base', 0.17), ('subproblems', 0.169), ('svd', 0.137), ('ccv', 0.11), ('simplemkl', 0.107), ('nystrom', 0.104), ('matrix', 0.089), ('recovery', 0.084), ('rpca', 0.083), ('effects', 0.082), ('llmpbase', 0.081), ('rlfalm', 0.081), ('lists', 0.077), ('running', 0.074), ('list', 0.066), ('guarantees', 0.066), ('latent', 0.065), ('event', 0.063), ('mkl', 0.063), ('kinds', 0.061), ('scalable', 0.06), ('pursuing', 0.059), ('kernel', 0.059), ('matrices', 0.058), ('multiplier', 0.058), ('score', 0.057), ('rank', 0.056), ('operations', 0.055), ('simulation', 0.055), ('dcllmp', 0.054), ('lpa', 0.054), ('methodmaptime', 0.054), ('shared', 0.054), ('theoretical', 0.053), ('fuses', 0.053), ('scalability', 0.049), ('mackey', 0.048), ('nandakumar', 0.048), ('pla', 0.048), ('terrades', 0.048), ('divides', 0.047), ('comparable', 0.047), ('bow', 0.046), ('entries', 0.045), ('columbia', 0.045), ('cong', 0.044), ('map', 0.043), ('arising', 0.042), ('cubic', 0.042), ('province', 0.042), ('pursuit', 0.041), ('scores', 0.04), ('singular', 0.04), ('flower', 0.04), ('theorem', 0.04), ('early', 0.039), ('fuse', 0.038), ('world', 0.038), ('alm', 0.038), ('corruptions', 0.038), ('seed', 0.037), ('lagrange', 0.037), ('aggregation', 0.036), ('consumer', 0.036), ('decomposition', 0.035), ('video', 0.035), ('factorization', 0.035), ('partitioned', 0.035), ('simulated', 0.035), ('corrupted', 0.035), ('output', 0.035), ('guangdong', 0.035), ('candes', 0.035), ('repeated', 0.034), ('algebra', 0.034), ('faster', 0.033), ('liu', 0.033), ('recover', 0.033), ('cheaper', 0.032), ('update', 0.032), ('china', 0.032), ('cross', 0.031), ('optimization', 0.031), ('encodes', 0.031), ('intermediate', 0.031), ('errors', 0.031), ('obviously', 0.031), ('correspondingly', 0.031), ('subproblem', 0.031), ('limits', 0.03), ('arxiv', 0.029), ('conduct', 0.029), ('stream', 0.029), ('calculate', 0.028), ('singapore', 0.028), ('test', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit

Author: Yan Pan, Hanjiang Lai, Cong Liu, Shuicheng Yan

Abstract: Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. We also provide high probability recovery guarantees of the base algorithm. The proposed method is evaluated on various fusion problems in object categorization and video event detection. Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection.

2 0.33057517 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition

Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang

Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.

3 0.10447717 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition

Author: Chunfeng Yuan, Xi Li, Weiming Hu, Haibin Ling, Stephen Maybank

Abstract: Spatio-temporal interest points serve as an elementary building block in many modern action recognition algorithms, and most of them exploit the local spatio-temporal volume features using a Bag of Visual Words (BOVW) representation. Such representation, however, ignorespotentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a new global feature to capture the detailed geometrical distribution of interest points. It is calculated by using the ℛ transform which is defined as an extended 3D discrete Rℛa tdroann transform, followed by applying a tewdo 3-dDir decitsicorneatel two-dimensional principal component analysis. Such ℛ feature captures the geometrical information of the Sinuctehre ℛst points and keeps invariant to geometry transformation and robust to noise. In addition, we propose a new fusion strategy to combine the ℛ feature with the BOVW representation for further improving recognition accuracy. Wpree suetnilitzaea context-aware fusion method to capture both the pairwise similarities and higher-order contextual interactions of the videos. Experimental results on several publicly available datasets demonstrate the effectiveness of the proposed approach for action recognition.

4 0.08391346 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources

Author: Lin Chen, Lixin Duan, Dong Xu

Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.

5 0.077938125 364 cvpr-2013-Robust Object Co-detection

Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang

Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.

6 0.072507218 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine

7 0.070962206 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies

8 0.068669558 453 cvpr-2013-Video Editing with Temporal, Spatial and Appearance Consistency

9 0.063180722 237 cvpr-2013-Kernel Learning for Extrinsic Classification of Manifold Features

10 0.061951276 164 cvpr-2013-Fast Convolutional Sparse Coding

11 0.060888629 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

12 0.05763717 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation

13 0.056612696 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

14 0.052814886 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition

15 0.052521884 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

16 0.051960148 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

17 0.051143587 257 cvpr-2013-Learning Structured Low-Rank Representations for Image Classification

18 0.050467432 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

19 0.050242409 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding

20 0.050047267 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.145), (1, -0.021), (2, -0.031), (3, 0.021), (4, 0.023), (5, 0.022), (6, -0.03), (7, -0.037), (8, -0.039), (9, 0.003), (10, 0.001), (11, -0.027), (12, -0.02), (13, -0.051), (14, -0.054), (15, -0.022), (16, -0.009), (17, -0.025), (18, 0.01), (19, -0.059), (20, -0.068), (21, 0.001), (22, 0.022), (23, -0.045), (24, 0.076), (25, 0.048), (26, -0.03), (27, 0.054), (28, 0.023), (29, -0.036), (30, 0.033), (31, -0.007), (32, 0.002), (33, -0.11), (34, -0.042), (35, 0.058), (36, 0.072), (37, -0.013), (38, 0.058), (39, -0.054), (40, 0.075), (41, -0.142), (42, -0.05), (43, -0.059), (44, 0.099), (45, 0.055), (46, 0.055), (47, 0.175), (48, -0.133), (49, 0.133)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90499365 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit

Author: Yan Pan, Hanjiang Lai, Cong Liu, Shuicheng Yan

Abstract: Data fusion, which effectively fuses multiple prediction lists from different kinds of features to obtain an accurate model, is a crucial component in various computer vision applications. Robust late fusion (RLF) is a recent proposed method that fuses multiple output score lists from different models via pursuing a shared low-rank latent matrix. Despite showing promising performance, the repeated full Singular Value Decomposition operations in RLF’s optimization algorithm limits its scalability in real world vision datasets which usually have large number of test examples. To address this issue, we provide a scalable solution for large-scale low-rank latent matrix pursuit by a divide-andconquer method. The proposed method divides the original low-rank latent matrix learning problem into two sizereduced subproblems, which may be solved via any base algorithm, and combines the results from the subproblems to obtain the final solution. Our theoretical analysis shows that withfixedprobability, theproposed divide-and-conquer method has recovery guarantees comparable to those of its base algorithm. Moreover, we develop an efficient base algorithm for the corresponding subproblems by factorizing a large matrix into the product of two size-reduced matrices. We also provide high probability recovery guarantees of the base algorithm. The proposed method is evaluated on various fusion problems in object categorization and video event detection. Under comparable accuracy, the proposed method performs more than 180 times faster than the stateof-the-art baselines on the CCV dataset with about 4,500 test examples for video event detection.

2 0.85233164 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition

Author: Dong Liu, Kuan-Ting Lai, Guangnan Ye, Ming-Syan Chen, Shih-Fu Chang

Abstract: Late fusion addresses the problem of combining the prediction scores of multiple classifiers, in which each score is predicted by a classifier trained with a specific feature. However, the existing methods generally use a fixed fusion weight for all the scores of a classifier, and thus fail to optimally determine the fusion weight for the individual samples. In this paper, we propose a sample-specific late fusion method to address this issue. Specifically, we cast the problem into an information propagation process which propagates the fusion weights learned on the labeled samples to individual unlabeled samples, while enforcing that positive samples have higher fusion scores than negative samples. In this process, we identify the optimal fusion weights for each sample and push positive samples to top positions in the fusion score rank list. We formulate our problem as a L∞ norm constrained optimization problem and apply the Alternating Direction Method of Multipliers for the optimization. Extensive experiment results on various visual categorization tasks show that the proposed method consis- tently and significantly beats the state-of-the-art late fusion methods. To the best knowledge, this is the first method supporting sample-specific fusion weight learning.

3 0.56351084 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine

Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding

Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.

4 0.55286825 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition

Author: Chunfeng Yuan, Xi Li, Weiming Hu, Haibin Ling, Stephen Maybank

Abstract: Spatio-temporal interest points serve as an elementary building block in many modern action recognition algorithms, and most of them exploit the local spatio-temporal volume features using a Bag of Visual Words (BOVW) representation. Such representation, however, ignorespotentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a new global feature to capture the detailed geometrical distribution of interest points. It is calculated by using the ℛ transform which is defined as an extended 3D discrete Rℛa tdroann transform, followed by applying a tewdo 3-dDir decitsicorneatel two-dimensional principal component analysis. Such ℛ feature captures the geometrical information of the Sinuctehre ℛst points and keeps invariant to geometry transformation and robust to noise. In addition, we propose a new fusion strategy to combine the ℛ feature with the BOVW representation for further improving recognition accuracy. Wpree suetnilitzaea context-aware fusion method to capture both the pairwise similarities and higher-order contextual interactions of the videos. Experimental results on several publicly available datasets demonstrate the effectiveness of the proposed approach for action recognition.

5 0.54323852 41 cvpr-2013-An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision

Author: Peter Ochs, Alexey Dosovitskiy, Thomas Brox, Thomas Pock

Abstract: Natural image statistics indicate that we should use nonconvex norms for most regularization tasks in image processing and computer vision. Still, they are rarely used in practice due to the challenge to optimize them. Recently, iteratively reweighed ?1 minimization has been proposed as a way to tackle a class of non-convex functions by solving a sequence of convex ?2-?1 problems. Here we extend the problem class to linearly constrained optimization of a Lipschitz continuous function, which is the sum of a convex function and a function being concave and increasing on the non-negative orthant (possibly non-convex and nonconcave on the whole space). This allows to apply the algorithm to many computer vision tasks. We show the effect of non-convex regularizers on image denoising, deconvolution, optical flow, and depth map fusion. Non-convexity is particularly interesting in combination with total generalized variation and learned image priors. Efficient optimization is made possible by some important properties that are shown to hold.

6 0.51044482 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies

7 0.4776732 143 cvpr-2013-Efficient Large-Scale Structured Learning

8 0.47497249 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation

9 0.46985972 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

10 0.46785811 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection

11 0.45531315 9 cvpr-2013-A Fast Semidefinite Approach to Solving Binary Quadratic Problems

12 0.43851352 312 cvpr-2013-On a Link Between Kernel Mean Maps and Fraunhofer Diffraction, with an Application to Super-Resolution Beyond the Diffraction Limit

13 0.43714565 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets

14 0.43547946 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition

15 0.43466157 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition

16 0.4326486 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion

17 0.42135105 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

18 0.42032889 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images

19 0.41578496 134 cvpr-2013-Discriminative Sub-categorization

20 0.41141248 164 cvpr-2013-Fast Convolutional Sparse Coding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.138), (16, 0.032), (26, 0.04), (33, 0.248), (65, 0.263), (67, 0.056), (69, 0.037), (77, 0.038), (87, 0.056)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90101391 435 cvpr-2013-Towards Contactless, Low-Cost and Accurate 3D Fingerprint Identification

Author: Ajay Kumar, Cyril Kwong

Abstract: In order to avail the benefits of higher user convenience, hygiene, and improved accuracy, contactless 3D fingerprint recognition techniques have recently been introduced. One of the key limitations of these emerging 3D fingerprint technologies to replace the conventional 2D fingerprint system is their bulk and high cost, which mainly results from the use of multiple imaging cameras or structured lighting employed in these systems. This paper details the development of a contactless 3D fingerprint identification system that uses only single camera. We develop a new representation of 3D finger surface features using Finger Surface Codes and illustrate its effectiveness in matching 3D fingerprints. Conventional minutiae representation is extended in 3D space to accurately match the recovered 3D minutiae. Multiple 2D fingerprint images (with varying illumination profile) acquired to build 3D fingerprints can themselves be used recover 2D features for further improving 3D fingerprint identification and has been illustrated in this paper. The experimental results are shown on a database of 240 client fingerprints and confirm the advantages of the single camera based 3D fingerprint identification.

2 0.85693449 139 cvpr-2013-Efficient 3D Endfiring TRUS Prostate Segmentation with Globally Optimized Rotational Symmetry

Author: Jing Yuan, Wu Qiu, Eranga Ukwatta, Martin Rajchl, Xue-Cheng Tai, Aaron Fenster

Abstract: Segmenting 3D endfiring transrectal ultrasound (TRUS) prostate images efficiently and accurately is of utmost importance for the planning and guiding 3D TRUS guided prostate biopsy. Poor image quality and imaging artifacts of 3D TRUS images often introduce a challenging task in computation to directly extract the 3D prostate surface. In this work, we propose a novel global optimization approach to delineate 3D prostate boundaries using its rotational resliced images around a specified axis, which properly enforces the inherent rotational symmetry of prostate shapes to jointly adjust a series of 2D slicewise segmentations in the global 3D sense. We show that the introduced challenging combinatorial optimization problem can be solved globally and exactly by means of convex relaxation. In this regard, we propose a novel coupled continuous max-flow model, which not only provides a powerful mathematical tool to analyze the proposed optimization problem but also amounts to a new and efficient duality-basedalgorithm. Ex- tensive experiments demonstrate that the proposed method significantly outperforms the state-of-art methods in terms ofefficiency, accuracy, reliability and less user-interactions, and reduces the execution time by a factor of 100.

3 0.85626489 176 cvpr-2013-Five Shades of Grey for Fast and Reliable Camera Pose Estimation

Author: Adam Herout, István Szentandrási, Michal Zachariáš, Markéta Dubská, Rudolf Kajan

Abstract: We introduce here an improved design of the Uniform Marker Fields and an algorithm for their fast and reliable detection. Our concept of the marker field is designed so that it can be detected and recognized for camera pose estimation: in various lighting conditions, under a severe perspective, while heavily occluded, and under a strong motion blur. Our marker field detection harnesses the fact that the edges within the marker field meet at two vanishing points and that the projected planar grid of squares can be defined by a detectable mathematical formalism. The modules of the grid are greyscale and the locations within the marker field are defined by the edges between the modules. The assumption that the marker field is planar allows for a very cheap and reliable camera pose estimation in the captured scene. The detection rates and accuracy are slightly better compared to state-of-the-art marker-based solutions. At the same time, and more importantly, our detector of the marker field is several times faster and the reliable real-time detection can be thus achieved on mobile and low-power devices. We show three targeted applications where theplanarity is assured and where thepresented marker field design and detection algorithm provide a reliable and extremely fast solution.

4 0.84459376 177 cvpr-2013-FrameBreak: Dramatic Image Extrapolation by Guided Shift-Maps

Author: Yinda Zhang, Jianxiong Xiao, James Hays, Ping Tan

Abstract: We significantly extrapolate the field of view of a photograph by learning from a roughly aligned, wide-angle guide image of the same scene category. Our method can extrapolate typical photos into complete panoramas. The extrapolation problem is formulated in the shift-map image synthesis framework. We analyze the self-similarity of the guide image to generate a set of allowable local transformations and apply them to the input image. Our guided shift-map method preserves to the scene layout of the guide image when extrapolating a photograph. While conventional shiftmap methods only support translations, this is not expressive enough to characterize the self-similarity of complex scenes. Therefore we additionally allow image transformations of rotation, scaling and reflection. To handle this in- crease in complexity, we introduce a hierarchical graph optimization method to choose the optimal transformation at each output pixel. We demonstrate our approach on a variety of indoor, outdoor, natural, and man-made scenes.

5 0.8311938 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

Author: Babak Saleh, Ali Farhadi, Ahmed Elgammal

Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviations from prototypical properties of categories. Our model can recognize abnormalities and report the main reasons of any recognized abnormality. We also show that abnormality predictions can help image categorization. We introduce the abnormality detection dataset and show interesting results on how to reason about abnormalities.

6 0.81459993 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

same-paper 7 0.80668932 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit

8 0.80631202 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models

9 0.74827409 195 cvpr-2013-HDR Deghosting: How to Deal with Saturation?

10 0.7441119 414 cvpr-2013-Structure Preserving Object Tracking

11 0.74362993 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

12 0.74231428 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

13 0.74201524 325 cvpr-2013-Part Discovery from Partial Correspondence

14 0.74201357 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

15 0.74185139 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

16 0.74122107 342 cvpr-2013-Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso

17 0.74089217 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration

18 0.74073529 314 cvpr-2013-Online Object Tracking: A Benchmark

19 0.74053782 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

20 0.7386443 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning