cvpr cvpr2013 cvpr2013-144 knowledge-graph by maker-knowledge-mining

144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection


Source: pdf

Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. [sent-9, score-0.452]

2 Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. [sent-10, score-0.452]

3 In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. [sent-11, score-0.236]

4 Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. [sent-12, score-0.259]

5 Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. [sent-13, score-0.505]

6 Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU. [sent-16, score-0.47]

7 Introduction Large-scale object detection is an important vision problem concerned with detecting a large number of object categories and localizing them in a large number of images. [sent-18, score-0.355]

8 The local feature xi is first mapped to a high dimensional sparse vector φ(xi) then the detection model can be applied locally to get the local confidence map. [sent-28, score-0.392]

9 The model inference is achieved by an efficient maximum subarray search. [sent-29, score-0.271]

10 A common thread that ties most of these state-of-the-art approaches together would be detection models that are designed to discriminate object shape from background on densely sampled sub-windows of images. [sent-32, score-0.207]

11 Since templates are sensitive to sampling scale and the pose of objects, inference of such models often entails exhaustively searching for the best template configuration regarding pose, scale, rotation, etc. [sent-34, score-0.173]

12 This enriched local representation enables us to transform the object detection problem into searching for an image sub-window with maximum sum of object possibility, which can be performed extremely efficiently. [sent-43, score-0.458]

13 The advantage of low computation complexity enables us to explore the large scale object detection problem with huge number of categories. [sent-44, score-0.271]

14 Our contributions are thus as follows: • We propose an efficient maximum appearance search Wmoed pelro fpoors large esfcfaicleie object idmetuemcti aopn. [sent-46, score-0.189]

15 p aOruarn proposed EMAS applies the model locally to each transformed local points and the inference problem is transferred to • • searching the sub-window with maximum sum. [sent-47, score-0.214]

16 As far as we know, this is the first model specifically designed for object detection with large number of categories, which makes it different from other works that focus on improving DPM model for efficiency [4, 31, 29]. [sent-48, score-0.245]

17 We propose the Pointwise Fisher Vector coding as the eWnreic phreopdo lsoec athl representation hoefr our dtoerte ccotdioinn gm aosd thele. [sent-49, score-0.152]

18 In this work, we propose to maintain a local feature coding which benefits the discriminative power of the local patches. [sent-52, score-0.281]

19 Moreover, this representation is able to construct a global form for multi-class detection and thus has the potential to search objects very efficiently in a large scale setting. [sent-54, score-0.238]

20 General Object Detection Shape-based object detection models rely on discriminative shape templates using histograms of oriented gradients. [sent-62, score-0.236]

21 Initially, Dalal and Triggs [1] used a single rigid template to build a detection model for pedestrians. [sent-63, score-0.173]

22 The MKL object detection [16] which uses kernel-based models and spatial pyramid (SP) feature combination achieves promising results but the computation cost is very high. [sent-69, score-0.339]

23 the local feature coding step and inference (dot-product ) over the linear model. [sent-81, score-0.299]

24 The cost of local feature coding step often increases with the codebook size K which is independent for each categorie. [sent-82, score-0.315]

25 For multiclass object detection, the only cost addition is the inference cost which depends on the sparseness E of the coding. [sent-83, score-0.293]

26 , Tahnde is around 3% for Fisher Vector coding (FV) [10] in our experiments. [sent-85, score-0.152]

27 Feature Encoding Recent feature encoding approaches, such as Sparse Coding [14] and Locality-constrained Linear Coding(LLC) [15], introduce soft assignment for local feature 333 111898991 Feature Points Vector Encoding Figure 2: Framework illustration of Efficient Maximum Appearance Search. [sent-91, score-0.271]

28 For the recognition problem, these two coding methods benefit from large size codebooks as demonstrated in a recent survey [17]. [sent-93, score-0.152]

29 Recently, aggregation coding, such as, the Fisher Vector coding or the Super Vector coding, have demonstrated increased discriminative power of local features [17]. [sent-95, score-0.208]

30 Fisher encoding [10] captures the average first and second order differences between local features and the centers of a Mixture of Gaussian Distributions learnt from general datasets, while the Super vector encoding [18] only focuses on the first order difference. [sent-96, score-0.225]

31 al [25] extended the Fisher Vector coding to the patch level for the semantic segmentation task. [sent-99, score-0.152]

32 ESS with branch-and-bound search [5] was proposed to reduce the cost in searching subwindow by finding bounds of subwindow scores. [sent-108, score-0.424]

33 Model The proposed Efficient Maximum Appearance Search (EMAS) model proceeds through four stages to perform large-scale object detection as shown in Fig. [sent-110, score-0.207]

34 , for an image, these features are then used to encode the image with a pointwise feature representation during the second stage. [sent-113, score-0.315]

35 In the third stage, we obtain the object confidence maps using a combination of appearance detection models and global context models to look for specific objects within a global context. [sent-114, score-0.369]

36 Finally, the object confidence values are combined to find the highly confident object locations for each object category using maximum subarray search. [sent-115, score-0.517]

37 Probabilistic Prediction over Point Ensemble Similar to Bag-of-Words like models, where the probabilistic prediction is conducted over the word ensemble contained by the inference body, the EMAS model also estimates the object probabilities using the point ensemble contained within an image area. [sent-121, score-0.248]

38 the figure-ground detection for each object category, which formulates the discriminative probabilities as: PP((XX|l|l = = − 11))=i? [sent-126, score-0.236]

39 age area to be an object foreground depends on the sum of the pointwise inference in this area. [sent-135, score-0.421]

40 Representation: Pointwise Fisher Vector The performance of the EMAS model relies heavily on the design of pointwise feature representation. [sent-138, score-0.315]

41 In this work, 333 111999002 we choose to extend the Fisher Vector (FV) feature coding method [10] to derive Pointwise Fisher Vector (PFV) coding. [sent-139, score-0.198]

42 Similar to Fisher Vector coding method, the PFV coding uses a Gaussian mixture models (GMMs) Uλ (x) = πkuk (x) trained on local features of a large image ? [sent-140, score-0.331]

43 For a local feature xi extracted from an image, the soft assignments of the descriptor xi to the kth Gaussian components γikis computed by γik = The PFV ? [sent-144, score-0.222]

44 The pointwise representation can also be flexibly merged back to the Fisher Vector global image representation as aforementioned. [sent-154, score-0.297]

45 For VQ, each local feature is mapped to a codebook index while in PFV, xi is mapped to each GMMs and the gradient vectors enable the local model learning. [sent-156, score-0.251]

46 The pointwise representation φ(xi) is sparse since each feature point only has few non-zero GMMs component assignment values γik. [sent-158, score-0.347]

47 ∈ymφ(xi)) > 1 − ξm ξm ≥ 0,∀lm ∈ {1, −1}, (5) where φ(xi) is the ith pointwise feature in the image area y and we use the ground truth object area as the positive training samples for l = 1and use image areas which have less than 0. [sent-177, score-0.393]

48 Normalization factor Zm tish applied eto s atmhep sum oorf lth =e pointwise faelaiztuartieosn i fna octrodrer Z to fit to the SVM optimization. [sent-179, score-0.269]

49 i∈ywTφ(xi) = (6) argmyaxf(I,y,w) where φ(xm) is the mth pointwise feature in the image area y. [sent-184, score-0.315]

50 We denote an appearance-based detection model as w = {wu1, w1v, ··· , wKu, wKv} while wku, wkv correspond atos twhe = weights for, coding vector} uik, vik respectively. [sent-185, score-0.401]

51 Consequently the object detection task is converted to the following optimization problem regarding the scoring function f(I, y, w) in Equation 7. [sent-198, score-0.258]

52 This optimization problem is called 2D maximum subarray sum search: yˆ = argmy∈aYxf(I,y,w) ? [sent-199, score-0.197]

53 In our experiment, the solution from [p1le9x] ttayk eosf aOb(oNut several milliseconds to search for one confidence map, and the total subarray search for the 107 object categories of SUN09 [20] dataset costs less than one second on one images. [sent-212, score-0.441]

54 Therefore, the computation cost in this subarray search is not a bottleneck of our proposed model. [sent-213, score-0.278]

55 Contextual Detection In this work, we propose a natural way to embed global contextual detection into our detection model. [sent-216, score-0.377]

56 As demonstrated in [2, 22], the object detection performance can be greatly enhanced using the knowledge of global context information in a multi-class setting. [sent-217, score-0.265]

57 The global context is normally the probability values describing how likely the image contains certain object categories, which can provide a reference to the detection results. [sent-218, score-0.265]

58 Suppose, there are nc class in the training dataset, we define the context feature for image I as φctx (I) = {c1, · · · , cnc }, where ci are the object existence probability predicted by t,h we hitehre global classifier. [sent-221, score-0.182]

59 ∈y It is worth noting that the contextual detection has sev- eral good properties: (1) Stability in the multi-class setting. [sent-225, score-0.248]

60 Predictions using additional contextual information is more stable and accurate in problems with large number of object categories and clear object relations. [sent-227, score-0.317]

61 In the EMAS model, it is easy to fuse multiple features to boost the detection accuracy as well as the effectiveness of the global classification model. [sent-236, score-0.186]

62 We perform independent coding for each kind of local feature. [sent-237, score-0.179]

63 Table 1: Average running time(s) for 107 classes detection on SUN09. [sent-239, score-0.163]

64 SPM can be easily added by applying more spatially-structured local models and the maximum subarray search with more complex optimization algorithm. [sent-248, score-0.277]

65 PFV encoding includes two parts: soft assignment calculation and the pointwise encoding. [sent-255, score-0.421]

66 The pointwise encoding takes O(E(γth)ND), where E(γth) represents the average nakuemsb Oer( Eo(fγ GM)NMDs assignments with higher probability than threshold γth for each feature point. [sent-257, score-0.4]

67 Hence the overall computation complexity for PFV coding is near O(KND) which is equal to the prevalently cusodedin gVe icsto ner aQru Oan(KtizNatiDon) w(VhiQch). [sent-262, score-0.254]

68 The computation in the model inference contains three parts: pointwise confidence mapping, maximum subwindow search and contextual detection. [sent-265, score-0.753]

69 For nc class, the complexity of pointwise confidence mapping is O(ncE(γ)ND). [sent-266, score-0.351]

70 And the maximum subwindow search we adopt has the complexity of O(N1. [sent-269, score-0.275]

71 Finally, compared etxoi tyhe o fo Othe(rN two parts, the contextual detection cost is trivial since it is only O(2ncKD) complexity. [sent-271, score-0.276]

72 2Ton be more clear, we demonstrate an example computation cost for EMAS in a large scale detection task. [sent-272, score-0.215]

73 1, the total cost for 107 classes detection is about 4. [sent-275, score-0.219]

74 For one object detector, per category model inference cost is around 0. [sent-278, score-0.273]

75 Namely the additional cost for one more detection model is only about 30ms. [sent-281, score-0.185]

76 The validation and test data for this competition consists of 150,000 photographs, collected from flickr and other search engines, hand labeled with the presence or absence of 1000 object categories. [sent-293, score-0.159]

77 We also use the SUN 09 dataset introduced in [20] for object detection evaluation of 107 object categories, which contains 4,367 training images and 4,3 17 testing images. [sent-298, score-0.285]

78 The number of mixtures in the GMMs model in PSV coding is set to 128 for SUN dataset and 256 for ILSVRC dataset. [sent-317, score-0.152]

79 For all experiments, we only output the maximum subwindow for one image per class at testing stage, namely we use a precision-preferred detector. [sent-320, score-0.243]

80 Efficiency Comparison We compare the computational cost of EMAS with three other object detection models in a multi-class setting: 1) Multiple kernel learning for object detection (MKL) [16] using three-stage linear and non-linear detection, 2) Deformable Part Model [2], and 3) Cascade DPM [4]. [sent-328, score-0.47]

81 Figure 3 shows the computation cost of the various approaches on the ILSVRC 2012 dataset with a varying number of object categories. [sent-329, score-0.164]

82 9 seconds for feature extraction and feature encoding, and about 56. [sent-332, score-0.155]

83 And it takes 500ms and 5s respectively for model inference per category per image(may change for different setting). [sent-336, score-0.166]

84 Additionally, the cost for MKL reported in [16] is 67 seconds per category for one image. [sent-337, score-0.184]

85 When number of categories is small, however, it can be observed that EMAS is not the fastest due to the cost of feature encoding. [sent-339, score-0.172]

86 The initialization of the detection model is trained using the object feature and a large amount of negative images. [sent-366, score-0.253]

87 For detection, we compare our results with the challenging entries 1: (1) Oxford DPM is the result from DPM detection over baseline classification scores. [sent-368, score-0.158]

88 (2) Oxford Mix used the detection result from DPM and retrain the foreground model with complicated classification model which also is the best result from Oxford. [sent-369, score-0.158]

89 (3) ISI CasDPM is the result using cascade object detection with deformable part models, restricting the sizes of bounding boxes. [sent-370, score-0.299]

90 Moreover, it is worth noting that the detection result of ILSVRC2012 heavily relies on the performance of classification. [sent-376, score-0.157]

91 Usually, detection will be performed to the top ranked image with high classification confidence, i. [sent-377, score-0.158]

92 Thus the error rate can be approximately interpreted as errordet = 1−(1 −errorcls) ∗ accdet where the accdet shows the rea=l d 1e−tec(t1io−n accuracy ∗foar cecach detection model. [sent-380, score-0.313]

93 3 Object Detection with Large Appearance Vari- Our appaenacraence-based model is appealing for object detection with large variation of appearance. [sent-429, score-0.207]

94 Here, we show 20 classes amorphous object detection result from SUN09 and compare with the DPM [2] in Tab. [sent-430, score-0.241]

95 Conclusion In this paper, we designed an efficient large-scale object detection approach by extending Fischer Vector encoding to the point-level. [sent-454, score-0.292]

96 This enabled us to transform the object detection problem into a problem of searching for a sub-window with the maximum sum leading to an order of magnitude of speed-up over the state-of-the-art approaches while maintaining comparable accuracy on the major largescale object detection benchmarks. [sent-455, score-0.556]

97 It is our belief that this significant speed-up makes large-scale object detection 333 111999446 CSlOaumsbtjepelrcetosf phoesr soen= 05. [sent-456, score-0.207]

98 Moreover, the proposed approach could further integrate global object contextual information into the detection model with little extra computational cost, which may make it very effective for object detection under difficult conditions, such as occluded objects. [sent-464, score-0.533]

99 : Efficient algorithms for subwindow search in object detection and localization. [sent-516, score-0.39]

100 : Image classification using super-vector coding of local image descriptors. [sent-565, score-0.208]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('emas', 0.602), ('pfv', 0.288), ('pointwise', 0.269), ('fisher', 0.199), ('coding', 0.152), ('subarray', 0.139), ('subwindow', 0.13), ('detection', 0.129), ('ilsvrc', 0.126), ('dpm', 0.117), ('gmms', 0.115), ('contextual', 0.091), ('encoding', 0.085), ('object', 0.078), ('uik', 0.076), ('inference', 0.074), ('singapore', 0.072), ('categories', 0.07), ('accdet', 0.069), ('casdpm', 0.069), ('pfvs', 0.069), ('wku', 0.069), ('wkv', 0.069), ('imagenet', 0.068), ('seconds', 0.063), ('maximum', 0.058), ('xi', 0.057), ('cost', 0.056), ('deformable', 0.055), ('searching', 0.055), ('vq', 0.054), ('search', 0.053), ('vik', 0.051), ('spm', 0.051), ('scoring', 0.051), ('river', 0.049), ('confidence', 0.048), ('ensemble', 0.048), ('sofa', 0.047), ('chal', 0.046), ('errorcls', 0.046), ('errordet', 0.046), ('psv', 0.046), ('feature', 0.046), ('template', 0.044), ('venkatesh', 0.041), ('curtain', 0.041), ('lsvrc', 0.041), ('peursum', 0.041), ('towel', 0.041), ('zhongyang', 0.041), ('cm', 0.041), ('mkl', 0.04), ('released', 0.039), ('category', 0.038), ('fv', 0.038), ('bowl', 0.038), ('ctx', 0.038), ('ner', 0.038), ('efficiency', 0.038), ('bed', 0.037), ('cascade', 0.037), ('knd', 0.036), ('sky', 0.036), ('soft', 0.035), ('complexity', 0.034), ('codebook', 0.034), ('oxford', 0.034), ('stage', 0.034), ('classes', 0.034), ('ibm', 0.033), ('enriched', 0.033), ('zisserman', 0.033), ('buildings', 0.032), ('assignment', 0.032), ('road', 0.032), ('nx', 0.031), ('othe', 0.031), ('mapped', 0.03), ('context', 0.03), ('computation', 0.03), ('discriminative', 0.029), ('magnitude', 0.029), ('classification', 0.029), ('sparseness', 0.029), ('pp', 0.029), ('windows', 0.029), ('global', 0.028), ('csurka', 0.028), ('noting', 0.028), ('ny', 0.028), ('complementary', 0.028), ('objects', 0.028), ('vector', 0.028), ('validation', 0.028), ('namely', 0.028), ('blaschko', 0.028), ('per', 0.027), ('local', 0.027), ('sink', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection

Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.

2 0.13440576 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification

Author: Takumi Kobayashi

Abstract: Image classification methods have been significantly developed in the last decade. Most methods stem from bagof-features (BoF) approach and it is recently extended to a vector aggregation model, such as using Fisher kernels. In this paper, we propose a novel feature extraction method for image classification. Following the BoF approach, a plenty of local descriptors are first extracted in an image and the proposed method is built upon the probability density function (p.d.f) formed by those descriptors. Since the p.d.f essentially represents the image, we extract the features from the p.d.f by means of the gradients on the p.d.f. The gradients, especially their orientations, effectively characterize the shape of the p.d.f from the geometrical viewpoint. We construct the features by the histogram of the oriented p.d.f gradients via orientation coding followed by aggregation of the orientation codes. The proposed image features, imposing no specific assumption on the targets, are so general as to be applicable to any kinds of tasks regarding image classifications. In the experiments on object recog- nition and scene classification using various datasets, the proposed method exhibits superior performances compared to the other existing methods.

3 0.12638754 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.

4 0.12076243 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification

Author: Amirreza Shaban, Hamid R. Rabiee, Mehrdad Farajtabar, Marjan Ghazvininejad

Abstract: Bag of words models for feature extraction have demonstrated top-notch performance in image classification. These representations are usually accompanied by a coding method. Recently, methods that code a descriptor giving regard to its nearby bases have proved efficacious. These methods take into account the nonlinear structure of descriptors, since local similarities are a good approximation of global similarities. However, they confine their usage of the global similarities to nearby bases. In this paper, we propose a coding scheme that brings into focus the manifold structure of descriptors, and devise a method to compute the global similarities of descriptors to the bases. Given a local similarity measure between bases, a global measure is computed. Exploiting the local similarity of a descriptor and its nearby bases, a global measure of association of a descriptor to all the bases is computed. Unlike the locality-based and sparse coding methods, the proposed coding varies smoothly with respect to the underlying manifold. Experiments on benchmark image classification datasets substantiate the superiority oftheproposed method over its locality and sparsity based rivals.

5 0.11176763 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.

6 0.11107033 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

7 0.11056153 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

8 0.099002622 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

9 0.097936101 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

10 0.097280815 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

11 0.097009644 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

12 0.095187858 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

13 0.092495263 129 cvpr-2013-Discriminative Brain Effective Connectivity Analysis for Alzheimer's Disease: A Kernel Learning Approach upon Sparse Gaussian Bayesian Network

14 0.089565881 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

15 0.087202892 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

16 0.082364939 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

17 0.082304314 325 cvpr-2013-Part Discovery from Partial Correspondence

18 0.082258411 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

19 0.081028767 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

20 0.080207691 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.206), (1, -0.075), (2, -0.01), (3, -0.014), (4, 0.081), (5, 0.028), (6, 0.028), (7, 0.042), (8, -0.055), (9, -0.036), (10, -0.076), (11, -0.031), (12, 0.041), (13, -0.072), (14, 0.028), (15, -0.023), (16, -0.012), (17, 0.001), (18, 0.031), (19, 0.005), (20, 0.005), (21, -0.003), (22, 0.092), (23, 0.012), (24, 0.016), (25, 0.074), (26, -0.066), (27, 0.065), (28, -0.009), (29, -0.034), (30, 0.005), (31, 0.008), (32, -0.009), (33, -0.019), (34, -0.006), (35, -0.018), (36, -0.011), (37, 0.01), (38, 0.045), (39, -0.041), (40, -0.004), (41, -0.041), (42, 0.005), (43, -0.065), (44, 0.019), (45, -0.039), (46, 0.008), (47, -0.027), (48, -0.056), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92665201 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection

Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.

2 0.83298439 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman

Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.

3 0.81212068 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification

Author: Takumi Kobayashi

Abstract: Image classification methods have been significantly developed in the last decade. Most methods stem from bagof-features (BoF) approach and it is recently extended to a vector aggregation model, such as using Fisher kernels. In this paper, we propose a novel feature extraction method for image classification. Following the BoF approach, a plenty of local descriptors are first extracted in an image and the proposed method is built upon the probability density function (p.d.f) formed by those descriptors. Since the p.d.f essentially represents the image, we extract the features from the p.d.f by means of the gradients on the p.d.f. The gradients, especially their orientations, effectively characterize the shape of the p.d.f from the geometrical viewpoint. We construct the features by the histogram of the oriented p.d.f gradients via orientation coding followed by aggregation of the orientation codes. The proposed image features, imposing no specific assumption on the targets, are so general as to be applicable to any kinds of tasks regarding image classifications. In the experiments on object recog- nition and scene classification using various datasets, the proposed method exhibits superior performances compared to the other existing methods.

4 0.78135258 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

Author: Xiaolong Wang, Liang Lin, Lichao Huang, Shuicheng Yan

Abstract: This paper proposes a reconfigurable model to recognize and detect multiclass (or multiview) objects with large variation in appearance. Compared with well acknowledged hierarchical models, we study two advanced capabilities in hierarchy for object modeling: (i) “switch” variables(i.e. or-nodes) for specifying alternative compositions, and (ii) making local classifiers (i.e. leaf-nodes) shared among different classes. These capabilities enable us to account well for structural variabilities while preserving the model compact. Our model, in the form of an And-Or Graph, comprises four layers: a batch of leaf-nodes with collaborative edges in bottom for localizing object parts; the or-nodes over bottom to activate their children leaf-nodes; the andnodes to classify objects as a whole; one root-node on the top for switching multiclass classification, which is also an or-node. For model training, we present an EM-type algorithm, namely dynamical structural optimization (DSO), to iteratively determine the structural configuration, (e.g., leaf-node generation associated with their parent or-nodes and shared across other classes), along with optimizing multi-layer parameters. The proposed method is valid on challenging databases, e.g., PASCAL VOC2007and UIUCPeople, and it achieves state-of-the-arts performance.

5 0.77619952 417 cvpr-2013-Subcategory-Aware Object Classification

Author: Jian Dong, Wei Xia, Qiang Chen, Jianshi Feng, Zhongyang Huang, Shuicheng Yan

Abstract: In this paper, we introduce a subcategory-aware object classification framework to boost category level object classification performance. Motivated by the observation of considerable intra-class diversities and inter-class ambiguities in many current object classification datasets, we explicitly split data into subcategories by ambiguity guided subcategory mining. We then train an individual model for each subcategory rather than attempt to represent an object category with a monolithic model. More specifically, we build the instance affinity graph by combining both intraclass similarity and inter-class ambiguity. Visual subcategories, which correspond to the dense subgraphs, are detected by the graph shift algorithm and seamlessly integrated into the state-of-the-art detection assisted classification framework. Finally the responses from subcategory models are aggregated by subcategory-aware kernel regression. The extensive experiments over the PASCAL VOC 2007 and PASCAL VOC 2010 databases show the state-ofthe-art performance from our framework.

6 0.75409907 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

7 0.75195915 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

8 0.74996883 204 cvpr-2013-Histograms of Sparse Codes for Object Detection

9 0.7312969 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

10 0.72978348 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition

11 0.71905643 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition

12 0.71571803 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

13 0.71035159 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

14 0.70941722 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification

15 0.70725989 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine

16 0.70578921 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

17 0.70323682 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

18 0.69916427 325 cvpr-2013-Part Discovery from Partial Correspondence

19 0.69680959 364 cvpr-2013-Robust Object Co-detection

20 0.68925136 83 cvpr-2013-Classification of Tumor Histology via Morphometric Context


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.079), (16, 0.026), (26, 0.058), (28, 0.01), (33, 0.249), (39, 0.01), (67, 0.113), (69, 0.06), (85, 0.223), (87, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8415696 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition

Author: Ziheng Wang, Shangfei Wang, Qiang Ji

Abstract: Spatial-temporal relations among facial muscles carry crucial information about facial expressions yet have not been thoroughly exploited. One contributing factor for this is the limited ability of the current dynamic models in capturing complex spatial and temporal relations. Existing dynamic models can only capture simple local temporal relations among sequential events, or lack the ability for incorporating uncertainties. To overcome these limitations and take full advantage of the spatio-temporal information, we propose to model the facial expression as a complex activity that consists of temporally overlapping or sequential primitive facial events. We further propose the Interval Temporal Bayesian Network to capture these complex temporal relations among primitive facial events for facial expression modeling and recognition. Experimental results on benchmark databases demonstrate the feasibility of the proposed approach in recognizing facial expressions based purely on spatio-temporal relations among facial muscles, as well as its advantage over the existing methods.

same-paper 2 0.83082354 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection

Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.

3 0.7944839 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu

Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.

4 0.79394114 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

Author: Jianguo Li, Yimin Zhang

Abstract: This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the wellknown Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the stateof-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.

5 0.78745013 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.

6 0.78663045 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

7 0.78578937 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

8 0.78406578 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

9 0.78383404 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video

10 0.78359991 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

11 0.7827664 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision

12 0.78207797 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

13 0.7813108 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns

14 0.781299 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

15 0.78068948 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

16 0.77990383 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

17 0.77980065 438 cvpr-2013-Towards Pose Robust Face Recognition

18 0.77893573 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

19 0.77873504 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

20 0.77855921 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification