iccv iccv2013 iccv2013-390 knowledge-graph by maker-knowledge-mining

390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection


Source: pdf

Author: Iasonas Kokkinos

Abstract: We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Shufflets: shared mid-level parts for fast object detection Iasonas Kokkinos Ecole Centrale de Paris INRIA Center for Visual Computing Galen Team, INRIA-Saclay iasonas. [sent-1, score-0.247]

2 fr ∗ Abstract We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). [sent-3, score-0.585]

3 This results in systematically better approximations than those attained using standard sparse coding. [sent-5, score-0.175]

4 To emphasize that the learned mid-level structures are shiftable we call them shufflets. [sent-6, score-0.221]

5 Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. [sent-7, score-0.465]

6 We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance. [sent-8, score-0.437]

7 In this work we use the models of [8] and focus on accelerating multiple-category detection by sharing computation. [sent-11, score-0.152]

8 Part score computation is the first, and most timeconsuming step in the pipeline of object detection with DPMs; at this stage Histogram-of-Gradient (HOG) features [3] are convolved with part filters to provide local part scores, which are then combined to deliver object scores. [sent-12, score-0.669]

9 and-Bound (DTBB) [16]; the real bottleneck is the frontend convolution with the part filters. [sent-15, score-0.185]

10 In this work exploit the redundancy that exists among the part filters of multiple categories to reduce the cost of computing part scores. [sent-16, score-0.435]

11 For this, we learn a common, ‘shared’, basis to reconstruct the part and root filters; this basis serves as a mid-level interface between parts and HOG features. [sent-17, score-0.689]

12 This approach was recently advocated in [29, 12], while [17, 33, 27] have pursued similar ideas, either by replacing the 32-dimensional inner products of HOG cells with lookup-based approximations [17, 33] or by building a common basis for multiple part filters [27]. [sent-18, score-0.568]

13 Along a complementary path, [6] developed a highly-optimized frequency domain acceleration technique for part score computation. [sent-19, score-0.295]

14 1393 Figure 2: A dictionary of 128 3 × 3 shufflets learned with Shift-Invariant Sparse Coding (SISC). [sent-20, score-0.423]

15 More recently [4] used hashing to efficiently retrieve the approximate top-K scoring templates at every candidate part position. [sent-21, score-0.144]

16 Our work presents advances with respect to these works in the following aspects: First, regarding quality of approximation, compared to [29] we obtain a better approximation for the same number of terms, by allowing the mid-level parts to translate. [sent-22, score-0.133]

17 Second, regarding detection accuracy, unlike [27, 33, 29] who take the midlevel-based approximation ‘at face value’, we only use it as a rough initial estimate of the part scores, which is then refined around a subset of locations shortlisted based on the approximation. [sent-23, score-0.4]

18 For this we use the probabilistic bounding technique of [17], and thereby attain virtually identical performance to the DPMs of [8]. [sent-24, score-0.197]

19 Third, unlike [6] who compute the part scores densely we have the option of computing part scores ‘on demand’, i. [sent-25, score-0.556]

20 Fourth, by virtue of being ‘shiftable’ our midlevel parts can be used as a common basis to reconstruct both the part and the root filters. [sent-28, score-0.483]

21 Finally, even though our method is linear in the number of categories -unlike the constant complexity of [4]- it has the advantage of coming with controllable accuracy, and we demonstrate that it yields virtually identical results as [8]. [sent-30, score-0.201]

22 In particular we combine the proposed shufflets with the Dual-Tree Branch-and-Bound technique [16, 17] which was originally developed to accelerate the stages following part computation. [sent-31, score-0.608]

23 We use shufflets to construct probabilistic upper bounds for the part scores, and use these bounds to drive branch-and-bound to a small set of candidate object locations. [sent-32, score-1.022]

24 Our algorithm eventually computes the exact values of the part scores and the correct object score, but only around the locations shortlisted by the preliminary bounding stage. [sent-33, score-0.565]

25 This spares us from computing the exact part scores at locations that do not merit it. [sent-34, score-0.466]

26 Previous Work The idea of using shared parts to detect multiple categories has its roots in the earliest days of recognition [7, 14, 37, 5], but its application to detection with statistical object models started later. [sent-38, score-0.295]

27 A ground-breaking work was [32] who proposed the sharing of weak learners for multi-view and multi-class object detection, using regionbased features as inputs to decision stumps. [sent-39, score-0.133]

28 These works have consistently demonstrated that the learned shared parts correspond to generic grouping rules at the lowest levels and more semantic configurations at the higher parts of the hierarchy. [sent-43, score-0.197]

29 The sparselets work of [29] introduced a sparse coding-based approach to express multiple part filters on a common basis and thereby accelerate detection with DPMs; in [12] this was shown to improve accuracy in large-scale category classification, if properly integrated during training. [sent-45, score-0.76]

30 In [27] the steerability of the part filters was enforced during training, and was shown to result in both better training and faster testing. [sent-46, score-0.28]

31 Even though shufflets could potentially be integrated in training, we have not explored this direction yet. [sent-47, score-0.379]

32 In that respect when it comes to part sharing for DPMS, the most relevant works are those covered in the introduction. [sent-48, score-0.237]

33 Shufflets: shiftable mid-level parts We start by describing the operation that we want to accelerate through our mid-level basis. [sent-50, score-0.355]

34 The score of a filter p at position y is expressed as the inner product of a weight vector wp and a HOG feature hy, sp(y) = hwp, hyi, with wp ∈ RD. [sent-51, score-0.409]

35 For a part filter of size v ·( yh) = = ( h6w w· 6) we hwaivthe D =∈ v · h. [sent-52, score-0.187]

36 = =So ( every part requires roughly a teho suizsean odf multiplications per candidate location. [sent-54, score-0.207]

37 Our goal is to recover a basis b of B vectors to approximate a large set of weight vectors wp, p = 1. [sent-55, score-0.206]

38 Coming now Lto finding a good basis, b, we consider first the basis that minimizes the ‘2 norm of the distortion, while using expansion vectors of bounded ‘0 norm. [sent-62, score-0.319]

39 This results in the following sparse coding problem: XP (P1) s. [sent-63, score-0.141]

40 3 penalizes the distortion of the basis elements, the constraint in Eq. [sent-76, score-0.206]

41 4 enforces the sparsity of the expansion vector and the constraint in Eq. [sent-77, score-0.113]

42 5 ensures that the basis elements will have unit ‘2 norm. [sent-78, score-0.275]

43 This standard sparse coding formulation was used for sparselets [29]. [sent-80, score-0.204]

44 We propose instead to use Shift-Invariant Sparse coding, which intuitively amounts to having ‘shiftable’ basis elements. [sent-81, score-0.206]

45 Namely, instead of using a basis with arbitrary vectors of dimension v · h · d, we first consider a kernel k for our basis edlimemeennstiso on a s·md,al wleer f dirosmt caoinns oidfe rsi aze k ev0r n· hl 0k k· od. [sent-83, score-0.412]

46 l Wlowes t htehen displaced kernel to be fully contained in the original basis domain. [sent-85, score-0.284]

47 hkb0,0, 1395 This illustrates the merit of this scheme: during the approximation of wp we are free to use any of the replicas kν,η of k; but at test time we only need to convolve with a single replica k0,0, and then use appropriately displaced convolution scores. [sent-91, score-0.471]

48 B (10) (11) kvbb,hb where the last constraint indicates that the replica of kernel kb is used to form the b-th basis element. [sent-109, score-0.31]

49 For known basis elements, the expansion coefficients are obtained through OMP, as in P1. [sent-111, score-0.319]

50 However the estimation of the basis elements is more challenging, since we no longer optimize over an arbitrary basis b, but rather constrain the basis to be ‘shiftable’, through the set of constraints in Eq. [sent-112, score-0.687]

51 11becomes decoupled in the different frequencies; we refer to [13] for further details, since we rely entirely on their approach for basis learning. [sent-116, score-0.206]

52 Our main difference with the method of [13] is that we use the ‘0 instead of ‘1 sparsity penalty on the expansion vectors. [sent-117, score-0.113]

53 We demonstrate in Table 1 the decrease in reconstruction error attained by our method, when compared to the simpler sparse coding technique employed in [29]. [sent-119, score-0.199]

54 Actually we used the sparse coding results of [29] as initialization for our optimization procedure, which practically ensures a better approximation quality. [sent-120, score-0.283]

55 Furthermore, the approximation of the root scores can be effortlessly achieved by our method; qualitative results are provided in Fig. [sent-122, score-0.302]

56 From approximate scores to score bounds We now describe how shufflets can be used to provide not only an estimate of the part score, but also a probabilistic upper bound to it. [sent-125, score-1.125]

57 This allows us to avoid a pitfall of Root filters for components 1and 3 of class ‘aeroplane’ Reconstructions using shufflets (SISC). [sent-126, score-0.478]

58 most of the existing works on fast approximate part score computation, which take the approximated score ‘at face value’, which incurs performance degradation as faster -and cruder- approximations are used. [sent-129, score-0.443]

59 Instead, as in [17], we consider that we should be using the shufflet-based scores only as proxies to the exact part scores; these proxies serve to reduce the set of locations that are considered for a subsequent, more refined estimation of the exact part score. [sent-130, score-0.724]

60 For this, we efficiently compute upper bounds to the part scores, and use them in bounding-based detection with DPMs. [sent-131, score-0.451]

61 This guarantees accuracy, and allows us to get the best of both worlds, namely a fast ‘bottom-up’ detection of objects with quickly computable scores, which is then complemented by an exact, ‘top-down’ score refinement around promising locations [17, 18]. [sent-132, score-0.251]

62 Turning to how this can be achieved, we construct bounds on the approximation error: ? [sent-133, score-0.254]

63 = s −ˆ s = hw, hi − h wˆ, hi (12) between the exact score s obtained by convolving with the 1396 and expansion length, L. [sent-134, score-0.289]

64 as a random variable, and construct probabilistic bounds that are valid with controllable accuracy. [sent-140, score-0.258]

65 which provides upper and lower bouns for s in terms What remains is to express the second moment of ? [sent-150, score-0.166]

66 c = X(w[c,d] − wˆ [c,d])h[c,d] (16) dX= X1 with c ranging over the C HOG cells comprising a part filter, and d ranges over the 32 HOG cell dimensions. [sent-156, score-0.189]

67 Namely, for every cell c we are taking the inner product between then part score approximation error w [c, ·] − wˆ [c, ·] and tthhee nas psaortci astceodr eH aOppGr ocxeilml, hti o[cn, n· ]e . [sent-157, score-0.427]

68 What we gain in this way is that instead of computing a 32-dimensional inner product per HOG cell, we only need to multiply the two quantities on the right of Eq. [sent-167, score-0.102]

69 14 to construct the upper and lower bounds to the score. [sent-174, score-0.248]

70 The quantities involved in this bounding scheme are illustrated in Fig. [sent-175, score-0.104]

71 1 Figure 4: Part score approximation and bounding: our goal is to rapidly bound the value ofthe part score s shown on the top right. [sent-179, score-0.506]

72 14 is formed from two quantities, a shufflet-based approximation in (d) and an estimate of the approximation error’s second moment, shown in (c). [sent-181, score-0.168]

73 The values of the lower and upper bounds for pe = 0. [sent-183, score-0.328]

74 Integration with DPM detection We now describe how we integrate the bound obtained above with the Dual-Tree Branch-and-Bound (DTBB) method [16, 17]; for lack of space, we refer to [16, 17] for details, and intend to provide a thorough presentation in a forthcoming technical report. [sent-187, score-0.139]

75 In [16] Branch-and-Bound is used to bypass Generalized Distance Transforms (GDTs); this can result in substantial speedups for the part combination phase. [sent-188, score-0.144]

76 However in [16] we consider the part scores to be computed in advance of DTBB, while this is actually the main bottleneck in detection with DPMs. [sent-189, score-0.337]

77 Instead, following [17], we now integrate into DTBB our efficient upper bound of the part scores. [sent-190, score-0.302]

78 (21) |∈X{z} {Sz The |com{pzutati}on of the bounds relevant to the geometric term, B[x0, x] exploits the fact that Xs , Xd are rectangular, and is detailed in [16]. [sent-194, score-0.17]

79 Coming to bounding the unary terms, the approach of [16] has been to compute the exact part scores at every location s[x] and then obtain S,S. [sent-195, score-0.412]

80 Instead we propose to accelerate the computation of S,S by initially sidestepping the computation of s[x] upsing the probabilistic bound pof Eq. [sent-197, score-0.294]

81 14 apre with probability 1 − pe lpower and upper bounds of s[x] respectively. [sent-199, score-0.328]

82 lBitays1e d − on thpese we can upper and lower bound S as follows: S = maxx0∈Xs0 s[x0] ≤ maxx0∈Xs0 s[x0] and S = minx0∈Xs0 s[x0] ≥ minx0∈Xs0 ≤s[x m0],a xand thereby use s, s as surrogates for ]s[ ≥x] imn iDnTBB. [sent-200, score-0.272]

83 As soon as Branch-and-Bound converges to singleton intervals, we evaluate the exact part scores, s[x]; as we show in the experimental section, this boosts performance when compared to using only the midlevel-based approximation. [sent-202, score-0.221]

84 This more elaborate computation however is performed around a drastically reduced set of points, namely around those image locations that survive the first, quick bounding phase. [sent-203, score-0.232]

85 Our method thereby combines the speed of shared part filter computation with the accuracy of DPMs. [sent-204, score-0.381]

86 The remedy used in [8] is to use separate conservative thresholds for the PCA-based part scores, and estimate them from the training data. [sent-210, score-0.202]

87 l oEcavteino though we nheaevde Lthe o option osf, using our bounding-based scheme to compute upper bounds throughout, we can now avoid it altogether, since the empirically computed thresholds for cascades can take into account the distortion due to the shufflet approximation. [sent-213, score-0.438]

88 We report Average Precision scores on all classes in Table 2. [sent-217, score-0.134]

89 5 we provide precision-recall curves for bicycle detection when using (a) the ‘golden standard’ Truncated Distance Transform (TDT) detector of [11] and (b) the raw shufflet score for different combinations of basis size, B, and expansion size, L (we use 2L terms for the larger root filters). [sent-219, score-0.751]

90 As expected, increasing the size of the basis and the expansion improves performance, but there still remains a gap in performance. [sent-220, score-0.319]

91 Moreover we observe that the exact value of the probability of error pe is not that important, and the performace is robust even for relatively large values of pe. [sent-222, score-0.157]

92 The first row indicates the time spent to compute part scores by the different methods, and the following rows indicate detection times. [sent-224, score-0.337]

93 For more conservative threshold values the part score is fully evaluated at more points and the merits of having a quick first fast pass get eliminated. [sent-225, score-0.339]

94 o0s itt as fnass to;u bt tuht our es hTDufTfl-ebt-absaesded im approximation atunr nbse out to be faster than both DTBB and TDT for moderate values of the threshold θ. [sent-228, score-0.121]

95 We note that these timings are (i) for a single-threaded implementation and (ii) do not include steps that are shared (a) Shufflet parameter variation (b) Raw vs bounding-based score Figure 5: Precision-recall curves for bicycles on PASCAL VOC 2007. [sent-233, score-0.281]

96 The left plot shows the performance of the raw, shufflet-based score for different shufflet parameter combinations, ans the right plot shows the performance of the bounding-based scheme. [sent-235, score-0.289]

97 by all classes, namely HOG pyramid construction, and shufflet convolutions. [sent-258, score-0.233]

98 Conclusion In this work we have introduced shufflets, a shiftable basis for mid-level part representation, demonstrated its usefulness for part sharing in DPMs, and introduced probabilis1399 Ou[r8w]orka0 irp. [sent-264, score-0.808]

99 tic bounds to accommodate the effects of distortions due to approximations on this basis, thereby enabligh fast and accurate detection with DPMs. [sent-285, score-0.344]

100 In future work we intend to explore how this basis can be exploited during training [2, 12, 27], incorporated in hierarchical models [18, 34] and used for scalable object detection [4], while also exploring connections with convolutional models [35, 20, 15, 19]. [sent-286, score-0.422]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('shufflets', 0.379), ('shiftable', 0.221), ('basis', 0.206), ('dpms', 0.203), ('dtbb', 0.19), ('shufflet', 0.19), ('bounds', 0.17), ('part', 0.144), ('scores', 0.134), ('expansion', 0.113), ('tdt', 0.112), ('wp', 0.106), ('sisc', 0.104), ('shared', 0.099), ('filters', 0.099), ('score', 0.099), ('sharing', 0.093), ('hog', 0.09), ('moment', 0.088), ('coding', 0.088), ('accelerate', 0.085), ('approximation', 0.084), ('root', 0.084), ('hy', 0.083), ('timings', 0.083), ('pe', 0.08), ('bound', 0.08), ('convolutional', 0.078), ('upper', 0.078), ('displaced', 0.078), ('maxx', 0.078), ('hyi', 0.078), ('exact', 0.077), ('bx', 0.075), ('bb', 0.073), ('reconstructions', 0.071), ('aeroplane', 0.07), ('elements', 0.069), ('cascaded', 0.066), ('approximations', 0.064), ('xd', 0.063), ('gdts', 0.063), ('hwp', 0.063), ('ppx', 0.063), ('rpme', 0.063), ('shortlisted', 0.063), ('sparselets', 0.063), ('surrogates', 0.063), ('multiplications', 0.063), ('ip', 0.061), ('merit', 0.061), ('kb', 0.059), ('detection', 0.059), ('deformable', 0.058), ('coming', 0.058), ('practically', 0.058), ('attained', 0.058), ('conservative', 0.058), ('bounding', 0.057), ('dualtree', 0.056), ('replicas', 0.056), ('inner', 0.055), ('sparse', 0.053), ('acceleration', 0.052), ('thereby', 0.051), ('locations', 0.05), ('parts', 0.049), ('proxies', 0.049), ('virtually', 0.048), ('categories', 0.048), ('quantities', 0.047), ('xc', 0.047), ('golden', 0.047), ('controllable', 0.047), ('omp', 0.047), ('cell', 0.045), ('grosse', 0.045), ('replica', 0.045), ('voc', 0.044), ('dictionary', 0.044), ('computation', 0.044), ('sz', 0.043), ('filter', 0.043), ('namely', 0.043), ('leibe', 0.043), ('xs', 0.043), ('operations', 0.042), ('probabilistic', 0.041), ('convolution', 0.041), ('object', 0.04), ('song', 0.039), ('scalable', 0.039), ('pascal', 0.039), ('quick', 0.038), ('girshick', 0.038), ('turning', 0.038), ('hw', 0.038), ('faster', 0.037), ('deep', 0.037), ('kokkinos', 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection

Author: Iasonas Kokkinos

Abstract: We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance.

2 0.14107373 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

Author: Ross Girshick, Jitendra Malik

Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.

3 0.11142086 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell

Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.

4 0.10762506 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables

Author: Daozheng Chen, Dhruv Batra, William T. Freeman

Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.

5 0.10694591 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies

Author: Min Sun, Wan Huang, Silvio Savarese

Abstract: Many methods have been proposed to solve the image classification problem for a large number of categories. Among them, methods based on tree-based representations achieve good trade-off between accuracy and test time efficiency. While focusing on learning a tree-shaped hierarchy and the corresponding set of classifiers, most of them [11, 2, 14] use a greedy prediction algorithm for test time efficiency. We argue that the dramatic decrease in accuracy at high efficiency is caused by the specific design choice of the learning and greedy prediction algorithms. In this work, we propose a classifier which achieves a better trade-off between efficiency and accuracy with a given tree-shaped hierarchy. First, we convert the classification problem as finding the best path in the hierarchy, and a novel branchand-bound-like algorithm is introduced to efficiently search for the best path. Second, we jointly train the classifiers using a novel Structured SVM (SSVM) formulation with additional bound constraints. As a result, our method achieves a significant 4.65%, 5.43%, and 4.07% (relative 24.82%, 41.64%, and 109.79%) improvement in accuracy at high efficiency compared to state-of-the-art greedy “tree-based” methods [14] on Caltech-256 [15], SUN [32] and ImageNet 1K [9] dataset, respectively. Finally, we show that our branch-and-bound-like algorithm naturally ranks the paths in the hierarchy (Fig. 8) so that users can further process them.

6 0.10378783 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images

7 0.10317552 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection

8 0.1010516 189 iccv-2013-HOGgles: Visualizing Object Detection Features

9 0.10021653 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation

10 0.099887051 379 iccv-2013-Semantic Segmentation without Annotating Segments

11 0.092680864 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary

12 0.092354052 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

13 0.091467865 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

14 0.089814954 34 iccv-2013-Abnormal Event Detection at 150 FPS in MATLAB

15 0.089241751 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

16 0.088129148 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

17 0.087797381 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking

18 0.085399508 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

19 0.081804834 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration

20 0.080882058 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.212), (1, 0.041), (2, -0.033), (3, -0.055), (4, -0.003), (5, -0.033), (6, -0.061), (7, 0.028), (8, -0.068), (9, -0.083), (10, 0.022), (11, 0.008), (12, -0.04), (13, -0.06), (14, -0.006), (15, 0.017), (16, 0.042), (17, 0.076), (18, 0.079), (19, 0.013), (20, -0.02), (21, 0.031), (22, -0.023), (23, 0.007), (24, -0.042), (25, -0.03), (26, 0.013), (27, -0.044), (28, 0.027), (29, 0.004), (30, 0.016), (31, -0.002), (32, -0.021), (33, 0.009), (34, 0.05), (35, -0.023), (36, 0.006), (37, 0.018), (38, 0.081), (39, 0.015), (40, -0.047), (41, 0.008), (42, 0.101), (43, 0.033), (44, -0.04), (45, -0.025), (46, 0.009), (47, -0.047), (48, -0.069), (49, 0.089)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94494462 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection

Author: Iasonas Kokkinos

Abstract: We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance.

2 0.79172641 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

Author: Ross Girshick, Jitendra Malik

Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.

3 0.70808923 189 iccv-2013-HOGgles: Visualizing Object Detection Features

Author: Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba

Abstract: We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.

4 0.69141382 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation

Author: Jian Sun, Jean Ponce

Abstract: In this paper, we address the problem of learning discriminative part detectors from image sets with category labels. We propose a novel latent SVM model regularized by group sparsity to learn these part detectors. Starting from a large set of initial parts, the group sparsity regularizer forces the model to jointly select and optimize a set of discriminative part detectors in a max-margin framework. We propose a stochastic version of a proximal algorithm to solve the corresponding optimization problem. We apply the proposed method to image classification and cosegmentation, and quantitative experiments with standard benchmarks show that it matches or improves upon the state of the art.

5 0.69089532 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

Author: João F. Henriques, João Carreira, Rui Caseiro, Jorge Batista

Abstract: Competitive sliding window detectors require vast training sets. Since a pool of natural images provides a nearly endless supply of negative samples, in the form of patches at different scales and locations, training with all the available data is considered impractical. A staple of current approaches is hard negative mining, a method of selecting relevant samples, which is nevertheless expensive. Given that samples at slightly different locations have overlapping support, there seems to be an enormous amount of duplicated work. It is natural, then, to ask whether these redundancies can be eliminated. In this paper, we show that the Gram matrix describing such data is block-circulant. We derive a transformation based on the Fourier transform that block-diagonalizes the Gram matrix, at once eliminating redundancies and partitioning the learning problem. This decomposition is valid for any dense features and several learning algorithms, and takes full advantage of modern parallel architectures. Surprisingly, it allows training with all the potential samples in sets of thousands of images. By considering the full set, we generate in a single shot the optimal solution, which is usually obtained only after several rounds of hard negative mining. We report speed gains on Caltech Pedestrians and INRIA Pedestrians of over an order of magnitude, allowing training on a desktop computer in a couple of minutes.

6 0.68504781 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables

7 0.67546934 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

8 0.67403948 104 iccv-2013-Decomposing Bag of Words Histograms

9 0.66104931 241 iccv-2013-Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection

10 0.66006696 277 iccv-2013-Multi-channel Correlation Filters

11 0.65418971 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

12 0.64308584 220 iccv-2013-Joint Deep Learning for Pedestrian Detection

13 0.64052117 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

14 0.63589603 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

15 0.63497525 349 iccv-2013-Regionlets for Generic Object Detection

16 0.60831076 258 iccv-2013-Low-Rank Sparse Coding for Image Classification

17 0.60650671 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

18 0.60434145 388 iccv-2013-Shape Index Descriptors Applied to Texture-Based Galaxy Analysis

19 0.59822702 21 iccv-2013-A Method of Perceptual-Based Shape Decomposition

20 0.59676415 401 iccv-2013-Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.035), (7, 0.011), (12, 0.015), (26, 0.074), (31, 0.036), (34, 0.015), (35, 0.015), (40, 0.011), (42, 0.086), (48, 0.013), (64, 0.041), (73, 0.025), (89, 0.367), (93, 0.148), (95, 0.013), (98, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95598716 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection

Author: Iasonas Kokkinos

Abstract: We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance.

2 0.94534779 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation

Author: David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof

Abstract: In this work we present a novel method for the challenging problem of depth image upsampling. Modern depth cameras such as Kinect or Time of Flight cameras deliver dense, high quality depth measurements but are limited in their lateral resolution. To overcome this limitation we formulate a convex optimization problem using higher order regularization for depth image upsampling. In this optimization an anisotropic diffusion tensor, calculated from a high resolution intensity image, is used to guide the upsampling. We derive a numerical algorithm based on a primaldual formulation that is efficiently parallelized and runs at multiple frames per second. We show that this novel upsampling clearly outperforms state of the art approaches in terms of speed and accuracy on the widely used Middlebury 2007 datasets. Furthermore, we introduce novel datasets with highly accurate groundtruth, which, for the first time, enable to benchmark depth upsampling methods using real sensor data.

3 0.94072086 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition

Author: Ling Wang, Hichem Sahbi

Abstract: One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. In the main contributions of this work, we first describe actions in videos using directed acyclic graphs (DAGs), that naturally encode pairwise interactions between moving object parts, and then we compare these DAGs by analyzing the spectrum of their sub-patterns that capture complex higher order interactions. This extraction and comparison process is computationally tractable, re- sulting from the acyclic property of DAGs, and it also defines a positive semi-definite kernel. When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset.

4 0.92617071 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions

Author: Bernhard Zeisl, Kevin Köser, Marc Pollefeys

Abstract: We address the problem of wide-baseline registration of RGB-D data, such as photo-textured laser scans without any artificial targets or prediction on the relative motion. Our approach allows to fully automatically register scans taken in GPS-denied environments such as urban canyon, industrial facilities or even indoors. We build upon image features which are plenty, localized well and much more discriminative than geometry features; however, they suffer from viewpoint distortions and request for normalization. We utilize the principle of salient directions present in the geometry and propose to extract (several) directions from the distribution of surface normals or other cues such as observable symmetries. Compared to previous work we pose no requirements on the scanned scene (like containing large textured planes) and can handle arbitrary surface shapes. Rendering the whole scene from these repeatable directions using an orthographic camera generates textures which are identical up to 2D similarity transformations. This ambiguity is naturally handled by 2D features and allows to find stable correspondences among scans. For geometric pose estimation from tentative matches we propose a fast and robust 2 point sample consensus scheme integrating an early rejection phase. We evaluate our approach on different challenging real world scenes.

5 0.92587107 2 iccv-2013-3D Scene Understanding by Voxel-CRF

Author: Byung-Soo Kim, Pushmeet Kohli, Silvio Savarese

Abstract: Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with different depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth measurements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (voxels) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of par- tial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appealing results in improving the quality of the 3D reconstruction. We also demonstrate an interesting application of object removal and scene completion from RGB-D images.

6 0.92585915 129 iccv-2013-Dynamic Scene Deblurring

7 0.92457998 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos

8 0.92348671 302 iccv-2013-Optimization Problems for Fast AAM Fitting in-the-Wild

9 0.92334378 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects

10 0.92303729 103 iccv-2013-Deblurring by Example Using Dense Correspondence

11 0.92248088 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces

12 0.92192137 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera

13 0.92072296 317 iccv-2013-Piecewise Rigid Scene Flow

14 0.91804028 174 iccv-2013-Forward Motion Deblurring

15 0.91790068 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction

16 0.91749763 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

17 0.91713929 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching

18 0.91688597 81 iccv-2013-Combining the Right Features for Complex Event Recognition

19 0.91681743 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation

20 0.91606236 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences