iccv iccv2013 iccv2013-77 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.
Reference: text
sentIndex sentText sentNum sentScore
1 We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. [sent-8, score-0.899]
2 We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. [sent-9, score-0.534]
3 As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. [sent-12, score-0.287]
4 Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. [sent-13, score-1.365]
5 Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. [sent-14, score-0.96]
6 Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. [sent-16, score-0.863]
7 Codemaps segment, classify and search objects locally by reordering the encoding, pooling and classification steps of object classification. [sent-22, score-0.318]
8 Different from existing linear decompositions for specific pipelines, codemaps are generic, embed fast ℓ2 normalization, include nonlinearities by local kernel pooling and allow for segment retrieval using a single query image only. [sent-23, score-1.291]
9 Pushing localization with state-of-the-art encodings to an extreme, [8] classifies pixels with a Fisher kernel for semantic segmentation, as application of Fisher on regions would be practically infeasible. [sent-33, score-0.294]
10 We show that reordering the processing steps for object type classification into local pooling before classification has considerable advantages. [sent-36, score-0.349]
11 Where [17, 34] have shown the efficiency benefits of such decompositions for unnormalized bag-of-words with a linear classifier, our codemaps make three novel contributions. [sent-37, score-0.949]
12 In the first novelty, we use this result to introduce codemaps with ℓ2 normalization for arbitrarily shaped image regions (Section 4), essential to reach a better than state-of-the-art performance in semantic segmentation [1, 5]. [sent-39, score-1.122]
13 In the second novelty, using the same lattice across images, we include nonlinearity in the decomposition by local kernel pooling (Section 5), to bring us on par with the state-of-the-art in object classification [23], but 3x faster. [sent-40, score-0.446]
14 Thirdly, we demonstrate the effectiveness of codemaps in object segment retrieval from a single query image (Section 6). [sent-41, score-0.941]
15 Related work We structure our discussion on related work by the subsequent steps of (localized) object type segmentation and classification: semantic segmentation, feature encoding, feature pooling and kernel classification. [sent-43, score-0.428]
16 The feature pooling spatially aggregates the relevant local feature encodings into a global image representation. [sent-60, score-0.339]
17 We show that pooling over a region of interest is equivalent to a simpler two-level pooling for a particular family of mathematical functions. [sent-65, score-0.394]
18 Last but not least, we propose kernel pooling which embeds nonlinearities by explicit or approximate feature mappings [21,32] to assure state-of-the-art competitiveness [23]. [sent-77, score-0.397]
19 We call our approach codemaps and we will now highlight its theoretical foundation in a preliminary. [sent-78, score-0.811]
20 To ensure good generalization and flexibility, we consider that a) each node gi of the lattice is arbitrarily sized, shaped, and nonoverlapping, i. [sent-84, score-0.239]
21 gi ∩ gj = ∅, ∀gi , gj ∈ G, i j, and b) each area R where we search for the objects of interest are composed of multiple nodes R = g1 . [sent-86, score-0.229]
22 The pooling function h(R) combines these local codes within the region R to arrive at its global feature encoding. [sent-101, score-0.25]
23 , f(h(gl))), (2) where q is a classification pooling function that aggregates the localized classifier decisions over a region of interest. [sent-115, score-0.319]
24 (2) we see that the pooling function h needs to be applied to each of the lexes gi separately. [sent-117, score-0.488]
25 (1), we arrive at the first condition for obtaining a valid codemap: Condition 1 The pooling function h : A → B must be homomorphic from the space A to space B, so that h(R)= h UB(UhAh (g 1 ,) gh2(,g. [sent-119, score-0.278]
26 i ),h(gl)i (3) where A refers to the spatial domain formed by lexes {gi}, and B refers to the code pooling space defined by h. [sent-121, score-0.4]
27 When h stands for sum pooling or max pooling, UB is the sum operator or max operator. [sent-127, score-0.237]
28 In practice a homomorphic pooling means that we can first locally pool the encodings from each lex gi separately, then combine them to get the global feature encoding as if we operated on R in the first place. [sent-128, score-0.618]
29 In addition, we want the classifier f to also operate on each of the lexes gi individually. [sent-129, score-0.33]
30 Having a homomorphic function for the classifier f, one only needs to consider the individual scores of the lexes within R. [sent-136, score-0.277]
31 Normally, when classifying a region we first perform a global pooling on all the feature encodings contained in the region, and then we apply the classifier. [sent-137, score-0.33]
32 1, codemaps first break the global pooling into a collection of local feature poolings over lexes. [sent-139, score-1.048]
33 2, codemaps apply the classifier on the local feature poolings and perform a global pooling on the classification scores of the lexes. [sent-141, score-1.125]
34 ℓ2 normalization for arbitrary regions Modern feature encodings, such as Fisher vector, VLAD or bag-of-words, usually include a summation operator in the feature pooling function h. [sent-147, score-0.354]
35 However, feature encodings also profit highly from normalization before classification [23, 32]. [sent-151, score-0.27]
36 (7), to calculate the ℓ2 norm of a region R we only need to know the sum of the pair-wise dot product h(gi)Th(gj) between feature encodings of the lexes within the region. [sent-164, score-0.374]
37 We ob- serve in Figure 2(a) that Fisher codemaps allow for a considerable speed-up when classifying a number of arbitrary sized, shaped regions. [sent-181, score-0.885]
38 For evaluating 1,000 regions the Fisher codemap needs 23 seconds per image, as compared to 22 minutes when using the traditional Fisher vectors. [sent-182, score-0.235]
39 Moreover, for a large number of classes, codemaps still have a clear advantage. [sent-184, score-0.811]
40 , N for codemaps is shared across all the object classes. [sent-188, score-0.832]
41 Therefore, for classifying 1,000 object categories over 1,000 image regions, the Fisher codemaps are still 45x faster than the Fisher vectors. [sent-189, score-0.832]
42 We conclude that our ℓ2 normalized Fisher codemaps are mathematically equivalent to Fisher vectors, but much faster. [sent-190, score-0.847]
43 Nonlinear kernel pooling for classification In principle, codemaps work with arbitrary lattices for images. [sent-195, score-1.154]
44 We now approach codemaps from a kernel point of view, aiming to introduce nonlinearities for object classification using the same lattice across images. [sent-196, score-1.128]
45 Given two codemaps ΦX and ΦZ for image X and Z, the similarity between two images using a linear kernel is: KL(X, Z) = h(X)Th(Z) =X X h(gx)Th(gz), (11) gx ∈X gz ∈Z which is equivalent to the sum of the similarities using linear kernel between pair-wise lexes. [sent-197, score-1.091]
46 (a) ℓ2 normalized Fisher codemaps (with 400 lexes per image) are up to depending on the number of regions analyzed (note the log 10/ log 10 scales). [sent-210, score-1.089]
47 For the 4−500 lexes per image that usually suffice for semantic segmentation [2, 5, 33], the unnormalized and the normalized Fisher codemaps are practically as efficient, but the normalized Fisher codemaps are much more effective as shown in Table 1. [sent-212, score-2.145]
48 (c) Depending on the number of lexes, computing Fisher codemaps costs up to 600 MB memory per image, while storing them only needs less than 30 MB. [sent-213, score-0.826]
49 he(X) where = Pgx∈X ψ(h(gx)) indicates the nonlinear feature epooling fuPnction for image X, which is the sum of the set of pooled lexes applied by nonlinear kernel mapping ψ. [sent-214, score-0.389]
50 Thus we still use the sum operator for global feature pooling and a linear classifier, which lead to a valid codemap. [sent-216, score-0.235]
51 For normalization we need Ke(X, X) = 1, which is equivalent to use ℓ2 normalizatione on Then the resulting codemap with nonlinear kernel poeoling is defined as: he(X). [sent-217, score-0.368]
52 Applying nonlinear kernel pooling for each lex makes the global image feature encoding dependent on the partition of the lattice elements placeed on the image. [sent-222, score-0.551]
53 Consequently, codemaps with kernel pooling have a strong connection with spatial pyramid kernels. [sent-224, score-1.111]
54 For the spatial pyramid kernel we compute the similarity of each lex in an image only with itself, whereas for codemaps all the pair-wise similarities between lexes are considered. [sent-225, score-1.212]
55 Hence, one could view our codemaps with kernel pooling as an extension of the spatial pyramid kernels. [sent-226, score-1.111]
56 However, for our kernel pooling the final classifica- he(X) tion score is computed from a single lattice based on all the partitions of spatial pyramids without any redundancy, where spatial pyramids require multiple layouts. [sent-227, score-0.483]
57 As a result, codemaps with kernel pooling allow for the inclusion of richer spatial information in the final classification score at a nearly zero cost. [sent-228, score-1.178]
58 Experiments We demonstrate the efficiency and effectiveness of proposed codemaps by experiments on semantic segmentation, Region normalization Bag-of-words Fisher codemaps ––ℓ2 mAP 4. [sent-230, score-1.795]
59 Since these tasks all require repetitive computations on overlapping regions, performing them once with ℓ2 normalized codemaps and nonlinear kernel pooling leads to a considerable speedup. [sent-237, score-1.163]
60 ℓ2 normalized semantic segmentation In the first experiment we quantify the value ofcodemaps with ℓ2 normalization for semantic segmentation, where several image regions need to be evaluated on presence of objects and their type. [sent-241, score-0.33]
61 We use the Fisher codemaps from Section 4, with dense sampling of basic intensity SIFT descriptors per pixel at multiple scales and a Gaussian mixture model of 128 components. [sent-243, score-0.826]
62 We also consider the unnormalized Fisher codemap version and unnormalized bag-of-words features using a visual codebook of size 4,000, similar to the ones used in [34]. [sent-245, score-0.357]
63 Adding normalized Fisher codemaps on top of the CPMC-O2P improves the state-of-the-art in semantic segmentation for 8 out of 21 object categories. [sent-271, score-0.981]
64 Ad ingnormalizedFisher codemaps (bottom row) on top of the CPMC-O2P [5] (top row) appears to be beneficial when multiple objects appear simultaneously in the image. [sent-275, score-0.811]
65 Note for example the difficult case in the last column, where codemaps help better segmenting the motorbike on the poster and the motorbike in the right part of the image. [sent-276, score-0.811]
66 We observe that ℓ2 normalized Fisher codemaps outperform the unnormalized ones by far. [sent-281, score-0.94]
67 9 mAP (mean Average Precision), where the unnormalized Fisher codemaps obtain only 7. [sent-283, score-0.904]
68 While unnormalized Fisher codemaps outperform bag-of-words, the ℓ2 normalization is critical for linear regression, since we have to ensure that the overlap between each segment and itself is largest and equal to 1. [sent-285, score-1.017]
69 Calculating the normalized Fisher codemap is as efficient as the unnormalized version for up to 500 lexes. [sent-287, score-0.3]
70 For semantic segmentation in particular, since 4−500 lexes per image usually suffice [2, 5, 33], calculating the ℓ2 normalized Fisher codemaps is practically as efficient as the unnormalized one, but much more accurate. [sent-288, score-1.298]
71 in [5] 3 features and in [1] 58 features are used, we embed Fisher codemaps into the multi-feature approach of CPMCO2P [5] to improve the state-of-the-art in semantic segmentation. [sent-292, score-0.902]
72 2 mAP on the VOC 2011 val set respectively, where Fisher codemaps score 26. [sent-298, score-0.862]
73 Since both the features in [5] and ℓ2 normalized Fisher codemaps use a linear regressor, we rely on late fusion with linear weights learned on the val set to combine them. [sent-301, score-0.877]
74 Adding Fisher codemaps brings more precision to the image region rep- resentations. [sent-304, score-0.855]
75 We observe that Fisher codemaps are particularly helpful when multiple objects are present simultaneously. [sent-307, score-0.811]
76 We conclude that a combination of CPMCO2P with our ℓ2 normalized Fisher codemaps improves the state-of-the-art in semantic segmentation. [sent-308, score-0.914]
77 Nonlinear kernel pooling for classification In the second experiment we quantify the value of codemaps with ℓ2 normalization and nonlinear kernel pooling for object classification. [sent-311, score-1.52]
78 Since power normalization has shown to work particularly well for Fisher vectors [23], we implement Fisher codemaps with local Hellinger kernel pooling. [sent-316, score-0.984]
79 We therefore implement bag-of-words codemaps with local χ2 and histogram intersection kernel poolings using explicit feature maps [32]. [sent-318, score-0.972]
80 For both bag-of-words and Fisher, a codemap with ℓ2 normalization is mathematically equivalent to the regular ℓ2 normalized linear models. [sent-321, score-0.288]
81 Both bag-of-words and Fisher codemaps with the proposed ℓ2 normalization and nonlinear kernel pooling have the same accuracy as the state-of-the-art. [sent-333, score-1.192]
82 Where the best Fisher vectors require 18 seconds per image for evaluating all 20 classes, our codemaps require only 6 seconds. [sent-334, score-0.862]
83 With histogram intersection kernel pooling using approximate feature maps we obtain practically the same result for codemaps: 54. [sent-339, score-0.314]
84 With Hellinger kernel pooling we reach the same result. [sent-343, score-0.273]
85 However, our codemaps only need a single-resolution lattice, as compared to the multiple lattices required by spatial pyramid kernels. [sent-344, score-0.873]
86 Since it also costs around 6 seconds for Fisher vectors to test an image without any spatial pyramids, codemaps can include full spatial pyramids with nearly zero additional cost, but increase the mAP from 57. [sent-346, score-0.921]
87 Codemaps with our proposed ℓ2 normalization and nonlinear kernel pooling are as good as the state-of-the-art, but 3x more efficient to compute. [sent-351, score-0.381]
88 Codemaps for segmented object retrieval In the last experiment we take advantage of the efficiency benefits of ℓ2 normalized codemaps to revisit the old challenge of object segment retrieval [6] and we suggest a new solution. [sent-354, score-1.038]
89 We propose to apply codemaps for segmented object retrieval in a query-by-example setting. [sent-355, score-0.895]
90 We extract normalized Fisher codemaps in the same way as the previous experiment. [sent-358, score-0.847]
91 We first look for those lexes most similar to the segmented query, as the seeds. [sent-369, score-0.228]
92 Although no object classifier is available at query time, codemaps find satisfactory segments in the retrieved images using a single query image only. [sent-379, score-0.987]
93 Conclusions In this paper, we propose codemaps to segment, classify and search objects locally. [sent-381, score-0.83]
94 Codemaps reorder the encoding, pooling and classification steps of object classification. [sent-382, score-0.274]
95 Our first contribution is introduction of codemaps with ℓ2 normalization for arbitrarily shaped image regions. [sent-384, score-0.959]
96 Depending on the number of regions analyzed the normalized codemaps are up to 56x faster than traditional Fisher vectors. [sent-385, score-0.88]
97 The fast normalization enables us to reach a better than state-of-the-art performance in semantic segmentation [5] by inclusion of Fisher codemaps. [sent-386, score-0.228]
98 Our second contribution is the embedding of nonlinearities in the codemap decomposition by local kernel pooling. [sent-387, score-0.314]
99 Finally, we demonstrate that the efficiency gains of codemaps facilitate object segment retrieval from a single query image. [sent-390, score-0.966]
100 Besides segmentation, classification and search, we anticipate that other computer vision challenges may profit from codemaps as well. [sent-391, score-0.88]
wordName wordTfidf (topN-words)
[('codemaps', 0.811), ('fisher', 0.281), ('lexes', 0.194), ('pooling', 0.184), ('codemap', 0.171), ('gi', 0.11), ('lattice', 0.102), ('encodings', 0.101), ('unnormalized', 0.093), ('lex', 0.091), ('normalization', 0.081), ('kernel', 0.072), ('nonlinearities', 0.071), ('semantic', 0.067), ('gz', 0.061), ('gx', 0.059), ('homomorphic', 0.057), ('classification', 0.051), ('gj', 0.05), ('query', 0.048), ('segmentation', 0.046), ('nonlinear', 0.044), ('vlad', 0.04), ('shaped', 0.04), ('voc', 0.04), ('encoding', 0.039), ('gl', 0.039), ('guesses', 0.037), ('normalized', 0.036), ('pascal', 0.035), ('poolings', 0.034), ('segmented', 0.034), ('regions', 0.033), ('retrieved', 0.033), ('kernels', 0.032), ('segment', 0.032), ('val', 0.03), ('pyramids', 0.03), ('retrieval', 0.029), ('arbitrarily', 0.027), ('reordering', 0.026), ('classifier', 0.026), ('novelty', 0.026), ('region', 0.026), ('efficiency', 0.025), ('embed', 0.024), ('amsterdam', 0.023), ('blobworld', 0.023), ('usages', 0.023), ('wdhd', 0.023), ('spatial', 0.022), ('pyramid', 0.022), ('pipelines', 0.022), ('mr', 0.022), ('object', 0.021), ('arrive', 0.021), ('practically', 0.021), ('score', 0.021), ('ub', 0.021), ('stands', 0.021), ('smeulders', 0.02), ('tpami', 0.02), ('vectors', 0.02), ('decompositions', 0.02), ('feature', 0.019), ('search', 0.019), ('brings', 0.018), ('ke', 0.018), ('explicit', 0.018), ('intersection', 0.018), ('arbitrary', 0.018), ('sized', 0.018), ('dot', 0.018), ('hellinger', 0.018), ('lattices', 0.018), ('profit', 0.018), ('reorder', 0.018), ('inclusion', 0.017), ('reach', 0.017), ('superpixels', 0.017), ('zi', 0.017), ('mappings', 0.017), ('netherlands', 0.017), ('locally', 0.017), ('hypotheses', 0.017), ('neighborhood', 0.016), ('considerable', 0.016), ('valid', 0.016), ('localized', 0.016), ('par', 0.016), ('embeds', 0.016), ('aggregates', 0.016), ('sum', 0.016), ('seconds', 0.016), ('wth', 0.016), ('uijlings', 0.015), ('optimizations', 0.015), ('iarpa', 0.015), ('suffice', 0.015), ('per', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
Author: Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.
2 0.14669605 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
3 0.12158781 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
4 0.09984719 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
Author: Weixin Li, Qian Yu, Ajay Divakaran, Nuno Vasconcelos
Abstract: The problem of adaptively selecting pooling regions for the classification of complex video events is considered. Complex events are defined as events composed of several characteristic behaviors, whose temporal configuration can change from sequence to sequence. A dynamic pooling operator is defined so as to enable a unified solution to the problems of event specific video segmentation, temporal structure modeling, and event detection. Video is decomposed into segments, and the segments most informative for detecting a given event are identified, so as to dynamically determine the pooling operator most suited for each sequence. This dynamic pooling is implemented by treating the locations of characteristic segments as hidden information, which is inferred, on a sequence-by-sequence basis, via a large-margin classification rule with latent variables. Although the feasible set of segment selections is combinatorial, it is shown that a globally optimal solution to the inference problem can be obtained efficiently, through the solution of a series of linear programs. Besides the coarselevel location of segments, a finer model of video struc- ture is implemented by jointly pooling features of segmenttuples. Experimental evaluation demonstrates that the re- sulting event detector has state-of-the-art performance on challenging video datasets.
5 0.089597613 396 iccv-2013-Space-Time Robust Representation for Action Recognition
Author: Nicolas Ballas, Yi Yang, Zhen-Zhong Lan, Bertrand Delezoide, Françoise Prêteux, Alexander Hauptmann
Abstract: We address the problem of action recognition in unconstrained videos. We propose a novel content driven pooling that leverages space-time context while being robust toward global space-time transformations. Being robust to such transformations is of primary importance in unconstrained videos where the action localizations can drastically shift between frames. Our pooling identifies regions of interest using video structural cues estimated by different saliency functions. To combine the different structural information, we introduce an iterative structure learning algorithm, WSVM (weighted SVM), that determines the optimal saliency layout ofan action model through a sparse regularizer. A new optimization method isproposed to solve the WSVM’ highly non-smooth objective function. We evaluate our approach on standard action datasets (KTH, UCF50 and HMDB). Most noticeably, the accuracy of our algorithm reaches 51.8% on the challenging HMDB dataset which outperforms the state-of-the-art of 7.3% relatively.
6 0.088288493 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection
7 0.071601458 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
8 0.068638645 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
9 0.06863524 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
10 0.064030424 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
11 0.060152963 379 iccv-2013-Semantic Segmentation without Annotating Segments
12 0.059716001 39 iccv-2013-Action Recognition with Improved Trajectories
13 0.058665127 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
14 0.057468966 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
15 0.057128884 104 iccv-2013-Decomposing Bag of Words Histograms
16 0.057050567 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
17 0.056148309 10 iccv-2013-A Framework for Shape Analysis via Hilbert Space Embedding
18 0.055742308 4 iccv-2013-ACTIVE: Activity Concept Transitions in Video Event Classification
19 0.049670208 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
20 0.047253378 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
topicId topicWeight
[(0, 0.108), (1, 0.042), (2, 0.011), (3, -0.03), (4, 0.031), (5, 0.059), (6, -0.02), (7, 0.004), (8, -0.037), (9, -0.064), (10, 0.04), (11, 0.006), (12, 0.022), (13, -0.025), (14, -0.05), (15, -0.021), (16, 0.047), (17, -0.034), (18, 0.048), (19, -0.043), (20, 0.011), (21, 0.001), (22, -0.001), (23, 0.026), (24, -0.038), (25, 0.07), (26, 0.007), (27, 0.015), (28, -0.01), (29, 0.074), (30, 0.007), (31, -0.039), (32, -0.075), (33, -0.036), (34, -0.07), (35, 0.007), (36, -0.006), (37, -0.098), (38, 0.054), (39, 0.065), (40, -0.062), (41, -0.045), (42, -0.017), (43, 0.039), (44, 0.01), (45, 0.029), (46, 0.025), (47, -0.03), (48, 0.015), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.91385925 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
Author: Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.
2 0.68717664 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang
Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.
3 0.66817153 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
4 0.64866626 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
Author: Giorgos Tolias, Yannis Avrithis, Hervé Jégou
Abstract: This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.
5 0.61369002 288 iccv-2013-Nested Shape Descriptors
Author: Jeffrey Byrne, Jianbo Shi
Abstract: In this paper, we propose a new family of binary local feature descriptors called nested shape descriptors. These descriptors are constructed by pooling oriented gradients over a large geometric structure called the Hawaiian earring, which is constructed with a nested correlation structure that enables a new robust local distance function called the nesting distance. This distance function is unique to the nested descriptor and provides robustness to outliers from order statistics. In this paper, we define the nested shape descriptor family and introduce a specific member called the seed-of-life descriptor. We perform a trade study to determine optimal descriptor parameters for the task of image matching. Finally, we evaluate performance compared to state-of-the-art local feature descriptors on the VGGAffine image matching benchmark, showing significant performance gains. Our descriptor is thefirst binary descriptor to outperform SIFT on this benchmark.
6 0.59497535 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
7 0.58911532 104 iccv-2013-Decomposing Bag of Words Histograms
8 0.58669531 202 iccv-2013-How Do You Tell a Blackbird from a Crow?
9 0.58096564 48 iccv-2013-An Adaptive Descriptor Design for Object Recognition in the Wild
10 0.57489812 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
11 0.56971407 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection
12 0.56787717 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
13 0.55585945 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint
14 0.54896808 388 iccv-2013-Shape Index Descriptors Applied to Texture-Based Galaxy Analysis
15 0.54608041 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
17 0.51057464 401 iccv-2013-Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology
18 0.49603838 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
19 0.49300542 258 iccv-2013-Low-Rank Sparse Coding for Image Classification
20 0.49295735 193 iccv-2013-Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification
topicId topicWeight
[(2, 0.069), (4, 0.015), (7, 0.014), (13, 0.024), (24, 0.012), (26, 0.089), (31, 0.029), (32, 0.232), (35, 0.014), (40, 0.01), (42, 0.088), (64, 0.029), (73, 0.022), (77, 0.014), (89, 0.194), (95, 0.011), (98, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.81152177 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
Author: Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.
2 0.80750191 255 iccv-2013-Local Signal Equalization for Correspondence Matching
Author: Derek Bradley, Thabo Beeler
Abstract: Correspondence matching is one of the most common problems in computer vision, and it is often solved using photo-consistency of local regions. These approaches typically assume that the frequency content in the local region is consistent in the image pair, such that matching is performed on similar signals. However, in many practical situations this is not the case, for example with low depth of field cameras a scene point may be out of focus in one view and in-focus in the other, causing a mismatch of frequency signals. Furthermore, this mismatch can vary spatially over the entire image. In this paper we propose a local signal equalization approach for correspondence matching. Using a measure of local image frequency, we equalize local signals using an efficient scale-space image representation such that their frequency contents are optimally suited for matching. Our approach allows better correspondence matching, which we demonstrate with a number of stereo reconstruction examples on synthetic and real datasets.
3 0.75222069 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
Author: Zhiwu Huang, Xiaowei Zhao, Shiguang Shan, Ruiping Wang, Xilin Chen
Abstract: The Still-to-Video (S2V) face recognition systems typically need to match faces in low-quality videos captured under unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, lowface resolutions, varying headpose, complex lighting, and alignment difficulty. To address the problem, one solution is to select the frames of ‘best quality ’ from videos (hereinafter called quality alignment in this paper). Meanwhile, the faces in the selected frames should also be geometrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasks–quality alignment, geometric alignment and face recognition–can benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive , experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.
4 0.74299031 223 iccv-2013-Joint Noise Level Estimation from Personal Photo Collections
Author: Yichang Shih, Vivek Kwatra, Troy Chinen, Hui Fang, Sergey Ioffe
Abstract: Personal photo albums are heavily biased towards faces of people, but most state-of-the-art algorithms for image denoising and noise estimation do not exploit facial information. We propose a novel technique for jointly estimating noise levels of all face images in a photo collection. Photos in a personal album are likely to contain several faces of the same people. While some of these photos would be clean and high quality, others may be corrupted by noise. Our key idea is to estimate noise levels by comparing multiple images of the same content that differ predominantly in their noise content. Specifically, we compare geometrically and photometrically aligned face images of the same person. Our estimation algorithm is based on a probabilistic formulation that seeks to maximize the joint probability of estimated noise levels across all images. We propose an approximate solution that decomposes this joint maximization into a two-stage optimization. The first stage determines the relative noise between pairs of images by pooling estimates from corresponding patch pairs in a probabilistic fashion. The second stage then jointly optimizes for all absolute noise parameters by conditioning them upon relative noise levels, which allows for a pairwise factorization of the probability distribution. We evaluate our noise estimation method using quantitative experiments to measure accuracy on synthetic data. Additionally, we employ the estimated noise levels for automatic denoising using “BM3D”, and evaluate the quality of denoising on real-world photos through a user study.
5 0.7166512 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
6 0.71660185 404 iccv-2013-Structured Forests for Fast Edge Detection
7 0.7165826 258 iccv-2013-Low-Rank Sparse Coding for Image Classification
8 0.71632576 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
9 0.71630335 21 iccv-2013-A Method of Perceptual-Based Shape Decomposition
10 0.71589762 6 iccv-2013-A Convex Optimization Framework for Active Learning
11 0.71540177 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
12 0.71462482 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
13 0.71411037 169 iccv-2013-Fine-Grained Categorization by Alignments
14 0.71397454 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
15 0.71382964 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
16 0.71371222 396 iccv-2013-Space-Time Robust Representation for Action Recognition
17 0.71362019 414 iccv-2013-Temporally Consistent Superpixels
18 0.71361333 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
19 0.71329904 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
20 0.71315223 18 iccv-2013-A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution