cvpr cvpr2013 cvpr2013-247 knowledge-graph by maker-knowledge-mining

247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

Source: pdf

Author: Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori

Abstract: We conduct image classification by learning a class-toimage distance function that matches objects. The set of objects in training images for an image class are treated as a collage. When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. State-of-the-art image classification results are obtained, and qualitative results demonstrate that objects can be accurately matched.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca a5 Abstract We conduct image classification by learning a class-toimage distance function that matches objects. [sent-3, score-0.305]

2 When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. [sent-5, score-0.37]

3 We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. [sent-6, score-0.509]

4 We formulate a class-to-image distance for matching to an unseen image that looks for a set of similar objects in similar spatial arrangements to those found in a set of training images. [sent-13, score-0.429]

5 The distance between this collage of objects and a test image is used to classify the test image. [sent-14, score-0.417]

6 Detailed reasoning about object segmentation can also assist in image classification [3]. [sent-26, score-0.331]

7 airport class now working test image at ject matchings between the airport class and a test image. [sent-32, score-1.149]

8 There are four major object categories in the training airport images: “sky”, “airplane”, “road” and “tree”. [sent-33, score-0.481]

9 We match the dashed objects from the training side to the objects in the test image, from which the class-to-image distance is calculated. [sent-34, score-0.426]

10 We are inspired by two recent lines of work Object Bank [20], which takes a statistical view of object presence, and exemplar SVM [22] which considers matching individual exemplar objects. [sent-42, score-0.329]

11 [20] showed that a large bank of object detectors is an effective feature for image classification building a feature vector that captures the statistics of object detector responses. [sent-44, score-0.555]

12 We present a novel latent variable distance function learning framework that considers matchings of objects between a test image and a set of training images from one class. [sent-48, score-0.991]

13 We develop efficient representations for the relationships between objects in this latent variable framework. [sent-49, score-0.307]

14 We show empirically that this method is effective, and that reasoning about objects and their relations in images can lead to high quality classification performance. [sent-50, score-0.3]

15 Malisiewicz and Efros [22] learn per-exemplar distance functions for data association based object detection. [sent-55, score-0.318]

16 [20] tackle scene classification by representing an image as Object Bank a feature vector that captures the statistics of object detectors. [sent-57, score-0.329]

17 Wang and Forsyth [29] jointly learn object categories and visual attributes in a multiple instance learning framework. [sent-60, score-0.299]

18 [27] exploit contextual relevance of objects by modeling object co-occurrences. [sent-63, score-0.3]

19 [33] also measure image-to-class distance by learning Mahalanobis distance metrics. [sent-75, score-0.324]

20 The Object Matching Based Distance Model Our goal is to learn a class-to-image distance function that jointly capture object matchings, the pairwise interactions among objects, as well as the global image appearance. [sent-83, score-0.484]

21 We start with an example (Figure 1) that illustrates calculating the class-to-image distance from the airport class to a test image. [sent-84, score-0.486]

22 The airport class is represented as a collage of object sets (i. [sent-85, score-0.526]

23 In essence, our distance model matches to a test image with a set of similar objects in similar spatial arrangements from training images. [sent-88, score-0.464]

24 Our model consists of three components: the unary object distance, the pairwise object distance, and the global image appearance distance. [sent-89, score-0.815]

25 The unary object distance measures the object-level distance from an image class to a test image. [sent-90, score-0.971]

26 In our example, we match one object from each of the four object sets (“sky”, “airplane”, “road” and “tree”) to the test image. [sent-91, score-0.4]

27 The unary object distance is a summation over the four distances calculated from the four object matchings. [sent-93, score-0.891]

28 The pairwise object distance measures the distance of spatial arrangements of objects from an image class to a test image. [sent-94, score-0.887]

29 In our example, the matched objects in the test image meet the three popular spatial relations in the training airport images. [sent-95, score-0.568]

30 Thus, we further pull the test image close to the airport scene. [sent-96, score-0.295]

31 Finally, our distance model takes the global image features into account and calculates the global image appearance distance accordingly. [sent-97, score-0.426]

32 For an image class C, we gather together all the objects in the training images belonging to this class to make up the object sets O = {Oi}i∈V, where V denotes all the object categories i nO O =, a {ndO O}i is ,th we seerte eo Vf objects san anllo tthateed o bwjeitcht category si ∈n OV,. [sent-104, score-0.919]

33 yGi ive ∈n an image x, our model is a distance function iDnθ O O(C, x) (here θ are the parameters of this function) that measures the class-to-image distance from C to x based on object matchings. [sent-106, score-0.494]

34 First, even though the ground-truth object bounding boxes are readily available in the training images, we do not have 777779999966444 annotated objects on the test image set. [sent-109, score-0.454]

35 We model the location/scale configurations of the “hypothesized” objects as latent variables and infer them implicitly in our model. [sent-111, score-0.336]

36 The latent variables are denoted as H = {Hi}i∈V, where Hi is the set of “hypothesized” object configurations ,i nw category i. [sent-112, score-0.536]

37 iAm sgeecson adnd challenge cliluesd eins finding tthse i optimal object matchings from O to H. [sent-117, score-0.641]

38 If we only consider the unary object cdhiisntagnsc fero, we can f Hind. [sent-118, score-0.508]

39 tfhe w optimal object matching separately within each object category by choosing the closest pair over the bipartite matchings between Oi and Hi. [sent-119, score-1.015]

40 Therefore, we need to jointly consider the unary object distance as well as the pairwise interactions. [sent-121, score-0.753]

41 To address the problem, we model the object matchings as a set of latent variables M = {(ui, vi)}i∈V, where ui and vi are o bfo ltaht object i nabdliecses M, Man =d th {(eu pair ()u}i, vi) indicates that object Ouii is matched to object Hvii for category i. [sent-122, score-1.502]

42 eGcitve On thies mclaatscsh eCd taon dob tjhecet image x, we can find the optimal settings of H and M by minimizing the distance over aalll possible object configurations amnidz nalgl possible oncbeject matchings. [sent-123, score-0.462]

43 Φ(O, H, M, x) is a linear function measuring the distance frΦo(mO ,CH t,oM x accordingly ator putative object configurations H and putative object matchings M. [sent-127, score-1.123]

44 ψ(O, H, M): This function measures the unary object distaψnc(Oe b,Hetw,eMen) O: T ahnids fHu nbcatisoend on atsheu object matchings tM di. [sent-136, score-1.179]

45 The unary object distance is then calculated as a weighted summation over all base distances. [sent-138, score-0.689]

46 c eN boettwe teheant αit is a sHcalar parameter that weights the t-th distance measure for all the category-i objects – high weights indicate discriminative object categories. [sent-145, score-0.442]

47 Given two object categories (i, j) and the matched objects (Hvii , Hvjj ) in the image x, we define ρk (Hvii , Hvjj ) = −1 if the spatial relation between Hvii and Hvjj is consistent with a spatial relation k, and ρk (Hvii , Hvjj ) = 0 otherwise. [sent-150, score-0.588]

48 The pairwise object distance is parameterized as: β? [sent-151, score-0.386]

49 k where βijk is a scalar parameter that weights the spatial relation k between object categories iand j high weights indicate discriminative spatial relations. [sent-158, score-0.46]

50 This function implements the idea that – we should pull the image x close to the class C if the spatial relations between the matched objects in the image x are discriminative for the class C. [sent-160, score-0.512]

51 In our experiments, we use the bag-of-word features [4] for object classification on PASCAL 07, and the GIST descriptors [25] for scene classification on SUN 09. [sent-168, score-0.424]

52 locations and scales) for each object category, search over all the possible object matchings, and find the complete configurations and object β, × matchings that jointly minimize the objective function. [sent-178, score-1.11]

53 If we only consider the unary object distance, this results in 777779999977555 inferring the optimal object configuration and object matching within each object category independently. [sent-179, score-1.224]

54 First, we reduce the search space of location/scale configurations for the objects in an object category. [sent-185, score-0.355]

55 In our experiments, we use respectively 5 and 10 candidate configurations for each object category per PASCAL 07 and SUN 09 image. [sent-187, score-0.444]

56 We keep using the notation Hi to denote the candidate configurations uofs object category Hi. [sent-188, score-0.444]

57 1, we restrict the selected object for object category i to one of its corresponding candidate configurations in Hi. [sent-190, score-0.616]

58 Given the candidate configurations Hi, there are |Oi | G|Hivie |e possible object matchings fioorn sth He object category ×i. [sent-192, score-1.085]

59 |IHt is| costly ltoe coobjnescidte mr aatlcl hoinf gthse fmor, especially s ciantceewe need to jointly regard all the object categories in find- ing the optimal set of object matchings. [sent-193, score-0.439]

60 Iyn id beyta oiln, fyor c oenascidh ecrainndgid |Hate| object configuration Hiv ∈ Hi, we compute the distance from all tchoen objects inn HOi t∈o Hit. [sent-195, score-0.469]

61 We then assign a candidate object matching by pairing Hiv to its closest object Oui∗ in Oi. [sent-196, score-0.458]

62 (6) Note that the candidate object matchings are still latent (i. [sent-199, score-0.855]

63 1, we require each object category to select one object matching from the candidate set. [sent-203, score-0.575]

64 ETahceh n nooddee ei i nha tsh e|H Mi |R possible states, sw tohe arne tohbeje unary energy Tfhore e naocdhe s ita hteas sis | Hthe| dpoisstasinbclee calculated by Eq. [sent-207, score-0.371]

65 An edge (i, j) in the MRF corresponds to the relation between object categories iand j. [sent-209, score-0.298]

66 when the relation between object categories is represented by a complete graph. [sent-213, score-0.298]

67 In detail, we first assume that only one spatial relation matters for a given pair of object categories, and we choose it as the most frequent spatial relation. [sent-215, score-0.389]

68 7 constrains that the classto-image distance from class C to a negative image xn should be larger than the distance to a positive image xp by a large margin. [sent-228, score-0.444]

69 It is also possible to learn our distance model by using the ground-truth object bounding boxes annotated in the training images without inferring the latent “hypothesized” configurations. [sent-241, score-0.608]

70 The goal is to predict the presence of an object category in a test image. [sent-257, score-0.345]

71 A typical image has around 3 object instances in 2 object categories. [sent-258, score-0.386]

72 On average, an object category contains 783 object instances in the training image set. [sent-259, score-0.549]

73 A typical image has around 11 object instances in 5 object categories. [sent-264, score-0.386]

74 On average, there are 417 object instances per object category in the training image set. [sent-265, score-0.549]

75 We perform classification tasks on 58 scene classes each containing at least 10 training and 10 test images1 . [sent-266, score-0.299]

76 First, the number of object instances per category in SUN 09 is significantly larger than that in SUN (417 as compared to around 65). [sent-270, score-0.331]

77 Local object features: We select or design several stateof-the-art features that are potentially useful for representing object categories. [sent-272, score-0.344]

78 A 128-dimensional texton histogram is built for each object 1We manually extract the scene labels for the SUN 09 images as they are not included in the original release. [sent-280, score-0.299]

79 We further develop two unary models based on Eqs. [sent-301, score-0.336]

80 5 and 3: Global+ Unary, where object matchings are infered using Eq. [sent-302, score-0.641]

81 6; and Global+ Unary-Latent, where object matchings are fixed by setting αit = 1in Eq. [sent-303, score-0.641]

82 The two unary models are designed to test the efficacy of latent object matchings. [sent-305, score-0.802]

83 We also build our own object bank representations for PASCAL 07. [sent-410, score-0.353]

84 For an image, the representation is a 20-dimensional feature vector, where each dimension corresponds to an object category in PASCAL 07, and its value is the maximum response of an object detector. [sent-411, score-0.461]

85 This demonstrates that the object matchings learned by local object models (i. [sent-417, score-0.813]

86 Now we consider Global+ Unary-Latent, Global+ Unary and Full to evaluate the efficacy of latent object matchings. [sent-420, score-0.41]

87 This is reasonable since the goal of PASCAL 07 classification is to decide the presence of an object category in a given test image. [sent-424, score-0.44]

88 Once the object detector fires on the test image, matching the detected object to a particular object in the class does not significantly affect the overall classification performance. [sent-425, score-0.805]

89 The only difference is that here we employ a 111-dimensional object bank representation, where each dimension corresponds to an object category in SUN 09. [sent-430, score-0.577]

90 Now we evaluate the efficacy of latent object matchings. [sent-438, score-0.41]

91 Recall that Global+ Unary-Latent uses fixed object matchings, Global+ Unary uses latent object matchings based on the unary object distance, and our Full model uses latent object matchings inferred by the combination of unary and pairwise object distance. [sent-439, score-2.844]

92 The color of the bounding box shows the relative importance of the objects in distance calculation (sorted by the unary object distance): red > blue > green > yellow. [sent-450, score-0.773]

93 This shows the efficacy of our latent object matching method on scene classification. [sent-452, score-0.525]

94 As compared to object classification on PASCAL 07, where the class label is purely determined by one object in the image, scene classification on SUN 09 is more complicated because we need to consider a collection of objects and their correlations to correctly classify a test image. [sent-454, score-0.826]

95 Conclusion We have presented a discriminative model to learn classto-image distances for image classification by considering 78 7 09 901 9 9 the object matchings between a test image and a set of training images from one class. [sent-460, score-0.903]

96 The model integrates three types of complementary distance including the unary object distance, the pairwise object distance and the global image appearance distance. [sent-461, score-1.107]

97 Our experiments validates the efficacy of our model in object classification and scene classification tasks. [sent-463, score-0.543]

98 image retrieval with object matchings or video classification/retrieval with action matchings. [sent-466, score-0.641]

99 Image retrieval with structured object queries using latent ranking svm. [sent-577, score-0.325]

100 A discriminative latent model of image region and object tag correspondence. [sent-682, score-0.36]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('matchings', 0.469), ('unary', 0.336), ('hvii', 0.212), ('airport', 0.199), ('pascal', 0.175), ('object', 0.172), ('sun', 0.168), ('latent', 0.153), ('distance', 0.146), ('hvjj', 0.121), ('category', 0.117), ('bank', 0.116), ('classification', 0.095), ('configurations', 0.094), ('hiv', 0.091), ('ouii', 0.091), ('objects', 0.089), ('relations', 0.087), ('class', 0.085), ('efficacy', 0.085), ('full', 0.075), ('hypothesized', 0.074), ('hi', 0.073), ('ijk', 0.07), ('collage', 0.07), ('gist', 0.069), ('pairwise', 0.068), ('global', 0.067), ('texton', 0.065), ('representations', 0.065), ('categories', 0.064), ('scene', 0.062), ('relation', 0.062), ('frome', 0.062), ('candidate', 0.061), ('nrbm', 0.061), ('weilong', 0.061), ('pn', 0.06), ('test', 0.056), ('matching', 0.053), ('lbp', 0.053), ('exemplar', 0.052), ('airplane', 0.052), ('svm', 0.051), ('tjhecet', 0.05), ('spatial', 0.048), ('lan', 0.048), ('inference', 0.048), ('arrangements', 0.047), ('unannotated', 0.047), ('harzallah', 0.047), ('training', 0.046), ('weighed', 0.045), ('matched', 0.043), ('malisiewicz', 0.042), ('instances', 0.042), ('wang', 0.042), ('road', 0.042), ('oi', 0.042), ('rabinovich', 0.041), ('pull', 0.04), ('classes', 0.04), ('encoding', 0.04), ('contextual', 0.039), ('tree', 0.038), ('xp', 0.037), ('fisher', 0.037), ('ap', 0.036), ('assist', 0.035), ('mori', 0.035), ('sky', 0.035), ('discriminative', 0.035), ('putative', 0.035), ('singer', 0.035), ('boiman', 0.035), ('calculated', 0.035), ('validates', 0.034), ('subgradient', 0.033), ('labelme', 0.033), ('pair', 0.032), ('boxes', 0.032), ('mrf', 0.032), ('learning', 0.032), ('examine', 0.032), ('matches', 0.032), ('inn', 0.032), ('jointly', 0.031), ('scalar', 0.031), ('berg', 0.031), ('negative', 0.03), ('bounding', 0.03), ('configuration', 0.03), ('measures', 0.03), ('distances', 0.03), ('histograms', 0.029), ('reasoning', 0.029), ('annotated', 0.029), ('significance', 0.028), ('formally', 0.027), ('frequent', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

Author: Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori

2 0.14049532 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.

3 0.13996366 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models

Author: Quanshi Zhang, Xuan Song, Xiaowei Shao, Ryosuke Shibasaki, Huijing Zhao

Abstract: An object model base that covers a large number of object categories is of great value for many computer vision tasks. As artifacts are usually designed to have various textures, their structure is the primary distinguishing feature between different categories. Thus, how to encode this structural information and how to start the model learning with a minimum of human labeling become two key challenges for the construction of the model base. We design a graphical model that uses object edges to represent object structures, and this paper aims to incrementally learn this category model from one labeled object and a number of casually captured scenes. However, the incremental model learning may be biased due to the limited human labeling. Therefore, we propose a new strategy that uses the depth information in RGBD images to guide the model learning for object detection in ordinary RGB images. In experiments, the proposed method achieves superior performance as good as the supervised methods that require the labeling of all target objects.

4 0.12736408 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

5 0.12689717 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters

Author: Matthieu Guillaumin, Luc Van_Gool, Vittorio Ferrari

Abstract: Pairwise discrete energies defined over graphs are ubiquitous in computer vision. Many algorithms have been proposed to minimize such energies, often concentrating on sparse graph topologies or specialized classes of pairwise potentials. However, when the graph is fully connected and the pairwise potentials are arbitrary, the complexity of even approximate minimization algorithms such as TRW-S grows quadratically both in the number of nodes and in the number of states a node can take. Moreover, recent applications are using more and more computationally expensive pairwise potentials. These factors make it very hard to employ fully connected models. In this paper we propose a novel, generic algorithm to approximately minimize any discrete pairwise energy function. Our method exploits tractable sub-energies to filter the domain of the function. The parameters of the filter are learnt from instances of the same class of energies with good candidate solutions. Compared to existing methods, it efficiently handles fully connected graphs, with many states per node, and arbitrary pairwise potentials, which might be expensive to compute. We demonstrate experimentally on two applications that our algorithm is much more efficient than other generic minimization algorithms such as TRW-S, while returning essentially identical solutions.

6 0.12460856 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

7 0.12425089 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

8 0.12261891 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

9 0.12170238 364 cvpr-2013-Robust Object Co-detection

10 0.11946128 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning

11 0.11687324 335 cvpr-2013-Poselet Conditioned Pictorial Structures

12 0.11238357 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People

13 0.1115566 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

14 0.11029638 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

15 0.10604471 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences

16 0.10484955 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

17 0.10334413 325 cvpr-2013-Part Discovery from Partial Correspondence

18 0.10280176 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

19 0.10194194 25 cvpr-2013-A Sentence Is Worth a Thousand Pixels

20 0.099322729 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.254), (1, -0.078), (2, 0.007), (3, -0.035), (4, 0.111), (5, 0.059), (6, 0.021), (7, 0.105), (8, -0.049), (9, -0.018), (10, -0.024), (11, -0.01), (12, -0.01), (13, -0.006), (14, -0.032), (15, -0.012), (16, 0.043), (17, 0.04), (18, 0.017), (19, -0.055), (20, 0.011), (21, -0.04), (22, 0.106), (23, 0.029), (24, 0.06), (25, -0.007), (26, -0.024), (27, 0.073), (28, -0.033), (29, -0.068), (30, -0.078), (31, -0.024), (32, 0.032), (33, -0.026), (34, 0.004), (35, 0.046), (36, 0.046), (37, 0.008), (38, -0.021), (39, -0.02), (40, 0.0), (41, 0.017), (42, 0.019), (43, 0.015), (44, 0.063), (45, -0.047), (46, -0.027), (47, -0.021), (48, -0.014), (49, 0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94525129 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

Author: Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori

2 0.80998057 25 cvpr-2013-A Sentence Is Worth a Thousand Pixels

Author: Sanja Fidler, Abhishek Sharma, Raquel Urtasun

Abstract: We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models [8].

3 0.80067134 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

Author: Xiaolong Wang, Liang Lin, Lichao Huang, Shuicheng Yan

Abstract: This paper proposes a reconfigurable model to recognize and detect multiclass (or multiview) objects with large variation in appearance. Compared with well acknowledged hierarchical models, we study two advanced capabilities in hierarchy for object modeling: (i) “switch” variables(i.e. or-nodes) for specifying alternative compositions, and (ii) making local classifiers (i.e. leaf-nodes) shared among different classes. These capabilities enable us to account well for structural variabilities while preserving the model compact. Our model, in the form of an And-Or Graph, comprises four layers: a batch of leaf-nodes with collaborative edges in bottom for localizing object parts; the or-nodes over bottom to activate their children leaf-nodes; the andnodes to classify objects as a whole; one root-node on the top for switching multiclass classification, which is also an or-node. For model training, we present an EM-type algorithm, namely dynamical structural optimization (DSO), to iteratively determine the structural configuration, (e.g., leaf-node generation associated with their parent or-nodes and shared across other classes), along with optimizing multi-layer parameters. The proposed method is valid on challenging databases, e.g., PASCAL VOC2007and UIUCPeople, and it achieves state-of-the-arts performance.

4 0.78965747 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection

Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.

5 0.78828865 417 cvpr-2013-Subcategory-Aware Object Classification

Author: Jian Dong, Wei Xia, Qiang Chen, Jianshi Feng, Zhongyang Huang, Shuicheng Yan

Abstract: In this paper, we introduce a subcategory-aware object classification framework to boost category level object classification performance. Motivated by the observation of considerable intra-class diversities and inter-class ambiguities in many current object classification datasets, we explicitly split data into subcategories by ambiguity guided subcategory mining. We then train an individual model for each subcategory rather than attempt to represent an object category with a monolithic model. More specifically, we build the instance affinity graph by combining both intraclass similarity and inter-class ambiguity. Visual subcategories, which correspond to the dense subgraphs, are detected by the graph shift algorithm and seamlessly integrated into the state-of-the-art detection assisted classification framework. Finally the responses from subcategory models are aggregated by subcategory-aware kernel regression. The extensive experiments over the PASCAL VOC 2007 and PASCAL VOC 2010 databases show the state-ofthe-art performance from our framework.

6 0.77620715 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

7 0.77221143 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models

8 0.76507545 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

9 0.75956273 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

10 0.75116694 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

11 0.73772597 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

12 0.73554897 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters

13 0.73528564 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations

14 0.73257571 364 cvpr-2013-Robust Object Co-detection

15 0.72295272 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

16 0.71341896 382 cvpr-2013-Scene Text Recognition Using Part-Based Tree-Structured Character Detection

17 0.7079885 325 cvpr-2013-Part Discovery from Partial Correspondence

18 0.7069 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

19 0.70134038 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

20 0.70100391 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.128), (16, 0.032), (26, 0.043), (28, 0.011), (33, 0.277), (67, 0.078), (69, 0.07), (72, 0.012), (77, 0.012), (84, 0.187), (87, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91094267 51 cvpr-2013-Auxiliary Cuts for General Classes of Higher Order Functionals

Author: Ismail Ben Ayed, Lena Gorelick, Yuri Boykov

Abstract: Several recent studies demonstrated that higher order (non-linear) functionals can yield outstanding performances in the contexts of segmentation, co-segmentation and tracking. In general, higher order functionals result in difficult problems that are not amenable to standard optimizers, and most of the existing works investigated particular forms of such functionals. In this study, we derive general bounds for a broad class of higher order functionals. By introducing auxiliary variables and invoking the Jensen ’s inequality as well as some convexity arguments, we prove that these bounds are auxiliary functionals for various non-linear terms, which include but are not limited to several affinity measures on the distributions or moments of segment appearance and shape, as well as soft constraints on segment volume. From these general-form bounds, we state various non-linear problems as the optimization of auxiliary functionals by graph cuts. The proposed bound optimizers are derivative-free, and consistently yield very steep functional decreases, thereby converging within a few graph cuts. We report several experiments on color and medical data, along with quantitative comparisons to stateof-the-art methods. The results demonstrate competitive performances of the proposed algorithms in regard to accuracy and convergence speed, and confirm their potential in various vision and medical applications.

2 0.8898266 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

Author: Peter Welinder, Max Welling, Pietro Perona

Abstract: How many labeled examples are needed to estimate a classifier’s performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupervised Performance Evaluation (SPE), is based on a generative model for the classifier’s confidence scores. In addition to estimating the performance of classifiers on new datasets, SPE can be used to recalibrate a classifier by reestimating the class-conditional confidence distributions.

same-paper 3 0.87165684 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

Author: Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori

4 0.86372542 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.

5 0.86088884 325 cvpr-2013-Part Discovery from Partial Correspondence

Author: Subhransu Maji, Gregory Shakhnarovich

Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.

6 0.85972035 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

7 0.85805929 414 cvpr-2013-Structure Preserving Object Tracking

8 0.8577522 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

9 0.85730922 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

10 0.85710877 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

11 0.85655916 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

12 0.85636461 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

13 0.85490239 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

14 0.85488325 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning

15 0.85460842 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

16 0.85396016 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

17 0.85378414 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

18 0.85375339 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

19 0.8534013 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

20 0.85290402 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval