cvpr cvpr2013 cvpr2013-136 knowledge-graph by maker-knowledge-mining

136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection


Source: pdf

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. [sent-3, score-0.318]

2 To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. [sent-4, score-0.913]

3 We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. [sent-5, score-0.52]

4 The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i. [sent-6, score-0.064]

5 , branches of an object Or-node) with the latent structures in AOG being integrated out. [sent-8, score-0.208]

6 , seeking the globally optimal parse trees in AOG for each sub-category). [sent-11, score-0.29]

7 To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. [sent-12, score-0.388]

8 (iii) Joint appearance and structural parameters training under latent structural SVM framework. [sent-13, score-0.344]

9 Motivations, objectives and overview In recent literature of object detection, compositional hierarchy and deformable templates are widely used and have shown improved performance. [sent-18, score-0.215]

10 Most state-of-the-art methods focus on weakly-supervised latent structure learning such as the deformable part-based model (DPM) [10] and the stochastic And-Or templates [18]. [sent-19, score-0.336]

11 By weaklysupervised learning or learning from weakly annotated data, it means that only the bounding boxes for whole objects and no parts are available in training, e. [sent-20, score-0.252]

12 edu , × latent structure learning (top) and corresponding examples of learned car models from PASCAL VOC2007 dataset [7] (bottom). [sent-25, score-0.241]

13 (a) shows the greedy pursuit method used by the deformable partbased model [10, 12], (b) the 3-layer quad-tree like decomposition method used in [27], and (c) the proposed method in this paper. [sent-26, score-0.217]

14 incorporating deformable parts is the major factor improving accuracy performance, however, how to find good part configurations (i. [sent-32, score-0.265]

15 , part shapes, sizes and locations which are latent variables given weakly annotated data) has not been addressed well in the literature. [sent-34, score-0.429]

16 In existing work, there are two types of methods specifying latent part configurations: (i) The greedy pursuit method used by the DPM [10] where, as illustrated in Fig. [sent-35, score-0.422]

17 1 (a), only a single part type (square with predefined size, e. [sent-36, score-0.233]

18 , 6 6) is adopted and t(sheq part configuration cdon ssiziset,s eo. [sent-38, score-0.251]

19 gf a 6fix ×ed 6 n)u imsb adero (poteftden a n8d) of part instances placed by heuristic search [9, 12]. [sent-39, score-0.198]

20 (ii) The quad-tree like decomposition adopted in [27] where the part configuration is predefined and the part types are fixed accordingly (see Fig. [sent-40, score-0.494]

21 Beside part configurations, another issue is how to learn sub-categories in an unsupervised manner to account for intra-class variations. [sent-42, score-0.147]

22 Most existing work adopt k-mean clustering method based on aspect ratios of labeled object bounding boxes with k predefined (often k = 3) [10, 12,27]. [sent-43, score-0.199]

23 In this paper, we address the learning of sub-categories and part configurations from weakly annotated data in a 333222777866 Figure 2. [sent-44, score-0.345]

24 principled way by learning reconfigurable And-Or Tree (AOT) models discriminatively. [sent-47, score-0.125]

25 We unfold the space of all latent part configurations using a directed acyclic AOG (the solid gray triangle), and then seek the globally optimal parse tree in AOG using a DP algorithm. [sent-56, score-1.07]

26 The globally optimal part configuration for a subcategory is obtained by collapsing the corresponding globally optimal parse trees onto image lattice (see the rightmost one). [sent-57, score-0.842]

27 So, in the final model, each subcategory is represented by a root terminal node (red rectangle), and a collection of part terminal nodes (collapsed from the parse tree). [sent-58, score-0.648]

28 1 (c) shows the learned part configuration for a sub-category of cars (side-view). [sent-60, score-0.199]

29 We first quantize the image lattice using an overcomplete set of shape primitives (e. [sent-64, score-0.254]

30 , all rectangles with different sizes and aspect ratios enumerated in a given image lattice, see Fig. [sent-66, score-0.292]

31 3 (a) and (b)), and then organize them into a directed acyclic AOG by exploiting their compositional relations (see Fig. [sent-67, score-0.461]

32 The constructed AOG can generate all possible part configurations. [sent-70, score-0.106]

33 For example, without considering overlaps, the number of part configurations is listed in Table. [sent-71, score-0.236]

34 The number will further increase geometrically if overlaps are considered. [sent-73, score-0.105]

35 Then, the learning of an AOT model consists of three components as follows. [sent-74, score-0.032]

36 We propose a method of measuring the similarity between any two positive examples by integrating out all the latent structures in AOG. [sent-77, score-0.208]

37 We first search the globally optimal parse tree by a DP algorithm, and then obtain the part configuration by collapsing the terminal nodes in the parse tree onto image lattice. [sent-80, score-1.084]

38 The proposed DP algorithm consists of two phases: (1) The bottom-up phase factorizes the scoring function based on the depth-first search (DFS) of AOG. [sent-81, score-0.067]

39 Appearance templates are discriminatively trained for terminalnodes and their error rates on validation dataset are calculated. [sent-82, score-0.107]

40 Then, each encountered Or-node selects the child node with the minimal error rate, and encountered And- nodes are treated as local deformable part-based models to calculate their error rates. [sent-83, score-0.404]

41 (2) In the top-down phase, we retrieve the globally optimal parse tree using the error rates of nodes in AOG to guide the breadth-first search (BFS). [sent-84, score-0.472]

42 Given the discovered sub-categories and their corresponding part configurations (i. [sent-86, score-0.198]

43 , an AOT), we train the parameters jointly using latent structural SVM [10, 24]. [sent-88, score-0.225]

44 Many work extended the DPM [10] by incorporating other types of features complementary to HOG, such as local binary pattern (LBP) features [21,25], irregular-shaped image patches [15] and color attributes [14], which often increase the model complexity significantly. [sent-94, score-0.037]

45 Different types of contextual information are explored, such as the multi-category object layout context [6, 10, 15] and the image classification context [5, 2 1]. [sent-98, score-0.037]

46 Since introducing deformable parts is the major factor improving performance, there are some work extending DPM by providing strong supervision for parts, instead of treating them as latent variables. [sent-101, score-0.268]

47 2D semantic part annotations for animals are used in [2], keypoint annotations are used by the poselets [3], and 3D CAD car models are adopted by [16]. [sent-102, score-0.202]

48 Beside the two methods listed above, the geometric And-Or quantization method was proposed for scene modeling recently in [23, 26], where non-overlapped shape primitives and generative learning are used. [sent-105, score-0.139]

49 In this paper, we follow the similar idea of And-Or quantization of image grid, but incorporate overlapping to account for the appearance “Or” implicitly, and adopt discriminative learning method. [sent-107, score-0.091]

50 This paper makes four main con- × × tributions to the weakly-supervised latent structure learning for object detection. [sent-109, score-0.235]

51 (i) It presents a directed acyclic AOG for exploring the space of latent structures effectively. [sent-110, score-0.57]

52 (ii) It presents an unsupervised method of learning subcategories of an object class. [sent-111, score-0.116]

53 (iii) It presents a DP algorithm to learn the reconfigurable part configuration efficiently. [sent-112, score-0.335]

54 Unfolding the space of latent structures In this section, we present how to construct the directed acyclic AOG to explore the space of latent structures. [sent-115, score-0.692]

55 Let Λ be the image grid with W H cells, and assume rectangular shapes are rusided w fitohr parts. [sent-116, score-0.048]

56 T coe decompose mΛe, we need to specify (i) what the part types are (i. [sent-117, score-0.214]

57 sizes and aspect ratios), (ii) where to place them, and (iii) how many instances each part type should have. [sent-119, score-0.327]

58 Without posing some structural constrains, it is a combinatorial problem. [sent-120, score-0.095]

59 As stated above, this is simplified by either adopting the greedy pursuit method with a single part type or using some predefined and fixed structure in existing work. [sent-121, score-0.347]

60 A part type t is defined by its width and height (wt, ht). [sent-124, score-0.169]

61 Starting from some minimal size (such as 2 2 cells), we enumerate all possible part types which f2it × ×the 2 grid )Λ,, wi. [sent-125, score-0.191]

62 ,e n2u m≤e wt ≤ al lW po sasnidb 2e p≤a th tty p≤e sH w h(siceeh Afit, Bthe, C gr, iDd Λin, Fig. [sent-127, score-0.137]

63 3A (na )i wnsthaenrece A Ao ifs a part type t, sd). [sent-130, score-0.169]

64 enoted by ti, is obtained by placing t at a position (xti , yti ) ∈ Λ. [sent-131, score-0.254]

65 So, it is defined by a bounding box in Λ, (xti , yti , wt, ∈ht) Λ. [sent-132, score-0.181]

66 T Shoe, set of all valid instances of a part type t is then defined by {(xti , yti , wt , ht) | (xti , yti ) ∈ Λ, (xti wt, yti ht) ∈ Λ}. [sent-133, score-0.9]

67 e,l rectangles with different sizes and aspect ratios enumerated in a × × given image grid), and (b) part instances generated by placing a part type in image grid. [sent-139, score-0.694]

68 Given the part instances, (c) shows how a sub-grid can be decomposed in different ways. [sent-140, score-0.106]

69 We allow overlap between child nodes (see (3) in (c)). [sent-141, score-0.19]

70 3 (b) shows the example of placing part type D (2 5 cFeigll. [sent-144, score-0.242]

71 , A(2O, 3, 5, g2a) niinz tahtieo right-top yofs Fig. [sent-149, score-0.074]

72 3 d(c()x),, we can e ⊆ither terminate it directly to the corresponding part instance (see Fig. [sent-150, score-0.106]

73 1)), or decompose it into two smaller subgrids using either horizontal or vertical cut. [sent-152, score-0.036]

74 Depending on the side length of (w, h), we may have multiple valid cuts along both directions (see Fig. [sent-153, score-0.039]

75 When cutting either side we allow overlaps between the two sub-grids up to some ratio (see Fig. [sent-156, score-0.105]

76 Then, we represent the subgrid as an Or-node, which has a set of child nodes including a terminal-node (i. [sent-159, score-0.19]

77 the part instance directly terminated from it), and a number of And-nodes (each of which represents a valid decomposition). [sent-161, score-0.18]

78 By starting from the whole grid Λ and using BFS, we construct the AOG. [sent-163, score-0.048]

79 Denote by G =< V, E > an AOG where the node set AVO =G V. [sent-164, score-0.055]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('aog', 0.582), ('aot', 0.277), ('acyclic', 0.192), ('yti', 0.181), ('parse', 0.165), ('latent', 0.165), ('xti', 0.155), ('directed', 0.127), ('dp', 0.109), ('part', 0.106), ('child', 0.106), ('overlaps', 0.105), ('tree', 0.098), ('lattice', 0.098), ('wt', 0.095), ('bfs', 0.094), ('tianfu', 0.094), ('configuration', 0.093), ('reconfigurable', 0.093), ('configurations', 0.092), ('globally', 0.092), ('terminal', 0.086), ('nodes', 0.084), ('ht', 0.081), ('dpm', 0.079), ('compositional', 0.076), ('ratios', 0.074), ('placing', 0.073), ('enumerated', 0.073), ('weaklysupervised', 0.073), ('pursuit', 0.072), ('templates', 0.072), ('weakly', 0.071), ('primitives', 0.069), ('iii', 0.069), ('deformable', 0.067), ('beside', 0.066), ('subcategory', 0.066), ('organize', 0.066), ('pascal', 0.065), ('predefined', 0.064), ('collapsing', 0.064), ('type', 0.063), ('aspect', 0.061), ('structural', 0.06), ('appearance', 0.059), ('dpms', 0.056), ('node', 0.055), ('instances', 0.054), ('ii', 0.052), ('adopted', 0.052), ('overlapped', 0.049), ('grid', 0.048), ('encountered', 0.046), ('overcomplete', 0.045), ('annotated', 0.044), ('car', 0.044), ('presents', 0.043), ('sizes', 0.043), ('structures', 0.043), ('greedy', 0.042), ('quantize', 0.042), ('tty', 0.042), ('ither', 0.042), ('yofs', 0.042), ('parent', 0.041), ('rectangles', 0.041), ('unsupervised', 0.041), ('implicitly', 0.04), ('valid', 0.039), ('innc', 0.038), ('imsb', 0.038), ('unfolding', 0.038), ('enriching', 0.038), ('plementary', 0.038), ('tributions', 0.038), ('listed', 0.038), ('types', 0.037), ('decomposition', 0.036), ('bthe', 0.036), ('inb', 0.036), ('shoe', 0.036), ('decompose', 0.036), ('introducing', 0.036), ('phase', 0.035), ('discriminatively', 0.035), ('text', 0.035), ('collapsed', 0.035), ('terminated', 0.035), ('wis', 0.035), ('posing', 0.035), ('coe', 0.035), ('vga', 0.033), ('iv', 0.033), ('optimal', 0.033), ('cells', 0.032), ('learning', 0.032), ('factorizes', 0.032), ('tahtieo', 0.032), ('figu', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

2 0.26227254 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu

Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

3 0.14867966 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

4 0.13611192 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.

5 0.13235041 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu

Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.

6 0.12698241 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification

7 0.12687792 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

8 0.11286139 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning

9 0.10112836 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

10 0.094894528 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

11 0.090439335 417 cvpr-2013-Subcategory-Aware Object Classification

12 0.090395704 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

13 0.090298891 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

14 0.090104073 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

15 0.088527024 325 cvpr-2013-Part Discovery from Partial Correspondence

16 0.0855866 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

17 0.085540585 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

18 0.083929919 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

19 0.077972531 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

20 0.076609202 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.163), (1, -0.053), (2, 0.009), (3, -0.05), (4, 0.081), (5, 0.031), (6, 0.074), (7, 0.097), (8, -0.019), (9, -0.025), (10, -0.09), (11, 0.041), (12, -0.061), (13, -0.06), (14, -0.015), (15, 0.008), (16, 0.022), (17, 0.023), (18, -0.002), (19, -0.04), (20, -0.031), (21, 0.058), (22, 0.096), (23, -0.044), (24, 0.098), (25, 0.044), (26, 0.079), (27, -0.051), (28, -0.027), (29, 0.138), (30, -0.073), (31, 0.068), (32, 0.111), (33, -0.027), (34, -0.01), (35, -0.052), (36, -0.005), (37, -0.032), (38, 0.029), (39, -0.101), (40, 0.056), (41, 0.063), (42, -0.103), (43, -0.072), (44, -0.028), (45, -0.044), (46, -0.043), (47, -0.031), (48, 0.06), (49, 0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93119788 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

2 0.77975333 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu

Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

3 0.72777921 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling

Author: Andelo Martinovic, Luc Van_Gool

Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.

4 0.67098176 228 cvpr-2013-Is There a Procedural Logic to Architecture?

Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool

Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.

5 0.64302176 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

Author: Xiaolong Wang, Liang Lin, Lichao Huang, Shuicheng Yan

Abstract: This paper proposes a reconfigurable model to recognize and detect multiclass (or multiview) objects with large variation in appearance. Compared with well acknowledged hierarchical models, we study two advanced capabilities in hierarchy for object modeling: (i) “switch” variables(i.e. or-nodes) for specifying alternative compositions, and (ii) making local classifiers (i.e. leaf-nodes) shared among different classes. These capabilities enable us to account well for structural variabilities while preserving the model compact. Our model, in the form of an And-Or Graph, comprises four layers: a batch of leaf-nodes with collaborative edges in bottom for localizing object parts; the or-nodes over bottom to activate their children leaf-nodes; the andnodes to classify objects as a whole; one root-node on the top for switching multiclass classification, which is also an or-node. For model training, we present an EM-type algorithm, namely dynamical structural optimization (DSO), to iteratively determine the structural configuration, (e.g., leaf-node generation associated with their parent or-nodes and shared across other classes), along with optimizing multi-layer parameters. The proposed method is valid on challenging databases, e.g., PASCAL VOC2007and UIUCPeople, and it achieves state-of-the-arts performance.

6 0.62822533 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

7 0.61721164 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

8 0.61098546 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

9 0.58681411 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

10 0.58662516 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

11 0.58401424 325 cvpr-2013-Part Discovery from Partial Correspondence

12 0.57612425 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification

13 0.55364597 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

14 0.53345686 190 cvpr-2013-Graph-Based Optimization with Tubularity Markov Tree for 3D Vessel Segmentation

15 0.52765578 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

16 0.49336144 417 cvpr-2013-Subcategory-Aware Object Classification

17 0.49078742 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

18 0.48361924 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

19 0.48294047 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization

20 0.48157877 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.137), (16, 0.012), (26, 0.068), (33, 0.193), (39, 0.302), (67, 0.089), (69, 0.031), (87, 0.09)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80668384 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

Author: Yibiao Zhao, Song-Chun Zhu

Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.

same-paper 2 0.79189849 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

3 0.77070713 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation

Author: Samuele Salti, Alessandro Lanza, Luigi Di_Stefano

Abstract: The paper conjectures and demonstrates that repeatable keypoints based on salient symmetries at different scales can be detected by a novel analysis grounded on the wave equation rather than the heat equation underlying traditional Gaussian scale–space theory. While the image structures found by most state-of-the-art detectors, such as blobs and corners, occur typically on planar highly textured surfaces, salient symmetries are widespread in diverse kinds of images, including those related to untextured objects, which are hardly dealt with by current feature-based recognition pipelines. We provide experimental results on standard datasets and also contribute with a new dataset focused on untextured objects. Based on the positive experimental results, we hope to foster further research on the promising topic ofscale invariant analysis through the wave equation.

4 0.71824354 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

Author: Byung-soo Kim, Shili Xu, Silvio Savarese

Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.

5 0.68381923 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

Abstract: We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. The experiments, conducted on Multi-PIE, XM2VTS and LFPW database, show that the proposed DRMF method outperforms stateof-the-art algorithms for the task of generic face fitting. Moreover, the DRMF method is computationally very efficient and is real-time capable. The current MATLAB implementation takes 1second per image. To facilitate future comparisons, we release the MATLAB code1 and the pretrained models for research purposes.

6 0.67754251 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

7 0.67462087 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

8 0.67266202 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

9 0.65709156 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

10 0.652915 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

11 0.64724308 414 cvpr-2013-Structure Preserving Object Tracking

12 0.64562321 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

13 0.64268339 311 cvpr-2013-Occlusion Patterns for Object Class Detection

14 0.64151508 314 cvpr-2013-Online Object Tracking: A Benchmark

15 0.6407389 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems

16 0.64033359 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

17 0.63963389 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

18 0.63894629 325 cvpr-2013-Part Discovery from Partial Correspondence

19 0.63846678 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

20 0.63652194 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation