cvpr cvpr2013 cvpr2013-381 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. [sent-2, score-0.683]
2 In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e. [sent-3, score-0.522]
3 “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. [sent-5, score-0.176]
4 We formulate the nature of the object function into a stochastic grammar model. [sent-6, score-0.127]
5 The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. [sent-8, score-2.036]
6 The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance. [sent-13, score-0.85]
7 However, the detection of indoor objects and segmentation of indoor scenes are still challenging tasks. [sent-16, score-0.428]
8 Other indoor objects, like the sofa and the dining table, are among Song-Chun Zhu Department of Statistics and Computer Science University of California, Los Angeles s c zhu@ st at . [sent-20, score-0.218]
9 The functional objects are defined by the affordance how likely its 3D shape is able to afford a human action. [sent-23, score-0.828]
10 1(b), despite the appearances, people can immediately recognize objects to sit on (chair), to sleep on (bed) and to store in (cabinet) based on their 3D shapes. [sent-32, score-0.187]
11 For example, a cuboid of 18 inch tall could be comfortable to sit on as a chair. [sent-33, score-0.189]
12 Moreover, the functional context is helpful to identify objects with similar shapes, such as the chair on the left and the nightstand on the right. [sent-34, score-0.883]
13 Although they are in similar shape, the nightstand is more likely to be placed beside the bed. [sent-35, score-0.172]
14 The bed and the nightstand offer a joint functional group to serve the activity of sleeping. [sent-36, score-0.973]
15 Based on the above observations, we propose an algorithm to tackle the problem of indoor scene parsing by modeling the object function, the 3D geometry and the local appearance (FGA). [sent-37, score-0.324]
16 There has been a recent surge in the detection of rectangular structures, typically modeled by planar surfaces 333 111 111977 or cuboids, in the indoor environment. [sent-38, score-0.213]
17 [19] adopted different approaches to model the geometric layout of the background and/or foreground blocks with the Structured SVM (or Latent SVM). [sent-43, score-0.19]
18 i) Function: An indoor scene is designed to serve a handful ofhuman activities inside. [sent-56, score-0.229]
19 The indoor objects (furniture) in the scenes are designed to support human poses/actions, e. [sent-57, score-0.245]
20 In the functional space, we model the probabilistic derivation of functional labels including scene categories (bedroom), functional groups (sleeping area), functional objects (bed and nightstand), and functional parts (the mattress and the headboard of a bed). [sent-60, score-3.449]
21 ii) Geometry: The 3D size (dimension) can be sufficient to evaluate how likely an object is able to afford a human action, known as the affordance [7]. [sent-61, score-0.145]
22 a rectangular cabinet, therefore the detection of these objects is tractable by inferring their geometric affordance. [sent-64, score-0.155]
23 For objects like sofas and beds, we use a more fine-grained geometric model with compositional parts, i. [sent-65, score-0.229]
24 For example, the bed with a headboard better explains the image signal as shown at the bottom of Fig. [sent-68, score-0.283]
25 In the geometric space, each 3D shape is directly linked to a functional part in the functional space. [sent-70, score-1.335]
26 The contextual relations are also involved when multiple objects are assigned to a same functional group, e. [sent-71, score-0.713]
27 A bottom-up appearance-geometry (AG) step groups noisy line segments in the A space into 3D primitive shapes, i. [sent-95, score-0.156]
28 A bottom-up geometry-function (GF) step assigns functional labels in the F space to detected 3D primitive shapes, e. [sent-98, score-0.811]
29 A top-down function-geometry (FG) step further fills in the missing objects and the missing parts in the G space according to the assigned functional labels, e. [sent-101, score-0.865]
30 a missing nightstand of a sleeping group, a missing headboard of a bed; iv). [sent-103, score-0.424]
31 A top-down geometry-appearance (GA) step synthesizes 2D segmentation maps in the A space, and makes an accept/reject decision of a current proposal by the Metropolis-Hastings acceptance probability. [sent-104, score-0.127]
32 A collection of indoor functional objects from the Google 3D Warehouse Figure 4. [sent-106, score-0.866]
33 The distribution of the 3D sizes of the functional objects (in unit of inch). [sent-107, score-0.683]
34 A stochastic scene grammar in FGA space We present a stochastic scene grammar model [26] to compute a parse tree pt for an input image I the FGA on hierarchy. [sent-109, score-0.588]
35 2 : • The functional space F contains the scene categories FThse, tfuhen cftuinocnatilon spaal groups Fg, tsh teh efu snccetnioen caal objects Fo, and the functional parts Fp. [sent-111, score-1.425]
36 All the variables in functional space take discrete labels; • The geometric space G contains the 3D geometric primitives eGtrpic. [sent-112, score-0.898]
37 The parse tree is an instance of the hierarchy pt ∈ {F, G, A} as illustrated in Fig. [sent-116, score-0.294]
38 (1) We specify a hierarchy of an indoor scene over the functional space F, the geometric space G and the appearance space A sp. [sent-118, score-0.995]
39 The function model P(F) The function model characterizes the prior distribution of the functional labels. [sent-121, score-0.671]
40 Given a bfurnacnctihoinnagl parse containing →th βe production r)u. [sent-128, score-0.189]
41 In this paper, we manually designed the grammar structure and learned the parameters of the production rules based on the labels of thousands of images in the SUN dataset [23] under the “bedroom” and the “living room” categories. [sent-134, score-0.232]
42 The geometric model P(G|F) In the geometric space, we model the distribution of 3D size (dimension) for each geometric primitive Gp given its functional labels F, e. [sent-137, score-1.06]
43 the distance distribution between a bed and a nightstand. [sent-144, score-0.213]
44 Suppose we have k primitives in the scene Gp = {vi : i = 1, · · · , k}, these geometric shapes form a graph Gv =: i(V =, E 1,) i·n· t ,hke} G, t space, ewomheeretr cea schha primitive is a graph node vi ∈ V , and each contextual relation is a graph edge e ∈ E. [sent-145, score-0.46]
45 The probability measures how likely an object is able to afford the action given its geometry. [sent-160, score-0.128]
46 5ft tall is comfortable to sit on despite its appearance, and a ”table” of 6ft tall loses its original function – to place objects on while sitting in front of. [sent-163, score-0.254]
47 We model the 3D sizes of the functional parts by a mixture of Gaussians. [sent-164, score-0.655]
48 The model characterizes the Gaussian nature of the object sizes and allows the alternatives of canonical sizes at the same time, such as king size bed, full size bed etc. [sent-165, score-0.263]
49 In order to learn a better affordance model, we collected a dataset of functional indoor furniture, as shown in Fig. [sent-169, score-0.903]
50 The functional objects in the dataset are modeled with the real-world measurement, therefore we can generalize our model to the real images by learning from this dataset. [sent-171, score-0.683]
51 As we can see, these functional categories are quite distinguishable solely based on their sizes as shown in Fig. [sent-173, score-0.621]
52 3D compositional models of functional object and functional groups ϕ2 (ei |Fo), ϕ3 (ei |Fg) are defined by the distributions of the 3D | rFeloa)ti,v ϕe relat|iFogns) among tfhinee parts of an objects Fo or the objects of an functional group Fg. [sent-178, score-2.1]
53 This term enables the topdown prediction of the missing parts or missing objects as we will discuss in Sect. [sent-182, score-0.208]
54 General physical constraints ϕ4 (ei) avoid invalid geometric configurations that violate the physical laws: Two objects can not penetrate each other; the objects must be contained in the room. [sent-184, score-0.217]
55 The model penalizes the penetrating area between foreground objects Λf and the exceeding area beyond the background room borders Λb as 1/z exp{−λ(Λf + Λb)}, where we take λ as a large number, and Λf = Λ(vi) ? [sent-185, score-0.17]
56 In the functional space and the geometric space, we specify how the underlying causes generate a scene image. [sent-192, score-0.76]
57 Put objects back to the 3D world Another important component of our model is the recovery of a real world 3D geometry from the parse tree (Fig. [sent-205, score-0.281]
58 If all the three vanishing points are visible and finite in the same image, the optical center can be estimated as the orthocenter of the triangle formed by the three vanishing points. [sent-212, score-0.138]
59 The compositional structure of the continuous geometric parameters introduces a large solution space, which is infeasible to enumerate all the possible explanations. [sent-229, score-0.131]
60 In each iteration, the algorithm proposes a new parse →tre Ae . [sent-232, score-0.154]
61 p Itn∗ ebaacshed it on ttiohen ,cu threre anlgt one pt according to the proposal probability. [sent-233, score-0.161]
62 A bottom-up appearance-geometry (AG) step detects possible geometric primitives Gp as bottom-up proposals, i. [sent-235, score-0.184]
63 The proposal probability for a new geometric primitive g∗ is defined as Q1(g∗|IΛ) =? [sent-240, score-0.328]
64 i Ist, i sa dwo Prth(F noting thhyapt trhprisi proposal probability Q1 is independent of the current parse tree pt. [sent-255, score-0.287]
65 Therefore we can precompute the proposal probability for each possible geometric proposal, which dramatically reduces the computational cost of the chain search. [sent-256, score-0.259]
66 A bottom-up geometry-function (GF) step assigns functional labels given the 3D shapes detected in the G space. [sent-258, score-0.739]
67 The proposal probability of switching an functional label f∗ on the functional tree is defined as Q2(f∗|pa,cl) =? [sent-259, score-1.406]
68 n of f∗, and pa is the parent of f∗ on the current parse tree pt. [sent-261, score-0.2]
69 In this way, the probability describes the compatibility of the functional label f∗ with its parent pa and its children cl on the tree. [sent-262, score-0.761]
70 With the geometry primitives fixed on the bottom, this proposal makes the chain jumping in the functional space to find a better functional explanation for these primitives. [sent-263, score-1.517]
71 metry (FG) step fills in the missing object in a functional group or the missing part in a functional object. [sent-267, score-1.39]
72 For example, once a bed is detected, the algorithm will try to propose nightstands beside it by drawing samples from the geometric prior and the contextual relations. [sent-268, score-0.415]
73 The proposal probability Q3 (g∗ |F) of a new geometric primitive g∗ is defined by Eq. [sent-273, score-0.328]
74 Here, we can see that Q1(I → G) proposes g∗ by the bottom-up image detection, a(nId →Q3 G(F) p→ro pGo)s proposes g∗ by the top-down functional prediction. [sent-275, score-0.683]
75 On the other hand, the algorithm also proposes to delete a geometric primitive with uniform probability. [sent-277, score-0.239]
76 Both the add or delete operation will trigger the step II of reassigning a functional label. [sent-278, score-0.621]
77 Experiments Our algorithm has been evaluated on the UIUC indoor dataset [12], the UCB dataset [4], and the SUN dataset [23]. [sent-289, score-0.183]
78 The UCB dataset contains 340 images and covers four cubic objects (bed, cabinet, table and sofa) and three planar objects (picture, window and door). [sent-290, score-0.19]
79 The UIUC indoor dataset contains 3 14 cluttered indoor images and the ground-truth is two label maps of the background layout with/without foreground objects. [sent-292, score-0.494]
80 We picked two categories in the SUN dataset: the bedroom with 2119 images and the living room with 2385 images. [sent-293, score-0.196]
81 This dataset contains thousands of object labels and was used to train our functional model as discussed in Sect. [sent-294, score-0.666]
82 Quantitative evaluation: We first compared the confusion matrix of functional object classification rates among the successfully detected objects on the UCB dataset as shown in Fig. [sent-297, score-0.713]
83 This is mainly attributed to our fine-grained part model and functional groups model. [sent-301, score-0.662]
84 It is worth noting that our method reduced the confusion between the bed and the sofa. [sent-302, score-0.213]
85 Because we also introduced the hidden variables of scene categories, which help to distinguish between the bedroom and living room according to the objects inside. [sent-303, score-0.304]
86 The confusion matrix of functional object classification on the UCB dataset. [sent-410, score-0.621]
87 The precision (and recall) of functional object detection on the UCB dataset. [sent-412, score-0.621]
88 The results show the pixel-level segmentation accuracy of proposed algorithms not only significantly widens the scope of indoor scene parsing algorithm from the segmentation and 3D recovery to the functional object recognition, but also yields improved overall performance. [sent-424, score-0.939]
89 The green cuboids are cubic objects proposed by the bottom-up AG step, and the cyan cuboids are the cubic objects proposed by the top-down FG step. [sent-427, score-0.39]
90 The functional labels are given to the right of each image. [sent-429, score-0.666]
91 Our method has detected most of the indoor objects, and recovered their functional labels very well. [sent-430, score-0.879]
92 In the middle bottom image, the algorithm failed to accurately locate the mattress for this bed with a curtain. [sent-434, score-0.259]
93 As a result, the algorithm detected a much larger bed instead. [sent-438, score-0.243]
94 Our approach parses an indoor image by inferring the object function and the 3D geometry. [sent-441, score-0.183]
95 The functionality defines an indoor object by evaluating its “affordance”. [sent-442, score-0.216]
96 As a result, a parsed scene with functional labels defines a human action space, and it also helps to predict people’s behavior by making use of the function cues. [sent-448, score-0.76]
97 On the other hand, given observed action sequence, it is very obvious to accurately recognize the functional objects associated with the rational actions. [sent-449, score-0.731]
98 333 111222335 prediction), planar objects (blue rectangles), background layout (red box). [sent-495, score-0.15]
99 The parse tree is shown to the right of each image. [sent-496, score-0.167]
100 Discriminative learning with latent variables for cluttered indoor scene understanding. [sent-605, score-0.26]
wordName wordTfidf (topN-words)
[('functional', 0.621), ('fga', 0.232), ('bed', 0.213), ('indoor', 0.183), ('nightstand', 0.139), ('fg', 0.136), ('ucb', 0.133), ('parse', 0.123), ('pero', 0.12), ('primitive', 0.115), ('sleeping', 0.103), ('affordance', 0.099), ('cuboids', 0.097), ('geometric', 0.093), ('gp', 0.092), ('primitives', 0.091), ('grammar', 0.091), ('del', 0.089), ('proposal', 0.086), ('bedroom', 0.077), ('pt', 0.075), ('cl', 0.073), ('ei', 0.073), ('cabinet', 0.072), ('headboard', 0.07), ('vanishing', 0.069), ('room', 0.069), ('production', 0.066), ('uiuc', 0.064), ('sit', 0.063), ('objects', 0.062), ('sleep', 0.062), ('chair', 0.061), ('fp', 0.06), ('furniture', 0.06), ('fo', 0.058), ('layout', 0.058), ('hedau', 0.057), ('missing', 0.056), ('geometry', 0.052), ('hierarchy', 0.052), ('rectangles', 0.051), ('characterizes', 0.05), ('living', 0.05), ('action', 0.048), ('mattress', 0.046), ('nightstands', 0.046), ('pptt', 0.046), ('widens', 0.046), ('afford', 0.046), ('chain', 0.046), ('scene', 0.046), ('labels', 0.045), ('tree', 0.044), ('tall', 0.044), ('shapes', 0.043), ('parsing', 0.043), ('satkin', 0.042), ('vi', 0.042), ('comfortable', 0.041), ('burges', 0.041), ('contacting', 0.041), ('inch', 0.041), ('pcfg', 0.041), ('groups', 0.041), ('acceptance', 0.041), ('ag', 0.041), ('fs', 0.04), ('foreground', 0.039), ('beds', 0.038), ('mcmc', 0.038), ('compositional', 0.038), ('ga', 0.037), ('cubic', 0.036), ('stochastic', 0.036), ('sofas', 0.036), ('yeh', 0.036), ('fills', 0.036), ('sofa', 0.035), ('embodied', 0.034), ('probability', 0.034), ('parts', 0.034), ('pa', 0.033), ('hejrati', 0.033), ('beside', 0.033), ('functionality', 0.033), ('pepik', 0.032), ('camera', 0.032), ('coffee', 0.031), ('cluttered', 0.031), ('proposes', 0.031), ('detected', 0.03), ('contextual', 0.03), ('planar', 0.03), ('rules', 0.03), ('posteriori', 0.03), ('feet', 0.03), ('pereira', 0.03), ('annealing', 0.03), ('bottou', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
2 0.2317937 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese
Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.
3 0.21396735 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
4 0.16858655 51 cvpr-2013-Auxiliary Cuts for General Classes of Higher Order Functionals
Author: Ismail Ben Ayed, Lena Gorelick, Yuri Boykov
Abstract: Several recent studies demonstrated that higher order (non-linear) functionals can yield outstanding performances in the contexts of segmentation, co-segmentation and tracking. In general, higher order functionals result in difficult problems that are not amenable to standard optimizers, and most of the existing works investigated particular forms of such functionals. In this study, we derive general bounds for a broad class of higher order functionals. By introducing auxiliary variables and invoking the Jensen ’s inequality as well as some convexity arguments, we prove that these bounds are auxiliary functionals for various non-linear terms, which include but are not limited to several affinity measures on the distributions or moments of segment appearance and shape, as well as soft constraints on segment volume. From these general-form bounds, we state various non-linear problems as the optimization of auxiliary functionals by graph cuts. The proposed bound optimizers are derivative-free, and consistently yield very steep functional decreases, thereby converging within a few graph cuts. We report several experiments on color and medical data, along with quantitative comparisons to stateof-the-art methods. The results demonstrate competitive performances of the proposed algorithms in regard to accuracy and convergence speed, and confirm their potential in various vision and medical applications.
5 0.16434753 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu
Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.
6 0.1458046 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
7 0.11971512 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
8 0.11789064 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
9 0.11595773 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling
10 0.10854905 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
11 0.10324638 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
12 0.097786941 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
13 0.0973536 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
14 0.095120706 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
15 0.085655637 187 cvpr-2013-Geometric Context from Videos
16 0.085381292 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
17 0.077972531 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
18 0.075716823 84 cvpr-2013-Cloud Motion as a Calibration Cue
19 0.071486801 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
20 0.069490016 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
topicId topicWeight
[(0, 0.183), (1, 0.027), (2, 0.027), (3, -0.052), (4, 0.034), (5, -0.009), (6, 0.01), (7, 0.102), (8, -0.031), (9, -0.005), (10, 0.012), (11, 0.018), (12, -0.026), (13, -0.024), (14, -0.04), (15, -0.027), (16, 0.068), (17, 0.137), (18, -0.029), (19, 0.033), (20, -0.021), (21, 0.085), (22, 0.093), (23, -0.031), (24, 0.082), (25, -0.022), (26, 0.1), (27, -0.106), (28, -0.072), (29, 0.075), (30, -0.127), (31, 0.103), (32, 0.044), (33, 0.059), (34, -0.129), (35, -0.093), (36, -0.048), (37, 0.096), (38, 0.025), (39, -0.162), (40, 0.018), (41, -0.002), (42, 0.075), (43, -0.139), (44, 0.043), (45, 0.086), (46, 0.082), (47, 0.046), (48, 0.089), (49, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.92097616 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
2 0.73216146 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling
Author: Andelo Martinovic, Luc Van_Gool
Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.
3 0.7216059 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese
Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.
4 0.69062364 228 cvpr-2013-Is There a Procedural Logic to Architecture?
Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool
Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.
5 0.64717525 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
6 0.63864392 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
7 0.60207129 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
8 0.59716129 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
9 0.57003963 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
10 0.55530655 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
11 0.49843854 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
12 0.47485369 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
13 0.44742742 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
14 0.43030712 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
15 0.42871404 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
16 0.40536752 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
17 0.39857432 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
18 0.38789839 440 cvpr-2013-Tracking People and Their Objects
19 0.38627651 406 cvpr-2013-Spatial Inference Machines
20 0.37763697 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
topicId topicWeight
[(10, 0.107), (16, 0.028), (26, 0.04), (33, 0.183), (39, 0.301), (55, 0.024), (65, 0.01), (67, 0.066), (69, 0.075), (87, 0.066)]
simIndex simValue paperId paperTitle
same-paper 1 0.79559845 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
2 0.75661713 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu
Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.
3 0.75316703 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation
Author: Samuele Salti, Alessandro Lanza, Luigi Di_Stefano
Abstract: The paper conjectures and demonstrates that repeatable keypoints based on salient symmetries at different scales can be detected by a novel analysis grounded on the wave equation rather than the heat equation underlying traditional Gaussian scale–space theory. While the image structures found by most state-of-the-art detectors, such as blobs and corners, occur typically on planar highly textured surfaces, salient symmetries are widespread in diverse kinds of images, including those related to untextured objects, which are hardly dealt with by current feature-based recognition pipelines. We provide experimental results on standard datasets and also contribute with a new dataset focused on untextured objects. Based on the positive experimental results, we hope to foster further research on the promising topic ofscale invariant analysis through the wave equation.
4 0.69470727 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
5 0.66575384 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models
Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
Abstract: We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. The experiments, conducted on Multi-PIE, XM2VTS and LFPW database, show that the proposed DRMF method outperforms stateof-the-art algorithms for the task of generic face fitting. Moreover, the DRMF method is computationally very efficient and is real-time capable. The current MATLAB implementation takes 1second per image. To facilitate future comparisons, we release the MATLAB code1 and the pretrained models for research purposes.
6 0.65309179 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
7 0.65016764 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera
8 0.63810098 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
9 0.63344574 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
10 0.63241321 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
11 0.62720996 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
12 0.61593455 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
13 0.61290359 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
14 0.60701573 414 cvpr-2013-Structure Preserving Object Tracking
15 0.60691452 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
16 0.60540068 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
17 0.6045171 172 cvpr-2013-Finding Group Interactions in Social Clutter
18 0.60438323 325 cvpr-2013-Part Discovery from Partial Correspondence
19 0.6042037 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
20 0.60358208 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis