cvpr cvpr2013 cvpr2013-57 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andelo Martinovic, Luc Van_Gool
Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.
Reference: text
sentIndex sentText sentNum sentScore
1 Bayesian Grammar Learning for Inverse Procedural Modeling An d¯elo Martinovi c´ and Luc Van Gool Abstract Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. [sent-1, score-0.463]
2 Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. [sent-2, score-0.389]
3 We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. [sent-3, score-0.399]
4 Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. [sent-5, score-0.734]
5 In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. [sent-6, score-0.962]
6 Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar. [sent-7, score-0.607]
7 In urban procedural modeling, the knowledge of the building style and layout is most commonly encoded as a shape grammar [17]. [sent-17, score-1.043]
8 Some approaches have used shape grammars as higherorder knowledge models for reconstruction of buildings. [sent-21, score-0.386]
9 Inverse procedural modeling (IPM) is an umbrella term for approaches that attempt to discover the parametrized rules and the parameter values of the procedural model. [sent-22, score-0.545]
10 [24] used a simple grammar for buildings that follow the Manhattan world assumption. [sent-26, score-0.729]
11 A grammar was fitted from laser-scan data in [23]. [sent-27, score-0.678]
12 [10] reconstructed Greek Doric temples using template procedural models. [sent-29, score-0.232]
13 An approach using reversible jump Markov Chain Monte Carlo (rjMCMC) for fitting split grammars to data was described in [16]. [sent-30, score-0.502]
14 [21] presented an efficient parsing scheme for Haussmannian shape grammars using Reinforcement Learning. [sent-32, score-0.542]
15 They assume that a manually designed grammar is available from the outset. [sent-34, score-0.678]
16 This is a serious con222000111 straint, as it limits the reconstruction techniques to a handful of building styles for which pre-written grammars exist. [sent-35, score-0.389]
17 Creating style-specific grammars is a tedious and timeconsuming process, which is usually performed only by a few experts in the field. [sent-36, score-0.36]
18 So, a natural question arises: can we learn procedural grammars from data? [sent-37, score-0.592]
19 [2] learned deterministic shape grammar rules from triangle meshes and point clouds. [sent-43, score-0.818]
20 Attribute graph grammars [5] were presented as a method of top-down/bottomup image parsing, though restricting the detected objects in scenes to rectangles. [sent-44, score-0.36]
21 In the field of formal grammar learning, a famous conclusion of Gold [3] states that no superfinite family of deterministic languages (including regular and context-free languages) can be identified in the limit. [sent-45, score-0.781]
22 However, Horning [6] showed that the picture is not so grim for statistical grammar learning, and demonstrated that stochastic context-free grammars (SCFGs) can be learned from positive examples. [sent-46, score-1.077]
23 Currently, one of the popular methods for learning SCFGs from data is Bayesian Model Merging [18], which makes the grammar induction problem tractable by introducing a Minimum Description Length (MDL) prior on the grammar structure. [sent-47, score-1.416]
24 Our Approach Inspired by recent successes of Bayesian Model Merging outside computer vision, we propose a novel approach of inducing procedural models, particularly split grammars, from a set of labeled images. [sent-50, score-0.294]
25 In the first step we create a stochastic grammar which generates only the input examples with equal probabilities. [sent-55, score-0.744]
26 However, we want to find a grammar that can also generalize to create novel designs. [sent-56, score-0.705]
27 We formulate this problem as a search in the space of grammars, where the quality of a grammar is defined by its posterior probability given the data. [sent-57, score-0.735]
28 5, this requires an optimal trade-off between the grammar description length (smaller grammars are preferred) and the likelihood ofthe input data. [sent-59, score-1.038]
29 Previous work has shown that image parsing with a known set of grammar rules is a difficult problem by itself [15, 22]. [sent-61, score-0.915]
30 On the other hand, our grammar search procedure typically needs to evaluate a huge number of candidate grammars. [sent-62, score-0.75]
31 This means that we have to parse the input examples in a very short time, lest the grammar search last indefinitely. [sent-63, score-0.767]
32 Different authors have tackled this curse of dimensionality during parsing in different ways: assuming that the building exhibits a highly regular structure [12], using approximate inference such as MCMC [16], or exploiting grammar factorization [22]. [sent-64, score-0.863]
33 Following their example, we transform all of our input images into irregular lattices, casting our grammar search procedure into a lower dimensional space. [sent-68, score-0.748]
34 In this space we use our own, modified version of the Earley-Stolcke parser [18], a technique from natural language processing adapted to parse 2D lattices instead of 1D strings. [sent-69, score-0.224]
35 This dimensionality reduction enables the grammar search procedure to run within a reasonable time. [sent-70, score-0.725]
36 Finally, in order to perform image parsing, the induced grammar is cast into the original space. [sent-71, score-0.715]
37 This stochastic, parameterized grammar can either be used as a graphics tool for sampling building designs, or as a vision tool to alleviate image parsing of actual buildings. [sent-72, score-0.863]
38 Our contributions are: (1) A novel approach for inducing procedural split grammars from data. [sent-73, score-0.654]
39 To the best of our knowledge, we are the first to present a principled approach for learning probabilistic two-dimensional split grammars from labeled images. [sent-74, score-0.392]
40 (4) An experimental evaluation suggesting that learned grammars can be as effective as human-written grammars for the task of facade parsing. [sent-77, score-0.848]
41 Starting from the axiom, production rules subdivide the starting shape either in horizontal or vertical directions. [sent-81, score-0.295]
42 These productions correspond to standard horizontal and vertical split operators in split grammars. [sent-84, score-0.268]
43 For the grammar to be well-formed, the productions with X as LHS must satisfy the condition ? [sent-87, score-0.86]
44 The parse tree is obtained by applying a sequence of rules on the axiom and non-terminal nodes. [sent-103, score-0.271]
45 A derivation from the grammar consists of the parse tree and the selected attributes at each node: = (τ, α). [sent-104, score-0.904]
46 We define the likelihood of the grammar G generating a lattice l as L(l |G) = ? [sent-112, score-0.735]
47 Bayesian Model Merging To cast our grammar learning as an instance of Bayesian Model Merging, we need to define several methods: • Data incorporation: given a body of data, build an init•ial D grammar rwpohricaht generates only otdhey input examples. [sent-116, score-1.356]
48 • Model merging: propose a candidate grammar by altering tohed esltru mcetrugrein nogf: :t phero currently bnedistd grammar. [sent-117, score-0.703]
49 • Model evaluation: evaluate the fitness ofthe candidate grammar compared itoon t:h eev currently e bfietsnte grammar. [sent-118, score-0.703]
50 • Search: use model merging to explore the grammar space, searching efo mr tohdee optimal grammar 4. [sent-119, score-1.438]
51 Data Incorporation We start with a set of nf facade images, with each pixel labeled as one of the nl terminal classes (window, wall, balcony, etc. [sent-121, score-0.234]
52 1, grammar induction would be infeasible in the image space due to the curse of dimensionality. [sent-123, score-0.715]
53 For each lattice in the input set, we create an instancespecific split grammar, with terminal symbols corresponding to image labels. [sent-126, score-0.325]
54 Non-terminal productions are created by alternatively splitting the image in horizontal and vertical directions, starting with the latter. [sent-127, score-0.235]
55 All production probabilities are set to 1; all attributes are initialized to the relative sizes of right-hand side elements. [sent-128, score-0.218]
56 For example, the first production splits the axiom into horizontal regions represented by newly instantiated non-terminals and parametrized by their height: S → Xi . [sent-129, score-0.241]
57 productions with a single terminal on the RHS: X → label, p = 1, A = {{1}}. [sent-138, score-0.288]
58 Lexical productions re222000333 main deterministic, as they only label the entire shape of the parent with the given terminal class. [sent-139, score-0.314]
59 Now we have a set of deterministic grammars Gi, each producing exactly one input lattice. [sent-140, score-0.393]
60 The next step is to merge them into a single grammar by setting all of their axioms to the same symbol and aggregating all symbols and productions: G0 = (∪Ni, ∪Ti, S, ∪Ri , ∪Pi , ∪Ai). [sent-141, score-0.824]
61 The probabilities of the =rule (s∪ starting ,frSo,m∪ Rthe,∪ aPxio,∪mA are changed to 1/nf, which means that the grammar G0 generates each of the input examples with the same probability. [sent-142, score-0.753]
62 Merging A new grammar is proposed by selecting two nonterminals X1 and X2 from the current grammar and replacing them with a new non-terminal Y . [sent-145, score-1.387]
63 ing grammar two previously different symbols may be used interchangeably, although with different probabilities. [sent-159, score-0.781]
64 Evaluating Candidate Grammars Our goal is to find the grammar model G that yields the best trade-off between the fit to the input data D and a general preference for simpler models. [sent-167, score-0.678]
65 From a Bayesian perspective, we want to maximize the posterior P(G|D), pwehrsicphe citsi proportional ttoo mthea product oef pthoset grammar prior P(G) and a likelihood term P(D|G) . [sent-168, score-0.732]
66 We can decompose tPhe(G grammar imkeodliheol iondto t a mstr Puc(tDur|Ge part eG cSa (representing grammar symbols and rules) and the parameter part θg(rule probabilities): G = (GS, θg). [sent-169, score-1.459]
67 To define the prior over the grammar structure we follow a Minimum Description Length (MDL) principle. [sent-171, score-0.701]
68 2D Earley Parsing In order to find the Viterbi derivations of each input lattice in the E-step, we use a modified version of the EarleyStolcke parser [18], which we extended from parsing strings to parsing 2D lattices. [sent-188, score-0.48]
69 Using Earley’s parser instead of more common CKY parsing [28] has a number of advantages. [sent-191, score-0.237]
70 Its worst-case complexity is cubic in the size of the input, but it can perform substantially better for many well-known grammar 222000444 classes. [sent-192, score-0.678]
71 This sets us apart from previous work which either requires the grammar to be in a Chomsky Normal Form [21], or that the rules have to satisfy optimal substructure property [15]. [sent-194, score-0.759]
72 The influence of global prior weight w on induced grammar size is shown in Table 1. [sent-200, score-0.738]
73 Starting from the initial grammar, we follow a greedy best-first approach: in each iteration, every pair of nonterminals is considered for merging, and all of the candidate grammars are evaluated. [sent-201, score-0.416]
74 Of course, one may imagine more intricate ways of searching through the grammar space, e. [sent-206, score-0.678]
75 Final Model The grammar resulting from the search procedure is still limited to the lattice space. [sent-212, score-0.782]
76 To cast the grammar back in the image space, we perform two post-processing steps. [sent-213, score-0.678]
77 First, we collapse sequences of the same non-terminal symbol in a production to a single symbol with correspondingly modified attributes, for example: X → λY Y μ Collapse X → λY μ ? [sent-214, score-0.242]
78 4 0618 Table 1: Size comparison: initial grammar created by grammar incorporation, and two inferred grammars with prior weights of w = 0. [sent-233, score-1.739]
79 Parsing in Image Space The grammar induced in the previous section is now amenable for image-scale parsing. [sent-241, score-0.715]
80 However, their method requires that only terminal symbols of the grammar contain descriptive continuous parameters. [sent-248, score-0.887]
81 Grammar Parsing via rjMCMC For a given test image, our task is to find the derivation from the grammar that has the best fit to the image. [sent-253, score-0.759]
82 factorized the prior into a rule and attribute term over all non-terminal 222000555 nodes s of the derivation tree. [sent-280, score-0.235]
83 The rule term is calculated by summing up the negative log probabilities of all rules rs selected in the derivation. [sent-290, score-0.285]
84 −Eδ)} (9) In the jump move, again a random node h is selected in the derivation tree, and a new rule is sampled from all rules applicable to the current LHS. [sent-316, score-0.354]
85 Results In all grammar learning experiments, the training set was limited to 30 images to keep the induction time within reasonable bounds. [sent-346, score-0.715]
86 Parsing Existing Facades To show that our grammar learning is usable on realworld examples, we use the well-established Ecole Centrale Paris (ECP) facade parsing dataset [13], which contains 104 images of Haussmannian-style buildings. [sent-354, score-0.962]
87 The results that we obtain show that learned grammars can be just as effective in facade parsing as their manually written counterparts, even outperforming them in some cases. [sent-362, score-0.644]
88 To put the results in context, we also show the performance of the state of the art (SOA) method in facade parsing [8]. [sent-363, score-0.284]
89 A promising direction for future work would be to learn grammars from the output of methods such as [8], eliminating the need for labeled ground truth images. [sent-365, score-0.36]
90 Generating Novel Designs The advantage of having a grammar for a certain style of buildings is that we can easily sample new designs from it. [sent-368, score-0.808]
91 In this scenario, we generate a random derivation from the grammar by starting from the grammar axiom as the first node of the tree. [sent-369, score-1.582]
92 We rendered a whole street of buildings sampled from our induced grammar in CityEngine [14]. [sent-382, score-0.766]
93 Conclusion and Future Work In this work we introduced a principled way of learning procedural split grammars from labeled data. [sent-390, score-0.624]
94 Our induced procedural grammar not only generates new buildings of the same style, but also achieves exceptional results in facade parsing, outperforming similar approaches which require a manually designed set of grammar rules. [sent-392, score-1.804]
95 Furthermore, more complex shape grammars could be inferred by extending the Earley parser, which is currently limited to grid-like designs. [sent-394, score-0.386]
96 In each step of this approach, a more refined grammar is inferred through initial labeling, augmenting in turn the labeling in subsequent iterations. [sent-396, score-0.678]
97 (b) The grammar is over-generalizing due to high prior weight. [sent-405, score-0.701]
98 A connection between partial symmetry and inverse procedural modeling. [sent-420, score-0.232]
99 Procedural 3D building reconstruction using shape grammars and detectors. [sent-469, score-0.415]
100 Reconstruction of façade structures using a formal grammar and rjmcmc. [sent-512, score-0.706]
wordName wordTfidf (topN-words)
[('grammar', 0.678), ('grammars', 0.36), ('procedural', 0.232), ('productions', 0.182), ('parsing', 0.156), ('production', 0.135), ('facade', 0.128), ('terminal', 0.106), ('gs', 0.105), ('symbols', 0.103), ('rule', 0.095), ('axiom', 0.084), ('merging', 0.082), ('rules', 0.081), ('parser', 0.081), ('derivation', 0.081), ('earley', 0.079), ('rjmcmc', 0.073), ('rhs', 0.07), ('jump', 0.067), ('ipm', 0.063), ('lhs', 0.063), ('parse', 0.063), ('aliaga', 0.058), ('lattice', 0.057), ('martinovi', 0.056), ('lattices', 0.052), ('wonka', 0.052), ('buildings', 0.051), ('chain', 0.05), ('vanegas', 0.047), ('viterbi', 0.046), ('probabilities', 0.044), ('move', 0.044), ('symbol', 0.043), ('reversible', 0.043), ('bayesian', 0.043), ('tree', 0.043), ('languages', 0.042), ('facades', 0.042), ('style', 0.04), ('designs', 0.039), ('stochastic', 0.039), ('attributes', 0.039), ('talton', 0.039), ('urban', 0.038), ('induced', 0.037), ('induction', 0.037), ('soa', 0.037), ('teboul', 0.037), ('rs', 0.036), ('attribute', 0.036), ('mathias', 0.033), ('deterministic', 0.033), ('split', 0.032), ('earleystolcke', 0.031), ('nonterminals', 0.031), ('prf', 0.031), ('scfgs', 0.031), ('posterior', 0.031), ('ller', 0.031), ('starting', 0.031), ('inducing', 0.03), ('node', 0.03), ('derivations', 0.03), ('paris', 0.029), ('building', 0.029), ('log', 0.029), ('bene', 0.028), ('centrale', 0.028), ('haussmannian', 0.028), ('weissenberg', 0.028), ('language', 0.028), ('formal', 0.028), ('create', 0.027), ('shape', 0.026), ('search', 0.026), ('subtree', 0.026), ('ecp', 0.026), ('koutsourakis', 0.026), ('reinforcement', 0.026), ('riemenschneider', 0.026), ('mcmc', 0.025), ('candidate', 0.025), ('bokeloh', 0.024), ('lexical', 0.024), ('carlo', 0.024), ('incorporation', 0.024), ('siggraph', 0.023), ('mdl', 0.023), ('irregular', 0.023), ('monte', 0.023), ('rectangular', 0.023), ('prior', 0.023), ('horizontal', 0.022), ('cities', 0.022), ('ecole', 0.022), ('diffusion', 0.021), ('collapse', 0.021), ('procedure', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling
Author: Andelo Martinovic, Luc Van_Gool
Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.
2 0.43184084 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu
Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.
3 0.30110765 228 cvpr-2013-Is There a Procedural Logic to Architecture?
Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool
Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.
4 0.11595773 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
5 0.11036395 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu
Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.
6 0.091874354 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
7 0.067321822 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
8 0.05859746 351 cvpr-2013-Recovering Line-Networks in Images by Junction-Point Processes
9 0.057264246 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
10 0.057000022 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
11 0.054288469 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
12 0.046230327 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
13 0.045154035 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
14 0.044689205 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
15 0.042837318 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
16 0.04154465 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
17 0.040505428 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
18 0.040501934 453 cvpr-2013-Video Editing with Temporal, Spatial and Appearance Consistency
19 0.040413499 28 cvpr-2013-A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
20 0.040184293 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
topicId topicWeight
[(0, 0.109), (1, -0.014), (2, 0.004), (3, -0.026), (4, 0.064), (5, 0.024), (6, -0.019), (7, 0.056), (8, -0.003), (9, 0.01), (10, -0.009), (11, 0.078), (12, -0.046), (13, -0.018), (14, -0.005), (15, 0.012), (16, 0.042), (17, 0.04), (18, 0.003), (19, -0.044), (20, -0.026), (21, 0.067), (22, 0.026), (23, -0.032), (24, 0.036), (25, 0.061), (26, 0.128), (27, -0.054), (28, -0.074), (29, 0.198), (30, -0.059), (31, 0.085), (32, 0.076), (33, 0.031), (34, -0.148), (35, -0.162), (36, -0.087), (37, -0.014), (38, -0.056), (39, -0.184), (40, 0.111), (41, 0.071), (42, -0.064), (43, -0.313), (44, -0.05), (45, -0.002), (46, 0.063), (47, 0.108), (48, 0.166), (49, 0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.95084983 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling
Author: Andelo Martinovic, Luc Van_Gool
Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.
2 0.89536494 228 cvpr-2013-Is There a Procedural Logic to Architecture?
Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool
Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.
3 0.65798831 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu
Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.
4 0.58108354 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
5 0.4931415 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu
Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.
6 0.42651281 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
7 0.34014383 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
8 0.29098967 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
9 0.28149542 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
10 0.27406773 190 cvpr-2013-Graph-Based Optimization with Tubularity Markov Tree for 3D Vessel Segmentation
11 0.26818946 55 cvpr-2013-Background Modeling Based on Bidirectional Analysis
12 0.25958502 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
13 0.25630933 22 cvpr-2013-A Non-parametric Framework for Document Bleed-through Removal
14 0.2539517 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
15 0.25357431 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images
16 0.25137404 406 cvpr-2013-Spatial Inference Machines
17 0.24710147 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
18 0.23533121 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization
19 0.23101392 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
20 0.22924924 350 cvpr-2013-Reconstructing Loopy Curvilinear Structures Using Integer Programming
topicId topicWeight
[(10, 0.075), (16, 0.017), (24, 0.013), (26, 0.063), (28, 0.016), (33, 0.217), (39, 0.024), (45, 0.284), (59, 0.012), (67, 0.05), (69, 0.059), (87, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.7936883 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling
Author: Andelo Martinovic, Luc Van_Gool
Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.
Author: Yiliang Xu, Sangmin Oh, Anthony Hoogs
Abstract: We present a novel vanishing point detection algorithm for uncalibrated monocular images of man-made environments. We advance the state-of-the-art by a new model of measurement error in the line segment extraction and minimizing its impact on the vanishing point estimation. Our contribution is twofold: 1) Beyond existing hand-crafted models, we formally derive a novel consistency measure, which captures the stochastic nature of the correlation between line segments and vanishing points due to the measurement error, and use this new consistency measure to improve the line segment clustering. 2) We propose a novel minimum error vanishing point estimation approach by optimally weighing the contribution of each line segment pair in the cluster towards the vanishing point estimation. Unlike existing works, our algorithm provides an optimal solution that minimizes the uncertainty of the vanishing point in terms of the trace of its covariance, in a closed-form. We test our algorithm and compare it with the state-of-the-art on two public datasets: York Urban Dataset and Eurasian Cities Dataset. The experiments show that our approach outperforms the state-of-the-art.
3 0.73355877 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
4 0.69708931 228 cvpr-2013-Is There a Procedural Logic to Architecture?
Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool
Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.
5 0.69162488 12 cvpr-2013-A Global Approach for the Detection of Vanishing Points and Mutually Orthogonal Vanishing Directions
Author: Michel Antunes, João P. Barreto
Abstract: This article presents a new global approach for detecting vanishing points and groups of mutually orthogonal vanishing directions using lines detected in images of man-made environments. These two multi-model fitting problems are respectively cast as Uncapacited Facility Location (UFL) and Hierarchical Facility Location (HFL) instances that are efficiently solved using a message passing inference algorithm. We also propose new functions for measuring the consistency between an edge and aputative vanishingpoint, and for computing the vanishing point defined by a subset of edges. Extensive experiments in both synthetic and real images show that our algorithms outperform the state-ofthe-art methods while keeping computation tractable. In addition, we show for the first time results in simultaneously detecting multiple Manhattan-world configurations that can either share one vanishing direction (Atlanta world) or be completely independent.
6 0.6694932 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
7 0.65901524 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
8 0.65853798 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
9 0.65788054 311 cvpr-2013-Occlusion Patterns for Object Class Detection
10 0.65781462 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
11 0.65681624 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
12 0.65629208 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
13 0.6558013 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
14 0.65442365 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation
15 0.65359831 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
16 0.65310079 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
17 0.6528818 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
18 0.65266716 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
19 0.65247536 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
20 0.6523369 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models