cvpr cvpr2013 cvpr2013-225 knowledge-graph by maker-knowledge-mining

225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation


Source: pdf

Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu

Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu a Abstract In this paper we present a compositional and-or graph grammar model for human pose estimation. [sent-5, score-0.553]

2 Our approach aims to address these problems by combining four key aspects: compositional parts, articulated geometry, context-sensitive part compatibility, and background modeling. [sent-13, score-0.397]

3 Compositional Parts: The fundamental difference between grammar models [27] and conventional hierarchical models [8] is the notion that an object can be composed from its parts in multiple ways. [sent-14, score-0.536]

4 These compositions can occur hierarchically, allowing the grammar to represent a very large space of possible configurations using a small number of concise rules. [sent-15, score-0.345]

5 Articulated Geometry and Part Compatibility: Articulation is a compatibility relation restricting the position and orientation of a pair of parts such that they align with a common hinge point between them, and are in a plausible orientation relative to each other. [sent-16, score-0.579]

6 The majority of articulated models in the literature rely exclusively on this type of relation, either between part pairs [8, 25] or higher-order cliques [21]. [sent-17, score-0.285]

7 Contextsensitive compatibility relations are then applied to the relative position and orientation (articulated geometry), relative scale, and cooccurrence between part variants. [sent-22, score-0.488]

8 Furthermore, each compositional variant defines its own articulated hinge points, providing an implicit compatibility between the appearance of that variant and the locations where neighboring parts can attach to it. [sent-23, score-0.85]

9 For example, the frontalview torso has a wide appearance with hinge points for the arms and legs near the sides, whereas the side-view torso has a narrow appearance with hinge points near the centerline. [sent-24, score-0.539]

10 Selecting the appropriate torso will depend on the torso appearance, alignment of the limbs to the joint locations for each torso variant, and the cooccurrence compatibility between the torso and limb variants. [sent-25, score-0.636]

11 Background modeling: Many parts of the body have very weak local appearance structure. [sent-26, score-0.238]

12 To complicate matters, edge features are often 333222111422 nodes represent distinct part appearance models, while or-nodes can be treated as a local mixtures of and-nodes. [sent-28, score-0.236]

13 Pictorial structures [8] (a) has a fixed structure with no shared parts, and uses conventional articulation relations over relative position and orientation. [sent-30, score-0.26]

14 The flexible mixtureof-parts model [25] (b) emulates articulation with a large number of orientation-specific parts and mixtures, using relations only between mixture selections (types). [sent-31, score-0.297]

15 Our baseline and-graph model (c) has similar structure and relations with PS but shares parts between left and right sides and uses relative scale relations. [sent-32, score-0.233]

16 Our final and-or-graph model (d) extends (c) by utilizing several part variants, and compatibility relations between variants (productions). [sent-33, score-0.388]

17 To combat this, we augment our part appearance models to compute a contrast between the part interior, and the region distributions of the adjoining background. [sent-35, score-0.403]

18 The resulting part appearance model helps eliminate spurious detections in clutter, and as a result improves localization performance. [sent-36, score-0.235]

19 To justify our approach, we evaluate a simplified case of our grammar omitting the use of part variants or background information (model AG in Fig. [sent-37, score-0.549]

20 Related Work Image grammar models have been a topic of interest for many years [10, 27], however, there has been limited success in getting these models to perform competitively over their fixed-structure counterparts on realistic problems. [sent-42, score-0.423]

21 Previous work using grammars for human pose modeling include template grammars to parse rich part appearances [3, 16], super pixel based composition grammars for human region parsing [18], and boundary contour grammars for parsing human outlines [26]. [sent-43, score-1.226]

22 [11] extends the popular discriminative deformable part model into a grammar formalism used for human detection, but not part localization. [sent-44, score-0.597]

23 The hierarchical mixture model of [25] differs from our model by replacing articulation geometry with keypoint mixtures, and does not allow reusable or reconfigurable parts. [sent-45, score-0.248]

24 Fixed structure models for human pose estimation generally fall into the family of pictorial structures models [8, 6, 1, 17], that use a kinematic tree of parts to represent the body. [sent-48, score-0.396]

25 Our model can be viewed as a generalization of these models, as each part composition can be treated as a local pictorial structure model nested in the grammar. [sent-50, score-0.219]

26 Poselets [2], are difficult to categorize as a model type, and utilizes a voting scheme of pose-specific parts to interpolate a pose without an explicit model of the body kinematics. [sent-53, score-0.284]

27 The use of image-specific background models to improve human pose estimation is an idea that has been revisited many times in recent literature. [sent-55, score-0.216]

28 An iterative learning scheme for CRF appearance models was presented in [15], which incrementally refines a generic part model using image-specific appearance evidence. [sent-56, score-0.28]

29 And-Or Graph Grammar Our articulated grammar model provides a probabilistic framework to decompose the body into modular parts while maintaining the articulated kinematics. [sent-63, score-0.875]

30 The grammar takes the form of an and-or graph, denoted as G = (S, P, s0), twheher foe Sm i os a snet a nofd symbols ,(o dr-ennootdeeds), a sP G Gis a s (eSt oPf, productions (and-nodes), and s0 is the root symbol. [sent-64, score-0.66]

31 Each production p ∈ P takes the form (α → β, t, R). [sent-65, score-0.406]

32 RS aiss a see dt ios-f probabilistic relations that control the spatial geometry and compatibility between α and β, and thus expresses contextual information about neighboring parts. [sent-69, score-0.34]

33 Each and-node production is drawn as a dot inside its corresponding proximal or-node symbol. [sent-72, score-0.527]

34 Unlike conventional grammars that only connect to the data through the terminal symbols, our grammar defines an appearance model for every production. [sent-74, score-0.682]

35 For this reason, we require at least one production to expand every symbol. [sent-75, score-0.429]

36 For a terminal symbol, a production of the form (α → ∅, t, R) is austeedrm mtion provide an appearance aonfd t geometry m →od ∅el, tf,oRr t)h ies proximal symbol without any further decomposition. [sent-76, score-0.914]

37 Each symbol can be expanded by multiple productions to provide different explanations for not just the appearance of that symbol, but also the geometry and compatibility between the symbol and its constituents. [sent-77, score-0.885]

38 Part sharing occurs whenever two or more productions use the same distal symbol. [sent-78, score-0.378]

39 The advantages of part sharing are threefold: the resulting model has fewer parame- ters to learn, shared parts inherently have more training examples, and inference computation can also be shared for these parts. [sent-79, score-0.326]

40 A parse tree pt is constructed from G by recursively selecting productions st oc expand symbols starting cfruormsiv tehley root symbol s0. [sent-83, score-0.922]

41 Similarly, (vi, vj) ∈ E(pt) enumerates the pairs of proximal to distal parts fo)r ∈ ∈e Each(p production uesse tdh ein p athires parse. [sent-85, score-0.835]

42 (1) (vi ,vjX X) X∈ E (pt) Each production has a model weight vector corresponding to each of the potential functions λi = (λia, λci1, λci2) : ∀pi ∈ P. [sent-87, score-0.428]

43 The weight vector of the full grammar mod)el :is ∀ expressed as a cwoenigchatte nvaetcitoonr ooff the production weights λ = {λi : i = 1. [sent-88, score-0.777]

44 Articulated Geometry Each symbol in the grammar is assigned a canonical length and width learned from the part annotations. [sent-94, score-0.631]

45 The geometry of each part in a parse can then be computed by retrieving the canonical dimensions corresponding to the proximal symbol of production ω, centering this rectangle at location (x, y) in the image, rotating by and rescaling by s. [sent-95, score-1.117]

46 For each articulated pair (vi , vj ) within each production ω, the hinge point for which these two parts articulate around is estimated from the training annotations by least-squares as in [8]. [sent-98, score-0.958]

47 We now have two coordinate transformations, denoted Tωp(vi) and Tωd(vj), to compute the θ, ideal hinge location from either the proximal or distal part states respectively. [sent-99, score-0.508]

48 The weights λgω1 are the mixing weights for each orientation, specific to production ω. [sent-104, score-0.406]

49 The geometry orientation score is therefore fg1(v) = hλgω1, φg1(θ)i. [sent-106, score-0.205]

50 Background features consist of distance measures between the mean part color and adjoining external region segments. [sent-109, score-0.242]

51 The template defines multiple background sample points around the perimeter of the part, each of which retrieves the region segment that contains the point and compares it with the interior mean color. [sent-110, score-0.231]

52 The articulation score consists of three components: hinge displacement, relative scale, and relative orientation. [sent-111, score-0.332]

53 Part Compatibility Part compatibility is the preference of selecting one production over another to explain the same symbol in a parse. [sent-119, score-0.773]

54 The first is a unary bias on each production, analogous to the production frequencies in a stochastic context-free grammar. [sent-121, score-0.428]

55 (4) × The second is a pairwise production compatibility between neighboring parts in the parse. [sent-123, score-0.719]

56 The vector λcω2 is the mareatr aix m row corresponding |tPo production ω, acntodr represents the compatibility of all distal productions with ω. [sent-125, score-0.956]

57 The production compatibility potential is then fc2(vi, vj) = λcω2i [ωj]. [sent-126, score-0.578]

58 Appearance Model and Segmentation Each production defines an appearance template that specifies where to extract features responses from the image for a given part state, as illustrated in Fig. [sent-129, score-0.71]

59 The region features measure how distinct the foreground part region is from the surrounding background. [sent-140, score-0.281]

60 We represent the image background as a collection of large disjoint regions, where the appearance within each region is well explained using a multivariate Gaussian in L∗u∗v∗ color space. [sent-141, score-0.206]

61 Furthermore, we assume that the background regions are large compared to the size of the foreground parts and treat the background process as independent of the foreground. [sent-142, score-0.3]

62 Finally, the region feature is computed as the Mahalanobis distance between the foreground mean µv and a background region (µi, Σi) QiK=1 d(µv, µi, Σi) = (µv − µi)>Σi−1(µv − µi). [sent-159, score-0.236]

63 leg part in the vertical orientation are shown using only HOG features in (d) and only region distance features in (e). [sent-166, score-0.229]

64 Due to local normalization, spurious foreground responses tend to appear particularly around textured regions, whereas the background feature is far more stable in these regions. [sent-167, score-0.235]

65 The full appearance response vector φa (I, t, v) can now be computed as a concatenation of responses from each rotation-shifted gradient histogram feature, and region distance feature in the template. [sent-172, score-0.205]

66 Inference Relations in the grammar are always between proximal and distal parts within the same production, resulting in a tree factorization of the full grammar model. [sent-175, score-1.17]

67 The basic unit of computation is computing a maximal score map for the proximal part of a production. [sent-177, score-0.336]

68 Each of the distal parts are conditionally independent given the proximal part, and can be maximized individually. [sent-178, score-0.429]

69 The maximal score map for part state vi given production ωi can be expressed recursively as M(vi|ωi) = fωai(vi,I) + fωgi1(vi) + fωci1 (8) +(vi,vXj)∈Rωimvajx? [sent-179, score-0.762]

70 Although the production for part vi is fixed, we must maximize over the full state of the distal parts vj, including the distal production. [sent-181, score-1.161]

71 The maximization over scale sj, orientation θj, and production ωj each require quadratic time to compute. [sent-183, score-0.527]

72 To infer the maximal scoring parse, we recurse through the grammar starting from the root symbol s0. [sent-185, score-0.657]

73 Terminal symbols have no distal parts, and their maximal score maps consist of only the appearance and unary potentials. [sent-186, score-0.509]

74 Once the maximal score maps are computed for every production, the maximal parse score can be obtained by maxing over all productions that have the root symbol as the proximal part pj∈Pm s. [sent-187, score-1.091]

75 (9) The parse tree can be recovered by replacing the max operators with arg max and backtracking through the optimal state maps. [sent-190, score-0.269]

76 Learning The score of a parse can always be expressed as the inner product of the full model weight vector and a response vector for the entire parse fG (pt, I) = hλ, φ(pt, I)i . [sent-192, score-0.556]

77 at T output dtehel wmeaixgihmtsal λ scoring parse sF aλG f f(aIm) l=y arg maxpt fG (pt, I) for a given grammar. [sent-194, score-0.254]

78 The optimal weights are λ λ∗ = argmλinE(p¯t,I)∼D[L(FλG(I),p¯t)] (10) The loss is defined on the structured output space of parses, and must measure the quality of a predicted parse against the ground truth parse. [sent-197, score-0.266]

79 All parses from the grammars we define here, however, have the same number of parts and the same branching structure which allows us to compute loss as the sum of part-wise terms. [sent-199, score-0.354]

80 Our loss is motivated by the PCP evaluation metric [6], which computes a score based on the proximity of the part endpoints to the ground truth endpoints. [sent-200, score-0.232]

81 The AOG+BG model includes terms to favor part regions that are distinct from their adjoining background process, and can correctly localize many of these parts. [sent-210, score-0.272]

82 Because the score opft the parse λ>φ(p,¯tIi,) I −) can never be greater than the score of the maximal parse λ>φ(pˆti, I), the right-hand-side of the expression can never be lower than the loss. [sent-217, score-0.632]

83 Evaluation We train and evaluate our method using three different grammar models to illustrate the impact on performance from the addition of reconfigurable parts as well as the background model. [sent-222, score-0.605]

84 For all cases, we discretize the state space of the parts to be 25% of the image width and height, and use 24 part orientations. [sent-223, score-0.234]

85 AG: And-graph grammar is our baseline model, shown in Fig. [sent-224, score-0.345]

86 1(c), and is the simplest possible model in our framework to represent the full articulated body. [sent-225, score-0.219]

87 Each symbol has only one production, and all limb parts are shared between the left and right sides. [sent-226, score-0.455]

88 AOG+BG: This is the same grammar as AOG, but with the addition of the background terms. [sent-241, score-0.416]

89 4 shows several examples where the top scoring pose erroneously matches to a strong edge with poor region support, but is corrected when retraining with the background feature. [sent-246, score-0.242]

90 Evaluation Protocol Unfortunately there are several competing evaluation protocols for articulated pose estimation scattered throughout the literature that often produce significantly different results. [sent-249, score-0.245]

91 This protocol defines a part as detected if the average distance between the centerline endpoints of the proposal and ground truth is less than 50% of the ground truth part length. [sent-251, score-0.29]

92 Even perfect results on these examples will still get the limb parts counted as wrong because the left limb is being evaluated against the right side and vice versa. [sent-258, score-0.329]

93 For each part, we provide an additional annotation to indicate a production label. [sent-286, score-0.406]

94 In the same manner as PARSE, we provide an additional production label to each part. [sent-293, score-0.406]

95 Conclusions We present a framework for human pose estimation using an articulated grammar model, as well as a simple approach to integrate a background model into the grammar that can improve localization performance in cluttered scenes. [sent-298, score-1.076]

96 Furthermore, we demonstrate consistent per-part performance improvements of adding reconfigurable parts over a baseline fixed-structure model using the same part representations and learning, and additional gains from incorporating the background model. [sent-301, score-0.35]

97 Pictorial structures revisited: People detection and articulated pose estimation. [sent-309, score-0.245]

98 Poselets: Body part detectors trained using 3d human pose annotations. [sent-315, score-0.213]

99 Combining discriminative appearance and segmentation cues for articulated human pose estimation. [sent-388, score-0.393]

100 Clustered pose and nonlinear appearance models for human pose estimation. [sent-393, score-0.291]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('production', 0.406), ('grammar', 0.345), ('aog', 0.273), ('parse', 0.22), ('symbol', 0.195), ('productions', 0.189), ('distal', 0.189), ('compatibility', 0.172), ('articulated', 0.171), ('vj', 0.155), ('pt', 0.144), ('grammars', 0.122), ('proximal', 0.121), ('parts', 0.119), ('bg', 0.119), ('vi', 0.117), ('hinge', 0.107), ('symbols', 0.099), ('articulation', 0.095), ('limb', 0.094), ('part', 0.091), ('leeds', 0.084), ('pictorial', 0.084), ('torso', 0.079), ('orientation', 0.075), ('pose', 0.074), ('appearance', 0.072), ('background', 0.071), ('score', 0.068), ('parses', 0.067), ('compositional', 0.064), ('adjoining', 0.063), ('region', 0.063), ('geometry', 0.062), ('relations', 0.061), ('template', 0.06), ('pcp', 0.058), ('terminal', 0.058), ('maximal', 0.056), ('dri', 0.055), ('spurious', 0.05), ('mixtures', 0.048), ('human', 0.048), ('fmp', 0.047), ('shared', 0.047), ('body', 0.047), ('reconfigurable', 0.047), ('loss', 0.046), ('maximization', 0.046), ('ti', 0.045), ('responses', 0.044), ('protocol', 0.044), ('parsing', 0.043), ('variant', 0.043), ('variants', 0.042), ('ag', 0.041), ('pages', 0.041), ('foreground', 0.039), ('defines', 0.037), ('mod', 0.037), ('scoring', 0.034), ('risk', 0.034), ('competitively', 0.032), ('poselets', 0.032), ('normally', 0.032), ('textured', 0.031), ('relative', 0.031), ('skeleton', 0.03), ('johnson', 0.03), ('xv', 0.03), ('segmentation', 0.028), ('angeles', 0.028), ('gain', 0.028), ('hog', 0.028), ('root', 0.027), ('limbs', 0.027), ('cooccurrence', 0.027), ('displacement', 0.027), ('endpoints', 0.027), ('los', 0.027), ('conventional', 0.026), ('fa', 0.026), ('full', 0.026), ('failures', 0.026), ('eg', 0.026), ('distinct', 0.025), ('tree', 0.025), ('consist', 0.025), ('state', 0.024), ('narrow', 0.023), ('fg', 0.023), ('contextual', 0.023), ('models', 0.023), ('lu', 0.023), ('expand', 0.023), ('perfect', 0.022), ('rotating', 0.022), ('model', 0.022), ('neighboring', 0.022), ('stochastic', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu

Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

2 0.43184084 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling

Author: Andelo Martinovic, Luc Van_Gool

Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.

3 0.26227254 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

4 0.17717046 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

5 0.17679653 335 cvpr-2013-Poselet Conditioned Pictorial Structures

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

6 0.17249058 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

7 0.16434753 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

8 0.16026787 334 cvpr-2013-Pose from Flow and Flow from Pose

9 0.15854962 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

10 0.13789763 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

11 0.13690457 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

12 0.13116635 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

13 0.13080314 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

14 0.12978685 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

15 0.11889122 228 cvpr-2013-Is There a Procedural Logic to Architecture?

16 0.11484244 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

17 0.11432835 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

18 0.1028254 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

19 0.10069323 325 cvpr-2013-Part Discovery from Partial Correspondence

20 0.096108235 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.213), (1, -0.031), (2, 0.029), (3, -0.101), (4, 0.061), (5, 0.023), (6, 0.099), (7, 0.119), (8, 0.012), (9, -0.075), (10, -0.067), (11, 0.164), (12, -0.107), (13, -0.048), (14, 0.004), (15, 0.041), (16, 0.08), (17, 0.008), (18, -0.025), (19, -0.072), (20, -0.045), (21, 0.119), (22, 0.045), (23, -0.095), (24, 0.051), (25, 0.062), (26, 0.149), (27, -0.066), (28, -0.054), (29, 0.183), (30, -0.025), (31, 0.115), (32, 0.106), (33, 0.001), (34, -0.094), (35, -0.166), (36, -0.072), (37, 0.031), (38, -0.034), (39, -0.138), (40, 0.128), (41, 0.042), (42, -0.037), (43, -0.245), (44, -0.066), (45, -0.064), (46, 0.084), (47, 0.096), (48, 0.133), (49, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90003043 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu

Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

2 0.87940192 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling

Author: Andelo Martinovic, Luc Van_Gool

Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.

3 0.83165324 228 cvpr-2013-Is There a Procedural Logic to Architecture?

Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool

Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.

4 0.68942583 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

5 0.64052826 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

Author: Yibiao Zhao, Song-Chun Zhu

Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.

6 0.57893711 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

7 0.52024424 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

8 0.51091301 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

9 0.48533294 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

10 0.47889969 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

11 0.46224734 335 cvpr-2013-Poselet Conditioned Pictorial Structures

12 0.45697063 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

13 0.44745693 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

14 0.44571918 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

15 0.44100866 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

16 0.43633288 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

17 0.42786083 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

18 0.40650624 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

19 0.40369242 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models

20 0.39394614 190 cvpr-2013-Graph-Based Optimization with Tubularity Markov Tree for 3D Vessel Segmentation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.128), (10, 0.117), (16, 0.044), (26, 0.048), (28, 0.021), (33, 0.239), (39, 0.053), (45, 0.01), (67, 0.103), (69, 0.041), (80, 0.035), (87, 0.077)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91139048 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu

Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

2 0.90740961 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity

Author: Abed Malti, Richard Hartley, Adrien Bartoli, Jae-Hak Kim

Abstract: We propose a new approach for template-based extensible surface reconstruction from a single view. We extend the method of isometric surface reconstruction and more recent work on conformal surface reconstruction. Our approach relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface. We derive a patch-based formulation of this stretching energy by assuming local linear elasticity. This formulation unifies geometrical and mechanical constraints in a single energy term. We prevent local scale ambiguities by imposing a set of fixed boundary 3D points. We experimentally prove the sufficiency of this set of boundary points and demonstrate the effectiveness of our approach on different developable and non-developable surfaces with a wide range of extensibility.

3 0.90241563 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin

Abstract: Driven by recent vision and graphics applications such as image segmentation and object recognition, assigning pixel-accurate saliency values to uniformly highlight foreground objects becomes increasingly critical. More often, such fine-grained saliency detection is also desired to have a fast runtime. Motivated by these, we propose a generic and fast computational framework called PISA Pixelwise Image Saliency Aggregating complementary saliency cues based on color and structure contrasts with spatial priors holistically. Overcoming the limitations of previous methods often using homogeneous superpixel-based and color contrast-only treatment, our PISA approach directly performs saliency modeling for each individual pixel and makes use of densely overlapping, feature-adaptive observations for saliency measure computation. We further impose a spatial prior term on each of the two contrast measures, which constrains pixels rendered salient to be compact and also centered in image domain. By fusing complementary contrast measures in such a pixelwise adaptive manner, the detection effectiveness is significantly boosted. Without requiring reliable region segmentation or post– relaxation, PISA exploits an efficient edge-aware image representation and filtering technique and produces spatially coherent yet detail-preserving saliency maps. Extensive experiments on three public datasets demonstrate PISA’s superior detection accuracy and competitive runtime speed over the state-of-the-arts approaches.

4 0.88873255 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

5 0.88730401 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

Author: Byung-soo Kim, Shili Xu, Silvio Savarese

Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.

6 0.88706088 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

7 0.88405782 9 cvpr-2013-A Fast Semidefinite Approach to Solving Binary Quadratic Problems

8 0.88357246 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

9 0.88221073 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation

10 0.88097668 226 cvpr-2013-Intrinsic Characterization of Dynamic Surfaces

11 0.87995541 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

12 0.87835139 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

13 0.87744188 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

14 0.87728792 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

15 0.87700802 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

16 0.87619901 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

17 0.87575662 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

18 0.8756659 414 cvpr-2013-Structure Preserving Object Tracking

19 0.87513649 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

20 0.8750515 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation