Author: Soumya Ghosh, Matthew Loper, Erik B. Sudderth, Michael J. Black

Abstract: We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various three-dimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultaneously guaranteeing a spatially connected segmentation. To allow analysis of datasets in which object instances have varying 3D shapes, we model part variability across poses via affine transformations. By placing a matrix normal-inverse-Wishart prior on these affine transformations, we develop a ddCRP Gibbs sampler which tractably marginalizes over transformation uncertainty. Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods.

1 de Abstract We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various three-dimensional poses. [sent-7, score-0.845]

2 Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods. [sent-11, score-0.362]

3 1 Introduction Mesh segmentation methods decompose a three-dimensional (3D) mesh, or a collection of aligned meshes, into their constituent parts. [sent-12, score-0.242]

4 This well-studied problem has numerous applications in computational graphics and vision, including texture mapping, skeleton extraction, morphing, and mesh registration and simplification. [sent-13, score-0.648]

5 We focus in particular on the problem of segmenting an articulated object, given aligned 3D meshes capturing various object poses. [sent-14, score-0.636]

6 The meshes we consider are complete surfaces described by a set of triangular faces, and we seek a segmentation into spatially coherent parts whose spatial transformations capture object articulations. [sent-15, score-1.117]

7 Applied to various poses of human bodies as in Figure 1, our approach identifies regions of the mesh that deform together, and thus provides information which could inform applications such as the design of protective clothing. [sent-16, score-0.838]

8 Mesh segmentation has been most widely studied as a static clustering problem, where a single mesh is segmented into “semantic” parts using low-level geometric cues such as distance and curvature [1, 2]. [sent-17, score-1.045]

9 While supervised training data can sometimes lead to improved results [3], there are many applications where such data is unavailable, and the proper way to partition a single mesh is inherently ambiguous. [sent-18, score-0.573]

10 By searching for parts which deform consistently across many meshes, we create a better-posed problem whose solution is directly useful for modeling objects in motion. [sent-19, score-0.259]

11 First, the number of parts comprising an articulated object is unknown a priori, and must be inferred from the observed deformations. [sent-21, score-0.428]

12 Second, mesh faces exhibit strong spatial correlations, and the inferred parts must be contiguous. [sent-22, score-0.986]

13 This spatial connectivity is needed to discover parts which correspond with physical object structure, and required by target applications such as skeleton extraction. [sent-23, score-0.319]

14 A segmentation of the human body should take into account this range of variability in the popula1 Figure 1: Human body segmentation. [sent-26, score-0.309]

15 Left: Reference poses for two female bodies, and those bodies captured in five other poses. [sent-27, score-0.207]

16 Right: A manual segmentation used to align these meshes [6], and the segmentation inferred by our ddCRP model from 56 poses. [sent-28, score-0.757]

17 The ddCRP segmentation discovers parts whose motion is nearly rigid, and includes small parts such as elbows and knees absent from the manual segmentation. [sent-29, score-0.691]

18 To our knowledge, no previous methods for segmenting meshes combine information about deformation from multiple bodies to address this corpus segmentation problem. [sent-31, score-0.753]

19 We adapt the distance dependent Chinese restaurant process (ddCRP) [4] to model spatial dependencies among mesh triangles, and enforce spatial contiguity of the inferred parts [5]. [sent-33, score-1.025]

20 Unlike most previous mesh segmentation methods, our Bayesian nonparametric approach allows data-driven inference of an appropriate number of parts, and uses a affine transformation-based likelihood to accommodate object instances of varying shape. [sent-34, score-0.789]

21 After developing our model in Section 2, Section 3 develops a Gibbs sampler which efficiently marginalizes the latent affine transformations defining part deformation. [sent-35, score-0.22]

22 We conclude in Section 4 with results examining meshes of humans and other articulated objects, where we introduce a metric for quantitative evaluation of deformation-based segmentations. [sent-36, score-0.509]

23 For some input mesh j, we let yjn ∈ R3 denote the 3D location of the center of triangular face n, and Yj = [yj1 , . [sent-38, score-0.693]

24 Each mesh j has an associated N -triangle reference mesh, indexed by bj . [sent-42, score-0.675]

25 We let xbn ∈ R4 denote the location of triangle n in reference mesh b, expressed in homogeneous coordinates (xbn (4) = 1). [sent-43, score-0.798]

26 In our later experiments, Yj encodes the 3D mesh for a person in pose j, and Xbj is the reference pose for the same individual. [sent-48, score-0.871]

27 We estimate aligned correspondences between the triangular faces of the input pose meshes Yj , and the reference meshes Xb , using a recently developed method [6]. [sent-49, score-1.17]

28 This approach robustly handles 3D data capturing varying shapes and poses, and outputs meshes which have equal numbers of faces in one-to-one alignment. [sent-50, score-0.499]

29 Our segmentation model does not depend on the details of this alignment method, and could be applied to data produced by other correspondence algorithms. [sent-51, score-0.235]

30 By placing prior probability mass on partitions with arbitrary numbers of parts, it allows data-driven inference of the true number of mostly-rigid parts underlying the observed data. [sent-54, score-0.276]

31 In addition, by choosing an appropriate distance function we can encourage spatially adjacent triangles to lie in the same part, and guarantee that all inferred parts are spatially contiguous [5]. [sent-55, score-0.554]

32 The Chinese restaurant process (CRP) is a distribution on all possible partitions of a set of objects (in our case, mesh triangles). [sent-56, score-0.673]

33 2 Figure 2: Left: A reference mesh in which links (yellow arrows) currently define three parts (connected components). [sent-60, score-0.939]

34 Although described sequentially, the CRP induces an exchangeable distribution on partitions, for which the segmentation probability is invariant to the order in which triangle allocations are sampled. [sent-63, score-0.236]

35 This is inappropriate for mesh data, in which nearby triangles are far more likely to lie in the same part. [sent-64, score-0.645]

36 We define the distance between two triangles as the minimal number of hops, between adjacent faces, required to reach one triangle from the other. [sent-70, score-0.195]

37 However, it does guarantee that only spatially contiguous parts have non-zero probability under the prior. [sent-73, score-0.314]

38 2 Modeling Part Deformation via Affine Transformations Articulated object deformation is naturally described via the spatial transformations of its constituent parts. [sent-76, score-0.309]

39 We expect the triangular faces within a part to deform according to a coherent part-specific transformation, up to independent face-specific noise. [sent-77, score-0.285]

40 We concisely denote the transformation from a reference triangle to an observed triangle via a matrix A ∈ R3×4 . [sent-79, score-0.283]

41 The fourth column of A encodes translation of the corresponding reference triangle via homogeneous coordinates xbn , and the other entries encode rotation, scaling, and shearing. [sent-80, score-0.225]

42 Our construction allows transformations to be analytically marginalized when learning our part-based segmentation, but retains the flexibility to later estimate transformations if desired. [sent-83, score-0.236]

43 Applied to mesh data, these parameters have physical interpretations and can be estimated from the data collection process. [sent-88, score-0.573]

44 Allocating a different affine transformation for the motion of each part in each pose (Figure 2), the overall generative model can be summarized as follows: 1. [sent-90, score-0.219]

45 For each triangle n, sample an associated link cn ∼ ddCRP (α, f, D). [sent-91, score-0.216]

46 For each pose j of each part k, sample an affine transformation Ajk and residual noise covariance Σjk from the matrix normal-inverse-Wishart prior of Equation (2). [sent-97, score-0.216]

47 Given these pose-specific affine transformations and assignments of mesh faces to parts, independently sample the observed location of each pose triangle relative to its corresponding reference triangle, yjn ∼ N (Ajzn xbjn , Σjzn ). [sent-99, score-1.166]

48 Note that Σjk governs the degree of non-rigid deformation of part k in pose j. [sent-100, score-0.205]

49 There is a single reference mesh Xb for each object instance b, and Yj captures a single deformed pose of Xbj . [sent-112, score-0.817]

50 3 Previous Work Previous work has also sought to segment a mesh into parts based on observed articulations [8, 12, 13, 14]. [sent-114, score-0.813]

51 Several other segmentation procedures [12, 14] lack coherent probabilistic models, and thus have difficulty quantifying uncertainty and determining appropriate segmentation resolutions. [sent-117, score-0.365]

52 [8] define a global probabilistic model, and use the EM algorithm to jointly estimate parts and their transformations. [sent-119, score-0.214]

53 They explicitly model spatial dependencies among mesh faces, but their Markov random field cannot ensure that parts are spatially connected; a separate connected components process is required. [sent-120, score-0.923]

54 Ambitious recent work has considered a model for joint mesh alignment and segmentation [9]. [sent-122, score-0.778]

55 However, this approach suffers from many of the issues noted above: the number of parts must be specified a priori, parts may not be contiguous, and their EM inference appears prone to local optima. [sent-123, score-0.428]

56 3 Inference We seek the constituent parts of an articulated model, given observed data (X, Y, and b). [sent-124, score-0.383]

57 These parts are characterized by the posterior distribution of the customer links c. [sent-125, score-0.308]

58 (4) Here, z(c) is the clustering into parts defined by the customer links c. [sent-127, score-0.337]

59 (8) Instead of explicitly sampling from Equation (4), a more efficient sampler [4] can be derived by observing that different realizations of the link cn only make a small change to the partition structure. [sent-130, score-0.208]

60 Sampling new (new) realizations of cn will give rise to new partitions z(c−n ∪ cn ), which may either be identical to z(c−n ) or contain one less part, due to a merge of two existing parts. [sent-132, score-0.289]

61 Note that if the mesh segmentation c is the only quantity of interest, the analytically marginalized affine transformations Ajk need not be directly estimated. [sent-135, score-0.863]

62 Because “ground truth” parts are unavailable for the real body pose datasets of primary interest, we propose an alternative evaluation metric based on the prediction of held-out object poses, and show that the mesh-ddcrp performs favorably against competing approaches. [sent-139, score-0.404]

63 For quantitative tests, we employ 12 meshes of each of six different female subjects [15] (Figure 4). [sent-141, score-0.435]

64 For each subject, a mesh in a canonical pose is chosen as the reference mesh (Figure 1). [sent-142, score-1.346]

65 1 Hyperparameter Specification and MCMC Learning The hyperparameters that regularize our mesh-ddcrp prior have intuitive interpretations, and can be specified based on properties of the mesh data under consideration. [sent-145, score-0.603]

66 The mean affine transformation M is set to 5 the identity transformation, because on average we expect mesh faces to undergo small deformations. [sent-150, score-0.773]

67 The expected part variance S0 captures the degree of non-rigidity which we expect parts to demonstrate, as well as noise from the mesh alignment process. [sent-152, score-0.855]

68 The correspondence error in our human meshes is approximately 0. [sent-153, score-0.417]

69 Our settings make this nearly identity for most components, but the translation components of A have variance which is an order of magnitude larger, so that the expected scale of the translation parameters matches that of the mesh coordinates. [sent-159, score-0.573]

70 The first is a modified agglomerative clustering technique [16] which enforces spatial contiguity of the faces within each part. [sent-164, score-0.256]

71 Adjacent parts on the mesh are then merged based on the squared error in describing their motion by affine transformations. [sent-166, score-0.82]

72 Only adjacent parts are considered in these merge steps, so that parts remain spatially connected. [sent-167, score-0.533]

73 Our second baseline is based on a publicly available implementation of spectral clustering methods [17], a popular approach which has been previously used for mesh segmentation [18]. [sent-168, score-0.799]

74 The affinity √ σ + m between two mesh faces u, v is defined as Cuv = exp{− uv S 2 uv }, where muv = J12 j δuvj , 1 ¯ 2 δuvj is the Euclidean distance between u and v in pose j, σuv = j (δuvj − δuv ) is the J √ 1 corresponding standard deviation, and S = M u,v σuv + muv for all M pairs of faces u, v. [sent-170, score-1.099]

75 For the agglomerative and spectral clustering approaches, the number of parts must be externally specified; we experimented with K = 5, 10, 15, 20, 25, 30 parts. [sent-171, score-0.328]

76 We also consider a Bayesian nonparametric baseline which replaces the ddCRP prior over mesh partitions with a standard CRP prior. [sent-172, score-0.635]

77 The resulting mesh-crp model may estimate the number of parts, but doesn’t model mesh structure or enforce part contiguity. [sent-173, score-0.608]

78 The expected number of parts under the CRP prior is roughly α log N ; we set α = 2 so that the expected number of mesh-crp parts is similar to the number of parts discovered by the mesh-ddcrp. [sent-174, score-0.672]

79 These meshes contain about 31,000 and 38,000 triangular faces, respectively. [sent-179, score-0.437]

80 Figure 3 displays the segmentations of the Tosca meshes inferred by mesh-ddcrp. [sent-180, score-0.578]

81 The inferred parts largely correspond to groups of mesh faces which undergo similar transformations. [sent-181, score-0.971]

82 Figure 4 displays the results produced by the ddCRP, as well as our baseline methods, on the human mesh data. [sent-182, score-0.681]

83 Note that in addition to capturing the head and limbs, the segmentation successfully segregates distinctly moving small regions such as knees, elbows, shoulders, biceps, and triceps. [sent-184, score-0.198]

84 In all, the mesh-ddcrp detects 20 distinctly moving parts for one half of the body. [sent-185, score-0.24]

85 We now introduce a quantitative measure of segmentation quality: segmentations are evaluated by their ability to explain the articulations of test meshes with novel shapes and poses. [sent-186, score-0.702]

86 Given a collection of T test meshes Yt with corresponding reference meshes Xbt , and a candidate segmentation into K parts, we compute E= 1 T T K t=1 k=1 ||Ytk − A∗ Xbt k ||2 . [sent-187, score-1.026]

87 tk 6 (12) Figure 3: Segmentations produced by mesh-ddcrp on synthetic Tosca meshes [20]. [sent-188, score-0.406]

88 The first mesh in each row displays the chosen reference mesh. [sent-189, score-0.712]

89 Note that Equation (12) is trivially zero for a degenerate solution wherein each mesh face is assigned to its own part. [sent-192, score-0.573]

90 Mesh-ddcrp is significantly better than all other methods, including for settings of K which allocate 50% more parts to competing approaches, according to a Wilcoxon’s signed rank test (5% significance level). [sent-197, score-0.214]

91 We selected an illustrative articulated pose for each of the two training subjects in addition to their respective reference poses (Figure 4). [sent-199, score-0.459]

92 The meshes were then segmented both independently for the two subjects and jointly sharing information across subjects. [sent-201, score-0.461]

93 However, sharing information among subjects results in parts which correspond well with physical human bodies. [sent-203, score-0.309]

94 Note that with only two articulated poses, we are able to generate meaningful segmentations in about an hour of computation. [sent-204, score-0.261]

95 This data-limited scenario also demonstrates the benefits of the ddCRP prior: as shown in Figure 5, the parts extracted by mesh-crp are “patchy”, spatially disconnected, and physically implausible. [sent-205, score-0.286]

96 5 Discussion Adapting the ddCRP to collections of 3D meshes, we have developed an effective approach for the discovery an unknown number of parts underlying articulated object motion. [sent-206, score-0.391]

97 Unlike previous methods, our model guarantees that parts are spatially connected, and uses transformations to model instances with potentially varying body shapes. [sent-207, score-0.452]

98 Experiments with dozens of real human body poses provide strong quantitative evidence that our approach produces state-of-the-art segmentations with many potential applications. [sent-210, score-0.338]

99 Each row displays the reference pose, an illustrative articulated pose, mesh-crp and meshddcrp segmentations produced by independently segmenting the pair of poses of each individual, and mesh-crp and mesh-ddcrp segmentations produced by jointly segmenting the chosen poses from both subjects. [sent-227, score-0.876]

100 Groupvalued regularization framework for motion segmentation of dynamic non-rigid shapes. [sent-306, score-0.205]

