iccv iccv2013 iccv2013-344 knowledge-graph by maker-knowledge-mining

344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling


Source: pdf

Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang

Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. [sent-15, score-0.584]

2 To overcome this limitation, a novel exemplar based approach is proposed in this work. [sent-18, score-0.382]

3 Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. [sent-19, score-0.634]

4 A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. [sent-20, score-0.517]

5 A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. [sent-21, score-0.734]

6 Existing approaches focus on modelling the co-occurrence or spatial relationship between human and the manipulated object. [sent-26, score-0.455]

7 The co-occurrence relationship, for example, can be modelled by a mutual context model that joins object detection and human pose estima∗corresponding author Figure 1. [sent-27, score-0.299]

8 Columns 1-4 show four images represented by the same atomic pose. [sent-30, score-0.322]

9 Column 5 shows manipulated objects locations overlapped with corresponding atomic pose. [sent-31, score-0.538]

10 the relative position and overlap between a human and objects that join human detection or annotation and object detection together [13, 1, 18, 4]. [sent-39, score-0.291]

11 In particular, [11] presents a method for categorising manipulated objects and tracking 3D articulated hand pose in context of each other in order to figure out the interactions between human and interacting objects. [sent-42, score-0.471]

12 33 113447 However, most of the existing HOI modelling approaches rely heavily on explicit human pose estimation [23] or directly using locations of human and objects as HOI representation [13, 1, 18]. [sent-46, score-0.412]

13 Specifically, for the methods that represent action using the spatial relationship between human and object, person and object detections are critical [13, 1, 18]; whilst for those based on the co-occurrence modelling, accurate human pose estimation is crucial [23]. [sent-47, score-0.525]

14 Nevertheless, the problem of detecting objects, especially those small-size objects such as badminton and tennis ball is far from being solved; the problem of estimating human pose under occlusion and large pose variations also remains unsolved. [sent-48, score-0.396]

15 In this paper, we overcome this limitation by proposing a model for learning a set of exemplars to representing human object interaction. [sent-50, score-0.319]

16 Exploring spatial pose-object interaction exemplar is motivated by the observation that for a human activity of similar human poses, the manipulated objects, if there is any, would appear at similar relative positions, i. [sent-51, score-1.102]

17 relative to a reference point, such as torso centre of human (see examples in column 5 of Fig. [sent-53, score-0.275]

18 Therefore, the configuration of pose and object can be viewed as an exem- plar for describing the action where interaction between human and object happens. [sent-55, score-0.504]

19 This type of exemplars is termed as spatial pose-object interaction exemplar. [sent-56, score-0.454]

20 A spatial pose-object interaction exemplar is mainly represented as a density function that tells how likely an object appears withrespectto an (atomic)pose ataposition around a person. [sent-57, score-0.71]

21 Some examples of spatial pose-object interaction exemplars can be found in the 4th column of Fig. [sent-58, score-0.48]

22 By representing HOI as a set of exemplars, the HOI in an image can be represented by measuring the response of different exemplars within image. [sent-61, score-0.317]

23 Due to the probabilistic modelling for the mutual spatial structure information between human and object in our exemplars, one no longer requires accurate detection of the human and object, and the estimation of human pose. [sent-62, score-0.537]

24 Together with the exemplar based HOI descriptor, this provides a robust still-image based human action recognition framework. [sent-64, score-0.512]

25 Despite that exemplar based modelling has been applied to a variety of visual recognition problems including scene recognition [10], object detection [14], pose estimation [15], exemplar based HOI modelling has been mostly unexploited. [sent-65, score-1.165]

26 The use of exemplar in existing work is focused on transferring useful information extracted from meta-data to a new data point. [sent-66, score-0.382]

27 This is very different from our objective, which is to develop an exemplar based representation. [sent-67, score-0.382]

28 More recently, an exemplar approach was exploited for action recognition [22]. [sent-68, score-0.46]

29 However, the purpose of exemplar in [22] is for selecting a set of representative samples for each class, which differs from our notion and design of the exemplar in this work. [sent-69, score-0.764]

30 Approach Our exemplar modelling consists of two parts: 1) a new exemplar based HOI descriptor (Sec. [sent-77, score-0.925]

31 Learning Atomic Poses Instead of explicit human pose estimation, our modelling is based on the use of a set of atomic poses [23] learned from training data. [sent-88, score-0.724]

32 We assume that each pose involved in the activities can be associated to a most similar atomic pose. [sent-90, score-0.465]

33 To derive the atomic poses from annotated training data, we first align all the annotations so that the torsos in all the images have the same position, width and height. [sent-93, score-0.459]

34 , HN} form our dictionary coluf satteorm ceicn poses, t=ha t{ His, each cluster represents an atomic pose. [sent-98, score-0.367]

35 Some examples of atomic poses we derive from a sports dataset are illustrated in Fig. [sent-99, score-0.541]

36 The advantage of using the AP method is that we do not need prior knowledge on the number of atomic poses N, which is determined automatically. [sent-102, score-0.411]

37 Constructing Exemplar Dictionary Given atomic poses, we would like to build a spatial pose-object interaction exemplar dictionary that both encodes and interprets interactions between human and objects. [sent-105, score-1.134]

38 Our idea of exploring interaction exemplar is inspired by the observation that the locations of the manipulated objects are constrained by person’s location, pose and type of activity. [sent-106, score-0.906]

39 All annotated boxes in an image constitute an atomic pose, where different parts are discovered and marked with different colours. [sent-109, score-0.343]

40 Hence, we formulate a distribution function G(x) to describe the likelihood that a manipulated object would appear at location x around a person for a specific spatial pose-object interaction. [sent-115, score-0.406]

41 By utilising the distribution modelling, we are able to describe the interaction between pose and object in a probabilistic way, rather than directly using the label information or precise coordinates of object and person as features for inference. [sent-117, score-0.442]

42 We will compute exemplar for each pair of manipulated object and atomic pose appears in the training set. [sent-118, score-1.043]

43 For the N atomic poses and K objects, we can construct a dictionary of spatial pose-object interaction exemplars Gnk for all atomic poses H and manipulated objects O = {Ok}k=1,2. [sent-120, score-1.537]

44 1 Dictionary Estimation We assume the distribution of each elementary exemplar follows normal distribution with parameters μ and Σ, which are mean vector and covariance matrix, respectively. [sent-132, score-0.405]

45 It is based on the assumption that for each exemplar, object would appear in a similar location relative to a human in an activity, and thus multiple exemplars can be viewed as multi-gaussian distribution for describing the location variation. [sent-133, score-0.375]

46 That is we can formulate density function for an elementary exemplar by G(x) ∝ exp[−(x − μ)TΣ−1(x − μ)] (1) For each training sample Q ∈ Q, we denote its corresponding cahto mtraicin pose as Hplen Qand ∈ it Qs manipulated object as Ok. [sent-134, score-0.744]

47 We aim to learn a measure of the spatial pose-object interaction exemplar G(x) that tells how likely Ok will locate at position x. [sent-135, score-0.666]

48 In order to derive a uniform coordinate frame, we need to normalise human and object configurations, so that their torso centres and widths are fixed as (xto, yto) and wto respectively. [sent-139, score-0.311]

49 X(2) and Y(2) are the x-axis and y-axis of torso centre of the corresponding training data, W(2) indicates width of the torso, (X˜, Y˜, W˜, H˜) is the normalised configuration. [sent-141, score-0.243]

50 We normalise the configurations only using torso width, because samples represented by the same atomic pose would usually have similar relative width-height ratio for each part and object. [sent-142, score-0.609]

51 Now we estimate the Gaussian parameters in th=e spatial pose-object interaction exemplar (Eq. [sent-145, score-0.639]

52 Some examples of the learned spatial pose-object interaction exemplars are visualised in Fig. [sent-158, score-0.454]

53 This figure shows that an atomic pose can interact with two objects or even more, and an object can also interact with multiple atomic poses. [sent-160, score-0.809]

54 However, for each pair of pose and manipulated object, there is only one interaction exemplar to describe the interaction between them. [sent-161, score-1.049]

55 In addition, from this figure, we can observe that the spatial pose-object interaction exemplar can capture some semantic information that tells us how the actor is manipulating the object. [sent-162, score-0.728]

56 Inferring Spatial Pose-Object Interaction Using Exemplars After constructing the exemplar dictionary, we can use the learned dictionary to compute a representation for an HOI activity in a probe image. [sent-165, score-0.649]

57 As aforementioned, the exemplar approach is exploited to avoid estimation of human pose in the probe image and thus nominate the most similar pose information that is contained in our spatial exemplar dictionary for the probe HOI. [sent-166, score-1.44]

58 Based on the nominated atomic poses, the model selects the candidate exemplar in the dictionary and computes the response of probe HOI against exemplar. [sent-167, score-1.006]

59 Finally the model forms a code vector for each probe HOI consisting of all the response of all the ex- emplars in the dictionary. [sent-168, score-0.23]

60 1 Nominating Similar Atomic Poses For each probe HOI, we nominate the most similar atomic poses defined in the spatial exemplar dictionary. [sent-173, score-1.01]

61 For each detected person P in the probe HOI, we first score each training image with Sim(P, Pi), where Sim(P, Pi) is a function that measures the pose similarity between P and Pi, where Pi indicates the person of interest in the ith training image. [sent-174, score-0.346]

62 Note that each person in the training image in our dataset is associated to an atomic pose. [sent-175, score-0.39]

63 Here only 6 parts from upper body are considered for learning the atomic poses, since sometimes only upper body is visible for person of interest. [sent-189, score-0.535]

64 Hence an object detection vector O will be formed for a probe image over all object types. [sent-198, score-0.233]

65 Third, for each object type Ok and the selected atomic pose Hn, we align the exemplar Gnk so that its torso position is (xt, yt) and the width is wt computed by G˜nk(x, y) = Gnk(x/scale+xt0 −xt, y/scale+yt0 −yt) (4) where scale = wt/wt0. [sent-199, score-1.027]

66 G˜nk (x, y) provides a measure of the probability of object Ok appearing at (x, y) in the image given atomic pose Hn. [sent-200, score-0.466]

67 After alignment, we update the detected object location (xo, yo) with respect to G˜nk and compute the cor- × responding semantic spatial interaction response as follows G˜nk(xo, I(n, k) = yo). [sent-203, score-0.449]

68 (5) for each selected candidate atomic poses and each object type. [sent-205, score-0.482]

69 Each entry of this matrix represents of tthriex response w Nit h× respect ctoh t ehnet corresponding xat roempirec pose and object category, where entries corresponding to nonselected atomic pose are zero. [sent-207, score-0.686]

70 A HOI Descriptor The spatial exemplar response vector I described in as Sec. [sent-211, score-0.573]

71 For in better visualization, bars that associate to different manipulated objects are marked with different colors: cricket bat (red), cricket ball (green), croquet mallet (blue), tennis racket (magenta), volleyball (yellow). [sent-223, score-0.847]

72 From the final representation, we can observe that the actor is manipulating a tennis racket or cricket ball. [sent-224, score-0.3]

73 The combination of I, P, and O are indeed necessary because they provide complementary information to each other, where I indicates spatial interaction response and [P; O] indicates appearance interaction response. [sent-226, score-0.563]

74 Our final HOI descriptor can be formulated as follows H = [I; P; O; C] (6) Compared to existing HOI descriptors [23, 1, 5, 9], the proposed one mainly differs in the use of spatial exemplar response I, while the rest three terms are exploited in existing work in a similar way [1, 5, 9]. [sent-231, score-0.649]

75 As in [23], only five object classes, cricket bat, bowling ball, croquet mallet, tennis racket, volleyball, were employed to model and evaluate HOI for action recognition. [sent-287, score-0.319]

76 To train detectors of human, head, torso and upper body, the ground truth bounding boxes in the sports and PPMI were used to generate positive examples, whilst the negative samples were generated from VOC2012. [sent-294, score-0.317]

77 To facilitate reliable detection of person across a variety of poses, we follow [6, 1] and combine detection windows returned by 4 detectors: head detector, torso detector, upper body detector and people detector. [sent-295, score-0.352]

78 In addition, the number of candidate exemplars for computing the exemplar response in Sec. [sent-299, score-0.726]

79 2, namely the parameter S, is set to be 3 for sports dataset and 20 for PPMI, which are almost a quarter of number of the learned atomic poses. [sent-302, score-0.477]

80 They need to use locations of people and human as features [1, 9] or depend on explicit human pose estimation [23, 5]. [sent-311, score-0.28]

81 6 which demonstrates that spatial pose-object interaction exemplars are able to effectively describe how a person is interacting with a manipulated object for different activities. [sent-338, score-0.788]

82 Specifically, for each training image, we annotated manipulated objects and six body parts, including head, torso, left upper arm, left lower arm, right upper arm and, right lower arm. [sent-343, score-0.324]

83 Effects of the Number of Candidate Exemplars We study the effect of different numbers of exemplars S used when nominating atomic poses (Sec. [sent-378, score-0.644]

84 The performance reaches the best when S = 3 on Sports dataset and when S = 20 on the PPMI dataset, which is almost a quarter of the number of atomic poses learned for each dataset and also set as default value in our experiment. [sent-384, score-0.436]

85 Due to this fact, a better performance on PPMI is obtained for a larger S, as more candidate exemplars are needed to describe the spatial pose-object interaction in an image. [sent-388, score-0.481]

86 Effect of Exemplar Modelling We evaluate the effectiveness of exemplar based semantic spatial interaction response by removing the spatial exemplar response vector I from our HOI descriptor and feeding the rest into our matching model. [sent-401, score-1.382]

87 f We also test the influence of our full interaction descriptor (combination of spatial interaction response I and appearance response [P; O] as defined in Sec. [sent-403, score-0.733]

88 These adbeomuotn 8st ∼rat 1e5 t%he oufs ethfeu lpneersfos romf our exemplar modelling. [sent-406, score-0.382]

89 Conclusion and Future work We have proposed to represent human-object interactions using a set of spatial pose-object interaction exemplars and form a new HOI descriptor, where weight parameters for each component are learned by an activity specific ranking model. [sent-408, score-0.616]

90 A key characteristic of our exemplar based approach is that it models the mutual spatial structure between human and object in a probabilistic way, so as to avert explicit human pose estimation and alleviate the effects of faulty detection of object and human. [sent-409, score-0.927]

91 Our experimental results suggest that our exemplar approach outperforms existing related HOI techniques or perform comparable to them for action recognition from still images. [sent-410, score-0.434]

92 On-going work includes further improvement of the exemplar learning. [sent-411, score-0.382]

93 Specially, our approach depends on the use of atomic poses. [sent-412, score-0.322]

94 repairing bike and phoning, it is not easy to mine a set of representative atomic poses from limited data. [sent-415, score-0.411]

95 For each class, image of Column 1shows HOI activity, image of Column 2 shows visual response to a normalised pose-object exemplar ( G˜nk in Eq. [sent-427, score-0.526]

96 ( 4)), image of Column 3 shows the manipulated object (what) and person (who), image of Column 4 is a histogram visual result of the pose-object spatial interaction response ( I Eq. [sent-428, score-0.684]

97 The X-axis and Y-axis of histogram figure are pose-object spatial exemplar index and response value in respectively. [sent-430, score-0.573]

98 Bars that represent different objects are marked with different colours: cricket bat (red), cricket ball (green), croquet mallet (blue), tennis racket (magenta), volleyball (yellow). [sent-433, score-0.652]

99 Arrows with red colour indicate that the exemplar’s manipulated object is consistent with predicted activity type. [sent-434, score-0.351]

100 It illustrates that our exemplar response can provide some semantic information for the activity, which can tell us how the person manipulates the object. [sent-435, score-0.57]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hoi', 0.493), ('exemplar', 0.382), ('atomic', 0.322), ('ppmi', 0.254), ('exemplars', 0.197), ('manipulated', 0.195), ('interaction', 0.186), ('torso', 0.131), ('sports', 0.13), ('cricket', 0.125), ('nk', 0.124), ('response', 0.12), ('activity', 0.112), ('modelling', 0.111), ('probe', 0.11), ('pose', 0.1), ('ok', 0.094), ('gnk', 0.089), ('poses', 0.089), ('volleyball', 0.079), ('human', 0.078), ('phow', 0.072), ('spatial', 0.071), ('person', 0.068), ('racket', 0.059), ('tennis', 0.054), ('mallet', 0.054), ('grouplet', 0.053), ('action', 0.052), ('descriptor', 0.05), ('interactions', 0.05), ('prest', 0.049), ('width', 0.048), ('nnk', 0.048), ('bat', 0.048), ('dictionary', 0.045), ('croquet', 0.044), ('object', 0.044), ('ball', 0.043), ('activities', 0.043), ('yao', 0.042), ('mutual', 0.042), ('centre', 0.04), ('body', 0.04), ('actor', 0.04), ('tpami', 0.036), ('nominate', 0.036), ('nominating', 0.036), ('siaai', 0.036), ('sysu', 0.036), ('tris', 0.036), ('detection', 0.035), ('whilst', 0.034), ('china', 0.034), ('xo', 0.033), ('ai', 0.032), ('guangdong', 0.032), ('reshaped', 0.032), ('hn', 0.031), ('desai', 0.031), ('yt', 0.031), ('yo', 0.03), ('pyramid', 0.03), ('centres', 0.029), ('guangzhou', 0.029), ('faulty', 0.029), ('normalise', 0.029), ('location', 0.028), ('recognising', 0.028), ('perturbation', 0.027), ('interacting', 0.027), ('candidate', 0.027), ('configurations', 0.027), ('tells', 0.027), ('exploited', 0.026), ('xto', 0.026), ('column', 0.026), ('pi', 0.026), ('quarter', 0.025), ('yto', 0.025), ('sim', 0.025), ('explicit', 0.024), ('arm', 0.024), ('delaitre', 0.024), ('normalised', 0.024), ('contextual', 0.023), ('xt', 0.023), ('elementary', 0.023), ('confusion', 0.022), ('manipulating', 0.022), ('utilise', 0.022), ('exploring', 0.022), ('upper', 0.022), ('mai', 0.022), ('parts', 0.021), ('school', 0.021), ('science', 0.021), ('gupta', 0.021), ('detector', 0.021), ('objects', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling

Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang

Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.

2 0.22742985 150 iccv-2013-Exemplar Cut

Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang

Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.

3 0.19233611 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition

Author: Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu

Abstract: Recognizing the events and objects in the video sequence are two challenging tasks due to the complex temporal structures and the large appearance variations. In this paper, we propose a 4D human-object interaction model, where the two tasks jointly boost each other. Our human-object interaction is defined in 4D space: i) the cooccurrence and geometric constraints of human pose and object in 3D space; ii) the sub-events transition and objects coherence in 1D temporal dimension. We represent the structure of events, sub-events and objects in a hierarchical graph. For an input RGB-depth video, we design a dynamic programming beam search algorithm to: i) segment the video, ii) recognize the events, and iii) detect the objects simultaneously. For evaluation, we built a large-scale multiview 3D event dataset which contains 3815 video sequences and 383,036 RGBD frames captured by the Kinect cameras. The experiment results on this dataset show the effectiveness of our method.

4 0.18282272 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?

Author: Yi Yang, Zhigang Ma, Zhongwen Xu, Shuicheng Yan, Alexander G. Hauptmann

Abstract: Compared to visual concepts such as actions, scenes and objects, complex event is a higher level abstraction of longer video sequences. For example, a “marriage proposal” event is described by multiple objects (e.g., ring, faces), scenes (e.g., in a restaurant, outdoor) and actions (e.g., kneeling down). The positive exemplars which exactly convey the precise semantic of an event are hard to obtain. It would be beneficial to utilize the related exemplars for complex event detection. However, the semantic correlations between related exemplars and the target event vary substantially as relatedness assessment is subjective. Two related exemplars can be about completely different events, e.g., in the TRECVID MED dataset, both bicycle riding and equestrianism are labeled as related to “attempting a bike trick” event. To tackle the subjectiveness of human assessment, our algorithm automatically evaluates how positive the related exemplars are for the detection of an event and uses them on an exemplar-specific basis. Experiments demonstrate that our algorithm is able to utilize related exemplars adaptively, and the algorithm gains good perform- z. ance for complex event detection.

5 0.18020852 118 iccv-2013-Discovering Object Functionality

Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei

Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.

6 0.13621932 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation

7 0.12081681 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

8 0.11626393 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

9 0.11122732 127 iccv-2013-Dynamic Pooling for Complex Event Recognition

10 0.10650587 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

11 0.1031408 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

12 0.092975356 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection

13 0.091269441 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions

14 0.087665647 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition

15 0.085144885 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach

16 0.084890053 149 iccv-2013-Exemplar-Based Graph Matching for Robust Facial Landmark Localization

17 0.084880948 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

18 0.083419442 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies

19 0.077339247 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition

20 0.075746238 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, 0.09), (2, 0.034), (3, 0.061), (4, 0.066), (5, -0.068), (6, 0.027), (7, -0.013), (8, -0.065), (9, 0.002), (10, 0.007), (11, -0.006), (12, -0.093), (13, -0.023), (14, -0.105), (15, 0.05), (16, 0.004), (17, -0.032), (18, 0.027), (19, 0.054), (20, 0.092), (21, 0.006), (22, 0.055), (23, 0.0), (24, 0.068), (25, 0.029), (26, -0.056), (27, 0.047), (28, -0.045), (29, -0.113), (30, -0.096), (31, -0.092), (32, 0.094), (33, 0.034), (34, 0.067), (35, 0.017), (36, -0.109), (37, 0.009), (38, -0.049), (39, 0.062), (40, 0.106), (41, -0.037), (42, 0.011), (43, -0.009), (44, 0.04), (45, 0.029), (46, -0.098), (47, -0.03), (48, 0.044), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90275264 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling

Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang

Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.

2 0.6781354 118 iccv-2013-Discovering Object Functionality

Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei

Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.

3 0.60451603 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?

Author: Yi Yang, Zhigang Ma, Zhongwen Xu, Shuicheng Yan, Alexander G. Hauptmann

Abstract: Compared to visual concepts such as actions, scenes and objects, complex event is a higher level abstraction of longer video sequences. For example, a “marriage proposal” event is described by multiple objects (e.g., ring, faces), scenes (e.g., in a restaurant, outdoor) and actions (e.g., kneeling down). The positive exemplars which exactly convey the precise semantic of an event are hard to obtain. It would be beneficial to utilize the related exemplars for complex event detection. However, the semantic correlations between related exemplars and the target event vary substantially as relatedness assessment is subjective. Two related exemplars can be about completely different events, e.g., in the TRECVID MED dataset, both bicycle riding and equestrianism are labeled as related to “attempting a bike trick” event. To tackle the subjectiveness of human assessment, our algorithm automatically evaluates how positive the related exemplars are for the detection of an event and uses them on an exemplar-specific basis. Experiments demonstrate that our algorithm is able to utilize related exemplars adaptively, and the algorithm gains good perform- z. ance for complex event detection.

4 0.60024899 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition

Author: Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu

Abstract: Recognizing the events and objects in the video sequence are two challenging tasks due to the complex temporal structures and the large appearance variations. In this paper, we propose a 4D human-object interaction model, where the two tasks jointly boost each other. Our human-object interaction is defined in 4D space: i) the cooccurrence and geometric constraints of human pose and object in 3D space; ii) the sub-events transition and objects coherence in 1D temporal dimension. We represent the structure of events, sub-events and objects in a hierarchical graph. For an input RGB-depth video, we design a dynamic programming beam search algorithm to: i) segment the video, ii) recognize the events, and iii) detect the objects simultaneously. For evaluation, we built a large-scale multiview 3D event dataset which contains 3815 video sequences and 383,036 RGBD frames captured by the Kinect cameras. The experiment results on this dataset show the effectiveness of our method.

5 0.59336954 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.

6 0.5830273 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

7 0.58215618 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?

8 0.57719195 46 iccv-2013-Allocentric Pose Estimation

9 0.54497463 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation

10 0.5449115 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection

11 0.54386073 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling

12 0.52203882 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

13 0.5186922 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets

14 0.51011187 291 iccv-2013-No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

15 0.51010776 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

16 0.47978184 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies

17 0.46827635 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

18 0.45928481 143 iccv-2013-Estimating Human Pose with Flowing Puppets

19 0.45308626 150 iccv-2013-Exemplar Cut

20 0.44496754 130 iccv-2013-Dynamic Structured Model Selection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.067), (7, 0.029), (12, 0.03), (26, 0.052), (27, 0.011), (31, 0.03), (35, 0.02), (40, 0.013), (42, 0.109), (64, 0.038), (73, 0.022), (76, 0.011), (78, 0.325), (89, 0.146)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79003042 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition

Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji

Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.

same-paper 2 0.77567416 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling

Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang

Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.

3 0.75496352 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations

Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang

Abstract: In multi-label image annotations, because each image is associated to multiple categories, the semantic terms (label classes) are not mutually exclusive. Previous research showed that such label correlations can largely boost the annotation accuracy. However, all existing methods only directly apply the label correlation matrix to enhance the label inference and assignment without further learning the structural information among classes. In this paper, we model the label correlations using the relational graph, and propose a novel graph structured sparse learning model to incorporate the topological constraints of relation graph in multi-label classifications. As a result, our new method will capture and utilize the hidden class structures in relational graph to improve the annotation results. In proposed objective, a large number of structured sparsity-inducing norms are utilized, thus the optimization becomes difficult. To solve this problem, we derive an efficient optimization algorithm with proved convergence. We perform extensive experiments on six multi-label image annotation benchmark data sets. In all empirical results, our new method shows better annotation results than the state-of-the-art approaches.

4 0.74689931 252 iccv-2013-Line Assisted Light Field Triangulation and Stereo Matching

Author: Zhan Yu, Xinqing Guo, Haibing Lin, Andrew Lumsdaine, Jingyi Yu

Abstract: Light fields are image-based representations that use densely sampled rays as a scene description. In this paper, we explore geometric structures of 3D lines in ray space for improving light field triangulation and stereo matching. The triangulation problem aims to fill in the ray space with continuous and non-overlapping simplices anchored at sampled points (rays). Such a triangulation provides a piecewise-linear interpolant useful for light field superresolution. We show that the light field space is largely bilinear due to 3D line segments in the scene, and direct triangulation of these bilinear subspaces leads to large errors. We instead present a simple but effective algorithm to first map bilinear subspaces to line constraints and then apply Constrained Delaunay Triangulation (CDT). Based on our analysis, we further develop a novel line-assisted graphcut (LAGC) algorithm that effectively encodes 3D line constraints into light field stereo matching. Experiments on synthetic and real data show that both our triangulation and LAGC algorithms outperform state-of-the-art solutions in accuracy and visual quality.

5 0.69692886 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding

Author: Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai

Abstract: We present a multi-attributed dictionary learning algorithm for sparse coding. Considering training samples with multiple attributes, a new distance matrix is proposed by jointly incorporating data and attribute similarities. Then, an objective function is presented to learn categorydependent dictionaries that are compact (closeness of dictionary atoms based on data distance and attribute similarity), reconstructive (low reconstruction error with correct dictionary) and label-consistent (encouraging the labels of dictionary atoms to be similar). We have demonstrated our algorithm on action classification and face recognition tasks on several publicly available datasets. Experimental results with improved performance over previous dictionary learning methods are shown to validate the effectiveness of the proposed algorithm.

6 0.68829584 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

7 0.63501585 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks

8 0.59276986 150 iccv-2013-Exemplar Cut

9 0.57918835 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition

10 0.5782665 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

11 0.57346749 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects

12 0.57274318 149 iccv-2013-Exemplar-Based Graph Matching for Robust Facial Landmark Localization

13 0.56842363 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

14 0.56271493 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

15 0.56054699 277 iccv-2013-Multi-channel Correlation Filters

16 0.56047976 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

17 0.5600369 362 iccv-2013-Robust Tucker Tensor Decomposition for Effective Image Representation

18 0.55880654 94 iccv-2013-Correntropy Induced L2 Graph for Robust Subspace Clustering

19 0.5568254 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

20 0.55607188 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation