cvpr cvpr2013 cvpr2013-123 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos
Abstract: The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. We propose that a fundamental concept in understanding such actions, are the consequences of actions. There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. In this paper a technique is developed to recognize these action consequences. At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. We provide a new dataset, called Manipulation Action Consequences (MAC 1.0), which can serve as testbed for other studies on this topic. Several ex- periments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. Quantitative tests prove the effectiveness and efficiency of the method.
Reference: text
sentIndex sentText sentNum sentScore
1 edu yianni s @ c s umd Computer Vision Lab, University of Maryland, College Park, MD 20742, USA Abstract The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. [sent-7, score-0.626]
2 While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. [sent-8, score-0.44]
3 We propose that a fundamental concept in understanding such actions, are the consequences of actions. [sent-9, score-0.523]
4 There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. [sent-10, score-1.142]
5 In this paper a technique is developed to recognize these action consequences. [sent-11, score-0.462]
6 At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. [sent-12, score-0.533]
7 These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. [sent-13, score-0.575]
8 0), which can serve as testbed for other studies on this topic. [sent-15, score-0.06]
9 Several ex- periments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. [sent-16, score-0.08]
10 Introduction Visual recognition is the process through which intelligent agents associate a visual observation to a concept from their memory. [sent-19, score-0.158]
11 In most cases, the concept either corresponds to a term in natural language, or an explicit definition in natural language. [sent-20, score-0.049]
12 Most research in Computer Vision has focused on two concepts: objects and actions; humans, faces and scenes can be regarded as special cases of objects. [sent-21, score-0.042]
13 Object and action recognition are indeed crucial since they are the fundamental building blocks for an intelligent agent to semantically understand its observations. [sent-22, score-0.672]
14 When it comes to understanding actions ofmanipulation, the movement of the body (especially the hands) is not a very good characteristic feature. [sent-23, score-0.422]
15 There is great variability in the way humans carry out such actions. [sent-24, score-0.07]
16 It has been realized that such actions are better described by involving a number of quantities. [sent-25, score-0.254]
17 Besides the motion trajectories, the objects involved, the hand pose, and the spatial relations between the body and the objects under influence, provide information about the action. [sent-26, score-0.084]
18 In this work we want to bring attention to another concept, the action consequence. [sent-27, score-0.489]
19 It describes the transformation of the object during the manipulation. [sent-28, score-0.035]
20 For example during a CUT or a SPLIT action an object is divided into segments, during a GLUE or a MERGE action two objects are combined into one, etc. [sent-29, score-0.869]
21 The recognition and understanding of human manipulation actions recently has attracted the attention of Computer Vision and Robotics researchers because of their critical role in human behavior analysis. [sent-30, score-0.749]
22 Moreover, they naturally relate to both, the movement involved in the action and the objects. [sent-31, score-0.566]
23 However, so far researchers have not considered that the most crucial cue in describing manipulation actions is actually not the movement nor the specific object under influence, but the object centric action consequence. [sent-32, score-1.314]
24 We can come up with examples, where two actions involve the same tool and same object under influence, and the motions of the hands are similar, for example in “cutting a piece of meat” vs. [sent-33, score-0.355]
25 In such cases, the action consequence is the key in differentiating the actions. [sent-36, score-0.491]
26 Thus, to fully understand manipulation actions, the intelligent system should be able to determine the object centric consequences. [sent-37, score-0.565]
27 Few researchers have addressed the problem of action consequences due to the difficulties involved. [sent-38, score-0.886]
28 The main challenge comes from the monitoring process, which calls for the ability to continuously check the topological and appearance changes of the object-under-manipulation. [sent-39, score-0.307]
29 In this paper, for the first time, a system 222555666311 is implemented to conquer these difficulties and eventually achieve robust action consequence detection. [sent-41, score-0.609]
30 Why Consequences and Fundamental Types Recognizing human actions has been an active research area in Computer Vision [10]. [sent-43, score-0.403]
31 Several excellent surveys on the topic of visual recognition are available ([21], [29]). [sent-44, score-0.039]
32 Most work on visual action analysis has been devoted to the study of movement and change of posture, such as walking, running etc. [sent-45, score-0.565]
33 The dominant approaches to the recognition of single actions compute as descriptors statistics of spatio-temporal interest points ([16], [3 1]) and flow in video volumes, or represent short actions by stacks of silhouettes ([4], [34]). [sent-46, score-0.508]
34 Approaches to more complex, longer actions employ parametric approaches, such as Hidden Markov Models [13], Linear Dynamical Systems [26] or Non-linear Dynamical Systems [7], which are defined on extracted features. [sent-47, score-0.254]
35 There are a few recent studies on human manipulation actions ([30], [14], [27]), but they do not consider action consequences for the interpretation of manipulation actions. [sent-48, score-1.721]
36 Works like [33] emphasize the role of object perception in action or pose recognition, but they focus on object labels, not object-centric consequences. [sent-49, score-0.466]
37 How do humans understand, recognize, and even replicate manipulation actions? [sent-50, score-0.343]
38 ) have pointed out the importance of manipulation action consequences for both understanding human cognition and intelligent system research. [sent-52, score-1.245]
39 When we perform an action, we always have a goal in mind, and the goal affects the action. [sent-54, score-0.082]
40 Similarly, when we try to recognize an action, we also keep a goal in mind. [sent-55, score-0.107]
41 The close relation between the movement during the action and goal is reflected also in language. [sent-56, score-0.567]
42 For example, the word “CUT” denotes both the action in which hands move up and down or in and out with sharp bladed tools, and the consequence of the action, namely that the object is separated. [sent-57, score-0.592]
43 Very often, we can recognize an action purely by the goal satisfaction, and even neglect the motion or the tools used. [sent-58, score-0.569]
44 For example, we may observe a human carry out movement with a knife, that is ”up and down”, but if the object remains as one whole, we won’t draw the conclusion that a “CUT” action has been performed. [sent-59, score-0.598]
45 Only when the goal of the recognition process, here “DIVIDE”, is detected, the goal satisfaction is reached and a “CUT” action is confirmed. [sent-60, score-0.543]
46 An intelligent system should have the ability to detect the consequences of manipulation actions, in order to check the goal of actions. [sent-61, score-0.852]
47 In addition, experiments conducted in neuronscience [25] show that a monkey’s mirror neuron system fires when a hand/object interaction is observed, and it will not fire when a similar movement is observed without hand/object interaction. [sent-62, score-0.278]
48 Recent experiments [9] further showed that the mirror neuron regions responding to the sight of actions responded more during the observation of goal-directed actions than similar movements not directed at goals. [sent-63, score-0.653]
49 These evidences support the idea of goal matching, as well as the crucial role of action consequence in the understanding of manipulation actions. [sent-64, score-0.915]
50 Taking an object-centric point of view, manipulation actions can be classified into six categories according how the object is transformed during the manipulation, or in other words what consequence the action has on the object. [sent-65, score-1.127]
51 These categories are: DIVIDE, ASSEMBLE, CREATE, CONSUME, TRANSFER, and DEFORM. [sent-66, score-0.037]
52 DToE FdOesRcrMib:e a tnhe osbej eacctti hoans categories we en ceehad a feo. [sent-68, score-0.037]
53 We use the visual semantic graph (VSG) inspired from the work of Aksoy et. [sent-70, score-0.078]
54 This formalism takes as input computed object segments, their spatial relationship, and temporal relationship over consecutive frames. [sent-72, score-0.035]
55 To provide the symbols for the VSG, an active monitoring process (discussed in sec. [sent-73, score-0.176]
56 4) is required for the purpose of (1) tracking the object to obtain temporal correspondence, and (2) segmenting the object to obtain its topological structure and appearance model. [sent-74, score-0.313]
57 This active monitoring (consisting of segmentation and tracking) is related to studies on active segmentation [20], and stochastic tracking ([11] etc. [sent-75, score-0.61]
58 Visual Semantic Graph (VSG) To define object-centric action consequences, a graph representation is used. [sent-78, score-0.396]
59 The vertex set |V | represents the set of semantically meaningful segments, t|hVe | edge sseetn t|Es t|h represents mthaen spatial mreleaatnioinngsf ubel tsweegemne any tohfe th edeg tew soe segments. [sent-80, score-0.072]
60 nTwtso t segments are oconnsn beecttwedee wn ahenyn they share parts of their borders, or when one of the segments is contained in the other. [sent-81, score-0.122]
61 IVn aadredition, every node v ∈ V is associated with a set of propertdiietsio Pn(, evv)e, trhya nt oddeesc vri ∈bes V t ihse aaststoricbiuatteesd o wfi tthhe a segment. [sent-83, score-0.035]
62 We need to compute the changes of the object over time. [sent-86, score-0.106]
63 At any time instance t, we consider two consecutive VSGs, the VSG at time t − 1, denoted as Ga(Va, Ea, Pa) aVnSdG Gtshe, tVheSG V SatG Gti amte t itm, dee tno −te 1d, as Gnozt (eVdz a , sE Gz , Pz). [sent-88, score-0.037]
64 We then define the following four consequences, where → is used to ddeefninoete thhee f temporal correspondence e bse,tw weheenre tw →o i sv eursteicde tso, ? [sent-89, score-0.037]
65 d purely on tthioen b(a4-) sis of topological changes, there are no such changes for TRANSFER and DEFORM. [sent-94, score-0.197]
66 Therefore, we have to define them through changes in property. [sent-95, score-0.071]
67 In the following definitions, PL represents properties of location, and PS represents properties of appearance (shape, color, etc. [sent-96, score-0.04]
68 • TRANSFER:{∃v1 ∈ Va; v2 ∈ Vz|PaL(v1) PzL(v2)} TCRonAdNitSioFnE (R5:){ • DEFORM: {∃v1 ∈ Va; v2 ∈ Vz|PaS(v1) PzS(v2)} DCEonFdOitRioMn: ( {6∃) ∈ ∈∈ = = Figure 1: Graphical illustration of the changes for Condition (1-6). [sent-98, score-0.071]
69 A new active segmentation and tracking method is introduced to 1) find correspondences (→) between Va iasn din tVroz;d u2c) mdo tonit 1o)r ilnodcat cioornr property ePsL ( →and) appearance property PS in the VSG. [sent-103, score-0.369]
70 The procedure for computing action consequences, first decides on whether there is a topological change between Ga and Gz. [sent-104, score-0.489]
71 If yes, the system checks whether Condition (1) to Condition (4) are fulfilled and returns the corresponding consequence. [sent-105, score-0.096]
72 If no, the system then checks whether Condition (5) or Condition (6) is fulfilled. [sent-106, score-0.096]
73 If both of them are not met, no consequence is detected. [sent-107, score-0.095]
74 Active Segmentation and Tracking Previously, researchers have treated segmentation and tracking as two different problems. [sent-109, score-0.242]
75 Here we propose a new method combining the two tasks to obtain the information necessary to monitor the objects under influence. [sent-110, score-0.042]
76 Our methods combines stochastic tracking [11] with a fixation based active segmentation [20]. [sent-111, score-0.464]
77 The tracking module provides a number of tracked points. [sent-112, score-0.158]
78 The locations of these points are used to define an area of interest and a fixation point for the segmentation, and the color in their immediate surroundings are used in the data term of the segmentation module. [sent-113, score-0.343]
79 The segmentation module segments the object, and based on the segmentation, updates the appearance model for the tracker. [sent-114, score-0.227]
80 Figure 2: Flow chart of the proposed active segmentation and tracking method for object monitoring. [sent-122, score-0.331]
81 The proposed method meets two challenging requirements, necessary to detect action consequences: 1) the system is able to track and segment objects when the shape or color (appearance) changes; 2) the system is also able to track and segment objects when they are divided into pieces. [sent-123, score-0.692]
82 1 show that our method can handle these requirements, while systems implementing independently tracking and segmentation cannot. [sent-126, score-0.188]
83 The Attention Field The idea underlying our approach is, that first a process of visual attention selects an area of interest. [sent-129, score-0.173]
84 Segmentation then is considered the process that separates the area selected by visual attention from background by finding closed contours that best separate the regions. [sent-130, score-0.173]
85 The minimization uses a color model for the data term and edges in the regularization term. [sent-131, score-0.056]
86 To achieve a minimization that is very robust to the length of the boundary, edges are weighted with their distance from the fixation center. [sent-132, score-0.168]
87 Visual attention, the process of driving an agent’s attention to a certain area, is based on both bottom-up processes defined on low level visual features, and top-down processes influenced by the agent’s previous experience [28]. [sent-133, score-0.132]
88 [32], instead of using a single fixation point in the active segmentation [20], here we use a weighted sample set S = {(s(n) , π(n) ) |n = 1. [sent-135, score-0.354]
89 N} 222555666533 to represent the attention field around the fixation point (N = 500 in practice). [sent-138, score-0.261]
90 Each sample consists of an ele- × mdisecnrtet se fr woemig thhte π se wth oefre tr? [sent-139, score-0.035]
91 ance model can be used to represent the local visual inf? [sent-142, score-0.039]
92 We choose to use a color histogram with a dynamic sampling area defined by an ellipse. [sent-144, score-0.097]
93 To compute the color distribution, every point is represented by an ellipse, s = {x, y, x˙ , y˙ , Hx , Hy, H˙x , H˙y, } where x and y denote the lo{caxt,ioyn,, x˙ x,˙ y˙ ,anHd y˙ the motion,, }H wx,h Hy xth aen length nooft eth teh e h laolfaxes, and H˙x, H˙y the changes in the axes. [sent-145, score-0.162]
94 Color Distribution Model To make the color model invariant to various textures or patterns, a color distribution model is used. [sent-148, score-0.112]
95 A function h(xi) is defined to create a color histogram, which assigns one of the m-bins to a giving color at location xi. [sent-149, score-0.153]
96 To make the algorithm less sensitive to lighting conditions, the HSV color space is used with less sensitivity in the V channel (8 8 4 bins). [sent-150, score-0.056]
97 The color distribution for each fixation point 8s( ×n) 4is computed as: ? [sent-151, score-0.224]
98 the inkt(u||iyti−oxn ||t)hat not all pixels in the sampling region are equally important for describing the color model. [sent-163, score-0.056]
wordName wordTfidf (topN-words)
[('action', 0.396), ('consequences', 0.391), ('manipulation', 0.31), ('actions', 0.254), ('vsg', 0.253), ('fixation', 0.168), ('va', 0.155), ('movement', 0.13), ('vz', 0.12), ('tracking', 0.11), ('active', 0.108), ('assemble', 0.098), ('consequence', 0.095), ('topological', 0.093), ('attention', 0.093), ('endition', 0.084), ('meat', 0.084), ('agent', 0.081), ('umd', 0.081), ('segmentation', 0.078), ('changes', 0.071), ('intelligent', 0.07), ('monitoring', 0.068), ('recognize', 0.066), ('hands', 0.066), ('centric', 0.065), ('mac', 0.065), ('consume', 0.065), ('satisfaction', 0.065), ('condition', 0.064), ('neuron', 0.062), ('hy', 0.062), ('segments', 0.061), ('posture', 0.06), ('studies', 0.06), ('color', 0.056), ('checks', 0.056), ('researchers', 0.054), ('deform', 0.053), ('cut', 0.051), ('divide', 0.05), ('concept', 0.049), ('module', 0.048), ('mirror', 0.046), ('fundamental', 0.045), ('difficulties', 0.045), ('understand', 0.045), ('ga', 0.045), ('objects', 0.042), ('goal', 0.041), ('dynamical', 0.041), ('create', 0.041), ('area', 0.041), ('system', 0.04), ('involved', 0.04), ('appearance', 0.04), ('semantic', 0.039), ('visual', 0.039), ('understanding', 0.038), ('transfer', 0.038), ('track', 0.038), ('pas', 0.037), ('aksoy', 0.037), ('amte', 0.037), ('disappears', 0.037), ('ferm', 0.037), ('manners', 0.037), ('responded', 0.037), ('sseetn', 0.037), ('weheenre', 0.037), ('categories', 0.037), ('language', 0.037), ('carry', 0.037), ('object', 0.035), ('merge', 0.035), ('vri', 0.035), ('zv', 0.035), ('pal', 0.035), ('innt', 0.035), ('nooft', 0.035), ('aloimonos', 0.035), ('bes', 0.035), ('calls', 0.035), ('cvo', 0.035), ('knife', 0.035), ('monkey', 0.035), ('mthaen', 0.035), ('oefre', 0.035), ('twt', 0.035), ('vde', 0.035), ('crucial', 0.035), ('ps', 0.034), ('humans', 0.033), ('purely', 0.033), ('tools', 0.033), ('ivn', 0.033), ('won', 0.033), ('mdo', 0.033), ('monitors', 0.033), ('conquer', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 123 cvpr-2013-Detection of Manipulation Action Consequences (MAC)
Author: Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos
Abstract: The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. We propose that a fundamental concept in understanding such actions, are the consequences of actions. There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. In this paper a technique is developed to recognize these action consequences. At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. We provide a new dataset, called Manipulation Action Consequences (MAC 1.0), which can serve as testbed for other studies on this topic. Several ex- periments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. Quantitative tests prove the effectiveness and efficiency of the method.
2 0.34793872 287 cvpr-2013-Modeling Actions through State Changes
Author: Alireza Fathi, James M. Rehg
Abstract: In this paper we present a model of action based on the change in the state of the environment. Many actions involve similar dynamics and hand-object relationships, but differ in their purpose and meaning. The key to differentiating these actions is the ability to identify how they change the state of objects and materials in the environment. We propose a weakly supervised method for learning the object and material states that are necessary for recognizing daily actions. Once these state detectors are learned, we can apply them to input videos and pool their outputs to detect actions. We further demonstrate that our method can be used to segment discrete actions from a continuous video of an activity. Our results outperform state-of-the-art action recognition and activity segmentation results.
Author: Tsz-Ho Yu, Tae-Kyun Kim, Roberto Cipolla
Abstract: This work addresses the challenging problem of unconstrained 3D human pose estimation (HPE)from a novelperspective. Existing approaches struggle to operate in realistic applications, mainly due to their scene-dependent priors, such as background segmentation and multi-camera network, which restrict their use in unconstrained environments. We therfore present a framework which applies action detection and 2D pose estimation techniques to infer 3D poses in an unconstrained video. Action detection offers spatiotemporal priors to 3D human pose estimation by both recognising and localising actions in space-time. Instead of holistic features, e.g. silhouettes, we leverage the flexibility of deformable part model to detect 2D body parts as a feature to estimate 3D poses. A new unconstrained pose dataset has been collected to justify the feasibility of our method, which demonstrated promising results, significantly outperforming the relevant state-of-the-arts.
4 0.25304088 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
Author: Arpit Jain, Abhinav Gupta, Mikel Rodriguez, Larry S. Davis
Abstract: How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate stateof-the-art performance on UCF50 and Olympics datasets.
5 0.25272188 40 cvpr-2013-An Approach to Pose-Based Action Recognition
Author: Chunyu Wang, Yizhou Wang, Alan L. Yuille
Abstract: We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the “best” one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations ofbodyparts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
6 0.22450329 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
7 0.21742781 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes
8 0.21475494 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition
9 0.21203615 336 cvpr-2013-Poselet Key-Framing: A Model for Human Activity Recognition
10 0.17571141 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots
11 0.17021464 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition
12 0.1641493 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
13 0.16244099 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
14 0.1547467 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
15 0.13332312 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
16 0.12197812 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
17 0.12000628 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
18 0.11970814 196 cvpr-2013-HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences
19 0.11033087 407 cvpr-2013-Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera
20 0.10955994 440 cvpr-2013-Tracking People and Their Objects
topicId topicWeight
[(0, 0.208), (1, -0.066), (2, 0.013), (3, -0.205), (4, -0.291), (5, -0.047), (6, -0.052), (7, 0.033), (8, -0.071), (9, -0.017), (10, 0.023), (11, -0.028), (12, -0.082), (13, 0.064), (14, -0.085), (15, -0.014), (16, 0.027), (17, -0.028), (18, 0.107), (19, 0.219), (20, 0.049), (21, -0.014), (22, -0.004), (23, 0.115), (24, 0.048), (25, -0.047), (26, 0.067), (27, -0.006), (28, 0.002), (29, -0.02), (30, -0.018), (31, -0.019), (32, -0.029), (33, 0.066), (34, -0.009), (35, -0.012), (36, 0.024), (37, 0.1), (38, -0.068), (39, -0.004), (40, -0.02), (41, 0.048), (42, -0.033), (43, 0.052), (44, 0.019), (45, 0.024), (46, -0.03), (47, -0.075), (48, -0.009), (49, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.95532918 123 cvpr-2013-Detection of Manipulation Action Consequences (MAC)
Author: Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos
Abstract: The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. We propose that a fundamental concept in understanding such actions, are the consequences of actions. There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. In this paper a technique is developed to recognize these action consequences. At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. We provide a new dataset, called Manipulation Action Consequences (MAC 1.0), which can serve as testbed for other studies on this topic. Several ex- periments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. Quantitative tests prove the effectiveness and efficiency of the method.
2 0.89038402 287 cvpr-2013-Modeling Actions through State Changes
Author: Alireza Fathi, James M. Rehg
Abstract: In this paper we present a model of action based on the change in the state of the environment. Many actions involve similar dynamics and hand-object relationships, but differ in their purpose and meaning. The key to differentiating these actions is the ability to identify how they change the state of objects and materials in the environment. We propose a weakly supervised method for learning the object and material states that are necessary for recognizing daily actions. Once these state detectors are learned, we can apply them to input videos and pool their outputs to detect actions. We further demonstrate that our method can be used to segment discrete actions from a continuous video of an activity. Our results outperform state-of-the-art action recognition and activity segmentation results.
3 0.81359476 336 cvpr-2013-Poselet Key-Framing: A Model for Human Activity Recognition
Author: Michalis Raptis, Leonid Sigal
Abstract: In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative keyframes collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of keyframes in a max-margin discriminative framework, where we treat keyframes as latent variables. This allows us to (jointly) learn a set of most discriminative keyframes while also learning the local temporal context between them. Keyframes are encoded using a spatially-localizable poselet-like representation with HoG and BoW components learned from weak annotations; we rely on structured SVM formulation to align our components and minefor hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.
4 0.79187739 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
Author: Yicong Tian, Rahul Sukthankar, Mubarak Shah
Abstract: Deformable part models have achieved impressive performance for object detection, even on difficult image datasets. This paper explores the generalization of deformable part models from 2D images to 3D spatiotemporal volumes to better study their effectiveness for action detection in video. Actions are treated as spatiotemporal patterns and a deformable part model is generated for each action from a collection of examples. For each action model, the most discriminative 3D subvolumes are automatically selected as parts and the spatiotemporal relations between their locations are learned. By focusing on the most distinctive parts of each action, our models adapt to intra-class variation and show robustness to clutter. Extensive experiments on several video datasets demonstrate the strength of spatiotemporal DPMs for classifying and localizing actions.
5 0.68914074 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
Author: Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, Cunzhao Shi
Abstract: In this paper, we propose a novel method for cross-view action recognition via a continuous virtual path which connects the source view and the target view. Each point on this virtual path is a virtual view which is obtained by a linear transformation of the action descriptor. All the virtual views are concatenated into an infinite-dimensional feature to characterize continuous changes from the source to the target view. However, these infinite-dimensional features cannot be used directly. Thus, we propose a virtual view kernel to compute the value of similarity between two infinite-dimensional features, which can be readily used to construct any kernelized classifiers. In addition, there are a lot of unlabeled samples from the target view, which can be utilized to improve the performance of classifiers. Thus, we present a constraint strategy to explore the information contained in the unlabeled samples. The rationality behind the constraint is that any action video belongs to only one class. Our method is verified on the IXMAS dataset, and the experimental results demonstrate that our method achieves better performance than the state-of-the-art methods.
6 0.67727435 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition
7 0.64953566 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition
8 0.64373463 40 cvpr-2013-An Approach to Pose-Based Action Recognition
9 0.6394186 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
10 0.61465412 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes
11 0.60979903 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
12 0.59888017 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition
13 0.59176481 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest
14 0.58637714 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition
15 0.54435849 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots
16 0.48255858 32 cvpr-2013-Action Recognition by Hierarchical Sequence Summarization
17 0.45700997 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
18 0.45410022 440 cvpr-2013-Tracking People and Their Objects
19 0.44458058 196 cvpr-2013-HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences
20 0.43236789 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
topicId topicWeight
[(10, 0.125), (16, 0.04), (22, 0.225), (26, 0.034), (33, 0.291), (67, 0.095), (69, 0.048), (87, 0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.88639259 123 cvpr-2013-Detection of Manipulation Action Consequences (MAC)
Author: Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos
Abstract: The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. We propose that a fundamental concept in understanding such actions, are the consequences of actions. There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. In this paper a technique is developed to recognize these action consequences. At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. We provide a new dataset, called Manipulation Action Consequences (MAC 1.0), which can serve as testbed for other studies on this topic. Several ex- periments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. Quantitative tests prove the effectiveness and efficiency of the method.
2 0.87423414 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
Author: Yang Liu, Jing Liu, Zechao Li, Jinhui Tang, Hanqing Lu
Abstract: In this paper, we propose a novel Weakly-Supervised Dual Clustering (WSDC) approach for image semantic segmentation with image-level labels, i.e., collaboratively performing image segmentation and tag alignment with those regions. The proposed approach is motivated from the observation that superpixels belonging to an object class usually exist across multiple images and hence can be gathered via the idea of clustering. In WSDC, spectral clustering is adopted to cluster the superpixels obtained from a set of over-segmented images. At the same time, a linear transformation between features and labels as a kind of discriminative clustering is learned to select the discriminative features among different classes. The both clustering outputs should be consistent as much as possible. Besides, weakly-supervised constraints from image-level labels are imposed to restrict the labeling of superpixels. Finally, the non-convex and non-smooth objective function are efficiently optimized using an iterative CCCP procedure. Extensive experiments conducted on MSRC andLabelMe datasets demonstrate the encouraging performance of our method in comparison with some state-of-the-arts.
3 0.86153579 143 cvpr-2013-Efficient Large-Scale Structured Learning
Author: Steve Branson, Oscar Beijbom, Serge Belongie
Abstract: unkown-abstract
4 0.83526552 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
Author: Sven Wanner, Christoph Straehle, Bastian Goldluecke
Abstract: Wepresent thefirst variationalframeworkfor multi-label segmentation on the ray space of 4D light fields. For traditional segmentation of single images, , features need to be extractedfrom the 2Dprojection ofa three-dimensional scene. The associated loss of geometry information can cause severe problems, for example if different objects have a very similar visual appearance. In this work, we show that using a light field instead of an image not only enables to train classifiers which can overcome many of these problems, but also provides an optimal data structure for label optimization by implicitly providing scene geometry information. It is thus possible to consistently optimize label assignment over all views simultaneously. As a further contribution, we make all light fields available online with complete depth and segmentation ground truth data where available, and thus establish the first benchmark data set for light field analysis to facilitate competitive further development of algorithms.
5 0.83137733 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
Author: Feng Lu, Yasuyuki Matsushita, Imari Sato, Takahiro Okabe, Yoichi Sato
Abstract: We propose an uncalibrated photometric stereo method that works with general and unknown isotropic reflectances. Our method uses a pixel intensity profile, which is a sequence of radiance intensities recorded at a pixel across multi-illuminance images. We show that for general isotropic materials, the geodesic distance between intensity profiles is linearly related to the angular difference of their surface normals, and that the intensity distribution of an intensity profile conveys information about the reflectance properties, when the intensity profile is obtained under uniformly distributed directional lightings. Based on these observations, we show that surface normals can be estimated up to a convex/concave ambiguity. A solution method based on matrix decomposition with missing data is developed for a reliable estimation. Quantitative and qualitative evaluations of our method are performed using both synthetic and real-world scenes.
6 0.82651919 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
7 0.82647008 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
8 0.82536358 325 cvpr-2013-Part Discovery from Partial Correspondence
10 0.8252722 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
11 0.8250891 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
12 0.82505852 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
13 0.82501739 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
14 0.82500035 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
15 0.8245672 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
16 0.82401365 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
17 0.82365209 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints
18 0.82356977 414 cvpr-2013-Structure Preserving Object Tracking
19 0.82258886 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
20 0.82257414 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video