iccv iccv2013 iccv2013-118 iccv2013-118-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.
[1] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In CVPR, 2009. 2
[2] L. Carlson-Radvansky, E. Covey, and K. Lattanzi. What effects on Where: Functional influence on spatial relations. Psychol. Sci., 10(6):519–521, 1999. 1
[3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 1, 2, 5
[4] V. Delaitre, D. F. Fouhey, I. Laptev, J. Sivic, A. Gupta, and A. A. Efros. Scene semantics from long-term observation of people. In ECCV, 2012. 1, 2
[5] V. Delaitre, J. Sivic, and I. Laptev. Learning person-object interactions for action recognition in still images. In NIPS, 2011. 2
[6] M. Eichner and V. Ferrari. We are family: Joint pose estimation of multiple persons. In ECCV, 2010. 2
[7] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. 5
[8] A. Farhadi and A. Sadeghi. Recognition using visual phrases. In CVPR, 2011. 2
[9] C. Fellbaum. WordNet: An electronic lexical database, 1998. 1
[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminantly trained partbased models. IEEE T. Pattern Anal. Mach. Intell., 32: 1627– 1645, 2010. 1, 2, 3, 5, 7
[11] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In ICCV, 2003. 1
[12] V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose estimation. In CVPR, 2008. 4, 5
[13] D. F. Fouhey, V. Delaitre, A. Gupta, A. A. Efros, I. Laptev, and J. Sivic. People watching: Human actions as a cue for single view geometry. In ECCV, 2012. 1, 2
[14] E. Gibson. The concept of affordance in development: The
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25] renascence of functionalism. The Concept of development: The innesota Symp. on Child Psychology, 15:55–81, 1982. 1 J. Gibson. The Ecological Approach to Visual Perception. Houghton Mifflin, 1979. 1 H. Grabner, J. Gall, and L. V. Gool. What makes a chair a chair? In CVPR, 2011. 1, 2 A. Gupta and L. Davis. Objects in action: An approach for combining action understanding and object perception. In CVPR, 2007. 1, 2 A. Gupta, A. Kembhavi, and L. Davis. Observing humanobject interactions: Using spatial and functional compatibility for recognition. IEEE T. Pattern Anal. Mach. Intell., 31(10): 1775–1789, 2009. 2 A. Gupta, S. Satkin, A. Efros, and M. Hebert. From 3D scene geometry to human workspace. In CVPR, 2011. 2 H. Kjellstrom, J. Romero, and D. Kragic. Visual objectaction recognition: Inferring object affordances from human demonstration. In CVIU, 2010. 1, 2 M. Meila and J. Shi. Learning segmentation by random walks. In NIPS, 2000. 4 K. Murphy, A. Torralba, and W. Freeman. Using the forest to see the trees: a graphical model relating features, objects and scenes. In NIPS, 2003. 2 L. Oakes and K. Madole. Function revisited: How infants construe functional features in their representation of objects. Adv. Child Dev. Behav., 36:135–185, 2008. 1, 7 J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In ALMC, 1999. 4 A. Prest, C. Schmid, and V. Ferrari. Weakly supervised learning of interactions between humans and objects. IEEE T.
[26]
[27]
[28]
[29]
[30] [3 1]
[32]
[33] Pattern Anal. Mach. Intell., 34(3):601–614, 2012. 2 A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In ICCV, 2007. 2 V. Ramakrishna, T. Kanade, and Y. Sheikh. Reconstructing 3D human pose from 2D image landmarks. In ECCV, 2012. 3 B. Russell, A. Efros, J. Sivic, W. Freeman, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR, 2006. 2 B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput., 10(5): 1299–1319, 1998. 4 Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures of parts. In CVPR, 2011. 2, 3, 5 B. Yao and L. Fei-Fei. Grouplet: A structured image representation for recognizing human and object interactions. In CVPR, 2010. 4 B. Yao and L. Fei-Fei. Action recognition with exemplar based 2.5D graph matching. In ECCV, 2012. 3, 4 B. Yao and L. Fei-Fei. Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE T. Pattern Anal. Mach. Intell., 34(9):1691–1703, 2012. 1, 2, 5 2255 1199