iccv iccv2013 iccv2013-170 iccv2013-170-reference knowledge-graph by maker-knowledge-mining

170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Source: pdf

Author: Taehwan Kim, Greg Shakhnarovich, Karen Livescu

Abstract: Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain ’s “grammar”. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of fingerspelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of fingerspelled words; here we study the more natural open-vocabulary case, where the only domain knowledge is the possible fingerspelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic handshape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% us- ing the proposed semi-Markov model.

reference text

[1] http : / / re s earch .mi cro s o ft . com/ en-u s / pro j e ct s / s car f / . 6

[2] http : / /www . ke ithv . com/ s o ftware / c s r. 6

[3] http : / / ht k . eng . cam . ac .uk. 6

[4] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. BoostMap: A method for efficient approximate similarity rankings. In CVPR, 2004. 2

[5] R. Bowden, D. Windridge, T. Kadir, A. Zisserman, and M. Brady. A linguistic feature vector for the visual interpretation of sign language. In ECCV, 2004. 2

[6] D. Brentari. A Prosodic Model of Sign Language Phonology. MIT Press, 1998. 1, 3, 4

[7] D. Brentari and C. Padden. Native and foreign vocabulary in American Sign Language: A lexicon with multiple origins. 11552277

[8]

[9]

[10]

[11] In Foreign vocabulary in sign languages: A cross-linguistic investigation of word formation, pages 87–1 19. Lawrence Erlbaum, Mahwah, NJ, 2001 . 1 S.-S. Cho, H.-D. Yang, and S.-W. Lee. Sign language spotting based on semi-Markov conditional random field. In Workshop on the Applications of Computer Vision, 2009. 2 N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 5 L. Ding and A. M. Mart ı´nez. Modelling and recognition of the linguistic components in American Sign Language. Image Vision Comput., 27(12), 2009. 2 P. Dreuw, D. Rybach, T. Deselaers, M. Zahedi, , and H. Ney. Speech recognition techniques for a sign language recognition system. In Interspeech, 2007. 2

[12] T. V. Duong, H. H. Bui, D. Q. Phung, and S. Venkatesh. Activity recognition and abnormality detection with the switching hidden semi-Markov model. In CVPR, 2005. 2

[13] R. S. Feris, M. Turk, R. Raskar, K. Tan, and G. Ohashi. Exploiting depth discontinuities for vision-based fingerspelling recognition. In IEEE Workshop on Real-Time Vision for Human-Computer Interaction, 2004. 2

[14] P. Goh and E.-J. Holden. Dynamic fingerspelling recognition using geometric and motion features. In ICIP, 2006. 1, 2

[15] K. Grobel and M. Assan. Isolated sign language recognition using hidden Markov models. In International Conference on System Man and Cybernetics, 1997. 2

[16] K. Grobel and H. Hienz. Video-based recognition of fingerspelling in real-time. In Workshops Bildverarbeitung f u¨r die Medizin, 1996. 2

[17] M. W. Kadous. Machine recognition of Auslan signs using powergloves: towards large lexicon integration of sign language. In Workshop on the Integration of Gesture in Language and Speech, 1996. 2

[18] J. Keane, D. Brentari, and J. Riggle. Coarticulation in ASL fingerspelling. In NELS, 2012. 4

[19] T. Kim, K. Livescu, and G. Shakhnarovich. American Sign Language fingerspelling recognition with phonological feature-based tandem models. In IEEE Workshop on Spoken Language Technology, 2012. 1, 2, 3, 4, 5, 7

[20] S. Liwicki and M. Everingham. Automatic recognition of fingerspelled words in British Sign Language. In 2nd IEEE Workshop on CVPR for Human Communicative Behavior Analysis, 2009. 1, 2, 5

[21] C. Oz and M. C. Leu. Recognition of finger spelling of American Sign Language with artificial neural network us-

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] ing position/orientation sensors and data glove. In 2nd international conference on Advances in Neural Networks, 2005. 2 C. Padden and D. C. Gunsauls. How the alphabet came to be used in a sign language. Sign Language Studies, page 4: 1033, 2004. 1 V. Pitsikalis, S. Theodorakis, C. Vogler, and P. Maragos. Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition. In IEEE CVPR Workshop on Gesture Recognition, 2011. 2 N. Pugeault and R. Bowden. Spelling it out: Real-time ASL fingerspelling recognition. In ICCV, 2011. 2 http : / /www . i s i c .be rke ley .edu / Spee ch / qn . html . 5 S. Ricco and C. Tomasi. Fingerspelling recognition through classification of letter-to-letter transitions. In ACCV, 2009. 1, 2 A. Roussos, S. Theodorakis, V. Pitsikalis, and P. Maragos. Affine-invariant modeling of shape-appearance images applied on sign language handshape classification. In ICIP, 2010. 2 S. Sarawagi and W. W. Cohen. Semi-Markov conditional random fields for information extraction. In NIPS, 2004. 1, 2, 3 Q. Shi, L. Cheng, L. Wang, and A. Smola. Human action segmentation and recognition using discriminative semiMarkov models. International Journal of Computer Vision, 93(1), 2011. 2 T. Starner, J. Weaver, and A. Pentland. Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Transactions on Pattern Analysis

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41] and Machine Intelligence, 20(12), 1998. 2 A. Stolcke, J. Zheng, W. Wang, and V. Abrash. SRILM at sixteen: update and outlook. In ASRU, 2011. 6 A. Thangali, J. P. Nash, S. Sclaroff, and C. Neidle. Exploiting phonological constraints for handshape inference in ASL video. In CVPR, 2011. 2 S. Theodorakis, V. Pitsikalis, and P. Maragos. Model-level data-driven sub-units for signs in videos of continuous sign language. In ICASSP, 2010. 2 P. Viola and M. J. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001 . 5 C. Vogler and D. Metaxas. Parallel hidden Markov models for American Sign Language recognition. In ICCV, 1999. 2 C. Vogler and D. Metaxas. Toward scalability in ASL recognition: Breaking down signs into phonemes. In Gesture Workshop, 1999. 2 C. Vogler and D. Metaxas. A framework for recognizing the simultaneous aspects of American Sign Language. Computer Vision and Image Understanding, 81:358–384, 2001 . 2 C. Vogler and D. Metaxas. Handshapes and movements: Multiple-channel ASL recognition. In Gesture Workshop, 2003. 2 R. Wang and J. Popovic. Real-time hand-tracking with a color glove. In SIGGRAPH, 2009. 2 R. Yang and S. Sarkar. Detecting coarticulation in sign language using conditional random fields. In ICPR, 2006. 2 G. Zweig and P. Nguyen. Segmental CRF approach to large vocabulary continuous speech recognition. In ASRU, 2009. 1, 2, 3 11552288