nips nips2010 nips2010-239 nips2010-239-reference knowledge-graph by maker-knowledge-mining

239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades


Source: pdf

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1


reference text

[1] L. Sigal, S. Bhatia, S. Roth, M.J. Black, and M. Isard. Tracking loose-limbed people. In Proc. CVPR, 2004.

[2] B. Wu and R. Nevatia. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. IJCV, 75(2):247–266, 2007.

[3] J.D.J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81(1), January 2009.

[4] D. Weiss and B. Taskar. Structured prediction cascades. In Proc. AISTATS, 2010.

[5] N. Komodakis, N. Paragios, and G. Tziritas. MRF optimization via dual decomposition: Message-passing revisited. In Proc. ICCV, 2007.

[6] B. Sapp, A. Toshev, and B. Taskar. Cascaded models for articulated pose estimation. In Proc. ECCV, 2010.

[7] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, second edition, 1999.

[8] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009.

[9] V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose estimation. In Proc. CVPR, 2008.

[10] M. Andriluka, S. Roth, and B. Schiele. People-tracking-by-detection and people-detection-by-tracking. In Proc. CVPR, 2008.

[11] S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool. Youll Never Walk Alone: Modeling Social Behavior for Multi-target Tracking. In Proc. ICCV, 2009.

[12] L. Kratz and K. Nishino. Tracking with Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes. In Proc. CVPR, 2010.

[13] R. Mu˜ oz-Salinas, E. Aguirre, and M. Garc´a-Silvente. People detection and tracking using stereo vision n ı and color. Image and Vision Computing, 25(6):995–1007, 2007.

[14] J. S. Kwon and K. M. Lee. Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping monte carlo sampling. In Proc. CVPR, 2009.

[15] B. Sapp, C. Jordan, and B. Taskar. Adaptive pose priors for pictorial structures. In Proc. CVPR, 2010.

[16] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In Proc. CVPR, 2009. 9