nips nips2011 nips2011-19 nips2011-19-reference knowledge-graph by maker-knowledge-mining

19 nips-2011-Active Classification based on Value of Classifier


Source: pdf

Author: Tianshi Gao, Daphne Koller

Abstract: Modern classification tasks usually involve many class labels and can be informed by a broad range of features. Many of these tasks are tackled by constructing a set of classifiers, which are then applied at test time and then pieced together in a fixed procedure determined in advance or at training time. We present an active classification process at the test time, where each classifier in a large ensemble is viewed as a potential observation that might inform our classification process. Observations are then selected dynamically based on previous observations, using a value-theoretic computation that balances an estimate of the expected classification gain from each observation as well as its computational cost. The expected classification gain is computed using a probabilistic model that uses the outcome from previous observations. This active classification process is applied at test time for each individual test instance, resulting in an efficient instance-specific decision path. We demonstrate the benefit of the active scheme on various real-world datasets, and show that it can achieve comparable or even higher classification accuracy at a fraction of the computational costs of traditional methods.


reference text

[1] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res., 1:113–141, 2001.

[2] A. Angelova, L. Matthies, D. Helmick, and P. Perona. Fast terrain classification using variable-length representation for autonomous navigation. CVPR, 2007.

[3] S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multiclass task. In NIPS, 2010.

[4] L. Breiman. Random forests. In Machine Learning, pages 5–32, 2001.

[5] X. Chai, L. Deng, and Q. Yang. Test-cost sensitive naive bayes classification. In ICDM, 2004.

[6] W. S. Cleveland and S. J. Devlin. Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83:596–610, 1988.

[7] D.A. Cohn, Zoubin Ghahramani, and M.I. Jordan. Active learning with statistical models. CoRR, cs.AI/9603104, 1996.

[8] J. Deng, A.C. Berg, K. Li, and L. Fei-Fei. What does classifying more than 10,000 image categories tell us? In ECCV10, pages V: 71–84, 2010.

[9] T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. J. of A. I. Res., 2:263–286, 1995.

[10] Y. Freud. Boosting a weak learning algorithm by majority. In Computational Learning Theory, 1995.

[11] Jerome H. Friedman. Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford University, 1996.

[12] P.V. Gehler and S. Nowozin. On feature combination for multiclass object classification. In ICCV, 2009.

[13] G. Griffin and P. Perona. Learning and using taxonomies for fast visual categorization. In CVPR, 2008.

[14] V. Guruswami and A. Sahai. Multiclass learning, boosting, and error-correcting codes. In Proc. of the Twelfth Annual Conf. on Computational Learning Theory, 1999.

[15] T. Hastie, R. Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction. 2009.

[16] R. A. Howard. Information value theory. IEEE Trans. on Systems Science and Cybernetics, 1966.

[17] R. A. Howard. Decision analysis: Practice and promise. Management Science, 1988.

[18] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

[19] A. Krause and C. Guestrin. Optimal value of information in graphical models. Journal of Artificial Intelligence Research (JAIR), 35:557–591, 2009.

[20] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

[21] L.-J. Li, H. Su, E.P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS, 2010.

[22] D. V. Lindley. On a Measure of the Information Provided by an Experiment. The Annals of Mathematical Statistics, 27(4):986–1005, 1956.

[23] D.G. Lowe. Object recognition from local scale-invariant features. In ICCV, 1999.

[24] V.S. Mookerjee and M.V. Mannino. Sequential decision models for expert system optimization. IEEE Trans. on Knowledge & Data Engineering, (5):675.

[25] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz. Uci repository of machine learning databases, 1998.

[26] Aude Oliva and Antonio Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 2001.

[27] J.C. Platt, N. Cristianini, and J. Shawe-taylor. Large margin dags for multiclass classification. In NIPS, 2000.

[28] M.J. Saberian and N. Vasconcelos. Boosting classifier cascades. In NIPS, 2010.

[29] Robert E. Schapire. Using output codes to boost multiclass learing problems. In ICML, 1997.

[30] G. A. Schwing, C. Zach, Zheng Y., and M. Pollefeys. Adaptive random forest - how many “experts” to ask before making a decision? In CVPR, 2011.

[31] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009.

[32] P. Viola and M. Jones. Robust Real-time Object Detection. IJCV, 2002.

[33] J.X. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A.B. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.

[34] Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML, pages 412–420, 1997. 9