jmlr jmlr2008 jmlr2008-58 jmlr2008-58-reference knowledge-graph by maker-knowledge-mining

58 jmlr-2008-Max-margin Classification of Data with Absent Features

Source: pdf

Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller

Abstract: We consider the problem of learning classiﬁers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to ﬁrst complete their unknown values, and then use a standard classiﬁcation procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classiﬁed directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efﬁciently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. Keywords: max margin, missing features, network reconstruction, metabolic pa

reference text

A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005. G. Elidan, G. Heitz, and D. Koller. Learning object shape: From drawings to images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2006. 19 C HECHIK , H EITZ , E LIDAN , A BBEEL AND KOLLER R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 264–271, 2003. J. Forster, I. Famili, P. Fu, B.. Palsson, and J. Nielsen. Genome-scale reconstruction of the saccharomyces cerevisiae metabolic network. Genome Research, 13(2):244–253, February 2003. Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an EM approach. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 120–127. Morgan Kaufmann Publishers, Inc., 1994. K. Grauman and T. Darrell. Pyramid match kernels: Discriminative classiﬁcation with sets of image features. In International Conference on Computer Vision, October 2005. W.K. Huh, J.V. Falvo, L.C. Gerke, A.S. Carroll, R.W. Howson, J.S. Weissman, and E.K. O’Shea. Global analysis of protein localization in budding yeast. Nature, 425:686–691, 2003. N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J.A. Sigrist. The prosite database. Nucleic Acids Res., 34:D227–D230, 2006. J. Ihmels, R. Levy, and N. Barkai. Principles of transcriptional control in the metabolic network of saccharomyces cerevisiae. Nature Biotechnology, 22:86–92, 2003. A. Jaimovich, G. Elidan, H. Margalit, and N. Friedman. Towards an integrated protein-protein interaction network. In RECOMB 2005, 2005. D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In 19th international confernce on Machine learning (ICML 2002), 2002. A. Kapoor. Learning Discriminative Models with Incomplete Data. PhD thesis, MIT Media Lab, Feb 2006 2006. P. Kharchenko, D. Vitkup, and GM Church. Filling gaps in a metabolic network using expression information. Bioinformatics, 2003. P. Kharchenko, GM Church, and D. Vitkup. Expression dynamics of a cellular metabolic network. Molecular Systems Biology, 2005. Y. LeCun. MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/. NEC Research, 1998. URL R.J.A. Little and D.B. Rubin. Statistical Analysis with Missing Data. NY wiley, 1987. M. Sousa Lobo, L. Vandenbergheb, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra and its Applications, 284(1-3):193–228, 1998. S. K. Pannagadatta, C. Bhattacharyya, and A.J. Smola. Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 7:1283–1314, 2006. 20 M AX - MARGIN C LASSIFICATION OF DATA WITH A BSENT F EATURES A. Quattoni, M. Collins, and T. Darrell. Conditional random ﬁelds for object recognition. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Advances in Neural Information Proe cessing Systems 17, pages 1097–1104, Cambridge, MA, 2005. MIT Press. P. Roth. Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47 (3):537–560, 1994. J. L. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall, 1997. B. Sch¨ lkopf and A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization o Optimization and Beyond. MIT Press, Cambridge, MA, 2002. V.N. Vapnik. The Nature of Statistical Learning Theory. SpringerVerlag, 1995. J. P. Vert and Y. Yamanishi. Supervised graph inference. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1433–1440, Cambridge, MA, 2004. MIT Press. 21