nips nips2010 nips2010-94 nips2010-94-reference knowledge-graph by maker-knowledge-mining

94 nips-2010-Feature Set Embedding for Incomplete Data

Source: pdf

Author: David Grangier, Iain Melvin

Abstract: We present a new learning strategy for classiﬁcation problems in which train and/or test data suffer from missing features. In previous work, instances are represented as vectors from some feature space and one is forced to impute missing values or to consider an instance-speciﬁc subspace. In contrast, our method considers instances as sets of (feature,value) pairs which naturally handle the missing value case. Building onto this framework, we propose a classiﬁcation strategy for sets. Our proposal maps (feature,value) pairs into an embedding space and then nonlinearly combines the set of embedded vectors. The embedding and the combination parameters are learned jointly on the ﬁnal classiﬁcation objective. This simple strategy allows great ﬂexibility in encoding prior knowledge about the features in the embedding step and yields advantageous results compared to alternative solutions over several datasets. 1

reference text

[1] G. Batista and M. Monard. A study of k-nearest neighbour as an imputation method. In Hybrid Intelligent Systems (HIS), pages 251–260, 2002.

[2] C. Bhattacharyya, P. K. Shivaswamy, and A. Smola. A second order cone programming formulation for classifying missing data. In Neural Information Processing Systems (NIPS), pages 153–160, 2005.

[3] S. Boughhorbel, J-P. Tarel, and F. Fleuret. Non-mercer kernels for svm object recognition. In British Machine Vision Conference (BMVC), 2004.

[4] G. Chechik, G. Heitz, G. Elidan, P. Abbeel, and D. Koller. Max margin classiﬁcation of data with absent features. Journal of Machine Learning Research (JMLR), 9:1–21, 2008.

[5] O. Dekel, O. Shamir, and L. Xiao. Learning to classify with missing and corrupted features. Machine Learning Journal, 2010 (to appear).

[6] U. Dick, P. Haider, and T. Scheffer. Learning from incomplete data with inﬁnite imputations. In International Conference on Machine Learning (ICML), 2008.

[7] G. Elidan, G. Heitz, and D. Koller. Learning object shape: From drawings to images. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2064–2071, 2006.

[8] Y. M. Ermoliev and V. I. Norkin. Stochastic generalized gradient method with application to insurance risk management. Technical Report 21, International Institute for Applied Systems Analysis, 1997.

[9] Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an em approach. In Neural Information Processing Systems (NIPS), pages 120–127, 1993.

[10] A. Globerson and S. Roweis. Nightmare at test time: robust learning by feature deletion. In International Conference on Machine Learning (ICML), pages 353–360, 2006.

[11] Y. Grandvalet, S. Canu, and S. Boucheron. Noise injection: Theoretical prospects. Neural Computation, 9(5):1093–1108, 1997.

[12] R. Kondor and T. Jebara. A kernel between sets of vectors. In International Conference on Machine Learning (ICML), 2003.

[13] E. Krupka, A. Navot, and N. Tishby. Learning to select features using their properties. Journal of Machine Learning Research (JMLR), 9:2349–2376, 2008.

[14] Y. LeCun, L. Bottou, G. B. Orr, and K. R. Mueller. Efﬁcient backprop. In G. B Orr and K. R. Mueller, editors, Neural Networks: Tricks of the Trade, chapter 1, pages 9–50. Springer, 1998.

[15] X. Liao, H. Li, and L. Carin. Quadratically gated mixture of experts for incomplete data classiﬁcation. In International Conference on Machine Learning (ICML), pages 553–560, 2007.

[16] S. Rosset, J. Zhu, and T. Hastie. Margin maximizing loss functions. In Neural Information Processing Systems (NIPS), 2003.

[17] R. Salakhutdinov and H. Larochelle. Efﬁcient learning of deep Boltzmann machines. In Artiﬁcial Intelligence and Statistics (AISTATS), 2010.

[18] J.L. Schafer. Analysis of Incomplete Multivariate Data. Chapman & Hall, London, UK, 1998.

[19] N.Z. Shor. Minimization Methods for Non-Differentiable Functions and Applications. Springer, Berlin, Germany, 1985.

[20] P. Simard, D. Steinkraus, and J.C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In International Conference on Document Analysis and Recognition (ICDAR), pages 958–962, 2003.

[21] P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol. Extracting and composing robust features with denoising autoencoders. In International Conference on Machine Learning (ICML), pages 1096–1103, 2008.

[22] D. Williams, X. Liao, Y. Xue, and L. Carin. Incomplete-data classiﬁcation using logistic regression. In International Conference on Machine Learning (ICML), pages 972–979, 2005. 9