nips nips2010 nips2010-94 nips2010-94-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Grangier, Iain Melvin
Abstract: We present a new learning strategy for classification problems in which train and/or test data suffer from missing features. In previous work, instances are represented as vectors from some feature space and one is forced to impute missing values or to consider an instance-specific subspace. In contrast, our method considers instances as sets of (feature,value) pairs which naturally handle the missing value case. Building onto this framework, we propose a classification strategy for sets. Our proposal maps (feature,value) pairs into an embedding space and then nonlinearly combines the set of embedded vectors. The embedding and the combination parameters are learned jointly on the final classification objective. This simple strategy allows great flexibility in encoding prior knowledge about the features in the embedding step and yields advantageous results compared to alternative solutions over several datasets. 1
[1] G. Batista and M. Monard. A study of k-nearest neighbour as an imputation method. In Hybrid Intelligent Systems (HIS), pages 251–260, 2002.
[2] C. Bhattacharyya, P. K. Shivaswamy, and A. Smola. A second order cone programming formulation for classifying missing data. In Neural Information Processing Systems (NIPS), pages 153–160, 2005.
[3] S. Boughhorbel, J-P. Tarel, and F. Fleuret. Non-mercer kernels for svm object recognition. In British Machine Vision Conference (BMVC), 2004.
[4] G. Chechik, G. Heitz, G. Elidan, P. Abbeel, and D. Koller. Max margin classification of data with absent features. Journal of Machine Learning Research (JMLR), 9:1–21, 2008.
[5] O. Dekel, O. Shamir, and L. Xiao. Learning to classify with missing and corrupted features. Machine Learning Journal, 2010 (to appear).
[6] U. Dick, P. Haider, and T. Scheffer. Learning from incomplete data with infinite imputations. In International Conference on Machine Learning (ICML), 2008.
[7] G. Elidan, G. Heitz, and D. Koller. Learning object shape: From drawings to images. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2064–2071, 2006.
[8] Y. M. Ermoliev and V. I. Norkin. Stochastic generalized gradient method with application to insurance risk management. Technical Report 21, International Institute for Applied Systems Analysis, 1997.
[9] Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an em approach. In Neural Information Processing Systems (NIPS), pages 120–127, 1993.
[10] A. Globerson and S. Roweis. Nightmare at test time: robust learning by feature deletion. In International Conference on Machine Learning (ICML), pages 353–360, 2006.
[11] Y. Grandvalet, S. Canu, and S. Boucheron. Noise injection: Theoretical prospects. Neural Computation, 9(5):1093–1108, 1997.
[12] R. Kondor and T. Jebara. A kernel between sets of vectors. In International Conference on Machine Learning (ICML), 2003.
[13] E. Krupka, A. Navot, and N. Tishby. Learning to select features using their properties. Journal of Machine Learning Research (JMLR), 9:2349–2376, 2008.
[14] Y. LeCun, L. Bottou, G. B. Orr, and K. R. Mueller. Efficient backprop. In G. B Orr and K. R. Mueller, editors, Neural Networks: Tricks of the Trade, chapter 1, pages 9–50. Springer, 1998.
[15] X. Liao, H. Li, and L. Carin. Quadratically gated mixture of experts for incomplete data classification. In International Conference on Machine Learning (ICML), pages 553–560, 2007.
[16] S. Rosset, J. Zhu, and T. Hastie. Margin maximizing loss functions. In Neural Information Processing Systems (NIPS), 2003.
[17] R. Salakhutdinov and H. Larochelle. Efficient learning of deep Boltzmann machines. In Artificial Intelligence and Statistics (AISTATS), 2010.
[18] J.L. Schafer. Analysis of Incomplete Multivariate Data. Chapman & Hall, London, UK, 1998.
[19] N.Z. Shor. Minimization Methods for Non-Differentiable Functions and Applications. Springer, Berlin, Germany, 1985.
[20] P. Simard, D. Steinkraus, and J.C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In International Conference on Document Analysis and Recognition (ICDAR), pages 958–962, 2003.
[21] P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol. Extracting and composing robust features with denoising autoencoders. In International Conference on Machine Learning (ICML), pages 1096–1103, 2008.
[22] D. Williams, X. Liao, Y. Xue, and L. Carin. Incomplete-data classification using logistic regression. In International Conference on Machine Learning (ICML), pages 972–979, 2005. 9