jmlr jmlr2013 jmlr2013-22 jmlr2013-22-reference knowledge-graph by maker-knowledge-mining

22 jmlr-2013-Classifying With Confidence From Incomplete Information


Source: pdf

Author: Nathan Parrish, Hyrum S. Anderson, Maya R. Gupta, Dun Yu Hsiao

Abstract: We consider the problem of classifying a test sample given incomplete information. This problem arises naturally when data about a test sample is collected over time, or when costs must be incurred to compute the classification features. For example, in a distributed sensor network only a fraction of the sensors may have reported measurements at a certain time, and additional time, power, and bandwidth is needed to collect the complete data to classify. A practical goal is to assign a class label as soon as enough data is available to make a good decision. We formalize this goal through the notion of reliability—the probability that a label assigned given incomplete data would be the same as the label assigned given the complete data, and we propose a method to classify incomplete data only if some reliability threshold is met. Our approach models the complete data as a random variable whose distribution is dependent on the current incomplete data and the (complete) training data. The method differs from standard imputation strategies in that our focus is on determining the reliability of the classification decision, rather than just the class label. We show that the method provides useful reliability estimates of the correctness of the imputed class labels on a set of experiments on time-series data sets, where the goal is to classify the time-series as early as possible while still guaranteeing that the reliability threshold is met. Keywords: classification, sensor networks, signals, reliability


reference text

H. S. Anderson, N. Parrish, K. Tsukida, and M. R. Gupta. Reliable early classification of time series. In ICASSP, 2012. P. Armitage. Sequential Analysis with More than Two Alternative Hypotheses, and its Relation to Discriminant Function Analysis. Journal of the Royal Statistical Society. Series B (Methodological), 12(1):137–144, January 1950. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, 2008. 3587 PARRISH , A NDERSON , G UPTA AND H SAIO C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu. edu.tw/˜cjlin/libsvm. M. Cooke, P. Green, L. Josifovski, and A. Vizinho. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34(3):267 – 285, 2001. M. C. Costanza and A. A. Afifi. Comparison of Stopping Rules in Forward Stepwise Discriminant Analysis. Journal of the American Statistical Association, 74(368):777–785, January 1979. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal Machine Learning Research, 3:951–991, 2003. M. Dredze, K. Crammer, and F. Pereira. Confidence-weighted linear classification. Intl. Conf. Machine Learning (ICML), 2008. T. Ferguson. Optimal Stopping Rules and Applications. E-book available on the author’s website., 2001. J. Friedman, R. Kohavi, and Y. Yun. Lazy decision trees. In Proc. AAAI, 1996. W. J. Fu, E. R. Dougherty, B. Mallick, and R. J. Carroll. How many samples are needed to build a classifier: a general sequential approach. Bioinformatics, 21(1):63–70, January 2005. E. K. Garcia, S. Feldman, M. R. Gupta, and S. Srivastava. Completely lazy learning. IEEE Trans. Knowledge and Data Engineering, 22(9):1274–1285, Sept. 2010. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, New York, 2001. E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The UCR time series classification and clustering webpage. 2006. J. M. Martinez. Local minimizers of quadratic functions on Euclidean balls and spheres. SIAM J. Optimization, 4(1):159 – 176, 1994. S. Pang, S. Ozawa, and N. Kasabov. Incremental linear discriminant analysis for classification of data streams. IEEE Trans. Systems, Man, and Cybernetics, 35(5):905–914, October 2005. N. Parrish and M. R. Gupta. Dimensionality reduction by local discriminative Gaussians. In Proc. Intl. Conf. on Machine Learning (ICML), 2012. B. Raj and R.M. Stern. Missing-feature approaches in speech recognition. Signal Processing Magazine, IEEE, 22(5):101 – 116, 2005. B. Raj, M. L. Seltzer, and R. M. Stern. Reconstruction of missing features for robust speech recognition. Speech Communication, 43(4):275 – 296, 2004. J. J. Rodriguez and C. J. Alonso. Boosting interval-based literals: Variable length and early classification. In ECAI’02 Workshop on Knowledge Discovery from (Spatio-) Temporal Data, 2002. 3588 C LASSIFYING W ITH C ONFIDENCE A. Rogier, T. Donders, Geert J. M. G. van der Heijden, T. Stijnen, and K. G. M. Moons. Review: A gentle introduction to imputation of missing values. J. of Clinical Epidemiology, 59(10):1087– 1091, 2006. M. Saar-Tsechansky and F. Provost. Handling missing values when applying classification models. Journal Machine Learning Research, 2007. J. L. Schafer and J. W. Graham. Missing data: Our view of the state of the art. Psychological Methods, 7(2):147–177, 2002. D. Schuurmans and R. Greiner. Learning to classify incomplete examples. In Computational Learning Theory and Natural Learning Systems: Making Learning Systems Practical, 2007. P. D. Tao and L. T. H. An. Convex analysis approach to D.C. programming: theory, algorithms, and applications. ACTA Mathematica Vietnamica, 22(1):289 – 355, 1997. A. Wald. Sequential Analysis. John Wiley, 1947. Z. Xing, J. Pei, and P. S. Yu. Early prediction on time series: a nearest neighbor approach. In IJCAI, pages 1297–1302, 1998. 3589