nips nips2012 nips2012-291 nips2012-291-reference knowledge-graph by maker-knowledge-mining

291 nips-2012-Reducing statistical time-series problems to binary classification


Source: pdf

Author: Daniil Ryabko, Jeremie Mary

Abstract: We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data. 1


reference text

[1] T. M. Adams and A. B. Nobel. Uniform convergence of Vapnik-Chervonenkis classes under ergodic sampling. The Annals of Probability, 38:1345–1367, 2010.

[2] T. M. Adams and A. B. Nobel. Uniform approximation of Vapnik-Chervonenkis classes. Bernoulli, 18(4):1310–1319, 2012.

[3] M.-F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. Sorkin. Robust reductions from ranking to classification. In COLT’07, v. 4539 of LNCS, pages 604–619. 2007.

[4] M.F. Balcan, A. Blum, and S. Vempala. A discriminative framework for clustering via similarity functions. In STOC, pp. 671–680. ACM, 2008.

[5] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

[6] C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20(3):273–297, 1995.

[7] R. Fortet and E. Mourier. Convergence de la r´ partition empirique vers la r´ partition e e th´ oretique. Ann. Sci. Ec. Norm. Super., III. Ser, 70(3):267–285, 1953. e

[8] R. Gray. Probability, Random Processes, and Ergodic Properties. Springer Verlag, 1988.

[9] M. Gutman. Asymptotically optimal classification for multiple tests with empirically observed statistics. IEEE Transactions on Information Theory, 35(2):402–408, 1989.

[10] Z. Harchaoui, F. Bach, E. Moulines. Kernel change-point analysis. NIPS, pp. 609–616, 2008.

[11] L. V. Kantorovich and G. S. Rubinstein. On a function space in certain extremal problems. Dokl. Akad. Nauk USSR, 115(6):1058–1061, 1957.

[12] R.L. Karandikara and M. Vidyasagar. Rates of uniform convergence of empirical means with mixing processes. Statistics and Probability Letters, 58:297–307, 2002.

[13] A. Khaleghi, D. Ryabko, J. Mary, and P. Preux. Online clustering of processes. In AISTATS, JMLR W&CP; 22, pages 601–609, 2012.

[14] D. Kifer, S. Ben-David, J. Gehrke. Detecting change in data streams. VLDB (v.30): 180–191, 2004.

[15] A.N. Kolmogorov. Sulla determinazione empirica di una legge di distribuzione. G. Inst. Ital. Attuari, pages 83–91, 1933.

[16] John Langford, Roberto Oliveira, and Bianca Zadrozny. Predicting conditional quantiles via reduction to classification. In UAI, 2006.

[17] Jos´ del R. Mill´ n. On the need for on-line learning in brain-computer interfaces. In Proc. of e a the Int. Joint Conf. on Neural Networks, 2004.

[18] D. Pollard. Convergence of Stochastic Processes. Springer, 1984.

[19] B. Ryabko. Prediction of random sequences and universal coding. Problems of Information Transmission, 24:87–96, 1988.

[20] B. Ryabko. Compression-based methods for nonparametric prediction and estimation of some characteristics of time series. IEEE Transactions on Information Theory, 55:4309–4315, 2009.

[21] D. Ryabko. Clustering processes. In Proc. ICML 2010, pp. 919–926, Haifa, Israel, 2010.

[22] D. Ryabko. Discrimination between B-processes is impossible. Journal of Theoretical Probability, 23(2):565–575, 2010.

[23] D. Ryabko and B. Ryabko. Nonparametric statistical inference for ergodic processes. IEEE Transactions on Information Theory, 56(3):1430–1435, 2010.

[24] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.

[25] P. Shields. The Ergodic Theory of Discrete Sample Paths. AMS Bookstore, 1996.

[26] V. M. Zolotarev. Metric distances in spaces of random variables and their distributions. Math. USSR-Sb, 30(3):373–401, 1976. 9