jmlr jmlr2006 jmlr2006-37 jmlr2006-37-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Luca Zaniboni
Abstract: We study the problem of classifying data in a given taxonomy when classifications associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linear-threshold classifier for each node of the taxonomy. A hierarchical classification is obtained by evaluating the trained node classifiers in a top-down fashion. To evaluate classifiers in our multipath framework, we define a new hierarchical loss function, the H-loss, capturing the intuition that whenever a classification mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the subtree of that node. Making no assumptions on the mechanism generating the data instances, and assuming a linear noise model for the labels, we bound the H-loss of our on-line algorithm in terms of the H-loss of a reference classifier knowing the true parameters of the label-generating process. We show that, in expectation, the excess cumulative H-loss grows at most logarithmically in the length of the data sequence. Furthermore, our analysis reveals the precise dependence of the rate of convergence on the eigenstructure of the data each node observes. Our theoretical results are complemented by a number of experiments on texual corpora. In these experiments we show that, after only one epoch of training, our algorithm performs much better than Perceptron-based hierarchical classifiers, and reasonably close to a hierarchical support vector machine. Keywords: incremental algorithms, online learning, hierarchical classification, second order perceptron, support vector machines, regret bound, loss function
K. S. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential familiy of distributions. Machine Learning, 43(3):211–246, 2001. N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. SIAM Journal of Computing., 43(3):640–668, 2005. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Learning probabilistic linear-threshold classifiers via selective sampling. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 373–386. LNAI 2777, Springer, 2003. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Regret bounds for hierarchical classification with linear-threshold functions. In Proceedings of the 17th Annual Conference on Computational Learning Theory, pages 93–108. LNAI 3120, Springer, 2004. C. C. Chang and C. J. Lin. Libsvm: a library for support vector machines, 2004. Available electronically at http://www.csie.ntu.edu.tw/∼cjlin/libsvm/. O. Dekel, J. Keshet, and Y. Singer. An efficient online algorithm for hierarchical phoneme classification. In Proceedings of the 1st International Workshop on Machine Learning for Multimodal Interaction, pages 146-158. Springer LNAI 3361, 2005. O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proceedings of the 21st International Conference on Machine Learning. Omnipress, 2004. S. T. Dumais and H. Chen. Hierarchical classification of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval, pages 256–263. ACM Press, 2000. M. Granitzer. Hierarchical Text Classification using Methods from Machine Learning. PhD thesis, Graz University of Technology, 2003. W. R. Hersh. The OHSUMED test collection, 1994. Available electronically at http://medir.ohsu.edu/pub/ohsumed/. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963. T. Hofmann, L. Cai, and M. Ciaramita. Learning with taxonomies: classifying documents and words. In NIPS 2003: Workshop on syntax, semantics, and statistics, 2003. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning, pages 170–178, 1997. T. L. Lai, H. Robbins, and C. Z. Wei. Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences USA, 75(7):3034–3036, 1979. 53 C ESA -B IANCHI ET AL . T. L. Lai and C. Z. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):154–166, 1982. A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering, 2004. Available electronically at http://www-2.cs.cmu.edu/∼mccallum/bow/. A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning, pages 359–367. Morgan Kaufmann Publishers, 1998. D. Mladenic. Turning yahoo into an automatic web-page classifier. In Proceedings of the 13th European Conference on Artificial Intelligence, pages 473–474, 1998. A. B. J. Novikov. On convergence proofs on Perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, vol. XII, pages 615–622, 1962. Reuters. Reuters corpus volume 1, 2000. Available electronically at http://about.reuters.com/researchandstandards/corpus/. R. Rifkin, G. Yeo, and T. Poggio. Regularized least squares classification. Advances in Learning Theory: Methods, Model and Applications. NATO Science Series III: Computer and Systems Sciences, 190:131–153, 2003. F. Rosenblatt. The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408, 1958. M. E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87–118, 2002. B. Sch¨ lkopf and A. Smola. Learning with kernels. MIT Press, 2002. o B. Sch¨ lkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. o Neural Computation, 12:1207–1245, 2000. A. Sun and E. P. Lim. Hierarchical text classification and evaluation. In Proceedings of the 2001 International Conference on Data Mining, pages 521–528. IEEE Press, 2001. V. Vapnik. Statistical Learning Theory. Wiley, 1998. V. Vovk. Competitive on-line statistics. International Statistical Review, 69:213–248, 2001. 54