jmlr jmlr2006 jmlr2006-37 jmlr2006-37-reference knowledge-graph by maker-knowledge-mining

37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

Source: pdf

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Luca Zaniboni

Abstract: We study the problem of classifying data in a given taxonomy when classiﬁcations associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linear-threshold classiﬁer for each node of the taxonomy. A hierarchical classiﬁcation is obtained by evaluating the trained node classiﬁers in a top-down fashion. To evaluate classiﬁers in our multipath framework, we deﬁne a new hierarchical loss function, the H-loss, capturing the intuition that whenever a classiﬁcation mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the subtree of that node. Making no assumptions on the mechanism generating the data instances, and assuming a linear noise model for the labels, we bound the H-loss of our on-line algorithm in terms of the H-loss of a reference classiﬁer knowing the true parameters of the label-generating process. We show that, in expectation, the excess cumulative H-loss grows at most logarithmically in the length of the data sequence. Furthermore, our analysis reveals the precise dependence of the rate of convergence on the eigenstructure of the data each node observes. Our theoretical results are complemented by a number of experiments on texual corpora. In these experiments we show that, after only one epoch of training, our algorithm performs much better than Perceptron-based hierarchical classiﬁers, and reasonably close to a hierarchical support vector machine. Keywords: incremental algorithms, online learning, hierarchical classiﬁcation, second order perceptron, support vector machines, regret bound, loss function

reference text

K. S. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential familiy of distributions. Machine Learning, 43(3):211–246, 2001. N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. SIAM Journal of Computing., 43(3):640–668, 2005. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Learning probabilistic linear-threshold classiﬁers via selective sampling. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 373–386. LNAI 2777, Springer, 2003. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Regret bounds for hierarchical classiﬁcation with linear-threshold functions. In Proceedings of the 17th Annual Conference on Computational Learning Theory, pages 93–108. LNAI 3120, Springer, 2004. C. C. Chang and C. J. Lin. Libsvm: a library for support vector machines, 2004. Available electronically at http://www.csie.ntu.edu.tw/∼cjlin/libsvm/. O. Dekel, J. Keshet, and Y. Singer. An efﬁcient online algorithm for hierarchical phoneme classiﬁcation. In Proceedings of the 1st International Workshop on Machine Learning for Multimodal Interaction, pages 146-158. Springer LNAI 3361, 2005. O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classiﬁcation. In Proceedings of the 21st International Conference on Machine Learning. Omnipress, 2004. S. T. Dumais and H. Chen. Hierarchical classiﬁcation of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval, pages 256–263. ACM Press, 2000. M. Granitzer. Hierarchical Text Classiﬁcation using Methods from Machine Learning. PhD thesis, Graz University of Technology, 2003. W. R. Hersh. The OHSUMED test collection, 1994. Available electronically at http://medir.ohsu.edu/pub/ohsumed/. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963. T. Hofmann, L. Cai, and M. Ciaramita. Learning with taxonomies: classifying documents and words. In NIPS 2003: Workshop on syntax, semantics, and statistics, 2003. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning, pages 170–178, 1997. T. L. Lai, H. Robbins, and C. Z. Wei. Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences USA, 75(7):3034–3036, 1979. 53 C ESA -B IANCHI ET AL . T. L. Lai and C. Z. Wei. Least squares estimates in stochastic regression models with applications to identiﬁcation and control of dynamic systems. The Annals of Statistics, 10(1):154–166, 1982. A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classiﬁcation and clustering, 2004. Available electronically at http://www-2.cs.cmu.edu/∼mccallum/bow/. A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classiﬁcation by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning, pages 359–367. Morgan Kaufmann Publishers, 1998. D. Mladenic. Turning yahoo into an automatic web-page classiﬁer. In Proceedings of the 13th European Conference on Artiﬁcial Intelligence, pages 473–474, 1998. A. B. J. Novikov. On convergence proofs on Perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, vol. XII, pages 615–622, 1962. Reuters. Reuters corpus volume 1, 2000. Available electronically at http://about.reuters.com/researchandstandards/corpus/. R. Rifkin, G. Yeo, and T. Poggio. Regularized least squares classiﬁcation. Advances in Learning Theory: Methods, Model and Applications. NATO Science Series III: Computer and Systems Sciences, 190:131–153, 2003. F. Rosenblatt. The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408, 1958. M. E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87–118, 2002. B. Sch¨ lkopf and A. Smola. Learning with kernels. MIT Press, 2002. o B. Sch¨ lkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. o Neural Computation, 12:1207–1245, 2000. A. Sun and E. P. Lim. Hierarchical text classiﬁcation and evaluation. In Proceedings of the 2001 International Conference on Data Mining, pages 521–528. IEEE Press, 2001. V. Vapnik. Statistical Learning Theory. Wiley, 1998. V. Vovk. Competitive on-line statistics. International Statistical Review, 69:213–248, 2001. 54