nips nips2005 nips2005-105 nips2005-105-reference knowledge-graph by maker-knowledge-mining

105 nips-2005-Large-Scale Multiclass Transduction

Source: pdf

Author: Thomas Gärtner, Quoc V. Le, Simon Burton, Alex J. Smola, Vishy Vishwanathan

Abstract: We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its inverse with a vector can be computed sufﬁciently fast. This holds, for instance, for certain graph and string kernels. Transduction is achieved by variational inference over the unlabeled data subject to a balancing constraint. 1

reference text

[1] K. Bennett. Combining support vector and mathematical programming methods for classiﬁcation. In Advances in Kernel Methods - -Support Vector Learning, pages 307 – 326. MIT Press, 1998.

[2] K. Bennett. Combining support vector and mathematical programming methods for induction. In B. Sch¨ lkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in o Kernel Methods - -SV Learning, pages 307 – 326, Cambridge, MA, 1999. MIT Press.

[3] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13 – 30, 1963.

[4] T. Joachims. Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. The Kluwer International Series In Engineering And Computer Science. Kluwer Academic Publishers, Boston, May 2002. ISBN 0 - 7923 7679-X.

[5] M. I. Jordan, Z. Ghahramani, Tommi S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183 – 233, 1999.

[6] S. L. Lauritzen. Graphical Models. Oxford University Press, 1996.

[7] A. J. Smola and I. R. Kondor. Kernels and regularization on graphs. In B. Sch¨ lkopf o and M. K. Warmuth, editors, Proceedings of the Annual Conference on Computational Learning Theory, Lecture Notes in Computer Science. Springer, 2003.

[8] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.

[9] S. V. N. Vishwanathan and A. J. Smola. Fast kernels for string and tree matching. In K. Tsuda, B. Sch¨ lkopf, and J.P. Vert, editors, Kernels and Bioinformatics, Camo bridge, MA, 2004. MIT Press.

[10] C. K. I. Williams and D. Barber. Bayesian classiﬁcation with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 20(12):1342 – 1351, 1998.

[11] D. Zhou, J. Huang, and B. Sch¨ lkopf. Learning from labeled and unlabeled data on a o directed graph. In International Conference on Machine Learning, 2005.

[12] X. Zhu, J. Lafferty, and Z. Ghahramani. Semi-supervised learning using gaussian ﬁelds and harmonic functions. In International Conference on Machine Learning ICML’03, 2003.