nips nips2010 nips2010-177 nips2010-177-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Novi Quadrianto, James Petterson, Tibério S. Caetano, Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1
[1] R. Caruana. Multitask learning. Machine Learning, 28:41–75, 1997.
[2] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Mach. Learn., 73(3):243–272, 2008.
[3] Kai Yu, Volker Tresp, and Anton Schwaighofer. Learning gaussian processes from multiple tasks. In ICML ’05: Proceedings of the 22nd international conference on Machine learning, pages 1012–1019, New York, NY, USA, 2005. ACM.
[4] Rie Kubota Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853, 2005.
[5] Y. Altun and A.J. Smola. Unifying divergence minimization and statistical inference via convex duality. In H.U. Simon and G. Lugosi, editors, Proc. Annual Conf. Computational Learning Theory, LNCS, pages 139–153. Springer, 2006.
[6] T. Pham Dinh and L. Hoai An. A D.C. optimization algorithm for solving the trust-region subproblem. SIAM Journal on Optimization, 8(2):476–505, 1988.
[7] G. Obozinski, B. Taskar, and M. I. Jordan. Multi-task feature selection. Technical report, U.C. Berkeley, 2007.
[8] Remi Flamary, Alain Rakotomamonjy, Gilles Gasso, and Stephane Canu. Svm multi-task learning and non convex sparsity measure. In The Learning Workshop, 2009.
[9] Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. Learning multiple tasks with kernel methods. J. Mach. Learn. Res., 6:615–637, 2005.
[10] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. In NIPS 19, pages 321–328. MIT Press, 2007.
[11] Shai Ben-David, Johannes Gehrke, and Reba Schuller. A theoretical framework for learning from a pool of disparate data sources. In KDD ’02: Proceedings of the 8th ACM international conference on Knowledge discovery and data mining, pages 443–449. ACM, 2002.
[12] M. Dud´k and R. E. Schapire. Maximum entropy distribution estimation with generalized regı ularization. In G´ bor Lugosi and Hans U. Simon, editors, Proc. Annual Conf. Computational a Learning Theory. Springer Verlag, June 2006.
[13] Nadia Ghamrawi and Andrew McCallum. Collective multi-label classification. In CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 195–200, New York, NY, USA, 2005. ACM.
[14] A.L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15:915– 936, 2003.
[15] A. J. Smola, S. V. N. Vishwanathan, and T. Hofmann. Kernel methods for missing variables. In R.G. Cowell and Z. Ghahramani, editors, Proceedings of International Workshop on Artificial Intelligence and Statistics, pages 325–332, 2005.
[16] Bharath Sriperumbudur and Gert Lanckriet. On the convergence of the concave-convex procedure. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1759–1767. MIT Press, 2009.
[17] B. Sch¨ lkopf. Support Vector Learning. R. Oldenbourg Verlag, Munich, 1997. Download: o http://www.kernel-machines.org.
[18] David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 5:361–397, 2004.
[19] Lijuan Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings of the Thirteenth ACM conference on Information and knowledge management, pages 78–87, New York, NY, USA, 2004. ACM Press.
[20] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat Genet, 25:25– 29, 2000.
[21] J. M. Borwein and Q. J. Zhu. Techniques of Variational Analysis. CMS books in Mathematics. Canadian Mathematical Society, 2005. 9