nips nips2009 nips2009-102 nips2009-102-reference knowledge-graph by maker-knowledge-mining

102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models


Source: pdf

Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1


reference text

[1] 20 Newsgroups Data Set. http://people.csail.mit.edu/jrennie/20Newsgroups/.

[2] E. Bauer and R. Kohavi. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 36:105–139, 2004.

[3] Dimitri P. Bertsekas. Non-Linear Programming (2nd Edition). Athena Scientific, 1999.

[4] A. Blum and T. Mitchell. Combining Labeled and Unlabeled Data with Co-training. In Proc. of COLT’ 98, pages 92–100, 1998.

[5] N. Borlin. Implementation of Hungarian Method. http://www.cs.umu.se/∼niclas/matlab/assignprob/.

[6] R. Caruana. Multitask Learning. Machine Learning, 28:41–75, 1997.

[7] C.-C. Chang and C.-J. Lin. LibSVM: a Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.

[8] O. Chapelle, B. Sch¨ lkopf and A. Zien (eds). Semi-Supervised Learning. MIT Press, 2006. o

[9] K. Crammer, M. Kearns and J. Wortman. Learning from Multiple Sources. Journal of Machine Learning Research, 9:1757-1774 , 2008.

[10] DBLP Bibliography. http://www.informatik.uni-trier.de/∼ley/db/.

[11] T. Dietterich. Ensemble Methods in Machine Learning. In Proc. of MCS ’00, pages 1–15, 2000.

[12] X. Z. Fern and C. E. Brodley. Solving Cluster Ensemble Problems by Bipartite Graph Partitioning. In Proc. of ICML’ 04, pages 281–288, 2004.

[13] K. Ganchev, J. Graca, J. Blitzer, and B. Taskar. Multi-view Learning over Structured and Non-identical Outputs. In Proc. of UAI’ 08, pages 204–211, 2008.

[14] J. Gao, W. Fan, Y. Sun, and J. Han. Heterogeneous source consensus learning via decision propagation and negotiation. In Proc. of KDD’ 09, pages 339–347, 2009.

[15] A. Genkin, D. D. Lewis, and D. Madigan. BBR: Bayesian Logistic Regression Software. http://stat.rutgers.edu/∼madigan/BBR/.

[16] A. Goldberg and X. Zhu. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization. In HLT-NAACL 2006 Workshop on Textgraphs.

[17] A. Gionis, H. Mannila, and P. Tsaparas. Clustering Aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007.

[18] T. Haveliwala. Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search. IEEE Transactions on Knowledge and Data Engineering, 15(4):1041-4347, 2003.

[19] J. Hoeting, D. Madigan, A. Raftery, and C. Volinsky. Bayesian Model Averaging: a Tutorial. Statistical Science, 14:382–417, 1999.

[20] R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton. Adaptive Mixtures of Local Experts. Neural Computation, 3:79-87, 1991.

[21] T. Joachims. Transductive Learning via Spectral Graph Partitioning. In Proc. of ICML’ 03, pages 290– 297, 2003.

[22] G. Karypis. CLUTO – Family of Data Clustering Software Tools. http://glaros.dtc.umn.edu/gkhome/views/cluto.

[23] A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the Construction of Internet Portals with Machine Learning. Information Retrieval Journal, 3:127–163, 2000.

[24] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report, Stanford InfoLab, 1999.

[25] V. Singh, L. Mukherjee, J. Peng, and J. Xu. Ensemble Clustering using Semidefinite Programming. In Proc. of NIPS’ 07, 2007.

[26] A. Strehl and J. Ghosh. Cluster Ensembles – a Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research, 3:583–617, 2003.

[27] D. Wolpert. Stacked Generalization. Neural Networks, 5:241–259, 1992.

[28] D. Zhou , J. Weston, A. Gretton, O. Bousquet and B. Scholkopf. Ranking on Data Manifolds. In Proc. of NIPS’ 03, pages 169–176, 2003.

[29] X. Zhu. Semi-supervised Learning Literature Survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005. 9