nips nips2007 nips2007-80 nips2007-80-reference knowledge-graph by maker-knowledge-mining

80 nips-2007-Ensemble Clustering using Semidefinite Programming

Source: pdf

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we ﬁrst derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semideﬁnite Program (SDP) via novel convexiﬁcation techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

reference text

[1] V. Filkov and S. Skiena. Integrating microarray data by consensus clustering. In Proc. of International Conference on Tools with Artiﬁcial Intelligence, page 418, 2003.

[2] X. Z. Fern and C. E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In Proc. of International Conference on Machine Learning, page 36, 2004.

[3] A. Strehl and J. Ghosh. Cluster Ensembles – A Knowledge Reuse Framework for Combining Partitionings. In Proc. of AAAI 2002, pages 93–98, 2002.

[4] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. In Proc. Symposium on Foundations of Computer Science, page 238, 2002.

[5] S. Monti, P. Tamayo, J. Mesirov, and T. Golub. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn., 52(1-2):91–118, 2003.

[6] A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. In Proc. of International Conference on Data Engineering, pages 341–352, 2005.

[7] N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: ranking and clustering. In Proc. of Symposium on Theory of Computing, pages 684–693, 2005.

[8] M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. J. Comput. Syst. Sci., 71(3):360–383, 2005.

[9] X. Z. Fern and C. E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of International Conference on Machine Learning, 2003.

[10] V. Singh. On Several Geometric Optimization Problems in Biomedical Computation. PhD thesis, State University of New York at Buffalo, 2007.

[11] L. Gasieniec, J. Jansson, and A. Lingas. Approximation algorithms for hamming clustering problems. In Proc. of Symposium on Combinatorial Pattern Matching, pages 108–118, 2000.

[12] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, 2004.

[13] J. Peng and Y. Wei. Approximating k-means-type clustering via semideﬁnite programming. SIAM Journal on Optimization, 18(1):186–205, 2007.

[14] A. D. Gordon and J. T. Henderson. An algorithm for euclidean sum of squares classiﬁcation. Biometrics, 33:355–362, 1977.

[15] J. F. Sturm. Using SeDuMi 1.02, A Matlab Toolbox for Optimization over Symmetric Cones. Optimization Methods and Software, 11-12:625–653, 1999.

[16] J. L¨ fberg. YALMIP : A toolbox for modeling and optimization in MATLAB. In CCA/ISIC/CACSD, o September 2004.