jmlr jmlr2007 jmlr2007-32 jmlr2007-32-reference knowledge-graph by maker-knowledge-mining

32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

Source: pdf

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are speciﬁed. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space, based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semideﬁnite matrices. The local structure of the embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text data sets, and show that it consistently and signiﬁcantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling, IsoMap and correspondence analysis. Keywords: embedding algorithms, manifold learning, exponential families, multidimensional scaling, matrix factorization, semideﬁnite programming

reference text

F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. A. L. Berger, S.A. Della Pietra, and V.J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996. D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control, 21:174–184, 1976. E.D. Bolker and B. Roth. When is a bipartite graph a rigid framework? Paciﬁc Journal of Mathematics, 90:27–44, 1980. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2004. M. Chalmers and P. Chitson. Bead: explorations in information visualization. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 330–337. ACM Press, New York, NY, 1992. G. Chechik and N. Tishby. Extracting relevant structures with side information. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 857–864. MIT Press, Cambridge, MA, 2003. M. Chiang. Geometric programming for communication systems. Foundations and Trends in Communications and Information Theory, 2(1):1–154, 2005. T.M. Cover and J.A Thomas. Elements of Information Theory. Wiley-Interscience, New York, 1991. T. Cox and M. Cox. Multidimensional Scaling. Chapman and Hall, London, 1984. M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, volume 6, pages 4734–4739. American Automatic Control Council, New York, 2001. R.A. Fisher. The precision of discriminant functions. Annals of Eugenics, London, 10:422–429, 1940. A. Globerson and N. Tishby. Sufﬁcient dimensionality reduction. Journal of Machine Learning Research, 3:1307–1331, 2003. A. Globerson, G. Chechik, F. Pereira, and N.Tishby. Euclidean embedding of co-occurrence data. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 497–504. MIT Press, Cambridge, MA, 2005. M.J. Greenacre. Theory and Applications of Correspondence Analysis. Academic Press, London, 1984. 2293 G LOBERSON , C HECHIK , P EREIRA AND T ISHBY J.H. Ham, D.D. Lee, and L.K. Saul. Learning high dimensional correspondences with low dimensional manifolds. In Proceedings of the 20th International Conference on Machine Learning. Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pages 34–41, 2003. M.O. Hill. Correspondence analysis: A neglected multivariate method. Applied Statistics, 23(3): 340–354, 1974. G. Hinton and S.T. Roweis. Stochastic neighbor embedding. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 833–840. MIT Press, Cambridge, MA, 2003. T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177–196, 2001. H. Hotelling. The most predictable criterion. Journal of Educational Psychology, 26:139–142, 1935. T. Iwata, K. Saito, N. Ueda, S. Stromsten, T. Grifﬁths, and J. Tenenbaum. Parametric embedding for class visualization. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2005. P.L. Lai and C. Fyfe. Kernel and nonlinear canonical correlation analysis. In International Joint Conference on Neural Networks, pages 365–378. IEEE Computer Society, Los Alamitos, CA, 2000. D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999. X. Lin. Map displays for information retrieval. Journal of the American Society for Information Science, 48(1):40–54, 1997. G. Mei and C. R. Shelton. Visualization of collaborative data. In R. Dechter and T. Richardson, editors, Proceedings of the Twenty-Second International Conference on Uncertainty in Artiﬁcial Intelligence, pages 341–348. AUAI Press, Arlington, VA, 2006. G. Michailidis and J. de Leeuw. The Giﬁ system of descriptive multivariate analysis. Statistical Science, 13(4):307–336, 1998. R. B. Nelsen. An Introduction to Copulas. Springer, New York, 1999. V.Y. Pan and Z.Q. Chen. The complexity of the matrix eigenproblem. In Proceedings of the ThirtyFirst Annual ACM Symposium on Theory of Computing, pages 507–516. ACM Press, New York, NY, 1999. S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000. S.T. Roweis. NIPS 0-12 data. http://www.cs.toronto.edu/∼roweis/data.html, 2000. 2294 E UCLIDEAN E MBEDDING OF C O - OCCURRENCE DATA J.B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000. L. Vandenberghe and S. Boyd. Semideﬁnite programming. SIAM Review, 38(1):49–95, 1996. K. Q. Weinberger and L. K. Saul. Unsupervised learning of image manifolds by semideﬁnite programming. International Journal of Computer Vision, 70(1):77–90, 2006. E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 505–512. MIT Press, Cambridge, MA, 2002. S. Yan D. Xu B. Zhang H.J. Zhang. Graph embedding: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 40–51, 2007. H. Zhong, J. Shi, and M. Visontai. Detecting unusual activity in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 819–826. IEEE Computer Society, Los-Alamitos, CA, 2004. 2295