jmlr jmlr2009 jmlr2009-86 jmlr2009-86-reference knowledge-graph by maker-knowledge-mining

86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

Source: pdf

Author: Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, Luca Cazzanti

Abstract: This paper reviews and extends the ﬁeld of similarity-based classiﬁcation, presenting new analyses, algorithms, data sets, and a comprehensive set of experimental results for a rich collection of classiﬁcation problems. Speciﬁcally, the generalizability of using similarities as features is analyzed, design goals and methods for weighting nearest-neighbors for similarity-based learning are proposed, and different methods for consistently converting similarities into kernels are compared. Experiments on eight real data sets compare eight approaches and their variants to similarity-based learning. Keywords: similarity, dissimilarity, similarity-based learning, indeﬁnite kernels

reference text

S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Molecular Biology, 215(3):403–410, Oct. 1990. A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. URL http://www.ics. uci.edu/˜mlearn/MLRepository.html. M.-F. Balcan, A. Blum, and N. Srebro. A theory of learning with similarity functions. Machine Learning, 72(1–2):89–112, Aug. 2008a. M.-F. Balcan, A. Blum, and N. Srebro. Improved guarantees for learning via similarity functions. In Proc. Ann. Conf. Learning Theory, 2008b. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learning Research, 3:463–482, Nov. 2002. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. and Machine Intel., 24(4):509–522, April 2002. I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, New York, 2nd edition, 2005. L. Cazzanti. Generative Models for Similarity-based Classiﬁcation. PhD thesis, Dept. of Electrical Engineerng, University of Washington, 2007. L. Cazzanti and M. R. Gupta. Local similarity discriminant analysis. In Proc. Intl. Conf. Machine Learning, 2007. L. Cazzanti, M. R. Gupta, and A. J. Koppal. Generative models for similarity-based classiﬁcation. Pattern Recognition, 41(7):2289–2297, July 2008. J. Chen and J. Ye. Training SVM with indeﬁnite kernels. In Proc. Intl. Conf. Machine Learning, 2008. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000. J. E. Driskell and T. McDonald. Identiﬁcation of incomplete networks. Technical report, Florida Maxima Corporation, 2008. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classiﬁcation. John Wiley & Sons, New York, 2nd edition, 2001. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, 2004. 773 C HEN , G ARCIA , G UPTA , R AHIMI AND C AZZANTI S. Feng, H. Krim, and I. A. Kogan. 3D face recognition using Euclidean integral invariants signature. In Proc. IEEE Workshop Statistical Signal Processing, 2007. M. P. Friedlander and M. R. Gupta. On minimizing distortion and relative entropy. IEEE Trans. Information Theory, 52(1):238–245, Jan. 2006. I. Gati and A. Tversky. Representations of qualitative and quantitative dimensions. J. Experimental Psychology: Human Perception & Performance, 8(2):325–340, April 1982. I. Gati and A. Tversky. Weighting common and distinctive features in perceptual and conceptual judgments. Cognitive Psychology, 16(3):341–370, July 1984. R. L. Goldstone and A. Kersten. Comprehensive Handbook of Psychology, volume 4, chapter 22: Concepts and Categorization, pages 599–621. Wiley, New Jersey, 2003. N. Goyal, Y. Lifshits, and H. Sch¨ tze. Disorder inequality: A combinatorial approach to nearest u neighbor search. In Proc. ACM Symposium Web Search and Data Mining, 2008. T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classiﬁcation on pairwise proximity data. In Advances in Neural Information Processing Systems, 1998. T. Graepel, R. Herbrich, B. Sch¨ lkopf, A. Smola, P. Bartlett, K.-R. M¨ ller, K. Obermayer, and o u R. Williamson. Classiﬁcation on proximity data with LP–machines. In Proc. Intl. Conf. Artiﬁcial Neural Networks, 1999. K. Grauman and T. Darrell. The pyramid match kernel: Efﬁcient learning with sets of features. J. Machine Learning Research, 8:725–760, April 2007. M. R. Gupta, R. M. Gray, and R. A. Olshen. Nonparametric supervised learning by linear interpolation with maximum entropy. IEEE Trans. Pattern Anal. and Machine Intel., 28(5):766–781, May 2006. B. Haasdonk. Feature space interpretation of SVMs with indeﬁnite kernels. IEEE Trans. Pattern Anal. and Machine Intel., 27(4):482–492, April 2005. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2001. N. J. Higham. Computing a nearest symmetric positive semideﬁnite matrix. Linear Algebra and its Applications, 103:103–118, May 1988. S. Hochreiter and K. Obermayer. Support vector machines for dyadic data. Neural Computation, 18(6):1472–1510, June 2006. T. Hofmann and J. M. Buhmann. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. and Machine Intel., 19(1):1–14, Jan. 1997. C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks, 13(2):415–425, March 2002. 774 S IMILARITY- BASED C LASSIFICATION E. T. Jaynes. On the rationale for maximum entropy methods. Proc. IEEE, 70(9):939–952, Sept. 1982. T. Knebel, S. Hochreiter, and K. Obermayer. An SMO algorithm for the potential support vectormachine. Neural Computation, 20(1):271–287, Jan. 2008. J. Laub and K.-R. M¨ ller. Feature discovery in non-metric pairwise data. J. Machine Learning u Research, 5:801–808, July 2004. J. Laub, V. Roth, J. M. Buhmann, and K.-R. M¨ ller. On the information and representation of u non-Euclidean pairwise data. Pattern Recognition, 39(10):1815–1826, Oct. 2006. L. Liao and W. S. Noble. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Computational Biology, 10 (6):857–868, 2003. H.-T. Lin and C.-J. Lin. A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Technical report, National Taiwan University, March 2003. D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 227 (4693):1435–1441, March 1985. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Intl. J. Computer Vision, 60 (2):91–110, 2004. R. Luss and A. d’Aspremont. Support vector machine classiﬁcation with indeﬁnite kernels. In Advances in Neural Information Processing Systems, 2007. C. S. Ong, X. Mary, S. Canu, and A. J. Smola. Learning with non-positive kernels. In Proc. Intl. Conf. Machine Learning, 2004. E. Pekalska and R. P. W. Duin. Dissimilarity representations allow for building good classiﬁers. Pattern Recognition Letters, 23(8):943–956, June 2002. E. Pekalska, P. Pacl´k, and R. P. W. Duin. A generalized kernel approach to dissimilarity-based ı classiﬁcation. J. Machine Learning Research, 2:175–211, Dec. 2001. S. Philips, J. Pitton, and L. Atlas. Perceptual feature identiﬁcation for active sonar echoes. In Proc. IEEE OCEANS Conf., 2006. J. C. Platt. Using analytic QP and sparseness to speed training of support vector machines. In Advances in Neural Information Processing Systems, 1998. R. M. Rifkin. Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, MIT, 2002. V. Roth, J. Laub, M. Kawanabe, and J. M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans. Pattern Anal. and Machine Intel., 25(12):1540–1551, Dec. 2003. 775 C HEN , G ARCIA , G UPTA , R AHIMI AND C AZZANTI Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover’s distance as a metric for image retrieval. Intl. J. Computer Vision, 40(2):99–121, Nov. 2000. S. Santini and R. Jain. Similarity measures. IEEE Trans. Pattern Anal. and Machine Intel., 21(9): 871–883, Sept. 1999. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, o Optimization, and Beyond. MIT Press, Cambridge, MA, 2002. T. F. Smith and M. S. Waterman. Identiﬁcation of common molecular subsequences. J. Molecular Biology, 147(1):195–197, March 1981. C. Stanﬁll and D. Waltz. Toward memory-based reasoning. Comm. ACM, 29(12):1213–1228, Dec. 1986. R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Statistical Society, Series B (Statistical Methodology), 58(1):267–288, 1996. A. Tversky. Features of similarity. Psychological Review, 84(2):327–352, July 1977. A. Tversky and I. Gati. Similarity, separability, and the triangle inequality. Psychological Review, 89(2):123–154, March 1982. L. Wang, C. Yang, and J. Feng. On learning with dissimilarity functions. In Proc. Intl. Conf. Machine Learning, 2007. G. Wu, E. Y. Chang, and Z. Zhang. An analysis of transformation on non-positive semideﬁnite similarity matrix for kernel machines. Technical report, University of California, Santa Barbara, March 2005. H. Zhang, A. C. Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor classiﬁcation for visual category recognition. In Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, 2006. X. Zhu. Semi-supervised Learning with Graphs. PhD thesis, School of Computer Science, Carnegie Mellon University, 2005. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, April 2005. 776