jmlr jmlr2009 jmlr2009-86 jmlr2009-86-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, Luca Cazzanti
Abstract: This paper reviews and extends the field of similarity-based classification, presenting new analyses, algorithms, data sets, and a comprehensive set of experimental results for a rich collection of classification problems. Specifically, the generalizability of using similarities as features is analyzed, design goals and methods for weighting nearest-neighbors for similarity-based learning are proposed, and different methods for consistently converting similarities into kernels are compared. Experiments on eight real data sets compare eight approaches and their variants to similarity-based learning. Keywords: similarity, dissimilarity, similarity-based learning, indefinite kernels
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Molecular Biology, 215(3):403–410, Oct. 1990. A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. URL http://www.ics. uci.edu/˜mlearn/MLRepository.html. M.-F. Balcan, A. Blum, and N. Srebro. A theory of learning with similarity functions. Machine Learning, 72(1–2):89–112, Aug. 2008a. M.-F. Balcan, A. Blum, and N. Srebro. Improved guarantees for learning via similarity functions. In Proc. Ann. Conf. Learning Theory, 2008b. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learning Research, 3:463–482, Nov. 2002. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. and Machine Intel., 24(4):509–522, April 2002. I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, New York, 2nd edition, 2005. L. Cazzanti. Generative Models for Similarity-based Classification. PhD thesis, Dept. of Electrical Engineerng, University of Washington, 2007. L. Cazzanti and M. R. Gupta. Local similarity discriminant analysis. In Proc. Intl. Conf. Machine Learning, 2007. L. Cazzanti, M. R. Gupta, and A. J. Koppal. Generative models for similarity-based classification. Pattern Recognition, 41(7):2289–2297, July 2008. J. Chen and J. Ye. Training SVM with indefinite kernels. In Proc. Intl. Conf. Machine Learning, 2008. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000. J. E. Driskell and T. McDonald. Identification of incomplete networks. Technical report, Florida Maxima Corporation, 2008. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley & Sons, New York, 2nd edition, 2001. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, 2004. 773 C HEN , G ARCIA , G UPTA , R AHIMI AND C AZZANTI S. Feng, H. Krim, and I. A. Kogan. 3D face recognition using Euclidean integral invariants signature. In Proc. IEEE Workshop Statistical Signal Processing, 2007. M. P. Friedlander and M. R. Gupta. On minimizing distortion and relative entropy. IEEE Trans. Information Theory, 52(1):238–245, Jan. 2006. I. Gati and A. Tversky. Representations of qualitative and quantitative dimensions. J. Experimental Psychology: Human Perception & Performance, 8(2):325–340, April 1982. I. Gati and A. Tversky. Weighting common and distinctive features in perceptual and conceptual judgments. Cognitive Psychology, 16(3):341–370, July 1984. R. L. Goldstone and A. Kersten. Comprehensive Handbook of Psychology, volume 4, chapter 22: Concepts and Categorization, pages 599–621. Wiley, New Jersey, 2003. N. Goyal, Y. Lifshits, and H. Sch¨ tze. Disorder inequality: A combinatorial approach to nearest u neighbor search. In Proc. ACM Symposium Web Search and Data Mining, 2008. T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classification on pairwise proximity data. In Advances in Neural Information Processing Systems, 1998. T. Graepel, R. Herbrich, B. Sch¨ lkopf, A. Smola, P. Bartlett, K.-R. M¨ ller, K. Obermayer, and o u R. Williamson. Classification on proximity data with LP–machines. In Proc. Intl. Conf. Artificial Neural Networks, 1999. K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. J. Machine Learning Research, 8:725–760, April 2007. M. R. Gupta, R. M. Gray, and R. A. Olshen. Nonparametric supervised learning by linear interpolation with maximum entropy. IEEE Trans. Pattern Anal. and Machine Intel., 28(5):766–781, May 2006. B. Haasdonk. Feature space interpretation of SVMs with indefinite kernels. IEEE Trans. Pattern Anal. and Machine Intel., 27(4):482–492, April 2005. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2001. N. J. Higham. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications, 103:103–118, May 1988. S. Hochreiter and K. Obermayer. Support vector machines for dyadic data. Neural Computation, 18(6):1472–1510, June 2006. T. Hofmann and J. M. Buhmann. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. and Machine Intel., 19(1):1–14, Jan. 1997. C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks, 13(2):415–425, March 2002. 774 S IMILARITY- BASED C LASSIFICATION E. T. Jaynes. On the rationale for maximum entropy methods. Proc. IEEE, 70(9):939–952, Sept. 1982. T. Knebel, S. Hochreiter, and K. Obermayer. An SMO algorithm for the potential support vectormachine. Neural Computation, 20(1):271–287, Jan. 2008. J. Laub and K.-R. M¨ ller. Feature discovery in non-metric pairwise data. J. Machine Learning u Research, 5:801–808, July 2004. J. Laub, V. Roth, J. M. Buhmann, and K.-R. M¨ ller. On the information and representation of u non-Euclidean pairwise data. Pattern Recognition, 39(10):1815–1826, Oct. 2006. L. Liao and W. S. Noble. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Computational Biology, 10 (6):857–868, 2003. H.-T. Lin and C.-J. Lin. A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Technical report, National Taiwan University, March 2003. D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 227 (4693):1435–1441, March 1985. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Intl. J. Computer Vision, 60 (2):91–110, 2004. R. Luss and A. d’Aspremont. Support vector machine classification with indefinite kernels. In Advances in Neural Information Processing Systems, 2007. C. S. Ong, X. Mary, S. Canu, and A. J. Smola. Learning with non-positive kernels. In Proc. Intl. Conf. Machine Learning, 2004. E. Pekalska and R. P. W. Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8):943–956, June 2002. E. Pekalska, P. Pacl´k, and R. P. W. Duin. A generalized kernel approach to dissimilarity-based ı classification. J. Machine Learning Research, 2:175–211, Dec. 2001. S. Philips, J. Pitton, and L. Atlas. Perceptual feature identification for active sonar echoes. In Proc. IEEE OCEANS Conf., 2006. J. C. Platt. Using analytic QP and sparseness to speed training of support vector machines. In Advances in Neural Information Processing Systems, 1998. R. M. Rifkin. Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, MIT, 2002. V. Roth, J. Laub, M. Kawanabe, and J. M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans. Pattern Anal. and Machine Intel., 25(12):1540–1551, Dec. 2003. 775 C HEN , G ARCIA , G UPTA , R AHIMI AND C AZZANTI Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover’s distance as a metric for image retrieval. Intl. J. Computer Vision, 40(2):99–121, Nov. 2000. S. Santini and R. Jain. Similarity measures. IEEE Trans. Pattern Anal. and Machine Intel., 21(9): 871–883, Sept. 1999. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, o Optimization, and Beyond. MIT Press, Cambridge, MA, 2002. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Molecular Biology, 147(1):195–197, March 1981. C. Stanfill and D. Waltz. Toward memory-based reasoning. Comm. ACM, 29(12):1213–1228, Dec. 1986. R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Statistical Society, Series B (Statistical Methodology), 58(1):267–288, 1996. A. Tversky. Features of similarity. Psychological Review, 84(2):327–352, July 1977. A. Tversky and I. Gati. Similarity, separability, and the triangle inequality. Psychological Review, 89(2):123–154, March 1982. L. Wang, C. Yang, and J. Feng. On learning with dissimilarity functions. In Proc. Intl. Conf. Machine Learning, 2007. G. Wu, E. Y. Chang, and Z. Zhang. An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. Technical report, University of California, Santa Barbara, March 2005. H. Zhang, A. C. Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, 2006. X. Zhu. Semi-supervised Learning with Graphs. PhD thesis, School of Computer Science, Carnegie Mellon University, 2005. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, April 2005. 776