nips nips2002 nips2002-98 knowledge-graph by maker-knowledge-mining

98 nips-2002-Going Metric: Denoising Pairwise Data


Source: pdf

Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. [sent-13, score-0.059]

2 First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. [sent-15, score-0.277]

3 The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. [sent-16, score-0.956]

4 Based on this new vectorial representation, denoising methods are applied in a second step. [sent-17, score-0.299]

5 Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. [sent-18, score-0.452]

6 We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. [sent-19, score-0.152]

7 1 Introduction Unsupervised grouping or clustering aims at extracting hidden structure from data (see e. [sent-20, score-0.346]

8 bioinformatics or imaging, the data is solely available as scores of pairwise comparisons. [sent-25, score-0.371]

9 Pairwise data is in no natural way related to the common viewpoint of objects lying in some "well behaved" space like a vector space. [sent-26, score-0.072]

10 Two cases should be distinguished: (i) The triangle inequality might not be satisfied as a result of noisy measurements (for instance using string alignment algorithms in DNA analysis). [sent-28, score-0.083]

11 This paper proposes an algorithm to metricize and subsequently de noise pairwise data. [sent-35, score-0.297]

12 [14]) for metrization, then constructs a positive semidefinite matrix which can in sequel be used for denoising and clustering purposes. [sent-37, score-0.464]

13 The next section introduces techniques for metrization, denoising and clustering pairwise data. [sent-39, score-0.631]

14 This is followed by a section illustrating our methods for real world data such as bacterial GyrE amino acid sequences and sequences from the ProD om data base and a brief discussion. [sent-40, score-0.617]

15 2 Proximity-based clustering and denoising One of the most popular methods for grouping vectorial data is k-means clustering (see e. [sent-41, score-0.818]

16 It derives a set of k prototype vectors which quantize the data set with minimal quantization error. [sent-44, score-0.12]

17 Partitioning proximity data is considered a much harder problem, since the inherent structure of n samples is hidden in n 2 pairwise relations. [sent-45, score-0.352]

18 The pairwise proximities can violate the requirements of a distance measure, i. [sent-46, score-0.406]

19 Thus, a lossfree embedding into a vector space is not possible, so that grouping problems of this kind cannot be directly transformed into a vectorial representation by means of classical embedding strategies such as multi-dimensional scaling (MDS [4]). [sent-49, score-0.821]

20 Moreover clustering the MDS embedded data-vectors in general yields partitionings different from those obtained by directly solving the pairwise problem, since embedding constraints might be in conflict with the clustering goal. [sent-50, score-0.911]

21 The assignments of objects to clusters are encoded in the binary stochastic matrix M E {O, l}nxk : 2:~=1 Miv = 1. [sent-52, score-0.296]

22 For such cost functions it can be shown [14] that there always exists a set of vectorial data representations-the constant shift embeddings-such that the grouping problem can be equivalently restated in terms of Euclidian distances between these vectors. [sent-53, score-0.576]

23 In order to handle non-symmetric dissimilarities, it should be noticed that HPc is also invariant under symmetrizing transformations: Dij +- 1/2(Dij + Dji). [sent-54, score-0.135]

24 Constant shift embedding Let D = (Dij) E jRnxn be the matrix of pairwise squared dissimilarities between n objects. [sent-61, score-0.951]

25 This corresponds to a constant additive shift Dij = Dij + Do for all i i:- j. [sent-72, score-0.265]

26 We look for the minimal constant shift Do such that D satisfy the triangle inequality. [sent-73, score-0.301]

27 In order to make the main result clear, we first need to introduce the notion of a centralized matrix. [sent-74, score-0.106]

28 Let P be an arbitrary matrix and let Q = I - ~ee T. [sent-75, score-0.067]

29 Q is the projection matrix on the orthogonal complement of e. [sent-76, score-0.067]

30 That is, any matrix of the form (Sij + I/2~Si + I/2~Sj) gives the same distance D as S for arbitrary ~Si's. [sent-81, score-0.117]

31 By simple algebra it can be shown that se = - ~ De, i. [sent-82, score-0.123]

32 Furthermore D derives from a squared Euclidian distance if and only if s e is positive semi-definite [14]. [sent-85, score-0.132]

33 Let s e = s e - An(se)In, where AnU is the minimal eigenvalue of its argument. [sent-86, score-0.078]

34 Do = -2An(se) is the minimal constant such that D = D + Do (ee T - In) derive from squared Euclidian distance. [sent-92, score-0.115]

35 We have thus shown that applying large enough additive shifts to the off-diagonal elements of D results in a matrix se that is positive semi-definite, and can thus be interpreted as a Gram matrix. [sent-94, score-0.19]

36 This means, that in some (n - I)-dimensional Euclidian space there exists a vector representation of the objects, summarized in the "design" matrix X (the rows of X are the feature vectors), such that se = XX T . [sent-95, score-0.19]

37 For the pairwise clustering cost function the optimal assignments of objects to clusters are invariant under the constant-shift embedding procedure, according to theorem 2. [sent-96, score-0.979]

38 Hence, the grouping problem can be re-formulated as optimizing the classical k-means criterion in the embedding space. [sent-98, score-0.45]

39 In many applications, however, it is advantageous not to cluster in the full space but to insert some dimension reduction step, that serves the purpose of increasing efficiency and noise reduction. [sent-99, score-0.12]

40 While it is unclear how to denoise for the original pairwise object representations while respecting additivity, scale- and shift invariance, and statistical robustness properties of the clustering criterion, we can easily apply kernel PCA [16] to Be after the constant-shift embedding. [sent-100, score-0.762]

41 Denoising of pairwise data by Constant Shift Embedding For de noising we construct D which derives from "real" points in a vector space, i. [sent-101, score-0.471]

42 In a first step, we briefly describe, how these real points can be recovered by loss-free kernel PCA [16]: (i) Calculate the centralized kernel matrix = -~QDQ . [sent-104, score-0.305]

43 (iii) Calculate the n x (n - 2) mapping matrix X~_2 = V':_2 (A~_2)1 /2, where V':_2 = (V1, . [sent-119, score-0.067]

44 se se The rows of X~_2 contain the vectors {xD (i = 1,2 . [sent-134, score-0.288]

45 *(A*)1/2 , t t t where i't* consists of the first t column vectors of V':_2 and At is the top txt submatrix of A~ _ 2' The vectors in ~t then differ the least from the vectors in ~n - 2 in the sense of a quadratic error. [sent-143, score-0.126]

46 The advantages of this method in comparison to directly applying classical scaling via MDS are: (i) t can be larger than the number p of positive eigenvalues, (ii) the embedded vectors are the best least squares error approximation to the optimal vectors which preserve the grouping structure. [sent-144, score-0.336]

47 1 Application on protein sequences Bacterial GyrB amino acid sequences We first illustrate our de noising technique on the gyrase subunit B. [sent-147, score-0.765]

48 The dataset consists of 84 amino acid sequences from five genera in Actinobacteria: 1: Corynebacterium, 2: Mycobacterium, 3: Gordonia, 4: Nocardia and 5: Rhodococcus. [sent-148, score-0.475]

49 The authors hinted at the possibility of computing the distance matrix by using BLAST scores [2], noting, however, that these scores could not be converted into positive semidefinite kernels. [sent-151, score-0.328]

50 In our experiment, the sequences have been aligned by the Smith-Waterman algorithm [11] which yields pairwise alignment scores. [sent-152, score-0.577]

51 Using constant shift embedding a positive semidefinite kernel is obtained, leaving the cluster assignment unchanged for shift invariant cost functions. [sent-153, score-1.02]

52 Several projections to lower dimensions have been tested and t = 5 turned out to be a good choice, eliminating the bulk of noise while retaining the essential cluster structure. [sent-155, score-0.12]

53 Figure 1 shows the striking improvement of the distance matrix after denoising. [sent-156, score-0.117]

54 On the left hand side the ideal distance matrix is depicted, consisting solely of O's (black) and l 's (white), reflecting the true cluster membership. [sent-157, score-0.283]

55 In the middle and on the right the original and the denoised distance matrix are shown, respectively. [sent-158, score-0.222]

56 Denoising visibly accentuates the cluster structure in the pairwise data. [sent-159, score-0.47]

57 Since we 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 20 40 60 80 20 40 60 80 20 40 60 80 Figure 1: Distance matrix: On the left the ideal distance matrix reflects the true cluster structure. [sent-160, score-0.237]

58 In the middle and on the right: distance matrix before and after de noising dispose of the true labels, we can quantitatively assess the improvement by denoising. [sent-161, score-0.249]

59 We performed usual k-means clustering, followed by a majority voting to match cluster labeling. [sent-162, score-0.155]

60 For the denoised data we obtained 3 misclassifications (3. [sent-163, score-0.105]

61 This simple experiment corroborates the usefulness of our embedding and denoising strategy for pairwise data. [sent-166, score-0.691]

62 In order to fulfill the spirit of the theory of constant-shift embedding, the costfunction of the data-mining algorithm subsequent to the embedding needs to be shift invariant. [sent-167, score-0.459]

63 For the classification of genera 3 - 5 and 4 - 5 we obtain a substantial improvement by denoising. [sent-176, score-0.207]

64 Interestingly this is not the case for genera 3 - 4 which may be due to the elimination of discriminative features by the de noising procedure. [sent-177, score-0.27]

65 The error still is significantly smaller than the error obtained by MCK2 and FK, which is in agreement with the superiority of a structure preserving embedding of Smith-Waterman scores even when left undenoised: FK and MCK are kernels de- Genera 3- 4 3-5 4-5 FK 10. [sent-178, score-0.307]

66 17 Table 1: Comparison of mean test-error of supervised classification by linear SVM of genera with training sample 25 % of the total sample. [sent-190, score-0.207]

67 The results for MCK2 (Marginalized Count Kernel) and FK (Fisher Kernel) is obtained by kernel Fisher discriminant analysis which compares favorably to the SVM in several benchmarks [18]. [sent-191, score-0.066]

68 rived from a generative model, whereas the alignment scores are obtained from a matching algorithm specifically tuned for protein sequences, reflecting much better the underlying structure of protein data. [sent-192, score-0.401]

69 2 Clustering of ProDom sequences The analysis described in this section aims at finding a partition of domain sequences from the ProDom database, [3], that is meaningful w. [sent-194, score-0.394]

70 In order to measure the quality of the grouping solution, we use the computed solution in a predictive way to assign group labels to SCOP sequences, which have been labeled by experts according to their structure, [10]. [sent-198, score-0.215]

71 For demonstration purposes, we select the following subset of sequences from prodom2001. [sent-200, score-0.197]

72 srs: among all sequences we choose those which are highly similar to at least one sequence contained in the first four folds of the SCOP database. [sent-202, score-0.267]

73 2 Between these sequences, we compute pairwise (length-corrected and standardized) Smith-Waterman alignment scores, summarized in the matrix (Sij). [sent-203, score-0.447]

74 These similarities are transformed into dissimilarities by setting Dij := Sii + Sjj - 2Sij . [sent-204, score-0.088]

75 The centralized score matrix SC = -1/2Dc possesses some highly negative eigenvalues, indicating that metric properties are violated. [sent-205, score-0.223]

76 Applying the constant-shift embedding method, a valid Mercer kernel is derived, with an eigenvalue spectrum that shows only a few dominating components over a broad "noise"-spectrum (see figure 2). [sent-206, score-0.341]

77 Extracting the first 16 leading principal components 3 leads to a vector representation of the sequences as points in ~16. [sent-207, score-0.232]

78 The model order was selected by applying a re-sampling based stability analysis, which has been demonstrated to be a suitable model order selection criterion for unsupervised grouping problems in [13]. [sent-209, score-0.173]

79 In order to measure the quality of the grouping solution, all 1158 SCOP sequences from the first four folds are embedded into the 16-dimensional space. [sent-210, score-0.475]

80 Figure 3 shows both the predicted group membership of these sequences and their true SCOP fold-label in the form of a bar diagram: the sequences are ordered by increasing group label (the lower horizontal bar), and compared with the true fold classification (upper bar) . [sent-212, score-0.678]

81 In order to quantify the results, the inferred clusters are 2"Highly similar" here means that the highest alignment score exceeds a predefined threshold. [sent-213, score-0.141]

82 We only used a subset of 800 randomly chosen proteins for estimating the 16 leading eigenvectors. [sent-216, score-0.077]

83 1158 SCOP sequences from folds 1-4 ~1==::j1"-------~ 1 I ~I(=~~~I-. [sent-222, score-0.267]

84 Cluster I Errors majority voting Cluster 2 Figure 3: Visualization of cluster membership of the SCOP sequences contained in folds 1-4. [sent-235, score-0.472]

85 Despite this surprisingly high percentage, it is necessary to deeper analyze the biological relevance of the inferred grouping solution. [sent-236, score-0.173]

86 In order to check to what extent the above "over-all" result is influenced by artefacts due to highly related (or even almost identical) SCOP sequences, we repeated the analysis based on the subset of 128 SCOP sequences with less than 50 % sequence identity (PDB50). [sent-237, score-0.243]

87 Predicting the group membership of these 128 sequences and using the same re-Iabeling approach, we can correctly identify 86 % of the fold-labels. [sent-238, score-0.289]

88 4 Discussion and Conclusion This paper provides two main contributions that are highly useful when analyzing pairwise data. [sent-240, score-0.297]

89 First, we employ the concept of constant shift embedding to provide a metric representation of the data. [sent-241, score-0.548]

90 For a certain class of grouping principles sharing a shift-invariance property, this embedding is distortion-less in the sense that it does not influence the optimal assignments of objects to groups. [sent-242, score-0.577]

91 Given the metricized data we can now use common signal (pre- )processing and denoising techniques that are typically only defined for vectorial data. [sent-243, score-0.299]

92 As we investigate the clustering of protein sequences from data bases like GyrB and ProDom, we are given non-metric pairwise proximity information that is strongly deteriorated by the shortcomings of the available alignment procedures. [sent-244, score-0.939]

93 Thus, it is important to apply denoising techniques to the data as a second step before running the actual clustering procedure. [sent-245, score-0.334]

94 We find that the combination of these two processing steps is successful in unraveling protein structure, greatly improving over existing methods (as exemplified for GyrB and ProDom). [sent-246, score-0.099]

95 We will also explore the perspectives it opens in any field handling pairwise data. [sent-248, score-0.297]

96 Acknowledgments The gyrE amino acid sequences where offered by courtesy of Identification and Classification of Bacteria (ICB) databank team [19]. [sent-249, score-0.337]

97 Prodom and prodom-cg: tools for protein domain analysis and whole genome comparisons. [sent-278, score-0.141]

98 Construction of the gyrb database for the identification and classification of bacteria. [sent-307, score-0.287]

99 Scop: a structural classification of proteins database for the investigation of sequences and structures. [sent-337, score-0.391]

100 Icb database: the gyrb database for identification and classification of bacteria. [sent-404, score-0.287]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('scop', 0.299), ('pairwise', 0.297), ('embedding', 0.233), ('shift', 0.226), ('dij', 0.212), ('sequences', 0.197), ('grouping', 0.173), ('clustering', 0.173), ('denoising', 0.161), ('genera', 0.138), ('vectorial', 0.138), ('noising', 0.132), ('prodom', 0.132), ('euclidian', 0.126), ('se', 0.123), ('cluster', 0.12), ('gyrb', 0.115), ('bonn', 0.115), ('centralized', 0.106), ('denoised', 0.105), ('assignments', 0.099), ('protein', 0.099), ('dissimilarities', 0.088), ('mds', 0.084), ('alignment', 0.083), ('likes', 0.079), ('metrization', 0.079), ('scores', 0.074), ('objects', 0.072), ('acid', 0.07), ('amino', 0.07), ('folds', 0.07), ('fk', 0.07), ('classification', 0.069), ('matrix', 0.067), ('roth', 0.067), ('kernel', 0.066), ('semidefinite', 0.063), ('pca', 0.063), ('marginalized', 0.059), ('violate', 0.059), ('clusters', 0.058), ('identification', 0.055), ('jd', 0.055), ('proximity', 0.055), ('germany', 0.053), ('additivity', 0.053), ('gyre', 0.053), ('hpc', 0.053), ('miv', 0.053), ('prod', 0.053), ('roemerstr', 0.053), ('symmetrizing', 0.053), ('undenoised', 0.053), ('visibly', 0.053), ('iii', 0.051), ('metric', 0.05), ('membership', 0.05), ('distance', 0.05), ('database', 0.048), ('eigenvalues', 0.048), ('invariant', 0.047), ('ii', 0.047), ('influenced', 0.046), ('bacterial', 0.046), ('laub', 0.046), ('sii', 0.046), ('sjj', 0.046), ('reflecting', 0.046), ('informatik', 0.046), ('sij', 0.046), ('triangular', 0.046), ('classical', 0.044), ('fold', 0.044), ('proteins', 0.042), ('derives', 0.042), ('vectors', 0.042), ('icb', 0.042), ('potsdam', 0.042), ('genome', 0.042), ('miiller', 0.042), ('group', 0.042), ('eigenvalue', 0.042), ('squared', 0.04), ('ee', 0.04), ('constant', 0.039), ('bar', 0.037), ('watanabe', 0.037), ('om', 0.037), ('berlin', 0.036), ('minimal', 0.036), ('structural', 0.035), ('noticed', 0.035), ('fraunhofer', 0.035), ('visualization', 0.035), ('bases', 0.035), ('voting', 0.035), ('embedded', 0.035), ('leading', 0.035), ('acids', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 98 nips-2002-Going Metric: Denoising Pairwise Data

Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1

2 0.20763263 145 nips-2002-Mismatch String Kernels for SVM Protein Classification

Author: Eleazar Eskin, Jason Weston, William S. Noble, Christina S. Leslie

Abstract: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings. ¡ ¢

3 0.17367074 53 nips-2002-Clustering with the Fisher Score

Author: Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller

Abstract: Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

4 0.16685344 27 nips-2002-An Impossibility Theorem for Clustering

Author: Jon M. Kleinberg

Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. 1

5 0.15825781 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

Author: Olivier Chapelle, Jason Weston, Bernhard SchĂślkopf

Abstract: We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach. 1

6 0.13931715 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information

7 0.13569258 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

8 0.12366261 97 nips-2002-Global Versus Local Methods in Nonlinear Dimensionality Reduction

9 0.11481888 188 nips-2002-Stability-Based Model Selection

10 0.11380477 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

11 0.11074137 119 nips-2002-Kernel Dependency Estimation

12 0.10992556 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs

13 0.088938788 69 nips-2002-Discriminative Learning for Label Sequences via Boosting

14 0.088588931 34 nips-2002-Artefactual Structure from Least-Squares Multidimensional Scaling

15 0.08847703 32 nips-2002-Approximate Inference and Protein-Folding

16 0.084655315 90 nips-2002-Feature Selection in Mixture-Based Clustering

17 0.083907917 156 nips-2002-On the Complexity of Learning the Kernel Matrix

18 0.083623424 83 nips-2002-Extracting Relevant Structures with Side Information

19 0.082615577 190 nips-2002-Stochastic Neighbor Embedding

20 0.081827484 87 nips-2002-Fast Transformation-Invariant Factor Analysis


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.236), (1, -0.12), (2, 0.069), (3, -0.017), (4, -0.154), (5, 0.042), (6, 0.035), (7, -0.12), (8, -0.115), (9, 0.199), (10, 0.139), (11, 0.094), (12, -0.166), (13, 0.105), (14, -0.087), (15, 0.061), (16, -0.002), (17, 0.11), (18, 0.037), (19, 0.076), (20, -0.009), (21, -0.121), (22, 0.018), (23, 0.073), (24, 0.024), (25, -0.001), (26, 0.089), (27, 0.13), (28, -0.008), (29, -0.038), (30, 0.053), (31, 0.079), (32, -0.022), (33, 0.067), (34, -0.03), (35, 0.035), (36, 0.036), (37, 0.044), (38, -0.025), (39, 0.065), (40, 0.005), (41, -0.041), (42, 0.026), (43, -0.028), (44, 0.162), (45, 0.058), (46, 0.008), (47, 0.029), (48, 0.017), (49, -0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96218455 98 nips-2002-Going Metric: Denoising Pairwise Data

Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1

2 0.69981438 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information

Author: Eric P. Xing, Michael I. Jordan, Stuart Russell, Andrew Y. Ng

Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many “plausible” ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider “similar.” For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, , learns a distance metric over if desired, dissimilar) pairs of points in that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance. £ ¤¢ £ ¥¢

3 0.62484258 27 nips-2002-An Impossibility Theorem for Clustering

Author: Jon M. Kleinberg

Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. 1

4 0.5869298 53 nips-2002-Clustering with the Fisher Score

Author: Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller

Abstract: Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

5 0.55322355 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs

Author: Gianluca Pollastri, Pierre Baldi, Alessandro Vullo, Paolo Frasconi

Abstract: We develop and test new machine learning methods for the prediction of topological representations of protein structures in the form of coarse- or fine-grained contact or distance maps that are translation and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topology directly in the fine-grained case and, in the coarsegrained case, indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achieve state-of-the-art performance. 1 Introduction: Protein Topology Prediction Predicting the 3D structure of protein chains from the linear sequence of amino acids is a fundamental open problem in computational molecular biology [1]. Any approach to the problem must deal with the basic fact that protein structures are translation and rotation invariant. To address this invariance, we have proposed a machine learning approach to protein structure prediction [4] based on the prediction of topological representations of proteins, in the form of contact or distance maps. The contact or distance map is a 2D representation of neighborhood relationships consisting of an adjacency matrix at some distance cutoff (typically in the range of 6 to 12 ˚), or a matrix of pairwise Euclidean distances. Fine-grained maps A are derived at the amino acid or even atomic level. Coarse maps are obtained by looking at secondary structure elements, such as helices, and the distance between their centers of gravity or, as in the simulations below, the minimal distances between their Cα atoms. Reasonable methods for reconstructing 3D coordinates from contact/distance maps have been developed in the NMR literature and elsewhere Oi B Hi F Hi Ii Figure 1: Bayesian network for bidirectional IOHMMs consisting of input units, output units, and both forward and backward Markov chains of hidden states. [14] using distance geometry and stochastic optimization techniques. Thus the main focus here is on the more difficult task of contact map prediction. Various algorithms for the prediction of contact maps have been developed, in particular using feedforward neural networks [6]. The best contact map predictor in the literature and at the last CASP prediction experiment reports an average precision [True Positives/(True Positives + False Positives)] of 21% for distant contacts, i.e. with a linear distance of 8 amino acid or more [6] for fine-grained amino acid maps. While this result is encouraging and well above chance level by a factor greater than 6, it is still far from providing sufficient accuracy for reliable 3D structure prediction. A key issue in this area is the amount of noise that can be tolerated in a contact map prediction without compromising the 3D-reconstruction step. While systematic tests in this area have not yet been published, preliminary results appear to indicate that recovery of as little as half of the distant contacts may suffice for proper reconstruction, at least for proteins up to 150 amino acid long (Rita Casadio and Piero Fariselli, private communication and oral presentation during CASP4 [10]). It is important to realize that the input to a fine-grained contact map predictor need not be confined to the sequence of amino acids only, but may also include evolutionary information in the form of profiles derived by multiple alignment of homologue proteins, or structural feature information, such as secondary structure (alpha helices, beta strands, and coils), or solvent accessibility (surface/buried), derived by specialized predictors [12, 13]. In our approach, we use different GIOHMM and GRNN strategies to predict both structural features and contact maps. 2 GIOHMM Architectures Loosely speaking, GIOHMMs are Bayesian networks with input, hidden, and output units that can be used to process complex data structures such as sequences, images, trees, chemical compounds and so forth, built on work in, for instance, [5, 3, 7, 2, 11]. In general, the connectivity of the graphs associated with the hidden units matches the structure of the data being processed. Often multiple copies of the same hidden graph, but with different edge orientations, are used in the hidden layers to allow direct propagation of information in all relevant directions. Output Plane NE NW 4 Hidden Planes SW SE Input Plane Figure 2: 2D GIOHMM Bayesian network for processing two-dimensional objects such as contact maps, with nodes regularly arranged in one input plane, one output plane, and four hidden planes. In each hidden plane, nodes are arranged on a square lattice, and all edges are oriented towards the corresponding cardinal corner. Additional directed edges run vertically in column from the input plane to each hidden plane, and from each hidden plane to the output plane. To illustrate the general idea, a first example of GIOHMM is provided by the bidirectional IOHMMs (Figure 1) introduced in [2] to process sequences and predict protein structural features, such as secondary structure. Unlike standard HMMs or IOHMMS used, for instance in speech recognition, this architecture is based on two hidden markov chains running in opposite directions to leverage the fact that biological sequences are spatial objects rather than temporal sequences. Bidirectional IOHMMs have been used to derive a suite of structural feature predictors [12, 13, 4] available through http://promoter.ics.uci.edu/BRNN-PRED/. These predictors have accuracy rates in the 75-80% range on a per amino acid basis. 2.1 Direct Prediction of Topology To predict contact maps, we use a 2D generalization of the previous 1D Bayesian network. The basic version of this architecture (Figures 2) contains 6 layers of units: input, output, and four hidden layers, one for each cardinal corner. Within each column indexed by i and j, connections run from the input to the four hidden units, and from the four hidden units to the output unit. In addition, the hidden units in each hidden layer are arranged on a square or triangular lattice, with all the edges oriented towards the corresponding cardinal corner. Thus the parameters of this two-dimensional GIOHMMs, in the square lattice case, are the conditional probability distributions:  NE NW SW SE  P (Oi |Ii,j , Hi,j , Hi,j , Hi,j , Hi,j, )   NE NE NE  P (Hi,j |Ii,j , Hi−1,j , Hi,j−1 )  N NW NW P (Hi,jW |Ii,j , Hi+1,j , Hi,j−1 )  SW SW SW  P (Hi,j |Ii,j , Hi+1,j , Hi,j+1 )    SE SE SE P (Hi,j |Ii,j , Hi−1,j , Hi,j+1 ) (1) In a contact map prediction at the amino acid level, for instance, the (i, j) output represents the probability of whether amino acids i and j are in contact or not. This prediction depends directly on the (i, j) input and the four-hidden units in the same column, associated with omni-directional contextual propagation in the hidden planes. In the simulations reported below, we use a more elaborated input consisting of a 20 × 20 probability matrix over amino acid pairs derived from a multiple alignment of the given protein sequence and its homologues, as well as the structural features of the corresponding amino acids, including their secondary structure classification and their relative exposure to the solvent, derived from our corresponding predictors. It should be clear how GIOHMM ideas can be generalized to other data structures and problems in many ways. In the case of 3D data, for instance, a standard GIOHMM would have an input cube, an output cube, and up to 8 cubes of hidden units, one for each corner with connections inside each hidden cube oriented towards the corresponding corner. In the case of data with an underlying tree structure, the hidden layers would correspond to copies of the same tree with different orientations and so forth. Thus a fundamental advantage of GIOHMMs is that they can process a wide range of data structures of variable sizes and dimensions. 2.2 Indirect Prediction of Topology Although GIOHMMs allow flexible integration of contextual information over ranges that often exceed what can be achieved, for instance, with fixed-input neural networks, the models described above still suffer from the fact that the connections remain local and therefore long-ranged propagation of information during learning remains difficult. Introduction of large numbers of long-ranged connections is computationally intractable but in principle not necessary since the number of contacts in proteins is known to grow linearly with the length of the protein, and hence connectivity is inherently sparse. The difficulty of course is that the location of the long-ranged contacts is not known. To address this problem, we have developed also a complementary GIOHMM approach described in Figure 3 where a candidate graph structure is proposed in the hidden layers of the GIOHMM, with the two different orientations naturally associated with a protein sequence. Thus the hidden graphs change with each protein. In principle the output ought to be a single unit (Figure 3b) which directly computes a global score for the candidate structure presented in the hidden layer. In order to cope with long-ranged dependencies, however, it is preferable to compute a set of local scores (Figure 3c), one for each vertex, and combine the local scores into a global score by averaging. More specifically, consider a true topology represented by the undirected contact graph G∗ = (V, E ∗ ), and a candidate undirected prediction graph G = (V, E). A global measure of how well E approximates E ∗ is provided by the informationretrieval F1 score defined by the normalized edge-overlap F1 = 2|E ∩ E ∗ |/(|E| + |E ∗ |) = 2P R/(P + R), where P = |E ∩ E ∗ |/|E| is the precision (or specificity) and R = |E ∩ E ∗ |/|E ∗ | is the recall (or sensitivity) measure. Obviously, 0 ≤ F1 ≤ 1 and F1 = 1 if and only if E = E ∗ . The scoring function F1 has the property of being monotone in the sense that if |E| = |E | then F1 (E) < F1 (E ) if and only if |E ∩ E ∗ | < |E ∩ E ∗ |. Furthermore, if E = E ∪ {e} where e is an edge in E ∗ but not in E, then F1 (E ) > F1 (E). Monotonicity is important to guide the search in the space of possible topologies. It is easy to check that a simple search algorithm based on F1 takes on the order of O(|V |3 ) steps to find E ∗ , basically by trying all possible edges one after the other. The problem then is to learn F1 , or rather a good approximation to F1 . To approximate F1 , we first consider a similar local measure Fv by considering the O I(v) I(v) F B H (v) H (v) (a) I(v) F B H (v) H (v) (b) O(v) (c) Figure 3: Indirect prediction of contact maps. (a) target contact graph to be predicted. (b) GIOHMM with two hidden layers: the two hidden layers correspond to two copies of the same candidate graph oriented in opposite directions from one end of the protein to the other end. The single output O is the global score of how well the candidate graph approximates the true contact map. (c) Similar to (b) but with a local score O(v) at each vertex. The local scores can be averaged to produce a global score. In (b) and (c) I(v) represents the input for vertex v, and H F (v) and H B (v) are the corresponding hidden variables. ∗ ∗ set Ev of edges adjacent to vertex v and Fv = 2|Ev ∩ Ev |/(|Ev | + |Ev |) with the ¯ global average F = v Fv /|V |. If n and n∗ are the average degrees of G and G∗ , it can be shown that: F1 = 1 |V | v 2|Ev ∩ E ∗ | n + n∗ and 1 ¯ F = |V | v 2|Ev ∩ E ∗ | n + v + n∗ + ∗ v (2) where n + v (resp. n∗ + ∗ ) is the degree of v in G (resp. in G∗ ). In particular, if G v ¯ ¯ and G∗ are regular graphs, then F1 (E) = F (E) so that F is a good approximation to F1 . In the contact map regime where the number of contacts grows linearly with the length of the sequence, we should have in general |E| ≈ |E ∗ | ≈ (1 + α)|V | so that each node on average has n = n∗ = 2(1 + α) edges. The value of α depends of course on the neighborhood cutoff. As in reinforcement learning, to learn the scoring function one is faced with the problem of generating good training sets in a high dimensional space, where the states are the topologies (graphs), and the policies are algorithms for adding a single edge to a given graph. In the simulations we adopt several different strategies including static and dynamic generation. Within dynamic generation we use three exploration strategies: random exploration (successor graph chosen at random), pure exploitation (successor graph maximizes the current scoring function), and semi-uniform exploitation to find a balance between exploration and exploitation [with probability (resp. 1 − ) we choose random exploration (resp. pure exploitation)]. 3 GRNN Architectures Inference and learning in the protein GIOHMMs we have described is computationally intensive due to the large number of undirected loops they contain. This problem can be addressed using a neural network reparameterization assuming that: (a) all the nodes in the graphs are associated with a deterministic vector (note that in the case of the output nodes this vector can represent a probability distribution so that the overall model remains probabilistic); (b) each vector is a deterministic function of its parents; (c) each function is parameterized using a neural network (or some other class of approximators); and (d) weight-sharing or stationarity is used between similar neural networks in the model. For example, in the 2D GIOHMM contact map predictor, we can use a total of 5 neural networks to recursively compute the four hidden states and the output in each column in the form:  NW NE SW SE  Oij = NO (Iij , Hi,j , Hi,j , Hi,j , Hi,j )   NE NE NE  Hi,j = NN E (Ii,j , Hi−1,j , Hi,j−1 )  N NW NW Hi,jW = NN W (Ii,j , Hi+1,j , Hi,j−1 )  SW SW SW  Hi,j = NSW (Ii,j , Hi+1,j , Hi,j+1 )    SE SE SE Hi,j = NSE (Ii,j , Hi−1,j , Hi,j+1 ) (3) N In the NE plane, for instance, the boundary conditions are set to Hij E = 0 for i = 0 N or j = 0. The activity vector associated with the hidden unit Hij E depends on the NE NE local input Iij , and the activity vectors of the units Hi−1,j and Hi,j−1 . Activity in NE plane can be propagated row by row, West to East, and from the first row to the last (from South to North), or column by column South to North, and from the first column to the last. These GRNN architectures can be trained by gradient descent by unfolding the structures in space, leveraging the acyclic nature of the underlying GIOHMMs. 4 Data Many data sets are available or can be constructed for training and testing purposes, as described in the references. The data sets used in the present simulations are extracted from the publicly available Protein Data Bank (PDB) and then redundancy reduced, or from the non-homologous subset of PDB Select (ftp://ftp.emblheidelberg.de/pub/databases/). In addition, we typically exclude structures with poor resolution (less than 2.5-3 ˚), sequences containing less than 30 amino acids, A and structures containing multiple sequences or sequences with chain breaks. For coarse contact maps, we use the DSSP program [9] (CMBI version) to assign secondary structures and we remove also sequences for which DSSP crashes. The results we report for fine-grained contact maps are derived using 424 proteins with lengths in the 30-200 range for training and an additional non-homologous set of 48 proteins in the same length range for testing. For the coarse contact map, we use a set of 587 proteins of length less than 300. Because the average length of a secondary structure element is slightly above 7, the size of a coarse map is roughly 2% the size of the corresponding amino acid map. 5 Simulation Results and Conclusions We have trained several 2D GIOHMM/GRNN models on the direct prediction of fine-grained contact maps. Training of a single model typically takes on the order of a week on a fast workstation. A sample of validation results is reported in Table 1 for four different distance cutoffs. Overall percentages of correctly predicted contacts Table 1: Direct prediction of amino acid contact maps. Column 1: four distance cutoffs. Column 2, 3, and 4: overall percentages of amino acids correctly classified as contacts, non-contacts, and in total. Column 5: Precision percentage for distant contacts (|i − j| ≥ 8) with a threshold of 0.5. Single model results except for last line corresponding to an ensemble of 5 models. Cutoff 6˚ A 8˚ A 10 ˚ A 12 ˚ A 12 ˚ A Contact .714 .638 .512 .433 .445 Non-Contact .998 .998 .993 .987 .990 Total .985 .970 .931 .878 .883 Precision (P) .594 .670 .557 .549 .717 and non-contacts at all linear distances, as well as precision results for distant contacts (|i − j| ≥ 8) are reported for a single GIOHMM/GRNN model. The model has k = 14 hidden units in the hidden and output layers of the four hidden networks, as well as in the hidden layer of the output network. In the last row, we also report as an example the results obtained at 12˚ by an ensemble of 5 networks A with k = 11, 12, 13, 14 and 15. Note that precision for distant contacts exceeds all previously reported results and is well above 50%. For the prediction of coarse-grained contact maps, we use the indirect GIOHMM/GRNN strategy and compare different exploration/exploitation strategies: random exploration, pure exploitation, and their convex combination (semiuniform exploitation). In the semi-uniform case we set the probability of random uniform exploration to = 0.4. In addition, we also try a fourth hybrid strategy in which the search proceeds greedily (i.e. the best successor is chosen at each step, as in pure exploitation), but the network is trained by randomly sub-sampling the successors of the current state. Eight numerical features encode the input label of each node: one-hot encoding of secondary structure classes; normalized linear distances from the N to C terminus; average, maximum and minimum hydrophobic character of the segment (based on the Kyte-Doolittle scale with a moving window of length 7). A sample of results obtained with 5-fold cross-validation is shown in Table 2. Hidden state vectors have dimension k = 5 with no hidden layers. For each strategy we measure performances by means of several indices: micro and macroaveraged precision (mP , M P ), recall (mR, M R) and F1 measure (mF1 , M F1 ). Micro-averages are derived based on each pair of secondary structure elements in each protein, whereas macro-averages are obtained on a per-protein basis, by first computing precision and recall for each protein, and then averaging over the set of all proteins. In addition, we also measure the micro and macro averages for specificity in the sense of percentage of correct prediction for non-contacts (mP (nc), M P (nc)). Note the tradeoffs between precision and recall across the training methods, the hybrid method achieving the best F 1 results. Table 2: Indirect prediction of coarse contact maps with dynamic sampling. Strategy Random exploration Semi-uniform Pure exploitation Hybrid mP .715 .454 .431 .417 mP (nc) .769 .787 .806 .834 mR .418 .631 .726 .790 mF1 .518 .526 .539 .546 MP .767 .507 .481 .474 M P (nc) .709 .767 .793 .821 MR .469 .702 .787 .843 M F1 .574 .588 .596 .607 We have presented two approaches, based on a very general IOHMM/RNN framework, that achieve state-of-the-art performance in the prediction of proteins contact maps at fine and coarse-grained levels of resolution. In principle both methods can be applied to both resolution levels, although the indirect prediction is computationally too demanding for fine-grained prediction of large proteins. Several extensions are currently under development, including the integration of these methods into complete 3D structure predictors. While these systems require long training periods, once trained they can rapidly sift through large proteomic data sets. Acknowledgments The work of PB and GP is supported by a Laurel Wilkening Faculty Innovation award and awards from NIH, BREP, Sun Microsystems, and the California Institute for Telecommunications and Information Technology. The work of PF and AV is partially supported by a MURST grant. References [1] D. Baker and A. Sali. Protein structure prediction and structural genomics. Science, 294:93–96, 2001. [2] P. Baldi and S. Brunak and P. Frasconi and G. Soda and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11):937–946, 1999. [3] P. Baldi and Y. Chauvin. Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7):1541–1565, 1996. [4] P. Baldi and G. Pollastri. Machine learning structural and functional proteomics. IEEE Intelligent Systems. Special Issue on Intelligent Systems in Biology, 17(2), 2002. [5] Y. Bengio and P. Frasconi. Input-output HMM’s for sequence processing. IEEE Trans. on Neural Networks, 7:1231–1249, 1996. [6] P. Fariselli, O. Olmea, A. Valencia, and R. Casadio. Prediction of contact maps with neural networks and correlated mutations. Protein Engineering, 14:835–843, 2001. [7] P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Trans. on Neural Networks, 9:768–786, 1998. [8] Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models Machine Learning, 29:245–273, 1997. [9] W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637, 1983. [10] A. M. Lesk, L. Lo Conte, and T. J. P. Hubbard. Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins, 45, S5:98–118, 2001. [11] G. Pollastri and P. Baldi. Predition of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Proceedings of 2002 ISMB (Intelligent Systems for Molecular Biology) Conference. Bioinformatics, 18, S1:62–70, 2002. [12] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235, 2002. [13] G. Pollastri, P. Baldi, P. Fariselli, and R. Casadio. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153, 2002. [14] M. Vendruscolo, E. Kussell, and E. Domany. Recovery of protein structure from contact maps. Folding and Design, 2:295–306, 1997.

6 0.52782309 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

7 0.5233236 188 nips-2002-Stability-Based Model Selection

8 0.50997096 145 nips-2002-Mismatch String Kernels for SVM Protein Classification

9 0.50922894 190 nips-2002-Stochastic Neighbor Embedding

10 0.4804121 97 nips-2002-Global Versus Local Methods in Nonlinear Dimensionality Reduction

11 0.46476489 107 nips-2002-Identity Uncertainty and Citation Matching

12 0.45479834 34 nips-2002-Artefactual Structure from Least-Squares Multidimensional Scaling

13 0.43518215 158 nips-2002-One-Class LP Classifiers for Dissimilarity Representations

14 0.42620343 119 nips-2002-Kernel Dependency Estimation

15 0.41788584 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

16 0.40192345 32 nips-2002-Approximate Inference and Protein-Folding

17 0.37470424 131 nips-2002-Learning to Classify Galaxy Shapes Using the EM Algorithm

18 0.36499691 124 nips-2002-Learning Graphical Models with Mercer Kernels

19 0.36300802 117 nips-2002-Intrinsic Dimension Estimation Using Packing Numbers

20 0.36286724 87 nips-2002-Fast Transformation-Invariant Factor Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.034), (14, 0.016), (23, 0.011), (42, 0.069), (54, 0.153), (55, 0.033), (68, 0.023), (74, 0.083), (79, 0.028), (87, 0.369), (92, 0.03), (98, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90626949 198 nips-2002-Theory-Based Causal Inference

Author: Joshua B. Tenenbaum, Thomas L. Griffiths

Abstract: People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference.

same-paper 2 0.83406913 98 nips-2002-Going Metric: Denoising Pairwise Data

Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1

3 0.73824793 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

Author: Harald Steck, Tommi S. Jaakkola

Abstract: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter - often interpreted as

4 0.55154383 40 nips-2002-Bayesian Models of Inductive Generalization

Author: Neville E. Sanjana, Joshua B. Tenenbaum

Abstract: We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flexible hypothesis spaces, and we propose a version of the Bayesian Occam’s razor that trades off priors and likelihoods to prevent under- or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.

5 0.54875857 75 nips-2002-Dynamical Causal Learning

Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1

6 0.52452421 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories

7 0.52268523 53 nips-2002-Clustering with the Fisher Score

8 0.51121682 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

9 0.51043713 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs

10 0.50846922 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

11 0.50769579 119 nips-2002-Kernel Dependency Estimation

12 0.50336629 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy

13 0.50289243 124 nips-2002-Learning Graphical Models with Mercer Kernels

14 0.50215846 163 nips-2002-Prediction and Semantic Association

15 0.50157362 188 nips-2002-Stability-Based Model Selection

16 0.49742162 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

17 0.49638784 145 nips-2002-Mismatch String Kernels for SVM Protein Classification

18 0.49515086 137 nips-2002-Location Estimation with a Differential Update Network

19 0.48947707 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

20 0.48882321 191 nips-2002-String Kernels, Fisher Kernels and Finite State Automata