nips nips2008 nips2008-203 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic
Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. [sent-4, score-0.822]
2 Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. [sent-5, score-0.614]
3 In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. [sent-6, score-1.567]
4 We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. [sent-7, score-1.094]
5 The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. [sent-8, score-0.382]
6 Classification of string data, sequences of discrete symbols, has attracted particular interest and has led to a number of new algorithms [1, 2, 3, 4]. [sent-10, score-0.307]
7 They exhibit state-of-the-art performance on tasks such as protein superfamily and fold prediction, music genre classification and document topic elucidation. [sent-11, score-0.63]
8 Classification of data in sequential domains is made challenging by the variability in the sequence lengths, potential existence of important features on multiple scales, as well as the size of the alphabets and datasets. [sent-12, score-0.194]
9 Typical alphabet sizes can vary widely, ranging in size from 4 nucleotides in DNA sequences, up to thousands of words from a language lexicon for text documents. [sent-13, score-0.267]
10 Strings within the same class, such as the proteins in one fold or documents about politics, can exhibit wide variability in the primary sequence content. [sent-14, score-0.232]
11 As a consequence, the resulting algorithms need the ability to efficiently handle large alphabets and datasets as well as establish measures of similarity under complex sequence transformations in order to accurately classify the data. [sent-16, score-0.287]
12 A number of state-of-the-art approaches to scoring similarity between pairs of sequences in a database rely on fixed, spectral representations of sequential data and the notion of mismatch kernels, c. [sent-17, score-0.7]
13 However, computing those representations efficiently for large alphabet sizes and ”loose” similarity models can be computationally challenging. [sent-22, score-0.231]
14 In this work we propose novel algorithms for modeling sequences under complex transformations (such as multiple insertions, deletions, mutations) that exhibit state-of-the-art performance on a variety of distinct classification tasks. [sent-26, score-0.126]
15 In particular, we present new algorithms for inexact (e. [sent-27, score-0.137]
16 with mismatches) string comparison that improve currently known time bounds for such tasks and show orders-of-magnitude running time improvements. [sent-29, score-0.337]
17 The algorithms rely on an efficient implicit computation of mismatch neighborhoods and k-mer statistic on sets of sequences. [sent-30, score-0.692]
18 This leads to a mismatch kernel algorithm with complexity O(ck,m (|X| + |Y |)), where ck,m is independent of the alphabet Σ. [sent-31, score-0.963]
19 The algorithm can be easily generalized to other families of string kernels, such as the spectrum and gapped kernels [6], as well as to semi-supervised settings such as the neighborhood kernel of [7]. [sent-32, score-0.846]
20 We demonstrate the benefits of our algorithms on many challenging classification problems, such as detecting homology (evolutionary similarity) of remotely related proteins, recognizing protein fold, and performing classification of music samples. [sent-33, score-0.535]
21 2 Related Work Over the past decade, various methods were proposed to solve the string classification problem, including generative, such as HMMs, or discriminative approaches. [sent-36, score-0.215]
22 Sequence matching is frequently based on common co-occurrence of exact sub-patterns (k-mers, features), as in spectrum kernels [9] or substring kernels [10]. [sent-38, score-0.519]
23 Inexact comparison in this framework is typically achieved using different families of mismatch [3] or profile [2] kernels. [sent-39, score-0.583]
24 Both spectrum-k and mismatch(k,m) kernel directly extract string features based on the observed sequence, X. [sent-40, score-0.341]
25 Constructing the profile for each sequence may not be practical in some application domains, since the size of the profile is dependent on the size of the alphabet set. [sent-43, score-0.348]
26 While for bio-sequences |Σ| = 4 or 20, for music or text classification |Σ| can potentially be very large, on the order of tens of thousands of symbols. [sent-44, score-0.151]
27 In this case, a very simple semi-supervised learning method, the sequence neighborhood kernel, can be employed [7] as an alternative to lone k-mers with many mismatches. [sent-45, score-0.206]
28 The most efficient available trie-based algorithms [3, 5] for mismatch kernels have a strong dependency on the size of alphabet set and the number of allowed mismatches, both of which need to be restricted in practice to control the complexity of the algorithm. [sent-46, score-1.102]
29 Under the trie-based framework, the list of k-mers extracted from given strings is traversed in a depth-first search with branches corresponding to all possible σ ∈ Σ. [sent-47, score-0.135]
30 Each leaf node at depth k corresponds to a particular k-mer feature (either exact or inexact instance of the observed exact string features) and contains a list of matching features from each string. [sent-48, score-0.427]
31 The kernel matrix is updated at leaf nodes with corresponding counts. [sent-49, score-0.126]
32 The complexity of the trie-based algorithm for mismatch kernel computation for two strings X and Y is O(k m+1 |Σ|m (|X| + |Y |)) [3]. [sent-50, score-0.9]
33 The algorithm complexity depends on the size of Σ since during a trie traversal, possible substitutions are drawn from Σ explicitly; consequently, to control the complexity of the algorithm we need to restrict the number of allowed mismatches (m), as well as the alphabet size (|Σ|). [sent-51, score-0.616]
34 While other efficient string algorithms exist, such as [6, 12] and the suffix-tree based algorithms in [10], they do not readily extend to the mismatch framework. [sent-53, score-0.864]
35 In this study, we aim to extend the works presented in [6, 10] and close the existing gap in theoretical complexity between the mismatch and other fast string kernels. [sent-54, score-0.854]
36 The mismatch kernel is then defined as K(X, Y |k, m) = cm (γ|X)cm (γ|Y ), (2) γ∈Σk where cm (γ|X) = Σα∈X Im (γ, α) is the number of times a contiguous substring of length k (k-mer) γ occurs in X with no more than m mismatches. [sent-59, score-0.822]
37 2 Intersection-based Algorithm Our first algorithm presents a novel way of performing local inexact string matching with the following key properties: a. [sent-61, score-0.375]
38 parameter independent: the complexity is independent of |Σ| and mismatch parameter m b. [sent-62, score-0.639]
39 I(a, b) is the size of intersection of mismatch neighborhoods of a and b). [sent-66, score-0.765]
40 The key observation here is that if we can compute I(a, b) efficiently then the kernel evaluation problem reduces to performing pairwise comparison based on all pairs of observed k-mers, a and b, in the two sequences. [sent-67, score-0.151]
41 As a result, the intersection values can be looked up in a table in constant time during matching. [sent-73, score-0.157]
42 In summary, given two strings X and Y , the algorithm (Algorithm 1) compares pairs of observed k-mers from X and Y and computes the mismatch kernel according to Equation 5. [sent-75, score-0.898]
43 The intersection size corresponds to the size of the (k, m)-mismatch m neighborhood, i. [sent-84, score-0.188]
44 For higher values of Hamming disi tance d, the key observation is that for fixed Σ, k, and m, given any distance d(a, b) = d, I(a, b) is also a constant, regardless of the mismatch positions. [sent-87, score-0.621]
45 As a result, the kernel computed in Equation 5 is incremented only by min(2m, k) + 1 distinct values, corresponding to min(2m, k) + 1 possible intersection sizes. [sent-92, score-0.228]
46 In the following, we will show how to compute the mismatch statistics {Mi } in O(ck,m (nx + ny )) time, where ck,m is a constant that does not depend on the alphabet size. [sent-98, score-0.899]
47 We formulate the task of inferring matching statistics {Mi } as the following auxiliary counting problem: Mismatch Statistic Counting: Given a set of n k-mers from two strings X and Y , for each Hamming distance i = 0, 1, . [sent-99, score-0.352]
48 We show next that the above problem of computing matching statistics can be solved in linear time (in the number n of k-mers) using multiple rounds of counting sort as a sub-algorithm. [sent-104, score-0.34]
49 In this case, we can apply counting sort to order all k-mers lexicographically and find the number of exact matches by scanning the sorted list. [sent-108, score-0.203]
50 Efficient direct computation of Mi for any i > 0 is difficult (requires quadratic time); we take another approach and first compute inexact cumulative mismatch statistics, Ci = Mi + i−1 k−j j=0 i−j Mj , that overcount the number of k-mer pairs at a given distance i, as follows. [sent-110, score-0.788]
51 As a result, given n k-mers, we can compute the cumulative mismatch statistics Ci in linear time using k rounds of counting sort on (k − i)-mers. [sent-114, score-0.905]
52 The exact mismatch statistics Mi can then be i obtained from Ci by subtracting the exact counts to compensate for overcounting as follows: i−1 Mi = Ci − j=0 k−j Mj , i−j i = 0, . [sent-115, score-0.667]
53 , min(min(2m, k), k − 1) (7) The last mismatch statistic Mk can be computed by subtracting the preceding statistics M0 , . [sent-118, score-0.654]
54 Mk = T − (8) j=0 Our algorithm for mismatch kernel computations based on sufficient statistics is summarized in Algorithm 2. [sent-122, score-0.741]
55 The overall complexity of the algorithm is O(nck,m ) with the constant ck,m = min(2m,k) k k l=0 l (k − l), independent of the size of the alphabet set, and l is the number of rounds of counting sort for evaluating the cumulative mismatch statistics Cl . [sent-123, score-1.176]
56 (Mismatch-SS) Mismatch kernel algorithm based on Sufficient Statistics Input: strings X, Y, |X| = nx , |Y | = ny , parameters k, m, pre-computed intersection values I 1. [sent-125, score-0.56]
57 Compute min(2m, k) cumulative matching statistics, Ci , using counting sort 2. [sent-126, score-0.271]
58 Evaluate kernel using Equation 6: K(X, Y |m, k) = i=0 Mi Ii Output: Mismatch kernel value K(X, Y |k, m) 5 Extensions Our algorithmic approach can also be applied to a variety of existing string kernels, leading to very efficient and simple algorithms that could benefit many applications. [sent-128, score-0.5]
59 The spectrum kernel [9] in our notation is the first sufficient statistic M0 , i. [sent-130, score-0.239]
60 K(X, Y |k) = M0 , which can be computed in k rounds of counting sort (i. [sent-132, score-0.226]
61 The gapped kernels [6] measure similarity between strings X and Y based on the co-occurrence of gapped instances g, |g| = k + m > k of k-long substrings: K(X, Y |k, g) = I(γ, g) γ∈Σk g∈X,|g|=k+m I(γ, g) , (9) g∈Y,|g|=k+m where I(γ, g) = 1 when γ is a subsequence of g. [sent-136, score-0.595]
62 Similar to the algorithmic approach for extracting cumulative mismatch statistics in Algorithm-2, to compute the gapped(g,k) kernel, we perform a single round of counting sort over k-mers contained in the g-mers. [sent-137, score-0.83]
63 This gives a very simple and g efficient O( k kn) time algorithm for gapped kernel computations. [sent-138, score-0.29]
64 The wildcard(k,m) kernel [6] in our notation is the sum of the cumulative m m statistics K(X, Y |k, m) = i=0 Ci , i. [sent-140, score-0.196]
65 can be computed in i=0 k rounds of counting sort, i m giving a simple and efficient O( i=0 k (k − i)n) algorithm. [sent-142, score-0.14]
66 The spatial(k,t,d) kernel [13] can be computed by sorting kt-mers iteratively for every arrangement of t k-mers spatially constrained by distance d. [sent-144, score-0.164]
67 The sequence neighborhood kernels [7] proved to be a powerful tool in many sequence analysis tasks. [sent-146, score-0.421]
68 Note the kernel value, if computed directly using Equation 10, will incur quadratic complexity in the size of the neighborhoods. [sent-148, score-0.225]
69 Similar to the single string case, using our algorithmic approach, to compute the neighborhood kernel (over the string sets), we can jointly sort the observed k-mers in N (X) and N (Y ) and apply the desired kernel evaluation method (spectrum, mismatch, or gapped). [sent-149, score-0.91]
70 Under this setting, the neighborhood kernel can be evaluated in time linear to the neighborhood size. [sent-150, score-0.436]
71 This leads to very efficient algorithms for computing sequence neighborhood kernels even for very large datasets, as we will show in the experimental section. [sent-151, score-0.39]
72 6 Evaluation We study the performance of our algorithms, both in running time and predictive accuracy, on synthetic data and standard benchmark datasets for protein sequence analysis and music genre classification. [sent-152, score-0.668]
73 The reduced running time requirements of our algorithms open the possibility to consider ”looser” mismatch measures with larger k and m. [sent-153, score-0.712]
74 The results presented here demonstrate that such mismatch kernels with larger (k, m) can lead to state-of-the-art predictive performance even when compared with more complex models such as [2]. [sent-154, score-0.734]
75 For protein sequence classification under the semi-supervised setting, we also use the Protein Data Bank (PDB, 17, 232 sequences), the Swiss-Prot (101, 602 sequences), and the non-redundant (NR) databases as the unlabeled datasets, following the setup of [17]. [sent-156, score-0.325]
76 For synthetic data, we generate strings of length n = 105 over alphabets of different sizes and measure the running time of the trie-based and our sufficient statistics based algorithms for evaluating mismatch string kernel. [sent-165, score-1.181]
77 Figure 1 shows relative running time Ttrie /Tss , in logarithmic scale, of the mismatch-trie and mismatch-SS as a function of the alphabet size. [sent-166, score-0.294]
78 As can be seen from the plot, our algorithm demonstrates several orders of magnitude improvements, especially for large alphabet sizes. [sent-167, score-0.198]
79 Table 1 compares running times of our algorithm and the trie-based algorithm for different real dataset (proteins, DNA, text, music) for a single kernel entry (pair of strings) computation. [sent-168, score-0.225]
80 We observe the speed improvements ranging from 100 to 106 times depending on the alphabet size. [sent-169, score-0.233]
81 We also measure the running time for full 7329-by-7329 mismatch(5,2) kernel matrix computations for SCOP dataset under the supervised setting. [sent-170, score-0.222]
82 The running time of our algorithm is 1525 seconds compared to 196052 seconds for the trie-based computations. [sent-171, score-0.146]
83 We consider the tasks of the multi-class music genre classification [16], with results in Table 2, and the protein remote homology (superfamily) prediction [9, 2, 18] in Table 3. [sent-199, score-0.679]
84 On the music classification task, we observe significant improvements in accuracy for larger number of mismatches. [sent-201, score-0.16]
85 The remote protein homology detection, as evident from Table 3, clearly benefits from larger number of allowed mismatches because the remotely related proteins are likely to be separated by multiple mutations or insertions/deletions. [sent-204, score-0.756]
86 In particular, the result on the Swiss-Prot dataset for the (7, 3)-mismatch kernel is very promising and compares well with the best results of the state-of-the-art, but computationally more demanding, profile kernels [2]. [sent-208, score-0.306]
87 This is an important result that addresses a main drawback of the neighborhood kernels, the running time [7, 2]. [sent-214, score-0.238]
88 Table 2: Classification per- Table 3: Classification performance on protein remote homology formance on music genre prediction classification (multi-class) dataset mismatch (5,1) mismatch (5,2) mismatch (7,3) Method Error ROC ROC50 ROC ROC50 ROC ROC50 Mismatch (5,1) 42. [sent-215, score-2.428]
89 32 For multi-class protein fold recognition (Table 4), we similarly observe improvements in performance for larger numbers of allowed mismatches. [sent-243, score-0.422]
90 The balanced error of 25% for the (7,3)-mismatch neighborhood kernel using Swiss-Prot compares well with the best error rate of 26. [sent-244, score-0.33]
91 This improvement makes the string kernels with approximate but looser matching a viable alternative for practical tasks of sequence analysis. [sent-309, score-0.511]
92 Our algorithms work with large databases in supervised and semi-supervised settings and scale well in the alphabet size and the number of allowed mismatches. [sent-310, score-0.312]
93 A machine learning information retrieval approach to protein fold recognition. [sent-313, score-0.349]
94 Profile-based string kernels for remote homology detection and motif extraction. [sent-317, score-0.606]
95 Fast string kernels using inexact matching for protein sequences. [sent-330, score-0.752]
96 The spectrum kernel: A string kernel for SVM protein classification. [sent-345, score-0.641]
97 Efficient computation of gapped substring kernels on large alphabets. [sent-361, score-0.35]
98 Fast protein homology and fold detection with sparse spatial sample kernels. [sent-368, score-0.503]
99 Multi-class protein fold recognition using support vector machines and neural networks. [sent-373, score-0.349]
100 On the role of local matching for efficient semisupervised protein sequence classification. [sent-387, score-0.346]
wordName wordTfidf (topN-words)
[('mismatch', 0.583), ('protein', 0.226), ('string', 0.215), ('alphabet', 0.198), ('mismatches', 0.182), ('iy', 0.173), ('kernels', 0.151), ('neighborhood', 0.142), ('gapped', 0.138), ('strings', 0.135), ('kernel', 0.126), ('music', 0.125), ('fold', 0.123), ('genre', 0.121), ('homology', 0.121), ('nx', 0.111), ('ix', 0.11), ('christina', 0.104), ('scop', 0.104), ('inexact', 0.104), ('intersection', 0.102), ('mi', 0.091), ('counting', 0.091), ('alphabets', 0.087), ('xix', 0.087), ('yiy', 0.087), ('ny', 0.086), ('remote', 0.086), ('sort', 0.086), ('pro', 0.081), ('le', 0.079), ('spectrum', 0.074), ('hamming', 0.074), ('running', 0.07), ('stafford', 0.069), ('sequence', 0.064), ('substring', 0.061), ('sequences', 0.059), ('matching', 0.056), ('complexity', 0.056), ('classi', 0.053), ('kuksa', 0.052), ('leslie', 0.049), ('rounds', 0.049), ('im', 0.048), ('vladimir', 0.047), ('nr', 0.047), ('pavel', 0.045), ('eugene', 0.045), ('william', 0.045), ('proteins', 0.045), ('size', 0.043), ('jason', 0.042), ('ci', 0.041), ('statistic', 0.039), ('deletions', 0.039), ('distance', 0.038), ('weston', 0.038), ('allowed', 0.038), ('cumulative', 0.038), ('insertions', 0.037), ('ie', 0.037), ('neighborhoods', 0.037), ('datasets', 0.036), ('improvements', 0.035), ('mj', 0.035), ('unlabeled', 0.035), ('eleazar', 0.035), ('eskin', 0.035), ('kuang', 0.035), ('mutational', 0.035), ('superfamily', 0.035), ('ttrie', 0.035), ('transformations', 0.034), ('detection', 0.033), ('similarity', 0.033), ('balanced', 0.033), ('algorithms', 0.033), ('statistics', 0.032), ('kn', 0.031), ('remotely', 0.03), ('wildcard', 0.03), ('rui', 0.03), ('pdb', 0.03), ('dna', 0.03), ('symbols', 0.029), ('table', 0.029), ('compares', 0.029), ('mutations', 0.028), ('substrings', 0.028), ('cation', 0.027), ('min', 0.027), ('cm', 0.026), ('exact', 0.026), ('time', 0.026), ('roc', 0.026), ('text', 0.026), ('seconds', 0.025), ('pairs', 0.025), ('looser', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic
Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
2 0.11096804 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
3 0.089696087 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
Author: Alexandre Bouchard-côté, Dan Klein, Michael I. Jordan
Abstract: Accurate and efficient inference in evolutionary trees is a central problem in computational biology. While classical treatments have made unrealistic site independence assumptions, ignoring insertions and deletions, realistic approaches require tracking insertions and deletions along the phylogenetic tree—a challenging and unsolved computational problem. We propose a new ancestry resampling procedure for inference in evolutionary trees. We evaluate our method in two problem domains—multiple sequence alignment and reconstruction of ancestral sequences—and show substantial improvement over the current state of the art. 1
4 0.087408789 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
Author: Jihun Hamm, Daniel D. Lee
Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1
5 0.075628988 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
6 0.069659308 239 nips-2008-Tighter Bounds for Structured Estimation
7 0.067684449 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
8 0.066019483 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
9 0.064665586 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
10 0.063746557 113 nips-2008-Kernelized Sorting
11 0.062106173 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
12 0.058801524 44 nips-2008-Characteristic Kernels on Groups and Semigroups
13 0.055256635 225 nips-2008-Supervised Bipartite Graph Inference
14 0.054256152 143 nips-2008-Multi-label Multiple Kernel Learning
15 0.051893361 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
16 0.050171636 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
17 0.048746426 78 nips-2008-Exact Convex Confidence-Weighted Learning
18 0.047160782 138 nips-2008-Modeling human function learning with Gaussian processes
19 0.046736442 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
20 0.045692794 62 nips-2008-Differentiable Sparse Coding
topicId topicWeight
[(0, -0.149), (1, -0.048), (2, -0.013), (3, 0.024), (4, 0.021), (5, -0.03), (6, 0.063), (7, -0.029), (8, 0.05), (9, -0.03), (10, 0.141), (11, -0.021), (12, -0.034), (13, -0.011), (14, 0.087), (15, 0.012), (16, 0.047), (17, -0.044), (18, -0.085), (19, 0.016), (20, 0.04), (21, 0.065), (22, 0.006), (23, -0.033), (24, -0.023), (25, 0.079), (26, -0.074), (27, -0.015), (28, 0.057), (29, 0.019), (30, -0.055), (31, 0.056), (32, 0.04), (33, -0.053), (34, -0.009), (35, 0.113), (36, 0.017), (37, -0.011), (38, 0.052), (39, 0.117), (40, -0.056), (41, 0.166), (42, -0.043), (43, -0.102), (44, 0.001), (45, 0.126), (46, -0.021), (47, -0.069), (48, 0.083), (49, 0.097)]
simIndex simValue paperId paperTitle
same-paper 1 0.93822473 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic
Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
2 0.63721281 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
3 0.63238585 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
4 0.60305762 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
Author: Jihun Hamm, Daniel D. Lee
Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1
5 0.55875993 225 nips-2008-Supervised Bipartite Graph Inference
Author: Yoshihiro Yamanishi
Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1
6 0.51814473 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
7 0.50868016 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
8 0.47608027 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
9 0.46932033 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
10 0.46899998 44 nips-2008-Characteristic Kernels on Groups and Semigroups
11 0.45034239 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
12 0.42529136 113 nips-2008-Kernelized Sorting
13 0.41569579 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
14 0.37935936 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
15 0.36408216 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance
16 0.35093161 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems
17 0.341921 196 nips-2008-Relative Margin Machines
18 0.34002969 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
19 0.33196339 3 nips-2008-A Massively Parallel Digital Learning Processor
20 0.33062404 228 nips-2008-Support Vector Machines with a Reject Option
topicId topicWeight
[(6, 0.043), (7, 0.085), (12, 0.021), (28, 0.156), (54, 0.32), (57, 0.049), (59, 0.015), (63, 0.02), (71, 0.047), (77, 0.032), (78, 0.026), (83, 0.08), (87, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.75143719 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic
Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
2 0.70530194 121 nips-2008-Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
Author: Michael T. Todd, Yael Niv, Jonathan D. Cohen
Abstract: Working memory is a central topic of cognitive neuroscience because it is critical for solving real-world problems in which information from multiple temporally distant sources must be combined to generate appropriate behavior. However, an often neglected fact is that learning to use working memory effectively is itself a difficult problem. The Gating framework [14] is a collection of psychological models that show how dopamine can train the basal ganglia and prefrontal cortex to form useful working memory representations in certain types of problems. We unite Gating with machine learning theory concerning the general problem of memory-based optimal control [5-6]. We present a normative model that learns, by online temporal difference methods, to use working memory to maximize discounted future reward in partially observable settings. The model successfully solves a benchmark working memory problem, and exhibits limitations similar to those observed in humans. Our purpose is to introduce a concise, normative definition of high level cognitive concepts such as working memory and cognitive control in terms of maximizing discounted future rewards. 1 I n t ro d u c t i o n Working memory is loosely defined in cognitive neuroscience as information that is (1) internally maintained on a temporary or short term basis, and (2) required for tasks in which immediate observations cannot be mapped to correct actions. It is widely assumed that prefrontal cortex (PFC) plays a role in maintaining and updating working memory. However, relatively little is known about how PFC develops useful working memory representations for a new task. Furthermore, current work focuses on describing the structure and limitations of working memory, but does not ask why, or in what general class of tasks, is it necessary. Borrowing from the theory of optimal control in partially observable Markov decision problems (POMDPs), we frame the psychological concept of working memory as an internal state representation, developed and employed to maximize future reward in partially observable environments. We combine computational insights from POMDPs and neurobiologically plausible models from cognitive neuroscience to suggest a simple reinforcement learning (RL) model of working memory function that can be implemented through dopaminergic training of the basal ganglia and PFC. The Gating framework is a series of cognitive neuroscience models developed to explain how dopaminergic RL signals can shape useful working memory representations [1-4]. Computationally this framework models working memory as a collection of past observations, each of which can occasionally be replaced with the current observation, and addresses the problem of learning when to update each memory element versus maintaining it. In the original Gating model [1-2] the PFC contained a unitary working memory representation that was updated whenever a phasic dopamine (DA) burst occurred (e.g., due to unexpected reward or novelty). That model was the first to connect working memory and RL via the temporal difference (TD) model of DA firing [7-8], and thus to suggest how working memory might serve a normative purpose. However, that model had limited computational flexibility due to the unitary nature of the working memory (i.e., a singleobservation memory controlled by a scalar DA signal). More recent work [3-4] has partially repositioned the Gating framework within the Actor/Critic model of mesostriatal RL [9-10], positing memory updating as but another cortical action controlled by the dorsal striatal
3 0.65876687 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
Author: Mohak Shah
Abstract: We derive risk bounds for the randomized classifiers in Sample Compression setting where the classifier-specification utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classifiers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.
4 0.53947514 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
Author: Liu Yang, Rong Jin, Rahul Sukthankar
Abstract: The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce “Semi-supervised Learning with Weakly-Related Unlabeled Data” (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal wordcorrelation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of stateof-the-art methods for inductive semi-supervised learning and text categorization. We show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test domain.
5 0.5393461 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
6 0.53763026 194 nips-2008-Regularized Learning with Networks of Features
7 0.53739637 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
8 0.53562069 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
10 0.53312784 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
11 0.53312755 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
12 0.53285468 196 nips-2008-Relative Margin Machines
13 0.53211379 99 nips-2008-High-dimensional support union recovery in multivariate regression
14 0.53195947 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
15 0.53066516 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
16 0.53004682 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
17 0.52988416 179 nips-2008-Phase transitions for high-dimensional joint support recovery
18 0.52943897 113 nips-2008-Kernelized Sorting
19 0.52894157 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
20 0.52858287 4 nips-2008-A Scalable Hierarchical Distributed Language Model