nips nips2008 nips2008-203 knowledge-graph by maker-knowledge-mining

203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

Source: pdf

Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic

Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufﬁcient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classiﬁcation, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with signiﬁcantly reduced running times. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. [sent-4, score-0.822]

2 Based on sufﬁcient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. [sent-5, score-0.614]

3 In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. [sent-6, score-1.567]

4 We evaluate our algorithms on synthetic data and real applications in music genre classiﬁcation, protein remote homology detection and protein fold prediction. [sent-7, score-1.094]

5 The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with signiﬁcantly reduced running times. [sent-8, score-0.382]

6 Classiﬁcation of string data, sequences of discrete symbols, has attracted particular interest and has led to a number of new algorithms [1, 2, 3, 4]. [sent-10, score-0.307]

7 They exhibit state-of-the-art performance on tasks such as protein superfamily and fold prediction, music genre classiﬁcation and document topic elucidation. [sent-11, score-0.63]

8 Classiﬁcation of data in sequential domains is made challenging by the variability in the sequence lengths, potential existence of important features on multiple scales, as well as the size of the alphabets and datasets. [sent-12, score-0.194]

9 Typical alphabet sizes can vary widely, ranging in size from 4 nucleotides in DNA sequences, up to thousands of words from a language lexicon for text documents. [sent-13, score-0.267]

10 Strings within the same class, such as the proteins in one fold or documents about politics, can exhibit wide variability in the primary sequence content. [sent-14, score-0.232]

11 As a consequence, the resulting algorithms need the ability to efﬁciently handle large alphabets and datasets as well as establish measures of similarity under complex sequence transformations in order to accurately classify the data. [sent-16, score-0.287]

12 A number of state-of-the-art approaches to scoring similarity between pairs of sequences in a database rely on ﬁxed, spectral representations of sequential data and the notion of mismatch kernels, c. [sent-17, score-0.7]

13 However, computing those representations efﬁciently for large alphabet sizes and ”loose” similarity models can be computationally challenging. [sent-22, score-0.231]

14 In this work we propose novel algorithms for modeling sequences under complex transformations (such as multiple insertions, deletions, mutations) that exhibit state-of-the-art performance on a variety of distinct classiﬁcation tasks. [sent-26, score-0.126]

15 In particular, we present new algorithms for inexact (e. [sent-27, score-0.137]

16 with mismatches) string comparison that improve currently known time bounds for such tasks and show orders-of-magnitude running time improvements. [sent-29, score-0.337]

17 The algorithms rely on an efﬁcient implicit computation of mismatch neighborhoods and k-mer statistic on sets of sequences. [sent-30, score-0.692]

18 This leads to a mismatch kernel algorithm with complexity O(ck,m (|X| + |Y |)), where ck,m is independent of the alphabet Σ. [sent-31, score-0.963]

19 The algorithm can be easily generalized to other families of string kernels, such as the spectrum and gapped kernels [6], as well as to semi-supervised settings such as the neighborhood kernel of [7]. [sent-32, score-0.846]

20 We demonstrate the beneﬁts of our algorithms on many challenging classiﬁcation problems, such as detecting homology (evolutionary similarity) of remotely related proteins, recognizing protein fold, and performing classiﬁcation of music samples. [sent-33, score-0.535]

21 2 Related Work Over the past decade, various methods were proposed to solve the string classiﬁcation problem, including generative, such as HMMs, or discriminative approaches. [sent-36, score-0.215]

22 Sequence matching is frequently based on common co-occurrence of exact sub-patterns (k-mers, features), as in spectrum kernels [9] or substring kernels [10]. [sent-38, score-0.519]

23 Inexact comparison in this framework is typically achieved using different families of mismatch [3] or proﬁle [2] kernels. [sent-39, score-0.583]

24 Both spectrum-k and mismatch(k,m) kernel directly extract string features based on the observed sequence, X. [sent-40, score-0.341]

25 Constructing the proﬁle for each sequence may not be practical in some application domains, since the size of the proﬁle is dependent on the size of the alphabet set. [sent-43, score-0.348]

26 While for bio-sequences |Σ| = 4 or 20, for music or text classiﬁcation |Σ| can potentially be very large, on the order of tens of thousands of symbols. [sent-44, score-0.151]

27 In this case, a very simple semi-supervised learning method, the sequence neighborhood kernel, can be employed [7] as an alternative to lone k-mers with many mismatches. [sent-45, score-0.206]

28 The most efﬁcient available trie-based algorithms [3, 5] for mismatch kernels have a strong dependency on the size of alphabet set and the number of allowed mismatches, both of which need to be restricted in practice to control the complexity of the algorithm. [sent-46, score-1.102]

29 Under the trie-based framework, the list of k-mers extracted from given strings is traversed in a depth-ﬁrst search with branches corresponding to all possible σ ∈ Σ. [sent-47, score-0.135]

30 Each leaf node at depth k corresponds to a particular k-mer feature (either exact or inexact instance of the observed exact string features) and contains a list of matching features from each string. [sent-48, score-0.427]

31 The kernel matrix is updated at leaf nodes with corresponding counts. [sent-49, score-0.126]

32 The complexity of the trie-based algorithm for mismatch kernel computation for two strings X and Y is O(k m+1 |Σ|m (|X| + |Y |)) [3]. [sent-50, score-0.9]

33 The algorithm complexity depends on the size of Σ since during a trie traversal, possible substitutions are drawn from Σ explicitly; consequently, to control the complexity of the algorithm we need to restrict the number of allowed mismatches (m), as well as the alphabet size (|Σ|). [sent-51, score-0.616]

34 While other efﬁcient string algorithms exist, such as [6, 12] and the sufﬁx-tree based algorithms in [10], they do not readily extend to the mismatch framework. [sent-53, score-0.864]

35 In this study, we aim to extend the works presented in [6, 10] and close the existing gap in theoretical complexity between the mismatch and other fast string kernels. [sent-54, score-0.854]

36 The mismatch kernel is then deﬁned as K(X, Y |k, m) = cm (γ|X)cm (γ|Y ), (2) γ∈Σk where cm (γ|X) = Σα∈X Im (γ, α) is the number of times a contiguous substring of length k (k-mer) γ occurs in X with no more than m mismatches. [sent-59, score-0.822]

37 2 Intersection-based Algorithm Our ﬁrst algorithm presents a novel way of performing local inexact string matching with the following key properties: a. [sent-61, score-0.375]

38 parameter independent: the complexity is independent of |Σ| and mismatch parameter m b. [sent-62, score-0.639]

39 I(a, b) is the size of intersection of mismatch neighborhoods of a and b). [sent-66, score-0.765]

40 The key observation here is that if we can compute I(a, b) efﬁciently then the kernel evaluation problem reduces to performing pairwise comparison based on all pairs of observed k-mers, a and b, in the two sequences. [sent-67, score-0.151]

41 As a result, the intersection values can be looked up in a table in constant time during matching. [sent-73, score-0.157]

42 In summary, given two strings X and Y , the algorithm (Algorithm 1) compares pairs of observed k-mers from X and Y and computes the mismatch kernel according to Equation 5. [sent-75, score-0.898]

43 The intersection size corresponds to the size of the (k, m)-mismatch m neighborhood, i. [sent-84, score-0.188]

44 For higher values of Hamming disi tance d, the key observation is that for ﬁxed Σ, k, and m, given any distance d(a, b) = d, I(a, b) is also a constant, regardless of the mismatch positions. [sent-87, score-0.621]

45 As a result, the kernel computed in Equation 5 is incremented only by min(2m, k) + 1 distinct values, corresponding to min(2m, k) + 1 possible intersection sizes. [sent-92, score-0.228]

46 In the following, we will show how to compute the mismatch statistics {Mi } in O(ck,m (nx + ny )) time, where ck,m is a constant that does not depend on the alphabet size. [sent-98, score-0.899]

47 We formulate the task of inferring matching statistics {Mi } as the following auxiliary counting problem: Mismatch Statistic Counting: Given a set of n k-mers from two strings X and Y , for each Hamming distance i = 0, 1, . [sent-99, score-0.352]

48 We show next that the above problem of computing matching statistics can be solved in linear time (in the number n of k-mers) using multiple rounds of counting sort as a sub-algorithm. [sent-104, score-0.34]

49 In this case, we can apply counting sort to order all k-mers lexicographically and ﬁnd the number of exact matches by scanning the sorted list. [sent-108, score-0.203]

50 Efﬁcient direct computation of Mi for any i > 0 is difﬁcult (requires quadratic time); we take another approach and ﬁrst compute inexact cumulative mismatch statistics, Ci = Mi + i−1 k−j j=0 i−j Mj , that overcount the number of k-mer pairs at a given distance i, as follows. [sent-110, score-0.788]

51 As a result, given n k-mers, we can compute the cumulative mismatch statistics Ci in linear time using k rounds of counting sort on (k − i)-mers. [sent-114, score-0.905]

52 The exact mismatch statistics Mi can then be i obtained from Ci by subtracting the exact counts to compensate for overcounting as follows: i−1 Mi = Ci − j=0 k−j Mj , i−j i = 0, . [sent-115, score-0.667]

53 , min(min(2m, k), k − 1) (7) The last mismatch statistic Mk can be computed by subtracting the preceding statistics M0 , . [sent-118, score-0.654]

54 Mk = T − (8) j=0 Our algorithm for mismatch kernel computations based on sufﬁcient statistics is summarized in Algorithm 2. [sent-122, score-0.741]

55 The overall complexity of the algorithm is O(nck,m ) with the constant ck,m = min(2m,k) k k l=0 l (k − l), independent of the size of the alphabet set, and l is the number of rounds of counting sort for evaluating the cumulative mismatch statistics Cl . [sent-123, score-1.176]

56 (Mismatch-SS) Mismatch kernel algorithm based on Sufﬁcient Statistics Input: strings X, Y, |X| = nx , |Y | = ny , parameters k, m, pre-computed intersection values I 1. [sent-125, score-0.56]

57 Compute min(2m, k) cumulative matching statistics, Ci , using counting sort 2. [sent-126, score-0.271]

58 Evaluate kernel using Equation 6: K(X, Y |m, k) = i=0 Mi Ii Output: Mismatch kernel value K(X, Y |k, m) 5 Extensions Our algorithmic approach can also be applied to a variety of existing string kernels, leading to very efﬁcient and simple algorithms that could beneﬁt many applications. [sent-128, score-0.5]

59 The spectrum kernel [9] in our notation is the ﬁrst sufﬁcient statistic M0 , i. [sent-130, score-0.239]

60 K(X, Y |k) = M0 , which can be computed in k rounds of counting sort (i. [sent-132, score-0.226]

62 Similar to the algorithmic approach for extracting cumulative mismatch statistics in Algorithm-2, to compute the gapped(g,k) kernel, we perform a single round of counting sort over k-mers contained in the g-mers. [sent-137, score-0.83]

63 This gives a very simple and g efﬁcient O( k kn) time algorithm for gapped kernel computations. [sent-138, score-0.29]

64 The wildcard(k,m) kernel [6] in our notation is the sum of the cumulative m m statistics K(X, Y |k, m) = i=0 Ci , i. [sent-140, score-0.196]

65 can be computed in i=0 k rounds of counting sort, i m giving a simple and efﬁcient O( i=0 k (k − i)n) algorithm. [sent-142, score-0.14]

66 The spatial(k,t,d) kernel [13] can be computed by sorting kt-mers iteratively for every arrangement of t k-mers spatially constrained by distance d. [sent-144, score-0.164]

67 The sequence neighborhood kernels [7] proved to be a powerful tool in many sequence analysis tasks. [sent-146, score-0.421]

68 Note the kernel value, if computed directly using Equation 10, will incur quadratic complexity in the size of the neighborhoods. [sent-148, score-0.225]

69 Similar to the single string case, using our algorithmic approach, to compute the neighborhood kernel (over the string sets), we can jointly sort the observed k-mers in N (X) and N (Y ) and apply the desired kernel evaluation method (spectrum, mismatch, or gapped). [sent-149, score-0.91]

70 Under this setting, the neighborhood kernel can be evaluated in time linear to the neighborhood size. [sent-150, score-0.436]

71 This leads to very efﬁcient algorithms for computing sequence neighborhood kernels even for very large datasets, as we will show in the experimental section. [sent-151, score-0.39]

72 6 Evaluation We study the performance of our algorithms, both in running time and predictive accuracy, on synthetic data and standard benchmark datasets for protein sequence analysis and music genre classiﬁcation. [sent-152, score-0.668]

73 The reduced running time requirements of our algorithms open the possibility to consider ”looser” mismatch measures with larger k and m. [sent-153, score-0.712]

74 The results presented here demonstrate that such mismatch kernels with larger (k, m) can lead to state-of-the-art predictive performance even when compared with more complex models such as [2]. [sent-154, score-0.734]

75 For protein sequence classiﬁcation under the semi-supervised setting, we also use the Protein Data Bank (PDB, 17, 232 sequences), the Swiss-Prot (101, 602 sequences), and the non-redundant (NR) databases as the unlabeled datasets, following the setup of [17]. [sent-156, score-0.325]

76 For synthetic data, we generate strings of length n = 105 over alphabets of different sizes and measure the running time of the trie-based and our sufﬁcient statistics based algorithms for evaluating mismatch string kernel. [sent-165, score-1.181]

77 Figure 1 shows relative running time Ttrie /Tss , in logarithmic scale, of the mismatch-trie and mismatch-SS as a function of the alphabet size. [sent-166, score-0.294]

78 As can be seen from the plot, our algorithm demonstrates several orders of magnitude improvements, especially for large alphabet sizes. [sent-167, score-0.198]

79 Table 1 compares running times of our algorithm and the trie-based algorithm for different real dataset (proteins, DNA, text, music) for a single kernel entry (pair of strings) computation. [sent-168, score-0.225]

80 We observe the speed improvements ranging from 100 to 106 times depending on the alphabet size. [sent-169, score-0.233]

81 We also measure the running time for full 7329-by-7329 mismatch(5,2) kernel matrix computations for SCOP dataset under the supervised setting. [sent-170, score-0.222]

82 The running time of our algorithm is 1525 seconds compared to 196052 seconds for the trie-based computations. [sent-171, score-0.146]

83 We consider the tasks of the multi-class music genre classiﬁcation [16], with results in Table 2, and the protein remote homology (superfamily) prediction [9, 2, 18] in Table 3. [sent-199, score-0.679]

84 On the music classiﬁcation task, we observe signiﬁcant improvements in accuracy for larger number of mismatches. [sent-201, score-0.16]

85 The remote protein homology detection, as evident from Table 3, clearly beneﬁts from larger number of allowed mismatches because the remotely related proteins are likely to be separated by multiple mutations or insertions/deletions. [sent-204, score-0.756]

86 In particular, the result on the Swiss-Prot dataset for the (7, 3)-mismatch kernel is very promising and compares well with the best results of the state-of-the-art, but computationally more demanding, proﬁle kernels [2]. [sent-208, score-0.306]

87 This is an important result that addresses a main drawback of the neighborhood kernels, the running time [7, 2]. [sent-214, score-0.238]

88 Table 2: Classiﬁcation per- Table 3: Classiﬁcation performance on protein remote homology formance on music genre prediction classiﬁcation (multi-class) dataset mismatch (5,1) mismatch (5,2) mismatch (7,3) Method Error ROC ROC50 ROC ROC50 ROC ROC50 Mismatch (5,1) 42. [sent-215, score-2.428]

89 32 For multi-class protein fold recognition (Table 4), we similarly observe improvements in performance for larger numbers of allowed mismatches. [sent-243, score-0.422]

90 The balanced error of 25% for the (7,3)-mismatch neighborhood kernel using Swiss-Prot compares well with the best error rate of 26. [sent-244, score-0.33]

91 This improvement makes the string kernels with approximate but looser matching a viable alternative for practical tasks of sequence analysis. [sent-309, score-0.511]

92 Our algorithms work with large databases in supervised and semi-supervised settings and scale well in the alphabet size and the number of allowed mismatches. [sent-310, score-0.312]

93 A machine learning information retrieval approach to protein fold recognition. [sent-313, score-0.349]

94 Proﬁle-based string kernels for remote homology detection and motif extraction. [sent-317, score-0.606]

95 Fast string kernels using inexact matching for protein sequences. [sent-330, score-0.752]

96 The spectrum kernel: A string kernel for SVM protein classiﬁcation. [sent-345, score-0.641]

97 Efﬁcient computation of gapped substring kernels on large alphabets. [sent-361, score-0.35]

98 Fast protein homology and fold detection with sparse spatial sample kernels. [sent-368, score-0.503]

99 Multi-class protein fold recognition using support vector machines and neural networks. [sent-373, score-0.349]

100 On the role of local matching for efﬁcient semisupervised protein sequence classiﬁcation. [sent-387, score-0.346]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mismatch', 0.583), ('protein', 0.226), ('string', 0.215), ('alphabet', 0.198), ('mismatches', 0.182), ('iy', 0.173), ('kernels', 0.151), ('neighborhood', 0.142), ('gapped', 0.138), ('strings', 0.135), ('kernel', 0.126), ('music', 0.125), ('fold', 0.123), ('genre', 0.121), ('homology', 0.121), ('nx', 0.111), ('ix', 0.11), ('christina', 0.104), ('scop', 0.104), ('inexact', 0.104), ('intersection', 0.102), ('mi', 0.091), ('counting', 0.091), ('alphabets', 0.087), ('xix', 0.087), ('yiy', 0.087), ('ny', 0.086), ('remote', 0.086), ('sort', 0.086), ('pro', 0.081), ('le', 0.079), ('spectrum', 0.074), ('hamming', 0.074), ('running', 0.07), ('stafford', 0.069), ('sequence', 0.064), ('substring', 0.061), ('sequences', 0.059), ('matching', 0.056), ('complexity', 0.056), ('classi', 0.053), ('kuksa', 0.052), ('leslie', 0.049), ('rounds', 0.049), ('im', 0.048), ('vladimir', 0.047), ('nr', 0.047), ('pavel', 0.045), ('eugene', 0.045), ('william', 0.045), ('proteins', 0.045), ('size', 0.043), ('jason', 0.042), ('ci', 0.041), ('statistic', 0.039), ('deletions', 0.039), ('distance', 0.038), ('weston', 0.038), ('allowed', 0.038), ('cumulative', 0.038), ('insertions', 0.037), ('ie', 0.037), ('neighborhoods', 0.037), ('datasets', 0.036), ('improvements', 0.035), ('mj', 0.035), ('unlabeled', 0.035), ('eleazar', 0.035), ('eskin', 0.035), ('kuang', 0.035), ('mutational', 0.035), ('superfamily', 0.035), ('ttrie', 0.035), ('transformations', 0.034), ('detection', 0.033), ('similarity', 0.033), ('balanced', 0.033), ('algorithms', 0.033), ('statistics', 0.032), ('kn', 0.031), ('remotely', 0.03), ('wildcard', 0.03), ('rui', 0.03), ('pdb', 0.03), ('dna', 0.03), ('symbols', 0.029), ('table', 0.029), ('compares', 0.029), ('mutations', 0.028), ('substrings', 0.028), ('cation', 0.027), ('min', 0.027), ('cm', 0.026), ('exact', 0.026), ('time', 0.026), ('roc', 0.026), ('text', 0.026), ('seconds', 0.025), ('pairs', 0.025), ('looser', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic

2 0.11096804 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive deﬁnite kernels allow to use large and potentially inﬁnite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efﬁciently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

3 0.089696087 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

Author: Alexandre Bouchard-côté, Dan Klein, Michael I. Jordan

Abstract: Accurate and efﬁcient inference in evolutionary trees is a central problem in computational biology. While classical treatments have made unrealistic site independence assumptions, ignoring insertions and deletions, realistic approaches require tracking insertions and deletions along the phylogenetic tree—a challenging and unsolved computational problem. We propose a new ancestry resampling procedure for inference in evolutionary trees. We evaluate our method in two problem domains—multiple sequence alignment and reconstruction of ancestral sequences—and show substantial improvement over the current state of the art. 1

4 0.087408789 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

Author: Jihun Hamm, Daniel D. Lee

Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of afﬁne as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classiﬁcation and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1

5 0.075628988 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

Author: Paolo Frasconi, Andrea Passerini

Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1

6 0.069659308 239 nips-2008-Tighter Bounds for Structured Estimation

7 0.067684449 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

8 0.066019483 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

9 0.064665586 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models

10 0.063746557 113 nips-2008-Kernelized Sorting

11 0.062106173 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

12 0.058801524 44 nips-2008-Characteristic Kernels on Groups and Semigroups

13 0.055256635 225 nips-2008-Supervised Bipartite Graph Inference

14 0.054256152 143 nips-2008-Multi-label Multiple Kernel Learning

15 0.051893361 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

16 0.050171636 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel

17 0.048746426 78 nips-2008-Exact Convex Confidence-Weighted Learning

18 0.047160782 138 nips-2008-Modeling human function learning with Gaussian processes

19 0.046736442 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

20 0.045692794 62 nips-2008-Differentiable Sparse Coding

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.149), (1, -0.048), (2, -0.013), (3, 0.024), (4, 0.021), (5, -0.03), (6, 0.063), (7, -0.029), (8, 0.05), (9, -0.03), (10, 0.141), (11, -0.021), (12, -0.034), (13, -0.011), (14, 0.087), (15, 0.012), (16, 0.047), (17, -0.044), (18, -0.085), (19, 0.016), (20, 0.04), (21, 0.065), (22, 0.006), (23, -0.033), (24, -0.023), (25, 0.079), (26, -0.074), (27, -0.015), (28, 0.057), (29, 0.019), (30, -0.055), (31, 0.056), (32, 0.04), (33, -0.053), (34, -0.009), (35, 0.113), (36, 0.017), (37, -0.011), (38, 0.052), (39, 0.117), (40, -0.056), (41, 0.166), (42, -0.043), (43, -0.102), (44, 0.001), (45, 0.126), (46, -0.021), (47, -0.069), (48, 0.083), (49, 0.097)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93822473 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic

2 0.63721281 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

Author: Paolo Frasconi, Andrea Passerini

3 0.63238585 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

4 0.60305762 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

Author: Jihun Hamm, Daniel D. Lee

5 0.55875993 225 nips-2008-Supervised Bipartite Graph Inference

Author: Yoshihiro Yamanishi

Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a uniﬁed Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1

6 0.51814473 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

7 0.50868016 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

8 0.47608027 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

9 0.46932033 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

10 0.46899998 44 nips-2008-Characteristic Kernels on Groups and Semigroups

11 0.45034239 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

12 0.42529136 113 nips-2008-Kernelized Sorting

13 0.41569579 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control

14 0.37935936 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels

15 0.36408216 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

16 0.35093161 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems

17 0.341921 196 nips-2008-Relative Margin Machines

18 0.34002969 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

19 0.33196339 3 nips-2008-A Massively Parallel Digital Learning Processor

20 0.33062404 228 nips-2008-Support Vector Machines with a Reject Option

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.043), (7, 0.085), (12, 0.021), (28, 0.156), (54, 0.32), (57, 0.049), (59, 0.015), (63, 0.02), (71, 0.047), (77, 0.032), (78, 0.026), (83, 0.08), (87, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75143719 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic

2 0.70530194 121 nips-2008-Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement

Author: Michael T. Todd, Yael Niv, Jonathan D. Cohen

Abstract: Working memory is a central topic of cognitive neuroscience because it is critical for solving real-world problems in which information from multiple temporally distant sources must be combined to generate appropriate behavior. However, an often neglected fact is that learning to use working memory effectively is itself a difficult problem. The Gating framework [14] is a collection of psychological models that show how dopamine can train the basal ganglia and prefrontal cortex to form useful working memory representations in certain types of problems. We unite Gating with machine learning theory concerning the general problem of memory-based optimal control [5-6]. We present a normative model that learns, by online temporal difference methods, to use working memory to maximize discounted future reward in partially observable settings. The model successfully solves a benchmark working memory problem, and exhibits limitations similar to those observed in humans. Our purpose is to introduce a concise, normative definition of high level cognitive concepts such as working memory and cognitive control in terms of maximizing discounted future rewards. 1 I n t ro d u c t i o n Working memory is loosely defined in cognitive neuroscience as information that is (1) internally maintained on a temporary or short term basis, and (2) required for tasks in which immediate observations cannot be mapped to correct actions. It is widely assumed that prefrontal cortex (PFC) plays a role in maintaining and updating working memory. However, relatively little is known about how PFC develops useful working memory representations for a new task. Furthermore, current work focuses on describing the structure and limitations of working memory, but does not ask why, or in what general class of tasks, is it necessary. Borrowing from the theory of optimal control in partially observable Markov decision problems (POMDPs), we frame the psychological concept of working memory as an internal state representation, developed and employed to maximize future reward in partially observable environments. We combine computational insights from POMDPs and neurobiologically plausible models from cognitive neuroscience to suggest a simple reinforcement learning (RL) model of working memory function that can be implemented through dopaminergic training of the basal ganglia and PFC. The Gating framework is a series of cognitive neuroscience models developed to explain how dopaminergic RL signals can shape useful working memory representations [1-4]. Computationally this framework models working memory as a collection of past observations, each of which can occasionally be replaced with the current observation, and addresses the problem of learning when to update each memory element versus maintaining it. In the original Gating model [1-2] the PFC contained a unitary working memory representation that was updated whenever a phasic dopamine (DA) burst occurred (e.g., due to unexpected reward or novelty). That model was the first to connect working memory and RL via the temporal difference (TD) model of DA firing [7-8], and thus to suggest how working memory might serve a normative purpose. However, that model had limited computational flexibility due to the unitary nature of the working memory (i.e., a singleobservation memory controlled by a scalar DA signal). More recent work [3-4] has partially repositioned the Gating framework within the Actor/Critic model of mesostriatal RL [9-10], positing memory updating as but another cortical action controlled by the dorsal striatal

3 0.65876687 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers

Author: Mohak Shah

Abstract: We derive risk bounds for the randomized classiﬁers in Sample Compression setting where the classiﬁer-speciﬁcation utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classiﬁers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.

4 0.53947514 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

Author: Liu Yang, Rong Jin, Rahul Sukthankar

Abstract: The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classiﬁcation. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classiﬁcation accuracy becomes a challenge. We introduce “Semi-supervised Learning with Weakly-Related Unlabeled Data” (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classiﬁcation tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal wordcorrelation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of stateof-the-art methods for inductive semi-supervised learning and text categorization. We show that SSLW results in a signiﬁcant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test domain.

5 0.5393461 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

6 0.53763026 194 nips-2008-Regularized Learning with Networks of Features

7 0.53739637 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

8 0.53562069 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

9 0.53326714 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

10 0.53312784 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

11 0.53312755 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

12 0.53285468 196 nips-2008-Relative Margin Machines

13 0.53211379 99 nips-2008-High-dimensional support union recovery in multivariate regression

14 0.53195947 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

15 0.53066516 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

16 0.53004682 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

17 0.52988416 179 nips-2008-Phase transitions for high-dimensional joint support recovery

18 0.52943897 113 nips-2008-Kernelized Sorting

19 0.52894157 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

20 0.52858287 4 nips-2008-A Scalable Hierarchical Distributed Language Model