nips nips2003 nips2003-173 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie
Abstract: A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Semi-supervised protein classification using cluster kernels Jason Weston∗ Max Planck Institute for Biological Cybernetics, 72076 T¨ bingen, Germany u weston@tuebingen. [sent-1, score-0.552]
2 edu Abstract A key issue in supervised protein classification is the representation of input sequences of amino acids. [sent-9, score-0.463]
3 Recent work using string kernels for protein data has achieved state-of-the-art classification performance. [sent-10, score-0.504]
4 However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. [sent-11, score-0.373]
5 In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. [sent-12, score-0.981]
6 We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. [sent-13, score-1.03]
7 We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. [sent-14, score-0.386]
8 1 Introduction A central problem in computational biology is the classification of proteins into functional and structural classes given their amino acid sequences. [sent-15, score-0.248]
9 The 3D structure that a protein assumes after folding largely determines its function in the cell. [sent-16, score-0.28]
10 However, it is far easier to determine experimentally the primary sequence of a protein than it is to solve the 3D structure. [sent-17, score-0.343]
11 The major methods for homology detection can be split into three basic groups: pairwise sequence comparison algorithms [1, 2], generative models for protein families [3, 4], and discriminative classifiers [5, 6, 7]. [sent-19, score-0.579]
12 Generative models such as profile hidden Markov models (HMMs) model positive examples of a protein family, but they can be trained iteratively using both positively labeled and unlabeled examples by pulling in close homologs and adding them to the positive set. [sent-25, score-1.243]
13 However, these classifiers still require an auxiliary method (such as PSI-BLAST) to handle unlabeled data: one generally adds predicted homologs of the positive training examples to the training set before training the classifier. [sent-28, score-0.765]
14 In practice, relatively little labeled data is available — approximately 30,000 proteins with known 3D structure, some belonging to families and superfamilies with only a handful of labeled members — whereas there are close to one million sequenced proteins, providing abundant unlabeled data. [sent-29, score-0.574]
15 New semi-supervised learning techniques should be able to make better use of this unlabeled data. [sent-30, score-0.249]
16 Recent work in semi-supervised learning has focused on changing the representation given to a classifier by taking into account the structure described by the unlabeled data [9, 10, 11]. [sent-31, score-0.267]
17 These works can be viewed as cases of cluster kernels, which produce similarity metrics based on the cluster assumption: namely, two points in the same “cluster” or region of high density should have a small distance to each other. [sent-32, score-0.375]
18 In this work, we investigate the use of cluster kernels for protein classification by developing two simple and scalable methods for modifying a base kernel. [sent-33, score-0.613]
19 The neighborhood kernel uses averaging over a neighborhood of sequences defined by a local sequence similarity measure, and the bagged kernel uses bagged clustering of the full sequence data set to modify the base kernel. [sent-34, score-1.701]
20 In both the semi-supervised and transductive settings, these techniques greatly improve classification performance when used with mismatch string kernels, and the techniques achieve equal or superior results to all previously presented cluster kernel methods that we tried. [sent-35, score-1.149]
21 Moreover, the neighborhood and bagged kernel approaches are far more computationally efficient than these competing methods. [sent-36, score-0.692]
22 2 Representations and kernels for protein sequences Proteins can be represented as variable length sequences, typically several hundred characters long, from the alphabet of 20 amino acids. [sent-37, score-0.554]
23 If we use kernel methods such as SVMs, which only need to compute inner products K(x, y) = Φ(x), Φ(y) for training and testing, then we can accomplish the above mapping using a kernel for sequence data. [sent-39, score-0.53]
24 These alignment-based scores do not define a positive definite kernel; however, one can use a feature representation based on the empirical kernel map Φ(x) = d(x1 , x), . [sent-43, score-0.369]
25 Note, however, that the method is slow, both because computing each SW score is O(|x|2 ) and because computing each empirically mapped kernel value is O(m). [sent-51, score-0.253]
26 Another appealing idea is to derive the feature representation from a generative model for a protein family. [sent-52, score-0.396]
27 In the Fisher kernel method [5], one first builds a profile HMM for the positive training sequences, defining a log likelihood function log P (x|θ) for any protein sequence x. [sent-53, score-0.605]
28 This representation gives excellent classification results, but the Fisher scores must be computed by an O(|x|2 ) forward-backward algorithm, making the kernel tractable but slow. [sent-55, score-0.298]
29 It is possible to construct useful kernels directly without explicitly depending on generative models by using string kernels. [sent-56, score-0.247]
30 For example, the mismatch kernel [6] is defined by a histogram-like feature map that uses mismatches to capture inexact string matching. [sent-57, score-0.874]
31 The mismatch kernel can be computed efficiently using a trie data structure: the complexity of calculating K(x, y) is O(cK (|x| + |y|)), where cK = k m+1 |A|m . [sent-63, score-0.689]
32 For typical kernel parameters k = 5 and m = 1 [6], the mismatch kernel is fast, scalable and yields impressive performance. [sent-64, score-0.935]
33 Many other interesting models and examples of string kernels have recently been presented. [sent-65, score-0.246]
34 A survey of related string kernel work is given in the longer version of this paper. [sent-66, score-0.317]
35 String kernel methods with SVMs are a powerful approach to protein classification and have consistently performed better than non-discriminative techniques [5, 7, 6]. [sent-67, score-0.536]
36 However, in a real-world setting, protein classifiers have access to unlabeled data. [sent-68, score-0.5]
37 3 Cluster kernels for protein sequences In semi-supervised learning, one tries to improve a classifier trained on labeled data by exploiting (a relatively large set of) unlabeled data. [sent-70, score-0.761]
38 Although classifiers implement the cluster assumption in various ways, we focus on classifiers that re-represent the given data to reflect structure revealed by unlabeled data. [sent-74, score-0.398]
39 If one is using kernels, rather than explicit feature vectors, one can modify the kernel representation by constructing a cluster kernel. [sent-76, score-0.439]
40 In [10], a general framework is presented for producing cluster kernels by modifying the eigenspectrum of the kernel matrix. [sent-77, score-0.478]
41 Two of the main methods presented are the random walk kernel and the spectral clustering kernel. [sent-78, score-0.447]
42 The random walk kernel is a normalized and symmetrized version of a transition matrix corresponding to a t-step random walk. [sent-79, score-0.338]
43 The random representation described in [11] interprets an RBF kernel as a transition matrix of a random walk on a graph with vertices Kij xi , P (xi → xj ) = Kip . [sent-80, score-0.455]
44 The spectral clustering kernel is a simple use of the representation derived from spectral clustering [13] using the first k eigenvectors. [sent-87, score-0.465]
45 A serious problem with these methods is that one must diagonalize a matrix the size of the set of labeled and unlabeled data. [sent-95, score-0.304]
46 Other methods of implementing the cluster assumption such as transductive SVMs [14] also suffer from computational efficiency issues. [sent-96, score-0.31]
47 A second drawback is that these kernels are better suited to a transductive setting (where one is given both the unlabeled and test points in advance) rather than a semi-supervising setting. [sent-97, score-0.488]
48 In order to estimate the kernel for a sequence not present during training, one is forced to solve a difficult regression problem [10]. [sent-98, score-0.269]
49 We wish to transfer the notion of profiles to our mismatch representation of protein sequences. [sent-102, score-0.81]
50 We use a standard sequence similarity measure like BLAST or PSI-BLAST to define a neighborhood Nbd(x) for each input sequence x as the set of sequences x with similarity score to x below a fixed E-value threshold, together with x itself. [sent-103, score-0.563]
51 The neighborhood kernel |Nbd(x)| x ∈Nbd(x) is then defined by: Knbd (x, y) = 1 |Nbd(x)||Nbd(y)| Korig (x , y ). [sent-105, score-0.437]
52 x ∈Nbd(x),y ∈Nbd(y) We will see in the experimental results that this simple neighborhood-averaging technique, used in a semi-supervised setting with the mismatch kernel, dramatically improves classification performance. [sent-106, score-0.507]
53 To see how the neighborhood approach fits with the cluster assumption, consider a set of points in feature space that form a “cluster” or dense region of the data set, and consider the region R formed by the union of the convex hulls of the neighborhood point sets. [sent-107, score-0.668]
54 If the dissimilarity measure is a true distance, the neighborhood averaged vector Φnbd (x) stays inside the convex hull of the vectors in its neighborhood, all the neighborhood vectors stay within region R. [sent-108, score-0.462]
55 5 The bagged mismatch kernel There exist a number of clustering techniques that are much more efficient than the methods mentioned in Section 3. [sent-111, score-1.044]
56 Take the product between the original and bagged kernel: . [sent-122, score-0.255]
57 The intuition behind the approach is that the original kernel is rescaled by the “probability” that two points are in the same cluster, hence encoding the cluster assumption. [sent-131, score-0.385]
58 To estimate the kernel on a test sequence x in a semi-supervised setting, one can assign x to the nearest cluster in each of the bagged runs to compute Kbag (x, xi ). [sent-132, score-0.718]
59 We apply the bagged kernel method with Korig as the mismatch kernel and Kbag built using PSI-BLAST. [sent-133, score-1.15]
60 6 Experiments We measure the recognition performance of cluster kernels methods by testing their ability to classify protein domains into superfamilies in the Structural Classification of Proteins (SCOP) [15]. [sent-134, score-0.605]
61 Data set sequences that were neither in the training nor test sets for experiments from [7] are included as unlabeled data. [sent-141, score-0.339]
62 Our first experiment shows that the neighborhood mismatch kernel makes better use of unlabeled data than the baseline method of “pulling in homologs” prior to training the SVM classifier, that is, simply finding close homologs of 1 0. [sent-159, score-1.606]
63 8 50 Number of families Mismatch(5,1)+homologs ROC−50 Using PSI−BLAST for homologs & neighborhoods 60 40 0. [sent-160, score-0.48]
64 8 1 Neighborhood Mismatch(5,1) ROC−50 Figure 1: Comparison of protein representations and classifiers using unlabeled data. [sent-171, score-0.521]
65 The mismatch kernel is used to represent proteins, with close homologs being pulled in from the unlabeled set with PSI-BLAST. [sent-172, score-1.341]
66 Building a neighborhood with the neighborhood mismatch kernel improves over the baseline of pulling in homologs. [sent-173, score-1.279]
67 mismatch kernel mismatch kernel + homologs neighborhood mismatch kernel BLAST ROC-50 ROC 0. [sent-174, score-2.697]
68 the positive training examples in the unlabeled set and adding them to the positive training set for the SVM. [sent-187, score-0.386]
69 Homologs come from the unlabeled set (not the test set), and “neighbors” for the neighborhood kernel come from the training plus unlabeled data. [sent-188, score-0.911]
70 We compare the methods using the mismatch kernel representation with k = 5 and m = 1, as used in [6]. [sent-189, score-0.757]
71 The neighborhood mismatch kernel uses the same threshold to choose neighborhoods. [sent-192, score-0.92]
72 For the neighborhood kernel, we normalize before and after the averaging operation via Kij ← Kij / Kii Kjj . [sent-193, score-0.25]
73 A signed rank test shows that the neighborhood mismatch kernel yields significant improvement over adding homologs (pvalue 3. [sent-196, score-1.375]
74 However, the comparison of methods in a truly inductive setting using BLAST shows the same improvement of the neighborhood mismatch kernel over adding homologs (p-value 8. [sent-199, score-1.414]
75 Adding homologs to the (much larger) negative training set in addition to pulling in the positive homologs gives poorer performance than only adding the positive homologs (results not shown). [sent-201, score-1.435]
76 In the following experiments, we consider a transductive setting, in which the test points are given to the methods in advance as unlabeled data, giving slightly improved results over the last section. [sent-203, score-0.389]
77 Although this setting is unrealistic for a real protein classification system, it more easily enables comparison with random walk and spectral clustering kernels, which do not easily work in another setting. [sent-204, score-0.542]
78 In Figure 2 (left), we again show the mismatch kernel compared with pulling in homologs and the neighborhood kernel. [sent-205, score-1.447]
79 8 10 1 0 0 PSI−BLAST + close homologs spectral cluster, k=100 random walk, t=2 0. [sent-210, score-0.506]
80 8 1 Figure 2: Comparison of protein representations and classifiers using unlabeled data in a transductive setting. [sent-214, score-0.632]
81 Neighborhood and bagged mismatch kernels outperform pulling in close homologs (left) and equal or outperform previous semi-supervised methods (right). [sent-215, score-1.478]
82 mismatch kernel mismatch kernel + homologs neighborhood mismatch kernel bagged mismatch kernel (k = 100) bagged mismatch kernel (k = 400) ROC-50 0. [sent-216, score-4.585]
83 935 PSI-BLAST kernel PSI-BLAST+homologs kernel spectral clustering kernel random walk kernel transductive SVM ROC-50 0. [sent-226, score-1.155]
84 bagged k-means with k = 100 and n = 100 runs, which gave the best results. [sent-237, score-0.255]
85 Both methods do not work well for the mismatch kernel (see online supplement), perhaps because the feature vectors are so orthogonal. [sent-241, score-0.737]
86 However, for a PSI-BLAST representation via empirical kernel map, the random walk outperforms pulling in homologs. [sent-242, score-0.495]
87 For pulling in close homologs, we take the empirical kernel map only for points in the training set and the chosen close homologs. [sent-249, score-0.476]
88 05) finds no significant difference between the neighborhood kernel, the bagged kernel (k = 100), and the random walk kernel in this transductive setting. [sent-253, score-1.123]
89 7 Discussion Two of the most important issues in protein classication are representation of sequences and handling unlabeled data. [sent-255, score-0.651]
90 Two developments in recent kernel methods research, string kernels and cluster kernels, address these issues separately. [sent-256, score-0.61]
91 We have described two kernels — the neighborhood mismatch kernel and the bagged mismatch kernel — that combine both approaches and yield state-of-the-art performance in protein classification. [sent-257, score-2.257]
92 Practical use of semi-supervised protein classification techniques requires computational efficiency. [sent-258, score-0.309]
93 Many cluster kernels require diagonalization of the full labeled plus unlabeled data kernel matrix. [sent-259, score-0.761]
94 The neighborhood and bagged kernel approaches, used with an efficient string kernel, are fast and scalable cluster kernels for sequence data. [sent-260, score-1.178]
95 Future work will deal with additional challenges of protein classification: addressing the full multi-class problem, which potentially involves thousands of classes; handling very small classes with few homologs; and dealing with missing classes, for which no labeled examples exist. [sent-262, score-0.402]
96 Acknowledgments We would like to thank Eleazar Eskin for discussions that contributed to the neighborhood kernel and Olivier Chapelle and Navin Lal for their help with this work. [sent-263, score-0.437]
97 Hidden markov models in computational biology: Applications to protein modeling. [sent-286, score-0.28]
98 Combining pairwise sequence similarity and support vector machines for remote protein homology detection. [sent-316, score-0.526]
99 Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. [sent-330, score-0.31]
100 Learning from labeled and unlabeled data with label propagation. [sent-335, score-0.283]
wordName wordTfidf (topN-words)
[('mismatch', 0.483), ('homologs', 0.399), ('protein', 0.28), ('bagged', 0.255), ('neighborhood', 0.231), ('unlabeled', 0.22), ('blast', 0.208), ('kernel', 0.206), ('cluster', 0.159), ('nbd', 0.144), ('pulling', 0.128), ('kernels', 0.113), ('string', 0.111), ('transductive', 0.111), ('roc', 0.104), ('walk', 0.096), ('sequences', 0.085), ('proteins', 0.082), ('families', 0.081), ('classi', 0.08), ('biology', 0.068), ('homology', 0.064), ('kbag', 0.064), ('sequence', 0.063), ('labeled', 0.063), ('spectral', 0.056), ('remote', 0.053), ('psi', 0.051), ('amino', 0.051), ('clustering', 0.05), ('molecular', 0.049), ('korig', 0.048), ('scop', 0.048), ('representation', 0.047), ('score', 0.047), ('scores', 0.045), ('sw', 0.044), ('cp', 0.044), ('scalable', 0.04), ('alignment', 0.037), ('similarity', 0.037), ('xi', 0.035), ('xj', 0.035), ('svms', 0.035), ('ers', 0.034), ('training', 0.034), ('weston', 0.034), ('kij', 0.033), ('close', 0.033), ('adding', 0.032), ('altschul', 0.032), ('gapped', 0.032), ('hubbard', 0.032), ('noble', 0.032), ('superfamilies', 0.032), ('supplemental', 0.032), ('pro', 0.031), ('cation', 0.031), ('database', 0.03), ('pairwise', 0.029), ('structural', 0.029), ('techniques', 0.029), ('kip', 0.028), ('feature', 0.027), ('fisher', 0.025), ('mismatches', 0.025), ('alphabet', 0.025), ('eskin', 0.025), ('svm', 0.025), ('signed', 0.024), ('setting', 0.024), ('generative', 0.023), ('outperform', 0.023), ('examples', 0.022), ('miller', 0.022), ('map', 0.022), ('positive', 0.022), ('methods', 0.021), ('leslie', 0.021), ('dii', 0.021), ('kd', 0.021), ('representations', 0.021), ('ak', 0.02), ('concerning', 0.02), ('le', 0.02), ('points', 0.02), ('averaging', 0.019), ('zhou', 0.019), ('assumption', 0.019), ('handling', 0.019), ('appealing', 0.019), ('classes', 0.018), ('comparison', 0.018), ('chapelle', 0.018), ('ck', 0.018), ('bingen', 0.018), ('ef', 0.018), ('random', 0.018), ('advance', 0.017), ('planck', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels
Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie
Abstract: A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. 1
2 0.18670644 113 nips-2003-Learning with Local and Global Consistency
Author: Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, Bernhard Schölkopf
Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data. 1
3 0.17114566 112 nips-2003-Learning to Find Pre-Images
Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir
Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1
4 0.13228637 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen
Abstract: We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms. 1
5 0.10895456 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
Author: Pedro J. Moreno, Purdy P. Ho, Nuno Vasconcelos
Abstract: Over the last years significant efforts have been made to develop kernels that can be applied to sequence data such as DNA, text, speech, video and images. The Fisher Kernel and similar variants have been suggested as good ways to combine an underlying generative model in the feature space and discriminant classifiers such as SVM’s. In this paper we suggest an alternative procedure to the Fisher kernel for systematically finding kernel functions that naturally handle variable length sequence data in multimedia domains. In particular for domains such as speech and images we explore the use of kernel functions that take full advantage of well known probabilistic models such as Gaussian Mixtures and single full covariance Gaussian models. We derive a kernel distance based on the Kullback-Leibler (KL) divergence between generative models. In effect our approach combines the best of both generative and discriminative methods and replaces the standard SVM kernels. We perform experiments on speaker identification/verification and image classification tasks and show that these new kernels have the best performance in speaker verification and mostly outperform the Fisher kernel based SVM’s and the generative classifiers in speaker identification and image classification. 1
6 0.095779218 126 nips-2003-Measure Based Regularization
7 0.089572169 46 nips-2003-Clustering with the Connectivity Kernel
8 0.085499726 107 nips-2003-Learning Spectral Clustering
9 0.07930553 73 nips-2003-Feature Selection in Clustering Problems
10 0.074758172 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering
11 0.074565828 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
12 0.067368068 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method
13 0.066196181 111 nips-2003-Learning the k in k-means
14 0.062786132 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
15 0.062586755 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
16 0.059085801 176 nips-2003-Sequential Bayesian Kernel Regression
17 0.058894377 172 nips-2003-Semi-Supervised Learning with Trees
18 0.056080349 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
19 0.054379746 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
20 0.053397905 48 nips-2003-Convex Methods for Transduction
topicId topicWeight
[(0, -0.189), (1, -0.131), (2, -0.021), (3, -0.048), (4, -0.06), (5, 0.097), (6, -0.119), (7, 0.054), (8, 0.056), (9, 0.07), (10, 0.108), (11, 0.025), (12, -0.067), (13, -0.079), (14, 0.162), (15, 0.028), (16, 0.095), (17, 0.105), (18, -0.066), (19, -0.086), (20, 0.012), (21, 0.043), (22, 0.005), (23, 0.02), (24, -0.057), (25, -0.096), (26, -0.06), (27, 0.185), (28, -0.041), (29, -0.107), (30, -0.038), (31, -0.088), (32, -0.062), (33, 0.078), (34, 0.005), (35, 0.053), (36, 0.101), (37, -0.02), (38, -0.078), (39, -0.123), (40, -0.101), (41, 0.028), (42, -0.209), (43, 0.018), (44, -0.064), (45, -0.036), (46, -0.143), (47, 0.067), (48, 0.007), (49, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.9503181 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels
Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie
Abstract: A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. 1
2 0.71265197 112 nips-2003-Learning to Find Pre-Images
Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir
Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1
3 0.710931 113 nips-2003-Learning with Local and Global Consistency
Author: Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, Bernhard Schölkopf
Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data. 1
4 0.618577 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen
Abstract: We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms. 1
5 0.54536283 126 nips-2003-Measure Based Regularization
Author: Olivier Bousquet, Olivier Chapelle, Matthias Hein
Abstract: We address in this paper the question of how the knowledge of the marginal distribution P (x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations. 1
6 0.44900677 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
7 0.43194285 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
8 0.42323834 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
9 0.42042983 176 nips-2003-Sequential Bayesian Kernel Regression
10 0.4041124 46 nips-2003-Clustering with the Connectivity Kernel
11 0.40120676 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering
12 0.39199889 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification
13 0.38591179 107 nips-2003-Learning Spectral Clustering
14 0.35616702 111 nips-2003-Learning the k in k-means
15 0.34662917 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
16 0.34059095 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
17 0.33838513 172 nips-2003-Semi-Supervised Learning with Trees
18 0.33628312 99 nips-2003-Kernels for Structured Natural Language Data
19 0.33245483 48 nips-2003-Convex Methods for Transduction
20 0.33118862 60 nips-2003-Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis
topicId topicWeight
[(0, 0.039), (11, 0.417), (30, 0.011), (35, 0.033), (49, 0.021), (53, 0.084), (71, 0.052), (76, 0.071), (85, 0.06), (91, 0.091)]
simIndex simValue paperId paperTitle
1 0.9869765 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors
Author: Masakazu Yagi, Hideo Yamasaki, Tadashi Shibata
Abstract: A mixed-signal image filtering VLSI has been developed aiming at real-time generation of edge-based image vectors for robust image recognition. A four-stage asynchronous median detection architecture based on analog digital mixed-signal circuits has been introduced to determine the threshold value of edge detection, the key processing parameter in vector generation. As a result, a fully seamless pipeline processing from threshold detection to edge feature map generation has been established. A prototype chip was designed in a 0.35-µm double-polysilicon three-metal-layer CMOS technology and the concept was verified by the fabricated chip. The chip generates a 64-dimension feature vector from a 64x64-pixel gray scale image every 80µsec. This is about 104 times faster than the software computation, making a real-time image recognition system feasible. 1 In tro du c ti o n The development of human-like image recognition systems is a key issue in information technology. However, a number of algorithms developed for robust image recognition so far [1]-[3] are mostly implemented as software systems running on general-purpose computers. Since the algorithms are generally complex and include a lot of floating point operations, they are computationally too expensive to build real-time systems. Development of hardware-friendly algorithms and their direct VLSI implementation would be a promising solution for real-time response systems. Being inspired by the biological principle that edge information is firstly detected in the visual cortex, we have developed an edge-based image representation algorithm compatible to hardware processing. In this algorithm, multiple-direction edges extracted from an original gray scale image is utilized to form a feature vector. Since the spatial distribution of principal edges is represented by a vector, it was named Projected Principal-Edge Distribution (PPED) [4],[5], or formerly called Principal Axis Projection (PAP) [6],[7]. (The algorithm is explained later.) Since the PPED vectors very well represent the human perception of similarity among images, robust image recognition systems have been developed using PPED vectors in conjunction with the analog soft pattern classifier [4],[8], the digital VQ (Vector Quantization) processor [9], and support vector machines [10] . The robust nature of PPED representation is demonstrated in Fig. 1, where the system was applied to cephalometric landmark identification (identifying specific anatomical landmarks on medical radiographs) as an example, one of the most important clinical practices of expert dentists in orthodontics [6],[7]. Typical X-ray images to be experienced by apprentice doctors were converted to PPED vectors and utilized as templates for vector matching. The system performance has been proven for 250 head film samples regarding the fundamental 26 landmarks [11]. Important to note is the successful detection of the landmark on the soft tissue boundary (the tip of the lower lip) shown in Fig. 1(c). Landmarks on soft tissues are very difficult to detect as compared to landmarks on hard tissues (solid bones) because only faint images are captured on radiographs. The successful detection is due to the median algorithm that determines the threshold value for edge detection. Sella Nasion Orbitale By our system (a) By expert dentists Landmark on soft tissue (b) (c) Fig. 1: Image recognition using PPED vectors: (a,b) cephalometric landmark identification; (c) successful landmark detection on soft tissue. We have adopted the median value of spatial variance of luminance within the filtering kernel (5x5 pixels), which allows us to extract all essential features in a delicate gray scale image. However, the problem is the high computational cost in determining the median value. It takes about 0.6 sec to generate one PPED vector from a 64x64-pixel image (a standard image size for recognition in our system) on a SUN workstation, making real time processing unrealistic. About 90% of the computation time is for edge detection from an input image, in which most of the time is spent for median detection. Then the purpose of this work is to develop a new architecture median-filter VLSI subsystem for real-time PPED-vector generation. Special attention has been paid to realize a fully seamless pipeline processing from threshold detection to edge feature map generation by employing the four-stage asynchronous median detection architecture. 2 P r o je c t e d P r i n c i pa l E dg e Dis tribution (PPED ) Projected Principal Edge Distribution (PPED) algorithm [5],[6] is briefly explained using Fig. 2(a). A 5x5-pixel block taken from a 64x64-pixel target image is subjected to edge detection filtering in four principal directions, i.e. horizontal, vertical, and ±45-degree directions. In the figure, horizontal edge filtering is shown as an example. (The filtering kernels used for edge detection are given in Fig. 2(b).) In order to determine the threshold value for edge detection, all the absolute-value differences between two neighboring pixels are calculated in both vertical and horizontal directions and the median value is taken as the threshold. By scanning the 5x5-pixel filtering kernels in the target image, four 64x64 edge-flag maps are generated, which are called feature maps. In the horizontal feature map, for example, edge flags in every four rows are accumulated and spatial distribution of edge flags are represented by a histogram having 16 elements. Similar procedures are applied to other three directions to form respective histograms each having 16 elements. Finally, a 64-dimension vector is formed by series-connecting the four histograms in the order of horizontal, +45-degree, vertical, and –45-degree. Edge Detection 64x64 Feature Map (64x64) (Horizontal) 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 -1-1-1-1-1 0 0 0 0 0 (Horizontal) Threshold || Median Scan (16 elements) Edge Filter PPED Vector (Horizontal Section) 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 -1 0 1 0 -1 0 1 0 -1 -1 0 0 -1 0 0 0 Horizontal +45-degree 0 0 0 0 0 Threshold Detection Absolute value difference between neiboring pels. 1 1 1 1 1 0 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 -1 0 0 0 1 0 -1 -1 0 0 1 0 -1 0 0 1 1 0 -1 0 0 0 1 0 Vertical (a) -45-degree (b) Fig. 2: PPED algorithm (a) and filtering kernels for edge detection (b). 3 Sy stem Orga ni za ti o n The system organization of the feature map generation VLSI is illustrated in Fig. 3. The system receives one column of data (8-b x 5 pixels) at each clock and stores the data in the last column of the 5x6 image buffer. The image buffer shifts all the stored data to the right at every clock. Before the edge filtering circuit (EFC) starts detecting four direction edges with respect to the center pixel in the 5x5 block, the threshold value calculated from all the pixel data in the 5x5 block must be ready in time for the processing. In order to keep the coherence of the threshold detection and the edge filtering processing, the two last-in data locating at column 5 and 6 are given to median filter circuit (MFC) in advance via absolute value circuit (AVC). AVC calculates all luminance differences between two neighboring pixels in columns 5 and 6. In this manner, a fully seamless pipeline processing from threshold detection to edge feature map generation has been established. The key requirement here is that MFC must determine the median value of the 40 luminance difference data from the 5x5-pixel block fast enough to carry out the seamless pipeline processing. For this purpose, a four-stage asynchronous median detection architecture has been developed which is explained in the following. Edge Filtering Circuit (EFC) 6 5 4 3 2 1 Edge flags H +45 V Image buffer 8-b x 5 pixels (One column) Absolute Value Circuit (AVC) Threshold value Median Filter Circuit (MFC) -45 Feature maps Fig. 3: System organization of feature map generation VLSI. The well-known binary search algorithm was adopted for fast execution of median detection. The median search processing for five 4-b data is illustrated in Fig. 4 for the purpose of explanation. In the beginning, majority voting is carried out for the MSB’s of all data. Namely, the number of 1’s is compared with the number of 0’s and the majority group wins. The majority group flag (“0” in this example) is stored as the MSB of the median value. In addition, the loser group is withdrawn in the following voting by changing all remaining bits to the loser MSB (“1” in this example). By repeating the processing, the median value is finally stored in the median value register. Elapse of time Median Register : 0 1 X X 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MVC0 MVC1 MVC2 MVC3 MVC0 MVC1 MVC2 MVC3 MVC0 MVC1 MVC2 MVC3 MVC0 MVC1 MVC2 MVC3 Majority Flag : 0 0 X X X Majority Voting Circuit (MVC) Fig. 4: Hardware algorithm for median detection by binary search. How the median value is detected from all the 40 8-b data (20 horizontal luminance difference data and 20 vertical luminance difference data) is illustrated in Fig. 5. All the data are stored in the array of median detection units (MDU’s). At each clock, the array receives four vertical luminance difference data and five horizontal luminance difference data calculated from the data in column 5 and 6 in Fig. 3. The entire data are shifted downward at each clock. The median search is carried out for the upper four bits and the lower four bits separately in order to enhance the throughput by pipelining. For this purpose, the chip is equipped with eight majority voting circuits (MVC 0~7). The upper four bits from all the data are processed by MVC 4~7 in a single clock cycle to yield the median value. In the next clock cycle, the loser information is transferred to the lower four bits within each MDU and MVC0~3 carry out the median search for the lower four bits from all the data in the array. Vertical Luminance Difference AVC AVC AVC AVC Horizontal Luminance Difference AVC AVC AVC AVC AVC Shift Shift Median Detection Unit (MDU) x (40 Units) Lower 4bit Upper 4bit MVC0 MVC2 MVC1 MVC3 MVC4 MVC5 MVC6 MVC7 MVCs for upper 4bit MVCs for lower 4bit Fig. 5: Median detection architecture for all 40 luminance difference data. The majority voting circuit (MVC) is shown in Fig. 6. Output connected CMOS inverters are employed as preamplifiers for majority detection which was first proposed in Ref. [12]. In the present implementation, however, two preamps receiving input data and inverted input data are connected to a 2-stage differential amplifier. Although this doubles the area penalty, the instability in the threshold for majority detection due to process and temperature variations has been remarkably improved as compared to the single inverter thresholding in Ref. [12]. The MVC in Fig. 6 has 41 input terminals although 40 bits of data are inputted to the circuit at one time. Bit “0” is always given to the terminal IN40 to yield “0” as the majority when there is a tie in the majority voting. PREAMP IN0 PREAMP 2W/L IN0 2W/L OUT W/L ENBL W/L W/L IN1 IN1 2W/L 2W/L W/L ENBL IN40 W/L W/L IN40 Fig. 6: Majority voting circuit (MVC). The edge filtering circuit (EFC) in Fig. 3 is composed as a four-stage pipeline of regular CMOS digital logic. In the first two stages, four-direction edge gradients are computed, and in the succeeding two stages, the detection of the largest gradient and the thresholding is carried out to generate four edge flags. 4 E x p e r i m e n t a l R es u l t s The feature map generation VLSI was fabricated in a 0.35-µm double-poly three-metal-layer CMOS technology. A photomicrograph of the proof-of-concept chip is shown in Fig. 7. The measured waveforms of the MVC at operating frequencies of 10MHz and 90MHz are demonstrated in Fig. 8. The input condition is in the worst case. Namely, 21 “1” bits and 20 “0” bits were fed to the inputs. The observed computation time is about 12 nsec which is larger than the simulation result of 2.5 nsec. This was caused by the capacitance loading due to the probing of the test circuit. In the real circuit without external probing, we confirmed the average computation time of 4~5 nsec. Edge-detection Filtering Circuit Processing Technology 0.35µm CMOS 2-Poly 3-Metal Median Filter Control Unit Chip Size 4.5mm x 4.5mm MVC Majority Voting Circuit X8 Supply Voltage 3.3 V Operation Frequengy 50MHz Vector Generator Fig. 7: Photomicrograph and specification of the fabricated proof-of-concept chip. 1V/div 5ns/div MVC_Output 1V/div 8ns/div MVC_OUT IN IN 1 Majority Voting operation (a) Majority Voting operation (b) Fig. 8: Measured waveforms of majority voting circuit (MVC) at operation frequencies of 10MHz (a) and 90 MHz (b) for the worst-case input data. The feature maps generated by the chip at the operation frequency of 25 MHz are demonstrated in Fig. 9. The power dissipation was 224 mW. The difference between the flag bits detected by the chip and those obtained by computer simulation are also shown in the figure. The number of error flags was from 80 to 120 out of 16,384 flags, only a 0.6% of the total. The occurrence of such error bits is anticipated since we employed analog circuits for median detection. However, such error does not cause any serious problems in the PPED algorithm as demonstrated in Figs. 10 and 11. The template matching results with the top five PPED vector candidates in Sella identification are demonstrated in Fig. 11, where Manhattan distance was adopted as the dissimilarity measure. The error in the feature map generation processing yields a constant bias to the dissimilarity and does not affect the result of the maximum likelihood search. Generated Feature maps Difference as compared to computer simulation Sella Horizontal Plus 45-degrees Vertical Minus 45-degrees Fig. 9: Feature maps for Sella pattern generated by the chip. Generated PPED vector by the chip Sella Difference as compared to computer simulation Dissimilarity (by Manhattan Distance) Fig. 10: PPED vector for Sella pattern generated by the chip. The difference in the vector components between the PPED vector generated by the chip and that obtained by computer simulation is also shown. 1200 Measured Data 1000 800 Computer Simulation 600 400 200 0 1st (Correct) 2nd 3rd 4th 5th Candidates in Sella recognition Fig. 11: Comparison of template matching results. 5 Conclusion A mixed-signal median filter VLSI circuit for PPED vector generation is presented. A four-stage asynchronous median detection architecture based on analog digital mixed-signal circuits has been introduced. As a result, a fully seamless pipeline processing from threshold detection to edge feature map generation has been established. A prototype chip was designed in a 0.35-µm CMOS technology and the fab- ricated chip generates an edge based image vector every 80 µsec, which is about 10 4 times faster than the software computation. Acknowledgments The VLSI chip in this study was fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo with the collaboration by Rohm Corporation and Toppan Printing Corporation. The work is partially supported by the Ministry of Education, Science, Sports, and Culture under Grant-in-Aid for Scientific Research (No. 14205043) and by JST in the program of CREST. References [1] C. Liu and Harry Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition”, IEEE Transactions on Image Processing, Vol. 11, No.4, Apr. 2002. [2] C. Yen-ting, C. Kuo-sheng, and L. Ja-kuang, “Improving cephalogram analysis through feature subimage extraction”, IEEE Engineering in Medicine and Biology Magazine, Vol. 18, No. 1, 1999, pp. 25-31. [3] H. Potlapalli and R. C. Luo, “Fractal-based classification of natural textures”, IEEE Transactions on Industrial Electronics, Vol. 45, No. 1, Feb. 1998. [4] T. Yamasaki and T. Shibata, “Analog Soft-Pattern-Matching Classifier Using Floating-Gate MOS Technology,” Advances in Neural Information Processing Systems 14, Vol. II, pp. 1131-1138. [5] Masakazu Yagi, Tadashi Shibata, “An Image Representation Algorithm Compatible to Neural-Associative-Processor-Based Hardware Recognition Systems,” IEEE Trans. Neural Networks, Vol. 14, No. 5, pp. 1144-1161, September (2003). [6] M. Yagi, M. Adachi, and T. Shibata,
2 0.93904984 37 nips-2003-Automatic Annotation of Everyday Movements
Author: Deva Ramanan, David A. Forsyth
Abstract: This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The 3D motion capture data is manually annotated off-line using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate.
3 0.90985 88 nips-2003-Image Reconstruction by Linear Programming
Author: Koji Tsuda, Gunnar Rätsch
Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.
same-paper 4 0.846313 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels
Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie
Abstract: A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. 1
5 0.68799543 12 nips-2003-A Model for Learning the Semantics of Pictures
Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon
Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1
6 0.65706313 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution
7 0.57679474 168 nips-2003-Salient Boundary Detection using Ratio Contour
8 0.5145663 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
9 0.50152004 112 nips-2003-Learning to Find Pre-Images
10 0.49100843 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
11 0.48604295 164 nips-2003-Ranking on Data Manifolds
12 0.48578423 119 nips-2003-Local Phase Coherence and the Perception of Blur
13 0.48409832 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion
14 0.48334998 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation
15 0.48314852 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence
16 0.48088646 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models
17 0.47551253 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
18 0.46997523 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
19 0.46326858 120 nips-2003-Locality Preserving Projections
20 0.46005851 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples