nips nips2001 nips2001-63 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. [sent-11, score-0.855]
2 Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. [sent-12, score-0.539]
3 The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. [sent-13, score-0.601]
4 Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). [sent-14, score-0.327]
5 1 Introduction Support Vector Machine (SVM) [1] is one of the latest and most successful statistical pattern classifier that utilizes a kernel technique [2, 3]. [sent-15, score-0.24]
6 Despite the successful applications of SVM in the field of pattern recognition such as character recognition and text classification, SVM has not been applied to speech recognition that much. [sent-17, score-0.825]
7 This is because SVM assumes that each sample is a vector of fixed dimension, and hence it can not deal with the variable length sequences directly. [sent-18, score-0.207]
8 Because of this, most of the efforts that have been made so far to apply SVM to speech recognition employ linear time normalization, where input feature vector sequences with different lengths are aligned to same length [4]. [sent-19, score-0.732]
9 A variant of this approach is a hybrid of SVM and HMM (hidden Markov model), in which HMM works as a pre-processor to feed time-aligned fixed-dimensional vectors to SVM [5]. [sent-20, score-0.035]
10 Another approach is to utilize probabilistic generative models as a SVM kernel function. [sent-21, score-0.267]
11 This includes the Fisher kernels [6, 7], and conditional symmetric independence (CSI) kernels [8], both of which employ HMMs as the generative models. [sent-22, score-0.191]
12 Since HMMs can treat sequential patterns, SVM that employs the generative models based on HMMs can handle sequential patterns as well. [sent-23, score-0.204]
13 In contrast to those approaches, our approach is a direct extension of the original SVM to the case of variable length sequence. [sent-24, score-0.135]
14 The idea is to incorporate the operation of dynamic time alignment into the kernel function itself. [sent-25, score-0.459]
15 Because of this, the proposed new SVM is called “Dynamic Time-Alignment Kernel SVM (DTAKSVM)”. [sent-26, score-0.045]
16 Unlike the SVM with Fisher kernel that requires two training stages with different training criteria, one is for training the generative models and the second is for training the SVM, the DTAK-SVM uses one training criterion as well as the original SVM. [sent-27, score-0.678]
17 2 Dynamic Time-Alignment Kernel We consider a sequence of vectors X = (x1 , x2 , · · · , x L ), where xi ∈ Rn , L is the length of the sequence, and the notation |X| is sometimes used to represent the length of the sequence instead. [sent-28, score-0.323]
18 For simplification, we at first assume the so-called linear SVM that does not employ non-linear mapping function φ. [sent-29, score-0.089]
19 In such case, the kernel operation in (1) is identical to the inner product operation. [sent-30, score-0.531]
20 1 Formulation for linear kernel Assume that we have two vector sequences X and V . [sent-32, score-0.323]
21 |X| = |V | = L, then the inner product between X and V can be obtained easily as a summation of each inner product between xk and v k for k = 1, · · · , L: L X ·V = xk · v k , (2) k=1 and therefore an SVM classifier can be defined as given in (1). [sent-35, score-0.586]
22 On the other hand in case where the two sequences are different in length, the inner product can not be calculated directly. [sent-36, score-0.328]
23 Even in such case, however, some sort of inner product like operation can be defined if we align the lengths of the patterns. [sent-37, score-0.394]
24 To that end, let ψ(k), θ(k) be the time-warping functions of normalized time frame k for the pattern X and V , respectively, and let “◦” be the new inner product operator instead of the original inner product “·”. [sent-38, score-0.622]
25 Then the new inner product between the two vector sequences X and V can be given by X ◦V = 1 L L xψ(k) · v θ(k) , (3) k=1 where L is a normalized length that can be either |X|, |V | or arbitrary positive integer. [sent-39, score-0.466]
26 As it can be seen from the definition given above, the linear warping function is not suitable for continuous speech recognition, i. [sent-43, score-0.477]
27 frame-synchronous processing, because the sequence lengths, |X| and |V |, should be known beforehand. [sent-45, score-0.043]
28 On the other hand, non-linear time warping, or dynamic time warping (DTW) [9] in other word, enables frame-synchronous processing. [sent-46, score-0.404]
29 Furthermore, the past research on speech recognition has shown that the recognition performance by the non-linear time normalization outperforms the one by the linear time normalization. [sent-47, score-0.663]
30 Because of these reasons, we focus on the non-linear time warping based on DTW. [sent-48, score-0.275]
31 In the standard L DTW, the normalizing factor M ψθ is given as k=1 m(k), and the weighting coefficients m(k) are chosen so that Mψθ is independent of the warping functions. [sent-50, score-0.337]
32 The above optimization problem can be solved efficiently by dynamic programming. [sent-51, score-0.129]
33 The recursive formula in the dynamic programming employed in the present study is as follows G(i, j) = max G(i − 1, j) + Inp(i, j), G(i − 1, j − 1) + 2 Inp(i, j), G(i, j − 1) + Inp(i, j), (6) where Inp(i, j) is the standard inner product between the two vectors corresponding to point i and j. [sent-52, score-0.534]
34 2 (7) Formulation for non-linear kernel In the last subsection, a linear kernel, i. [sent-55, score-0.21]
35 the inner product, for two vector sequences with different lengths has been formulated in the framework of dynamic time-warping. [sent-57, score-0.466]
36 With a little constraint, similar formulation is possible for the case where SVM’s non-linear mapping function Φ is applied to the vector sequences. [sent-58, score-0.127]
37 To that end, Φ is restricted to the one having the following form: Φ(X) = (φ(x1 ), φ(x2 ), · · · , φ(x L )), (8) where φ is a non-linear mapping function that is applied to each frame vector x i , as given in (1). [sent-59, score-0.122]
38 It should be noted that under the above restriction Φ preserves the original length of sequence at the cost of losing long-term correlations such as the one between x1 and xL . [sent-60, score-0.178]
39 As a result, a new class of kernel can be defined by using the extended inner product introduced in the previous section; Ks (X, V ) = Φ(X) ◦ Φ(V ) = max ψ,θ = max ψ,θ 1 Mψθ 1 Mψθ (9) L m(k)φ(xψ(k) ) · φ(v θ(k) ) (10) m(k)K(xψ(k) , vθ(k) ). [sent-61, score-0.597]
40 (11) k=1 L k=1 We call this new kernel “dynamic time-alignment kernel (DTAK)”. [sent-62, score-0.42]
41 3 Properties of the dynamic time-alignment kernel It has not been proven that the proposed function Ks (, ) is really an SVM’s admissible kernel which guarantees the existence of a feature space. [sent-64, score-0.623]
42 This is because that the mapping function to a feature space is not independent but dependent on the given vector sequences. [sent-65, score-0.118]
43 Although a class of data-dependent asymmetric kernel for SVM has been developed in [10], our proposed function is more complicated and difficult to analyze because the input data is a vector sequence with variable length and non-linear time normalization is embedded in the function. [sent-66, score-0.536]
44 On the other hand, L Ks (X, X) = L max ψ,θ φ(xψ(k) ) · φ(xθ(k) ) = k=1 φ(xψ+ (k) ) · φ(xθ+ (k) ). [sent-69, score-0.064]
45 (15) k=1 Because here we assume that ψ+ (k), θ+ (k) are the optimal warping functions that maximize (15), for any warping functions including ψ ∗ (k), the following inequality holds: L Ks (X, X) ≥ L φ(xψ∗ (k) ) · φ(xψ∗ (k) ) = k=1 2 φ(xψ∗ (k) ) . [sent-70, score-0.612]
46 As it can be seen from these expressions, the SVM discriminant function for time sequence has the same form with the original SVM except for the difference in kernels. [sent-73, score-0.116]
47 It is straightforward to deduce the learning problem which is given as 1 W ◦W +C 2 min W,b,ξi subject to N i=1 (i) ξi , yi (W ◦ Φ(X ) + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, · · · , N. [sent-74, score-0.089]
48 (21) (22) Again, since the formulation of learning problem defined above is almost the same with that for the original SVM, same training algorithms for the original SVM can be used to solve the problem. [sent-75, score-0.194]
49 4 Experiments Speech recognition experiments were carried out to evaluated the classification performance of DTAK-SVM. [sent-76, score-0.247]
50 As our objective is to evaluate the basic performance of the proposed method, very limited task, hand-segmented phoneme recognition task in which positions of target patterns in the utterance are known, was chosen. [sent-77, score-0.55]
51 Continuous speech recognition task that does not require phoneme labeling would be our next step. [sent-78, score-0.592]
52 1 Experimental conditions The details of the experimental conditions are given in Table 1. [sent-80, score-0.043]
53 0 C=10 95 90 # SVs / # training samples [%] Correct classification rate [%] 100 85 80 75 70 65 60 55 50 C=0. [sent-83, score-0.182]
54 In consonant-recognition task (Experiment-1), only six voiced-consonants /b,d,g,m,n,N/ were used to save time. [sent-88, score-0.036]
55 The classification task of those 6 phonemes without using contextual information is considered as a relatively difficult task, whereas the classification of 5 vowels /a,i,u,e,o/ (Experiment-2) is considered as an easier task. [sent-89, score-0.148]
56 The proposed DTAK-SVM has been implemented with the publicly available toolkit, SVMTorch [11]. [sent-91, score-0.045]
57 1 depicts the experimental results for Experiment-1, where average values over 5 speakers are shown. [sent-94, score-0.134]
58 Table 2: Recognition performance comparison of DTAK-SVM with HMM. [sent-102, score-0.039]
59 Results of Experiment-1 for 1 male and 1 female speakers are shown. [sent-103, score-0.257]
60 (numbers represent correct classification rate [%]) Model HMM (1 mix. [sent-104, score-0.037]
61 ) DTAK-SVM # training samples/phoneme male female 50 100 200 50 100 200 75. [sent-108, score-0.24]
62 7 Next, the classification performance of DTAK-SVM was compared with that of the state-of-the-art HMM. [sent-138, score-0.039]
63 In order to see the effect of generalization performance on the size of training data set and model complexity, experiments were carried out by varying the number of training samples (50, 100, 200), and mixtures (1,4,8,16) for each state of HMM. [sent-139, score-0.295]
64 The HMM used in this experiment was a 3-states, continuous density, Gaussian-distribution mixtures with diagonal covariances, contextindependent model. [sent-140, score-0.031]
65 The results for Experiment-1 with respect to 1 male and 1 female speakers are given in Table 2. [sent-144, score-0.257]
66 It can be said from the experimental results that DTAK-SVM shows better classification performance when the number of training samples is 50, while comparable performance when the number of samples is 200. [sent-145, score-0.448]
67 One might argue that the number of training samples used in this experiment is not enough at all for HMM to achieve best performance. [sent-146, score-0.182]
68 But such shortage of training samples occurs often in HMMbased real-world speech recognition, especially when context-dependent models are employed, which prevents HMM from improving the generalization performance. [sent-147, score-0.353]
69 5 Conclusions A novel approach to extend the SVM framework for the sequential-pattern classification problem has been proposed by embedding a dynamic time-alignment operation into the kernel. [sent-148, score-0.236]
70 Though long-term correlations between the feature vectors are omitted at the cost of achieving frame-synchronous processing for speech recognition, the proposed DTAK-SVMs demonstrated comparable performance in hand-segmented phoneme recognition with HMMs. [sent-149, score-0.706]
71 The DTAK-SVM is potentially applicable to continuous speech recognition with some extension of One-pass search algorithm [9]. [sent-150, score-0.41]
72 Picone, “Hybrid SVM/HMM architectures for speech recognition,” in ICSLP2000, 2000. [sent-172, score-0.171]
73 Jaakkola and David Haussler, “Exploiting generative models in discriminative classifiers,” in Advances in Neural Information Processing Systems 11 (M. [sent-174, score-0.057]
74 Niranjan, “Data-dependent Kernels in SVM classification of speech patterns,” in ICSLP-2000, vol. [sent-185, score-0.171]
wordName wordTfidf (topN-words)
[('svm', 0.385), ('ks', 0.38), ('warping', 0.275), ('kernel', 0.21), ('recognition', 0.208), ('classi', 0.191), ('phoneme', 0.177), ('speech', 0.171), ('svs', 0.163), ('inner', 0.151), ('dtw', 0.137), ('hmm', 0.134), ('dynamic', 0.129), ('cation', 0.114), ('samples', 0.108), ('product', 0.108), ('inp', 0.101), ('length', 0.094), ('di', 0.092), ('speakers', 0.091), ('female', 0.09), ('yi', 0.089), ('vowels', 0.082), ('employed', 0.082), ('male', 0.076), ('training', 0.074), ('lengths', 0.073), ('hmms', 0.073), ('erent', 0.071), ('dtak', 0.069), ('htk', 0.069), ('sagayama', 0.069), ('sequences', 0.069), ('speaker', 0.065), ('max', 0.064), ('operation', 0.062), ('inequality', 0.062), ('toolkit', 0.06), ('advanced', 0.058), ('alignment', 0.058), ('generative', 0.057), ('school', 0.057), ('er', 0.055), ('japan', 0.055), ('males', 0.054), ('path', 0.054), ('rn', 0.052), ('rbf', 0.052), ('sequential', 0.051), ('xi', 0.049), ('svmtorch', 0.048), ('patterns', 0.045), ('mapping', 0.045), ('proposed', 0.045), ('kernels', 0.045), ('vector', 0.044), ('employ', 0.044), ('sequence', 0.043), ('technology', 0.043), ('experimental', 0.043), ('accumulated', 0.042), ('original', 0.041), ('performance', 0.039), ('formulation', 0.038), ('comparable', 0.037), ('correct', 0.037), ('normalization', 0.037), ('task', 0.036), ('nds', 0.035), ('hybrid', 0.035), ('xk', 0.034), ('frame', 0.033), ('asymmetric', 0.033), ('lkopf', 0.033), ('normalizing', 0.033), ('support', 0.032), ('discriminant', 0.032), ('sch', 0.032), ('continuous', 0.031), ('embedded', 0.03), ('fisher', 0.03), ('holds', 0.03), ('pattern', 0.03), ('phonemes', 0.03), ('ganapathiraju', 0.03), ('juang', 0.03), ('niranjan', 0.03), ('picone', 0.03), ('females', 0.03), ('sim', 0.03), ('japanese', 0.03), ('voiced', 0.03), ('atr', 0.03), ('hmmbased', 0.03), ('mfccs', 0.03), ('orts', 0.03), ('rhs', 0.03), ('feature', 0.029), ('table', 0.029), ('weighting', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
2 0.26012498 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
3 0.24527632 172 nips-2001-Speech Recognition using SVMs
Author: N. Smith, Mark Gales
Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1
4 0.23627912 46 nips-2001-Categorization by Learning and Combining Object Parts
Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio
Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.
5 0.18793483 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
Author: Manfred Opper, Robert Urbanczik
Abstract: Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1
6 0.18515263 122 nips-2001-Model Based Population Tracking and Automatic Detection of Distribution Changes
7 0.18010297 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
8 0.16280083 164 nips-2001-Sampling Techniques for Kernel Methods
9 0.15766864 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems
10 0.15668295 134 nips-2001-On Kernel-Target Alignment
11 0.15230688 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
12 0.14806961 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
13 0.14729035 58 nips-2001-Covariance Kernels from Bayesian Generative Models
14 0.1446726 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
15 0.14451054 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine
16 0.1424897 170 nips-2001-Spectral Kernel Methods for Clustering
17 0.14144085 60 nips-2001-Discriminative Direction for Kernel Classifiers
18 0.14128305 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
19 0.13298184 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
20 0.13244584 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
topicId topicWeight
[(0, -0.328), (1, 0.235), (2, -0.122), (3, 0.053), (4, -0.169), (5, 0.333), (6, 0.132), (7, -0.125), (8, -0.01), (9, 0.132), (10, -0.009), (11, -0.065), (12, -0.003), (13, 0.144), (14, 0.007), (15, 0.03), (16, 0.01), (17, -0.024), (18, -0.026), (19, -0.044), (20, 0.005), (21, -0.002), (22, 0.023), (23, -0.054), (24, -0.013), (25, -0.001), (26, -0.109), (27, 0.018), (28, 0.127), (29, -0.062), (30, -0.024), (31, -0.106), (32, 0.012), (33, 0.041), (34, 0.024), (35, 0.034), (36, 0.062), (37, -0.041), (38, 0.016), (39, -0.029), (40, -0.086), (41, 0.054), (42, -0.041), (43, -0.022), (44, 0.089), (45, -0.049), (46, -0.032), (47, -0.056), (48, -0.006), (49, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.97296971 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
2 0.86167902 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
3 0.70469695 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine
Author: Ji Zhu, Trevor Hastie
Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.
4 0.68970138 172 nips-2001-Speech Recognition using SVMs
Author: N. Smith, Mark Gales
Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1
5 0.62113762 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
Author: Olivier Chapelle, Bernhard Schćžšlkopf
Abstract: The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance , it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice. 1
6 0.5856483 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
7 0.57946199 46 nips-2001-Categorization by Learning and Combining Object Parts
8 0.54483098 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
9 0.53973675 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
10 0.5397262 60 nips-2001-Discriminative Direction for Kernel Classifiers
11 0.52363312 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems
12 0.51416743 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
13 0.51271617 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets
14 0.49431831 164 nips-2001-Sampling Techniques for Kernel Methods
15 0.46150765 74 nips-2001-Face Recognition Using Kernel Methods
16 0.44885254 62 nips-2001-Duality, Geometry, and Support Vector Regression
17 0.44582692 58 nips-2001-Covariance Kernels from Bayesian Generative Models
18 0.44552806 99 nips-2001-Intransitive Likelihood-Ratio Classifiers
19 0.44521406 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
20 0.43064472 105 nips-2001-Kernel Machines and Boolean Functions
topicId topicWeight
[(14, 0.07), (17, 0.027), (19, 0.022), (20, 0.02), (27, 0.115), (30, 0.131), (38, 0.013), (59, 0.059), (66, 0.212), (72, 0.104), (79, 0.052), (83, 0.023), (91, 0.078)]
simIndex simValue paperId paperTitle
same-paper 1 0.85782826 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
2 0.8177796 128 nips-2001-Multiagent Planning with Factored MDPs
Author: Carlos Guestrin, Daphne Koller, Ronald Parr
Abstract: We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.
3 0.7117694 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
Author: Dieter Fox
Abstract: Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.
4 0.69975913 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
Author: Mário Figueiredo
Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.
5 0.69731265 149 nips-2001-Probabilistic Abstraction Hierarchies
Author: Eran Segal, Daphne Koller, Dirk Ormoneit
Abstract: Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in “nearby” classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data. We present experimental results on synthetic data, and on real data in the domains of gene expression and text.
6 0.69278699 60 nips-2001-Discriminative Direction for Kernel Classifiers
7 0.68721956 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
8 0.6866684 46 nips-2001-Categorization by Learning and Combining Object Parts
9 0.68588132 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
10 0.68061846 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
11 0.6782645 56 nips-2001-Convolution Kernels for Natural Language
12 0.67748553 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
13 0.67734194 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines
14 0.67655385 185 nips-2001-The Method of Quantum Clustering
15 0.67607898 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
16 0.67544651 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
17 0.67535985 22 nips-2001-A kernel method for multi-labelled classification
18 0.67479366 1 nips-2001-(Not) Bounding the True Error
19 0.67105484 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
20 0.66868043 172 nips-2001-Speech Recognition using SVMs