nips nips2001 nips2001-172 knowledge-graph by maker-knowledge-mining

172 nips-2001-Speech Recognition using SVMs


Source: pdf

Author: N. Smith, Mark Gales

Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. [sent-11, score-0.267]

2 The score-space defined by this mapping avoids some limitations of the Fisher score. [sent-14, score-0.205]

3 Class-conditional generative models are directly incorporated into the definition of the score-space. [sent-15, score-0.361]

4 The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. [sent-16, score-0.697]

5 State-of-the-art systems use Hidden Markov Models (HMMs), either trained to maximise likelihood or discriminatively, to achieve good levels of performance. [sent-18, score-0.247]

6 This paper examines the application of SVMs to speech recognition. [sent-22, score-0.187]

7 Their scheme uses generative probability models of the data to define a mapping into a fixed dimension space, the Fisher score-space. [sent-33, score-0.409]

8 This paper examines the suitability of the Fisher kernel for classification in speech recognition and proposes an alternative, more useful, kernel. [sent-36, score-0.406]

9 In addition some normalisation issues associated with using this kernel for speech recognition are addressed. [sent-37, score-0.522]

10 OT) where Ot E ~D , and a set of generative probability models of the observation sequences as P = {Pk(OI(h)}, where 9 k is the vector of parameters for the kth member of the set. [sent-45, score-0.393]

11 The observation sequence 0 can be mapped into a vector of fixed dimension [4], i{J~ (0) (1) f(¡) is the score-argument and is a function of the members of the set of generative models P. [sent-46, score-0.396]

12 What are the best generative models, scorearguments and score-operators to use? [sent-50, score-0.264]

13 2 Score-spaces As HMMs have proved successful in speech recognition, they are a natural choice as the generative models for this task. [sent-51, score-0.462]

14 For a two-class problem, let Pi(019 i ) represent a generative model, where i = {g, 1, 2} (g denotes the global2-class generative model, and 1 and 2 denote the class-conditional generative models for the two competing classes). [sent-54, score-0.849]

15 Previous schemes have used the log of a single generative model, Inpi (019 i ) representing either both classes as in the original Fisher score (i = g) [4], or one of the classes (i = 1 or 2) [6]. [sent-55, score-0.497]

16 The score-space proposed in this paper uses the log of the ratio of the two classconditional generative models, In(P1(019d / P2(019 2)) where 9 = [9{,9J] T. [sent-57, score-0.33]

17 Thus, i{J~k(O) (2) i{J~(0) (3) The likelihood-ratio score-space can be shown to avoid some of the limitations of the likelihood score-space, and may be viewed as a generalisation of the standard generative model classifier. [sent-59, score-0.38]

18 Having proposed forms for the generative models and score-arguments, the scoreoperators must be selected. [sent-61, score-0.321]

19 The 1st-order derivatives of the log probability of the sequence 0 with respect to the model parameters are given below1, where the derivative operator has been defined to give column vectors, T L ')'jk(t)S~,jkl t= l lFor fuller details of the derivations see [2). [sent-72, score-0.451]

20 (4) V Wjk Inp(OIO) where S[t ,jk] Ijdt) is the posterior probability of component k of state j at time t. [sent-73, score-0.189]

21 Assuming the HMM is left-to-right with no skips and assuming that a state only appears once in the HMM (i. [sent-74, score-0.151]

22 From the definitions above, the score for an utterance is a weighted sum of scores for individual observations. [sent-77, score-0.172]

23 If the scores for the same utterance spoken at different speaking rates were calculated, they would lie in different regions of score-space simply because of differing numbers of observations. [sent-78, score-0.179]

24 To ease the task of the classifier in score-space, the score-space may be normalised by the number of observations, called sequence length normalisation. [sent-79, score-0.272]

25 One method of normalisation redefines score-spaces using generative models trained to maximise a modified log likelihood function, In( 010). [sent-81, score-0.887]

26 Consider that state j has entry time Tj and duration d j (both in numbers of observations) and output probability bj(Ot) for observation Ot [7]. [sent-82, score-0.171]

27 However, in this paper, a simpler normalisation method is employed. [sent-85, score-0.253]

28 The generative models are trained to maximise the standard likelihood function. [sent-86, score-0.568]

29 Rather than define the score-space using standard state posteriors Ij(t) (the posterior probability of state j at time t), it is defined on state posteriors normalised by the total state occupancy over the utterance. [sent-87, score-0.774]

30 The standard component posteriors 1 j k (t) are replaced in Equations 4 to 6 and 8 by their normalised form 'Yjk(t), A . [sent-88, score-0.241]

31 ~k (t) _ - Ij(t) T (WjkN(Ot; ILjk, ~jk) K 2:: T=l/j(T) 2:: i = l wjiN(ot; ILji' ~ji) ) (10) In effect, each derivative is divided by the sum of state posteriors. [sent-89, score-0.201]

32 This is preferred to division by the total number of observations T which assumes that when the utterance length varies, the occupation of every state in the state sequence is scaled by the same ratio. [sent-90, score-0.437]

33 The nature of the score-space affects the discriminative power of classifiers built in the score-space. [sent-92, score-0.33]

34 For example, the likelihood score-space defined on a two-class 2Due to the sum to unity constraints, one of the weight parameters in each Gaussian mixture is discarded from the definition of the super-vector, as are the forward transitions in the left-to-right HMM with no skips. [sent-93, score-0.158]

35 generative model is susceptible to wrap-around [7] . [sent-94, score-0.264]

36 If an observation is generated at the peak of the first Gaussian, then the derivative relative to the mean of that Gaussian is zero because S [t ,jk] is zero (see Equation 4). [sent-97, score-0.21]

37 However, the derivative relative to the mean of the distant second Gaussian is also zero because of a zero component posterior f jdt). [sent-98, score-0.264]

38 A likelihood-ratio score-space defined on these two Gaussians does not suffer wraparound since the component posteriors for each Gaussian are forced to unity. [sent-103, score-0.218]

39 For example, the zeroth-order derivative for the likelihood score-space is expected to be less useful than its counter-part in the likelihood-ratio score-space because of its greater sensitivity to acoustic conditions. [sent-107, score-0.179]

40 Consider the simple case of true class-conditional generative models P1(OIOd and P2(OI02) with respective estimates of the same functional form P1 (0 10d and P2(010 2 ) . [sent-109, score-0.321]

41 ]T Inpi (OIOi ) 3) (11) The output from the operator in square brackets is an infinite number of derivatives arranged as a column vector. [sent-113, score-0.162]

42 The expressions for the two true models can be incorporated into an optimal minimum Bayes error decision A rule as follows , where 0 priors, AT AT [0 1 , 02 ]T , W = [w i, WJjT, and b encodes the class +b wi[l, V'~1' vec(V' 91V'~1) T . [sent-115, score-0.222]

43 ]T I n P1(OIOd + b P2(OI02) w Tiplr(o) + b a Inp1(OIOd -lnp2(OI02) A a (12) iplr(o) is a score in the likelihood-ratio score-space formed by an infinite number of derivatives with respect to the parameter estimates O. [sent-124, score-0.241]

44 However, most HMMs used in speech recognition are 1st-order Markov processes but speech is a high-order or infinite-order Markov process. [sent-128, score-0.329]

45 Therefore, a linear decision boundary in the likelihood-ratio score-space defined on 1st-order Markov model estimates is unlikely to be sufficient for recovering the optimal decision rule due to model incorrectness. [sent-129, score-0.166]

46 However, powerful non-linear classifiers may be trained in such a likelihood-ratio score-space to try to compensate for model incorrectness and approximate the optimal decision rule. [sent-130, score-0.27]

47 However, an example of a 2nd-order derivative is V' J-L jk (V'~;k Inp(OIO)) , T V' J-L;k (V'~;k Inp(OIO)) ~ - L 'Yjk(t)"2';;k1 (13) t= l For simplicity the component posterior 'Yj k (t) is assumed independent of J-L j k. [sent-134, score-0.32]

48 Failure to perform such score-space normalisation for a linear kernel in score-space results in a kernel similar to the Plain kernel [5]. [sent-140, score-0.496]

49 Simple scaling has been found to be a reasonable approximation to full whitening and avoids inverting large matrices in [2] (though for classification of single observations rather than sequences, on a different database). [sent-142, score-0.201]

50 This is only an acceptable normalisation for a likelihood score-space under conditions that give a zero expectation in score-space. [sent-144, score-0.403]

51 240 utterances per letter from isolet{ 1,2,3,4} were used for training and 60 utterances per letter from isolet5 for testing. [sent-149, score-0.472]

52 The baseline HMM system was well-trained to maximise likelihood. [sent-152, score-0.23]

53 Each letter was modelled by a 10-emitting state left-to-right continuous density HMM with no skips, and silence by a single emitting-state HMM with no skips. [sent-153, score-0.283]

54 Each state output distribution had the same number of Gaussian components with diagonal covariance matrices. [sent-154, score-0.147]

55 31t is useful to note that a linear decision boundary, with zero bias, constructed in a single-dimensional likelihood-ratio score-space formed by the zeroth-order derivative operator would, under equal class priors, give the standard minimum Bayes error classifier. [sent-156, score-0.353]

56 The baseline HMMs were used as generative models for SVM kernels. [sent-157, score-0.415]

57 02 [9] was used to train 1vI SVM classifiers on each possible class pairing. [sent-159, score-0.232]

58 The sequence length normalisation in Equation 10, and simple scaling for score-space normalisation, were used during training and testing. [sent-160, score-0.37]

59 Linear kernels were used in the normalised score-space, since they gave better performance than GRBFs of variable width and polynomial kernels of degree 2 (including homogeneous, inhomogeneous, and inhomogeneous with zero-mean score-space). [sent-161, score-0.304]

60 The abbreviations rn, v, wand t refer to the score-subspaces \7 J-L jk Inpi( OIOi), \7 veC (I;jk) Inpi(OIOi), \7Wjk Inpi(OIOi) and \7 ajj Inpi(OIOi) respectively. [sent-165, score-0.232]

61 1 refers to the log likelihood Inpi(OIOi) and r to the log likelihood-ratio In[p2(OI02) /Pl( OIOd]. [sent-166, score-0.208]

62 The binary SVM classification results (and, as a baseline, the binary HMM results) were combined to obtain a single classification for each utterance. [sent-167, score-0.182]

63 This was done using a simple majority voting scheme among the full set of 1v1 binary classifiers (for tied letters, the relevant 1v1 classifiers were inspected and then, if necessary, random selection performed [2]). [sent-168, score-0.615]

64 Table 1: Error-rates for HMM baselines and SVM score-spaces (E-set) Num compo per class per state 1 2 4 6 HMM min. [sent-169, score-0.283]

65 6 Table 1 compares the baseline HMM and SVM classifiers as the complexity of the generative models was varied. [sent-197, score-0.608]

66 The majority voting scheme gave the same performance as the minimum Bayes error scheme, indicating that majority voting was an acceptable multi-class scheme for the E-set experiments. [sent-200, score-0.445]

67 For the SVMs, each likelihood-ratio score-space was defined using its competing class-conditional generative models and projected into a rnr score-space. [sent-201, score-0.403]

68 Each likelihood (I-class) score-space was defined using only the generative model for the first of its two classes, and projected into a rnl score-space. [sent-202, score-0.475]

69 Each likelihood (2-class) score-space was defined using a generative model for both of its classes, and projected into a rnl score-space (the original Fisher score, which is a projection into its rn score-subspace, was also tested but was found to yield slightly higher error rates). [sent-203, score-0.519]

70 In both cases, the optimum number of components in the generative models was 2 per state, possibly reflecting the gender division within each class. [sent-207, score-0.546]

71 However, there was an excep- tion for generative models with 1 component per class per state (in total the models had 2 components per state since they modelled both classes). [sent-209, score-1.014]

72 The 2 components per state did not generally reflect the gender division in the 2-class data, as first supposed, but the class division. [sent-210, score-0.362]

73 A possible explanation is that each Gaussian component modelled a class with bi-modal distribution caused by gender differences. [sent-211, score-0.241]

74 This task was too small to fully assess possible decorrelation in error structure between HMM and SVM classifiers [6] . [sent-214, score-0.237]

75 Without scaling for score-space normalisation, the error-rate for the likelihood-ratio score-space defined on models with 2 components per state increased from 5. [sent-215, score-0.359]

76 Some likelihood-ratio mr score-spaces were then augmented with 2nd-order derivatives ~ J-t jk (~~jk lnp( 018)) . [sent-218, score-0.307]

77 The disappointing performance was probably due to the simplicity of the task, the independence assumption between component posteriors and component means, and the effect of noise with so few training scores in such large score-spaces. [sent-220, score-0.232]

78 It is known that some dimensions of feature-space are noisy and degrade classification performance. [sent-221, score-0.185]

79 For this reason, experiments were performed which selected subsets of the likelihood-ratio score-space and then built SVM classifiers in those score-subspaces. [sent-222, score-0.242]

80 Again, the generative models were class-conditional HMMs with 2 components per state. [sent-225, score-0.443]

81 The log likelihood-ratio was shown to be a powerful discriminating feature 4 • Increasing the number of dimensions in score-space allowed more discriminative classifiers. [sent-226, score-0.248]

82 There was more discrimination, or less noise, in the derivatives of the component means than the component variances. [sent-227, score-0.23]

83 As expected in a dynamic task, the derivatives of the transitions were also useful since they contained some duration information. [sent-228, score-0.154]

84 Table 2: Error rates for subspaces of the likelihood-ratio score-space (E-set) score-space error rate, % r v m mv mvt wmvtr 8. [sent-229, score-0.199]

85 1 score-space dimensionality 1 1560 1560 3120 3140 3161 Next, subsets of the mr and wmvtr score-spaces were selected according to dimensions with highest Fisher-ratios [7] . [sent-235, score-0.263]

86 The lowest error rates for the mr and wmvtr score-spaces were respectively 3. [sent-236, score-0.262]

87 7% confidence levels relative to the best HMM system with 4 components per state). [sent-240, score-0.18]

88 Generally, adding the most discriminative dimensions lowered error-rate until less discriminative dimensions were added. [sent-241, score-0.364]

89 For most binary classifiers, the most discriminative dimension was the log likelihoodratio. [sent-242, score-0.154]

90 As expected for the E-set, the most discriminative dimensions were dependent on initial HMM states. [sent-243, score-0.182]

91 The HMM and SVM classifiers were run on the full alphabet. [sent-249, score-0.227]

92 The best HMM classifier, with 4 components per state, gave 3. [sent-250, score-0.164]

93 However, generative models with 2 components per state and a wmvtr score-space pruned to 500 dimensions by Fisher-ratios, gave a lower error rate of 2. [sent-253, score-0.827]

94 Preliminary experiments evaluating sequence length normalisation on the full alphabet and E-set are detailed in [7]. [sent-256, score-0.441]

95 4 Conclusions In this work, SVMs have been successfully applied to the classification of speech data. [sent-257, score-0.232]

96 The paper has concentrated on the nature of the score-space when handling variable length speech sequences. [sent-258, score-0.22]

97 The standard likelihood score-space of the Fisher kernel has been extended to the likelihood-ratio score-space, and normalisation schemes introduced. [sent-259, score-0.41]

98 The new score-space avoids some of the limitations of the Fisher score-space, and incorporates the class-conditional generative models directly into the SVM classifier. [sent-260, score-0.402]

99 The different score-spaces have been compared on a speakerindependent isolated letter task. [sent-261, score-0.154]

100 The likelihood-ratio score-space out-performed the likelihood score-spaces and HMMs trained to maximise likelihood. [sent-262, score-0.247]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('inpi', 0.264), ('generative', 0.264), ('normalisation', 0.253), ('hmm', 0.24), ('oioi', 0.238), ('classifiers', 0.193), ('svm', 0.185), ('oi', 0.157), ('hmms', 0.154), ('ot', 0.147), ('speech', 0.141), ('fisher', 0.14), ('maximise', 0.136), ('vec', 0.132), ('jk', 0.126), ('derivatives', 0.118), ('oio', 0.115), ('svms', 0.112), ('letter', 0.108), ('ajj', 0.106), ('oiod', 0.106), ('wmvtr', 0.106), ('normalised', 0.105), ('derivative', 0.103), ('state', 0.098), ('dimensions', 0.094), ('baseline', 0.094), ('classification', 0.091), ('discriminative', 0.088), ('defined', 0.082), ('kernel', 0.081), ('posteriors', 0.08), ('length', 0.079), ('modelled', 0.077), ('score', 0.077), ('likelihood', 0.076), ('per', 0.073), ('gender', 0.069), ('log', 0.066), ('voting', 0.064), ('mr', 0.063), ('smith', 0.062), ('inp', 0.058), ('confidence', 0.058), ('models', 0.057), ('component', 0.056), ('utterance', 0.055), ('utterances', 0.055), ('grbfs', 0.053), ('inhomogeneous', 0.053), ('iplr', 0.053), ('isolet', 0.053), ('lik', 0.053), ('rnl', 0.053), ('skips', 0.053), ('kernels', 0.052), ('majority', 0.05), ('classifier', 0.05), ('built', 0.049), ('components', 0.049), ('rates', 0.049), ('recognition', 0.047), ('isolated', 0.046), ('examines', 0.046), ('mfccs', 0.046), ('wjk', 0.046), ('dept', 0.046), ('formed', 0.046), ('scheme', 0.046), ('classes', 0.045), ('operator', 0.044), ('error', 0.044), ('decision', 0.042), ('gave', 0.042), ('acceleration', 0.042), ('msec', 0.042), ('mapping', 0.042), ('avoids', 0.041), ('incorporated', 0.04), ('limitations', 0.04), ('scores', 0.04), ('class', 0.039), ('acceptable', 0.039), ('gales', 0.039), ('sequence', 0.038), ('letters', 0.038), ('alphabet', 0.037), ('scholkopf', 0.037), ('observation', 0.037), ('duration', 0.036), ('observations', 0.035), ('trained', 0.035), ('sequences', 0.035), ('spoken', 0.035), ('poorly', 0.035), ('tied', 0.035), ('posterior', 0.035), ('zero', 0.035), ('division', 0.034), ('full', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 172 nips-2001-Speech Recognition using SVMs

Author: N. Smith, Mark Gales

Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1

2 0.2472095 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

Author: Andrew D. Brown, Geoffrey E. Hinton

Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1

3 0.24527632 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

4 0.18833555 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models

Author: John R. Hershey, Michael Casey

Abstract: It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audio-visual information for the task of speech enhancement. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information. 1

5 0.18286876 15 nips-2001-A New Discriminative Kernel From Probabilistic Models

Author: Koji Tsuda, Motoaki Kawanabe, Gunnar Rätsch, Sören Sonnenburg, Klaus-Robert Müller

Abstract: Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called

6 0.14153001 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems

7 0.13106875 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

8 0.13012515 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition

9 0.12574734 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

10 0.12252197 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

11 0.12038255 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

12 0.11952552 58 nips-2001-Covariance Kernels from Bayesian Generative Models

13 0.11511179 183 nips-2001-The Infinite Hidden Markov Model

14 0.11230033 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

15 0.10954519 115 nips-2001-Linear-time inference in Hierarchical HMMs

16 0.10418443 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

17 0.10325201 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

18 0.09966737 46 nips-2001-Categorization by Learning and Combining Object Parts

19 0.092655629 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions

20 0.087601684 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.275), (1, 0.107), (2, -0.043), (3, -0.063), (4, -0.235), (5, 0.205), (6, 0.227), (7, -0.117), (8, -0.097), (9, 0.023), (10, 0.023), (11, 0.066), (12, 0.015), (13, 0.124), (14, 0.122), (15, -0.025), (16, -0.017), (17, 0.018), (18, 0.035), (19, -0.012), (20, 0.085), (21, 0.124), (22, -0.042), (23, -0.084), (24, 0.007), (25, 0.03), (26, -0.118), (27, 0.039), (28, -0.133), (29, 0.033), (30, -0.079), (31, -0.024), (32, 0.095), (33, -0.106), (34, 0.095), (35, -0.038), (36, 0.111), (37, -0.013), (38, -0.04), (39, 0.0), (40, -0.021), (41, 0.091), (42, -0.012), (43, 0.03), (44, -0.048), (45, 0.026), (46, 0.013), (47, -0.019), (48, 0.004), (49, 0.0)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9641645 172 nips-2001-Speech Recognition using SVMs

Author: N. Smith, Mark Gales

Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1

2 0.78370631 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

Author: Andrew D. Brown, Geoffrey E. Hinton

Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1

3 0.62732494 15 nips-2001-A New Discriminative Kernel From Probabilistic Models

Author: Koji Tsuda, Motoaki Kawanabe, Gunnar Rätsch, Sören Sonnenburg, Klaus-Robert Müller

Abstract: Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called

4 0.62705743 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

5 0.5538941 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

Author: Jens Kohlmorgen, Steven Lemm

Abstract: We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, our method processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changing number of states and an on-line variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incoming data in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream. The usefulness of the algorithm is demonstrated by an application to a switching dynamical system. 1

6 0.55073363 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

7 0.54672033 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models

8 0.48845801 115 nips-2001-Linear-time inference in Hierarchical HMMs

9 0.4831709 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

10 0.48312613 183 nips-2001-The Infinite Hidden Markov Model

11 0.46031535 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

12 0.45978901 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

13 0.43560299 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems

14 0.41820723 3 nips-2001-ACh, Uncertainty, and Cortical Inference

15 0.41518 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition

16 0.40570316 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

17 0.39826584 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

18 0.39182979 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions

19 0.38165236 58 nips-2001-Covariance Kernels from Bayesian Generative Models

20 0.37447691 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.031), (17, 0.032), (19, 0.033), (20, 0.285), (27, 0.146), (30, 0.076), (38, 0.015), (59, 0.039), (72, 0.075), (79, 0.087), (83, 0.014), (91, 0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83661985 172 nips-2001-Speech Recognition using SVMs

Author: N. Smith, Mark Gales

Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1

2 0.8193354 116 nips-2001-Linking Motor Learning to Function Approximation: Learning in an Unlearnable Force Field

Author: O. Donchin, Reza Shadmehr

Abstract: Reaching movements require the brain to generate motor commands that rely on an internal model of the task’s dynamics. Here we consider the errors that subjects make early in their reaching trajectories to various targets as they learn an internal model. Using a framework from function approximation, we argue that the sequence of errors should reflect the process of gradient descent. If so, then the sequence of errors should obey hidden state transitions of a simple dynamical system. Fitting the system to human data, we find a surprisingly good fit accounting for 98% of the variance. This allows us to draw tentative conclusions about the basis elements used by the brain in transforming sensory space to motor commands. To test the robustness of the results, we estimate the shape of the basis elements under two conditions: in a traditional learning paradigm with a consistent force field, and in a random sequence of force fields where learning is not possible. Remarkably, we find that the basis remains invariant. 1

3 0.71977717 44 nips-2001-Blind Source Separation via Multinode Sparse Representation

Author: Michael Zibulevsky, Pavel Kisilev, Yehoshua Y. Zeevi, Barak A. Pearlmutter

Abstract: We consider a problem of blind source separation from a set of instantaneous linear mixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representation according to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified on noise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation quality over previously reported results. 1

4 0.7080245 1 nips-2001-(Not) Bounding the True Error

Author: John Langford, Rich Caruana

Abstract: We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs a distribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound. In this paper we demonstrate the method on artificial neural networks with results of a order of magnitude improvement vs. the best deterministic neural net bounds. £ ¡ ¤¢

5 0.62923855 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

Author: Roland Vollgraf, Klaus Obermayer

Abstract: We present a new method for the blind separation of sources, which do not fulfill the independence assumption. In contrast to standard methods we consider groups of neighboring samples (

6 0.60341734 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

7 0.60003525 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

8 0.59598857 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

9 0.59198481 13 nips-2001-A Natural Policy Gradient

10 0.59074676 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

11 0.58995968 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex

12 0.58986676 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

13 0.58940321 71 nips-2001-Estimating the Reliability of ICA Projections

14 0.58554184 166 nips-2001-Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity

15 0.58420503 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

16 0.58348513 188 nips-2001-The Unified Propagation and Scaling Algorithm

17 0.58218002 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

18 0.58201522 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

19 0.58136296 190 nips-2001-Thin Junction Trees

20 0.58037126 8 nips-2001-A General Greedy Approximation Algorithm with Applications