nips nips2001 nips2001-20 knowledge-graph by maker-knowledge-mining

20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

Source: pdf

Author: William M. Campbell

Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classiﬁer model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that signiﬁcantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. [sent-5, score-0.225]

2 The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. [sent-6, score-0.399]

3 The use of an explicit expansion kernel reduces classiﬁer model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. [sent-7, score-0.212]

4 The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. [sent-8, score-0.125]

5 Training using standard support vector machine methodology gives accuracy that signiﬁcantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task. [sent-9, score-0.836]

6 1 Introduction Comparison of sequences of observations is a natural and necessary operation in speech applications. [sent-10, score-0.235]

7 Several recent approaches using support vector machines (SVM’s) have been proposed in the literature. [sent-11, score-0.108]

8 First, large training sets result in long training times for support vector methods. [sent-14, score-0.229]

9 Second, the emission probabilities must be approximated [3], since the output of the support vector machine is not a probability. [sent-15, score-0.141]

10 A more recent method for comparing sequences is based on the Fisher kernel proposed by Jaakkola and Haussler [4]. [sent-16, score-0.201]

11 This approach has been explored for speech recognition in [5]. [sent-17, score-0.235]

12 The application to speaker recognition is detailed in [6]. [sent-18, score-0.641]

13 We propose an alternative kernel based upon polynomial classiﬁers and the associated mean-squared error (MSE) training criterion [7]. [sent-19, score-0.407]

14 The advantage of this kernel is that it preserves the structure of the classiﬁer in [7] which is both computationally and memory efﬁcient. [sent-20, score-0.159]

15 We consider the application of text-independent speaker recognition; i. [sent-21, score-0.547]

16 , determining or verifying the identity of an individual through voice characteristics. [sent-23, score-0.117]

17 Text-independent recognition implies that knowledge of the text of the speech data is not used. [sent-24, score-0.235]

18 Traditional methods for text-independent speaker recognition are vector quantization [8], Gaussian mixture models [9], and artiﬁcial neural networks [8]. [sent-25, score-0.646]

19 A state-of-the-art approach based on polynomial classiﬁers was presented in [7]. [sent-26, score-0.124]

20 The polynomial approach has several ad- vantages over traditional methods–1) it is extremely computationally-efﬁcient for identiﬁcation, 2) the classiﬁer is discriminative which eliminates the need for a background or cohort model [10], and 3) the method generates small classiﬁer models. [sent-27, score-0.153]

21 In Section 2, we describe polynomial classiﬁers and the associated scoring process. [sent-28, score-0.297]

22 Section 5 compares the new kernel approach to the standard mean-squared error training approach. [sent-31, score-0.254]

23 2 Polynomial classiﬁers for sequence data We start by considering the problem of speaker veriﬁcation–a two-class problem. [sent-32, score-0.587]

24 In this case, the goal is to determine the correctness of an identity claim (e. [sent-33, score-0.117]

25 , a user id was entered in the system) from a voice input. [sent-35, score-0.132]

26 If is the class, then the decision to be made is if the claim is valid, , or if an impostor is trying to break into the system, . [sent-36, score-0.204]

27 ¥ © ¡ ¢ § £ ¡ ¨¦¥¤¢ For the veriﬁcation application, a decision is made from a sequence of observations extracted from the speech input. [sent-38, score-0.261]

28 We decide based on the output of a discriminant function using a polynomial classiﬁer. [sent-39, score-0.209]

29 A polynomial classiﬁer of the form where is the vector of classiﬁer parameters (model) and is an expansion of the input and space into the vector of monomials of degree or less is used. [sent-40, score-0.239]

30 5 If the polynomial classiﬁer is trained with a mean-squared error training criterion and target values of for and for , then will approximate the a posteriori probability [11]. [sent-44, score-0.359]

31 `V & )0 S"¥Ua Y Y § £ ¡ & $S"¥XW V § £ ¡ S"¥@ H For the purposes of classiﬁcation, we can discard sides to get the discriminant function (2) where we have used the shorthand to denote the sequence . [sent-53, score-0.125]

32 We use two terms of the Taylor series, , to approximate the discriminant function and also normalize by the number of frames to obtain the ﬁnal discriminant function (4) Note that we have discarded the in this discriminant function since this will not affect the classiﬁcation decision. [sent-54, score-0.2]

33 Substituting in the polynomial function ¡ (5) ¢ £ ¤ ¥ ¤ where we have deﬁned the mapping as ¦ § (6) ¦ ¨ We summarize the scoring method. [sent-58, score-0.323]

34 For a sequence of input vectors and a speaker model, , we construct using (6). [sent-59, score-0.587]

35 Since we are performing veriﬁcation, if is above a threshold then we declare the identity claim valid; otherwise, the claim is rejected as an impostor attempt. [sent-61, score-0.361]

36 More details on this probabilistic scoring method can be found in [13]. [sent-62, score-0.202]

37 ¤ 1 ¤ © © Extending the sequence scoring framework to the case of identiﬁcation (i. [sent-63, score-0.24]

38 , identifying the speaker from a list of speakers by voice) is straightforward. [sent-65, score-0.619]

39 In this case, we construct speaker models for each speaker and then choose the speaker which maximizes (assuming equal prior probability of each speaker). [sent-66, score-1.56]

40 ¤ 5 3d 1 d 1 3 Mean-squared error training We next review how to train the polynomial classiﬁer to approximate the probability ; this process will help us set notation for the following sections. [sent-68, score-0.247]

41 The desired speaker model and resulting problem is (7) 1 T ¡ ) ¥ f© & ) & 0(& ) B 35 1 H ¡ ) S¦¥£ & § 1 2 0 ) Y & 0 h V u © ¡ 1 ) & ( & )'% " # $ ! [sent-72, score-0.52]

42 This criterion can be approximated using the training set as % " ! [sent-74, score-0.106]

43 "u ) d & 35 1 fed r ¢ ) d `& 35 1 edr ¡ 1 © B BH R S E F P Q (8) C C CI G @ H 96 CD A @8 6 B#975 C C C C " 4 $ 3 where Here, the speaker’s training data is , and the anti-speaker data is . [sent-75, score-0.124]

44 (Anti-speakers are designed to have the same statistical characteristics as the impostor set. [sent-76, score-0.125]

45 P T A @ B#8 6 @ HG 6 P The training method can be written in matrix form. [sent-78, score-0.106]

46 First, deﬁne rows are the polynomial expansion of the speaker’s data; i. [sent-79, score-0.175]

47 Deﬁne X Deﬁne a similar matrix for the impostor data, zeros (i. [sent-87, score-0.125]

48 If we deﬁne becomes and solve for 1 U We rearrange (12) to (12) , then (13) (14) U ¨ U Y X `QV 4 The naive a posteriori sequence kernel We can now combine the methods from Sections 2 and 3 to obtain a novel sequence comparison kernel in a straightforward manner. [sent-91, score-0.472]

49 Combine the speaker model from (14) with the scoring equation from (5) to obtain the classiﬁer score 0 5 (3 5 ¡ ) & # (E " # & & ' 1 # ( )5 # %b ¢ ¦ H & ) ¨ ) # "! [sent-92, score-0.693]

50 5 ¤ ¤ where is data using (6)), and Y X V Now population), so that (15) becomes (15) (note that this exactly the same as mapping the training . [sent-94, score-0.103]

51 The scoring method in (16) is the basis of our sequence kernel. [sent-95, score-0.269]

52 Given two sequences of speech feature vectors, and , we compare them by mapping and and then computing (17) We call the naive a posteriori sequence kernel since scoring assumes independence of observations and training approximates the a posteriori probabilities. [sent-96, score-0.86]

53 The value can be interpreted as scoring using a polynomial classiﬁer on the sequence , see (5), with the MSE model trained from feature vectors (or vice-versa because of symmetry). [sent-97, score-0.397]

54 First, scoring complexity can using be reduced dramatically in training by the following trick. [sent-99, score-0.304]

55 , if we transform all the sequence data by before training, the sequence kernel is a simple inner product. [sent-104, score-0.265]

56 For our application in Section 5, this reduces training time from hours per speaker down to seconds on a Sun Ultra , MHz. [sent-105, score-0.654]

57 Second, since the NAPS kernel explicitly performs the expansion to “feature space”, we can simplify the output of the support vector machine. [sent-106, score-0.284]

58 That is, once we train the support vector machine, we can collapse all the support vectors down into a single model , where is the quantity in parenthesis in (19). [sent-109, score-0.118]

59 Third, although the NAPS kernel is reminiscent of the Mahalanobis distance, it is distinct. [sent-110, score-0.131]

60 No assumption of equal covariance matrices for different classes (speakers) is made for the new kernel–the kernel covariance matrix is a mixture of the individual class covariances. [sent-111, score-0.131]

61 Also, the kernel is not a distance measure–no subtraction of means occurs as in the Mahalanobis distance. [sent-112, score-0.131]

62 1 Setup The NAPS kernel was tested on the standard speaker recognition database YOHO [14] collected from 138 speakers. [sent-114, score-0.745]

63 ” Enrollment and veriﬁcation session were recorded at distinct times. [sent-118, score-0.092]

64 (Enrollment is the process of collecting data for training and generating a speaker model. [sent-119, score-0.597]

65 , the user makes an identity claim and then this hypothesis is veriﬁed. [sent-122, score-0.144]

66 ) For each speaker, enrollment consisted of four sessions each containing twenty-four utterances. [sent-123, score-0.254]

67 Veriﬁcation consisted of ten separate sessions with four utterances per session (again per speaker). [sent-124, score-0.24]

68 Thus, there are 40 tests of the speaker’s identity and 40*137=5480 possible impostor attempts on a speaker. [sent-125, score-0.163]

69 For clarity, we emphasize that enrollment and veriﬁcation session data is completely separate. [sent-126, score-0.267]

70 To extract features for each of the utterances, we used standard speech processing. [sent-127, score-0.141]

71 Each utterance was broken up into frames of ms each with a frame rate of frames/sec. [sent-128, score-0.086]

72 T vT H H T8 § ¨H § ¨H T qT § ¥£ ¦¤¢ ST For veriﬁcation, we measure performance in terms of the pooled and average equal error rates (EER). [sent-134, score-0.235]

73 The individual EER is the threshold at which the false accept rate (FAR) equals the false reject rate (FRR). [sent-136, score-0.138]

74 The pooled EER is found by setting a constant threshold across the entire population. [sent-137, score-0.22]

75 When the FAR equals the FRR for the entire population this is termed the pooled EER. [sent-138, score-0.215]

76 ¢ 8 ¢ ©8 To eliminate bias in veriﬁcation, we trained the ﬁrst speakers against the ﬁrst and the second against the second (as in [7]). [sent-140, score-0.158]

77 We then performed veriﬁcation using the second as impostors to the ﬁrst speakers models and vice versa. [sent-141, score-0.149]

78 2 Experiments We trained support vector machines for each speaker using the software tool SVMTorch [15] and the NAPS kernel (17). [sent-145, score-0.792]

79 The cepstral features were mapped to a dimension vector using a rd degree polynomial classiﬁer. [sent-146, score-0.195]

80 We cross-validated using the ﬁrst enrollment sessions as training and the th enrollment session as a test to determine the best tradeoff between margin and error; the best performing value of was used with the ﬁnal SVMTorch training. [sent-150, score-0.598]

81 Using the identical set of features and the same methodology, classiﬁer models were also trained using the mean-squared error criterion using the method in [7]. [sent-151, score-0.137]

82 For ﬁnal testing, all enrollment session were used for training, and all veriﬁcation sessions were used for testing. [sent-152, score-0.346]

83 The new kernel method reduces error rates considerably–the average EER is reduced by , the pooled EER is reduced by , and the identiﬁcation error rate is reduced by . [sent-154, score-0.576]

84 The average number of support vectors was which resulted in a model size of about bytes (in single precision ﬂoating point); using the model size reduction method in Section 4 resulted in a model size of bytes–over a hundred times reduction in size. [sent-155, score-0.273]

85 72% We also plotted scores for all speakers versus a threshold, see Figure 1. [sent-162, score-0.124]

86 One can easily see the reduction in pooled EER from the graph. [sent-164, score-0.189]

87 Note also the dramatic shifting of the FRR curve to the right for the SVM training, resulting in substantially better error rates , the MSE training method gives than the MSE training. [sent-165, score-0.191]

88 For instance, when FAR is ; whereas, the SVM training method gives an FRR of –a reduction by an FRR of a factor of in error. [sent-166, score-0.145]

89 This data-dependent kernel was motivated by using a probabilistic scoring method and mean-squared error training. [sent-168, score-0.379]

90 Experiments showed that incorporating this kernel in an SVM training architecture yielded performance superior to that of the MSE training criterion. [sent-169, score-0.285]

91 6 £ 8 The new kernel method is also applicable to more general situations. [sent-171, score-0.16]

92 Potential applications include–using the approach with radial basis functions, application to automatic speech recognition, and extending to an SVM/HMM architecture. [sent-172, score-0.192]

93 [2] Aravind Ganapathiraju and Joseph Picone, “Hybrid SVM/HMM architectures for speech recognition,” in Speech Transcription Workshop, 2000. [sent-176, score-0.141]

94 [5] Nathan Smith, Mark Gales, and Mahesan Niranjan, “Data-dependent kernels in SVM classiﬁcation of speech patterns,” Tech. [sent-194, score-0.141]

95 Gopinath, “A hybrid GMM/SVM approach to speaker ri a recognition,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2001. [sent-199, score-0.546]

96 Assaleh, “Polynomial classiﬁer techniques for speaker veriﬁcation,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1999, pp. [sent-202, score-0.52]

97 Assaleh, “Speaker recognition using neural networks and conventional classiﬁers,” IEEE Trans. [sent-207, score-0.094]

98 Reynolds, “Automatic speaker recognition using Gaussian mixture speaker models,” The Lincoln Laboratory Journal, vol. [sent-214, score-1.134]

99 Bridle, “A speaker veriﬁcation system using alpha-nets,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 1991, pp. [sent-221, score-0.52]

100 Broun, “A computationally scalable speaker recognition system,” in Proceedings of EUSIPCO, 2000, pp. [sent-229, score-0.614]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('speaker', 0.52), ('veri', 0.257), ('eer', 0.249), ('mse', 0.208), ('classi', 0.176), ('enrollment', 0.175), ('frr', 0.175), ('naps', 0.175), ('scoring', 0.173), ('cation', 0.155), ('pooled', 0.15), ('speech', 0.141), ('kernel', 0.131), ('impostor', 0.125), ('polynomial', 0.124), ('er', 0.112), ('qv', 0.108), ('gv', 0.099), ('speakers', 0.099), ('svm', 0.096), ('recognition', 0.094), ('identi', 0.092), ('campbell', 0.092), ('session', 0.092), ('voice', 0.079), ('sessions', 0.079), ('claim', 0.079), ('training', 0.077), ('ers', 0.072), ('utterances', 0.069), ('sequence', 0.067), ('william', 0.063), ('acoustics', 0.06), ('discriminant', 0.058), ('observations', 0.053), ('expansion', 0.051), ('posteriori', 0.05), ('assaleh', 0.05), ('edr', 0.05), ('impostors', 0.05), ('joseph', 0.05), ('khaled', 0.05), ('mahalanobis', 0.05), ('yoho', 0.05), ('fed', 0.047), ('error', 0.046), ('bytes', 0.043), ('cholesky', 0.043), ('ua', 0.043), ('support', 0.043), ('sequences', 0.041), ('threshold', 0.04), ('resulted', 0.04), ('cepstral', 0.039), ('emission', 0.039), ('rates', 0.039), ('reduction', 0.039), ('identity', 0.038), ('frame', 0.036), ('signal', 0.035), ('population', 0.035), ('svmtorch', 0.035), ('trained', 0.033), ('machines', 0.033), ('vector', 0.032), ('bb', 0.031), ('st', 0.03), ('haussler', 0.03), ('reduces', 0.03), ('entire', 0.03), ('method', 0.029), ('criterion', 0.029), ('taylor', 0.028), ('preserves', 0.028), ('reduced', 0.027), ('output', 0.027), ('proceedings', 0.027), ('dramatically', 0.027), ('user', 0.027), ('coef', 0.027), ('application', 0.027), ('testing', 0.027), ('naive', 0.026), ('frames', 0.026), ('id', 0.026), ('mapping', 0.026), ('far', 0.026), ('hybrid', 0.026), ('eliminate', 0.026), ('false', 0.025), ('scores', 0.025), ('independence', 0.025), ('nal', 0.024), ('ideal', 0.024), ('extending', 0.024), ('jaakkola', 0.024), ('rate', 0.024), ('hg', 0.024), ('methodology', 0.024), ('sch', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

Author: William M. Campbell

2 0.26012498 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classiﬁcation algorithms can be employed without further modiﬁcations. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1

3 0.22320317 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models

Author: John R. Hershey, Michael Casey

Abstract: It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audio-visual information for the task of speech enhancement. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information. 1

4 0.17197716 46 nips-2001-Categorization by Learning and Combining Object Parts

Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio

Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classiﬁers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classiﬁers are then combined in a second stage to yield a hierarchical SVM classiﬁer. Experimental results in face classiﬁcation show considerable robustness against rotations in depth and suggest performance at signiﬁcantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classiﬁcation experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classiﬁer which may be relevant for biological models of visual recognition.

5 0.14021027 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Author: Paul Viola, Michael Jones

Abstract: This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval). In such domains a cascade of simple classiﬁers each trained to achieve high detection rates and modest false positive rates can yield a ﬁnal detector with many desirable features: including high detection rates, very low false positive rates, and fast performance. Achieving extremely high detection rates, rather than low error, is not a task typically addressed by machine learning algorithms. We propose a new variant of AdaBoost as a mechanism for training the simple classiﬁers used in the cascade. Experimental results in the domain of face detection show the training algorithm yields signiﬁcant improvements in performance over conventional AdaBoost. The ﬁnal face detection system can process 15 frames per second, achieves over 90% detection, and a false positive rate of 1 in a 1,000,000.

6 0.13622423 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

7 0.12467556 60 nips-2001-Discriminative Direction for Kernel Classifiers

8 0.12252197 172 nips-2001-Speech Recognition using SVMs

9 0.115605 105 nips-2001-Kernel Machines and Boolean Functions

10 0.11542501 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

11 0.11313887 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

12 0.11189935 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

13 0.11080127 164 nips-2001-Sampling Techniques for Kernel Methods

14 0.10456865 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets

15 0.10442074 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition

16 0.10107777 139 nips-2001-Online Learning with Kernels

17 0.096967138 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

18 0.096179463 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance

19 0.089900315 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

20 0.088736273 144 nips-2001-Partially labeled classification with Markov random walks

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.248), (1, 0.172), (2, -0.104), (3, 0.132), (4, -0.171), (5, 0.211), (6, 0.101), (7, -0.111), (8, 0.019), (9, 0.04), (10, -0.024), (11, -0.091), (12, -0.093), (13, 0.148), (14, -0.09), (15, 0.025), (16, 0.022), (17, -0.003), (18, -0.101), (19, -0.087), (20, 0.009), (21, -0.078), (22, 0.081), (23, 0.008), (24, -0.028), (25, 0.032), (26, -0.036), (27, -0.029), (28, 0.069), (29, -0.056), (30, -0.007), (31, -0.004), (32, -0.026), (33, 0.013), (34, 0.041), (35, 0.089), (36, -0.022), (37, 0.029), (38, 0.033), (39, -0.036), (40, -0.038), (41, 0.021), (42, -0.0), (43, -0.04), (44, -0.013), (45, -0.047), (46, -0.021), (47, -0.005), (48, 0.025), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96020353 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

Author: William M. Campbell

2 0.85171586 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

3 0.67534292 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets

Author: S. Parveen, P. Green

Abstract: In the ‘missing data’ approach to improving the robustness of automatic speech recognition to added noise, an initial process identiﬁes spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be ‘missing’. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classiﬁcation with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.

4 0.662467 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

Author: Ji Zhu, Trevor Hastie

Abstract: The support vector machine (SVM) is known for its good performance in binary classiﬁcation, but its extension to multi-class classiﬁcation is still an on-going research issue. In this paper, we propose a new approach for classiﬁcation, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classiﬁcation, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

5 0.62091607 46 nips-2001-Categorization by Learning and Combining Object Parts

Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio

6 0.59484369 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

7 0.58986998 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models

8 0.57464474 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

9 0.57065636 60 nips-2001-Discriminative Direction for Kernel Classifiers

10 0.55317092 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

11 0.54568434 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance

12 0.53433603 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

13 0.52314812 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition

14 0.51174664 172 nips-2001-Speech Recognition using SVMs

15 0.49462089 99 nips-2001-Intransitive Likelihood-Ratio Classifiers

16 0.48246506 105 nips-2001-Kernel Machines and Boolean Functions

17 0.4631404 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method

18 0.41674918 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

19 0.40462163 164 nips-2001-Sampling Techniques for Kernel Methods

20 0.40436307 15 nips-2001-A New Discriminative Kernel From Probabilistic Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.059), (17, 0.012), (19, 0.038), (20, 0.02), (27, 0.098), (30, 0.139), (38, 0.015), (57, 0.29), (59, 0.02), (66, 0.011), (72, 0.081), (79, 0.03), (83, 0.017), (91, 0.098)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81660247 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition

Author: William M. Campbell

2 0.60360312 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

3 0.59820896 149 nips-2001-Probabilistic Abstraction Hierarchies

Author: Eran Segal, Daphne Koller, Dirk Ormoneit

Abstract: Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in “nearby” classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data. We present experimental results on synthetic data, and on real data in the domains of gene expression and text.

4 0.59415078 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

Author: Dieter Fox

Abstract: Over the last years, particle ﬁlters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efﬁciency of particle ﬁlters by adapting the size of sample sets on-the-ﬂy. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle ﬁlter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle ﬁlters with ﬁxed sample set sizes and over a previously introduced adaptation technique.

5 0.58317077 60 nips-2001-Discriminative Direction for Kernel Classifiers

Author: Polina Golland

Abstract: In many scientiﬁc and engineering applications, detecting and understanding differences between two groups of examples can be reduced to a classical problem of training a classiﬁer for labeling new examples while making as few mistakes as possible. In the traditional classiﬁcation setting, the resulting classiﬁer is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, such analysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical model in the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we deﬁne a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classiﬁer function. We derive the discriminative direction for kernel-based classiﬁers, demonstrate the technique on several examples and brieﬂy discuss its use in the statistical shape analysis, an application that originally motivated this work.

6 0.58004498 163 nips-2001-Risk Sensitive Particle Filters

7 0.57949436 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

8 0.57852048 46 nips-2001-Categorization by Learning and Combining Object Parts

9 0.57088494 1 nips-2001-(Not) Bounding the True Error

10 0.56918228 22 nips-2001-A kernel method for multi-labelled classification

11 0.56914103 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

12 0.5688386 56 nips-2001-Convolution Kernels for Natural Language

13 0.56833315 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

14 0.56729406 185 nips-2001-The Method of Quantum Clustering

15 0.56698298 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

16 0.56679577 116 nips-2001-Linking Motor Learning to Function Approximation: Learning in an Unlearnable Force Field

17 0.56490129 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

18 0.56248432 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

19 0.56221408 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

20 0.56187958 27 nips-2001-Activity Driven Adaptive Stochastic Resonance