nips nips2001 nips2001-20 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. [sent-5, score-0.225]
2 The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. [sent-6, score-0.399]
3 The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. [sent-7, score-0.212]
4 The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. [sent-8, score-0.125]
5 Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task. [sent-9, score-0.836]
6 1 Introduction Comparison of sequences of observations is a natural and necessary operation in speech applications. [sent-10, score-0.235]
7 Several recent approaches using support vector machines (SVM’s) have been proposed in the literature. [sent-11, score-0.108]
8 First, large training sets result in long training times for support vector methods. [sent-14, score-0.229]
9 Second, the emission probabilities must be approximated [3], since the output of the support vector machine is not a probability. [sent-15, score-0.141]
10 A more recent method for comparing sequences is based on the Fisher kernel proposed by Jaakkola and Haussler [4]. [sent-16, score-0.201]
11 This approach has been explored for speech recognition in [5]. [sent-17, score-0.235]
12 The application to speaker recognition is detailed in [6]. [sent-18, score-0.641]
13 We propose an alternative kernel based upon polynomial classifiers and the associated mean-squared error (MSE) training criterion [7]. [sent-19, score-0.407]
14 The advantage of this kernel is that it preserves the structure of the classifier in [7] which is both computationally and memory efficient. [sent-20, score-0.159]
15 We consider the application of text-independent speaker recognition; i. [sent-21, score-0.547]
16 , determining or verifying the identity of an individual through voice characteristics. [sent-23, score-0.117]
17 Text-independent recognition implies that knowledge of the text of the speech data is not used. [sent-24, score-0.235]
18 Traditional methods for text-independent speaker recognition are vector quantization [8], Gaussian mixture models [9], and artificial neural networks [8]. [sent-25, score-0.646]
19 A state-of-the-art approach based on polynomial classifiers was presented in [7]. [sent-26, score-0.124]
20 The polynomial approach has several ad- vantages over traditional methods–1) it is extremely computationally-efficient for identification, 2) the classifier is discriminative which eliminates the need for a background or cohort model [10], and 3) the method generates small classifier models. [sent-27, score-0.153]
21 In Section 2, we describe polynomial classifiers and the associated scoring process. [sent-28, score-0.297]
22 Section 5 compares the new kernel approach to the standard mean-squared error training approach. [sent-31, score-0.254]
23 2 Polynomial classifiers for sequence data We start by considering the problem of speaker verification–a two-class problem. [sent-32, score-0.587]
24 In this case, the goal is to determine the correctness of an identity claim (e. [sent-33, score-0.117]
25 , a user id was entered in the system) from a voice input. [sent-35, score-0.132]
26 If is the class, then the decision to be made is if the claim is valid, , or if an impostor is trying to break into the system, . [sent-36, score-0.204]
27 ¥ © ¡ ¢ § £ ¡ ¨¦¥¤¢ For the verification application, a decision is made from a sequence of observations extracted from the speech input. [sent-38, score-0.261]
28 We decide based on the output of a discriminant function using a polynomial classifier. [sent-39, score-0.209]
29 A polynomial classifier of the form where is the vector of classifier parameters (model) and is an expansion of the input and space into the vector of monomials of degree or less is used. [sent-40, score-0.239]
30 5 If the polynomial classifier is trained with a mean-squared error training criterion and target values of for and for , then will approximate the a posteriori probability [11]. [sent-44, score-0.359]
31 `V & )0 S"¥Ua Y Y § £ ¡ & $S"¥XW V § £ ¡ S"¥@ H For the purposes of classification, we can discard sides to get the discriminant function (2) where we have used the shorthand to denote the sequence . [sent-53, score-0.125]
32 We use two terms of the Taylor series, , to approximate the discriminant function and also normalize by the number of frames to obtain the final discriminant function (4) Note that we have discarded the in this discriminant function since this will not affect the classification decision. [sent-54, score-0.2]
33 Substituting in the polynomial function ¡ (5) ¢ £ ¤ ¥ ¤ where we have defined the mapping as ¦ § (6) ¦ ¨ We summarize the scoring method. [sent-58, score-0.323]
34 For a sequence of input vectors and a speaker model, , we construct using (6). [sent-59, score-0.587]
35 Since we are performing verification, if is above a threshold then we declare the identity claim valid; otherwise, the claim is rejected as an impostor attempt. [sent-61, score-0.361]
36 More details on this probabilistic scoring method can be found in [13]. [sent-62, score-0.202]
37 ¤ 1 ¤ © © Extending the sequence scoring framework to the case of identification (i. [sent-63, score-0.24]
38 , identifying the speaker from a list of speakers by voice) is straightforward. [sent-65, score-0.619]
39 In this case, we construct speaker models for each speaker and then choose the speaker which maximizes (assuming equal prior probability of each speaker). [sent-66, score-1.56]
40 ¤ 5 3d 1 d 1 3 Mean-squared error training We next review how to train the polynomial classifier to approximate the probability ; this process will help us set notation for the following sections. [sent-68, score-0.247]
41 The desired speaker model and resulting problem is (7) 1 T ¡ ) ¥ f© & ) & 0(& ) B 35 1 H ¡ ) S¦¥£ & § 1 2 0 ) Y & 0 h V u © ¡ 1 ) & ( & )'% " # $ ! [sent-72, score-0.52]
42 This criterion can be approximated using the training set as % " ! [sent-74, score-0.106]
43 "u ) d & 35 1 fed r ¢ ) d `& 35 1 edr ¡ 1 © B BH R S E F P Q (8) C C CI G @ H 96 CD A @8 6 B#975 C C C C " 4 $ 3 where Here, the speaker’s training data is , and the anti-speaker data is . [sent-75, score-0.124]
44 (Anti-speakers are designed to have the same statistical characteristics as the impostor set. [sent-76, score-0.125]
45 P T A @ B#8 6 @ HG 6 P The training method can be written in matrix form. [sent-78, score-0.106]
46 First, define rows are the polynomial expansion of the speaker’s data; i. [sent-79, score-0.175]
47 Define X Define a similar matrix for the impostor data, zeros (i. [sent-87, score-0.125]
48 If we define becomes and solve for 1 U We rearrange (12) to (12) , then (13) (14) U ¨ U Y X `QV 4 The naive a posteriori sequence kernel We can now combine the methods from Sections 2 and 3 to obtain a novel sequence comparison kernel in a straightforward manner. [sent-91, score-0.472]
49 Combine the speaker model from (14) with the scoring equation from (5) to obtain the classifier score 0 5 (3 5 ¡ ) & # (E " # & & ' 1 # ( )5 # %b ¢ ¦ H & ) ¨ ) # "! [sent-92, score-0.693]
50 5 ¤ ¤ where is data using (6)), and Y X V Now population), so that (15) becomes (15) (note that this exactly the same as mapping the training . [sent-94, score-0.103]
51 The scoring method in (16) is the basis of our sequence kernel. [sent-95, score-0.269]
52 Given two sequences of speech feature vectors, and , we compare them by mapping and and then computing (17) We call the naive a posteriori sequence kernel since scoring assumes independence of observations and training approximates the a posteriori probabilities. [sent-96, score-0.86]
53 The value can be interpreted as scoring using a polynomial classifier on the sequence , see (5), with the MSE model trained from feature vectors (or vice-versa because of symmetry). [sent-97, score-0.397]
54 First, scoring complexity can using be reduced dramatically in training by the following trick. [sent-99, score-0.304]
55 , if we transform all the sequence data by before training, the sequence kernel is a simple inner product. [sent-104, score-0.265]
56 For our application in Section 5, this reduces training time from hours per speaker down to seconds on a Sun Ultra , MHz. [sent-105, score-0.654]
57 Second, since the NAPS kernel explicitly performs the expansion to “feature space”, we can simplify the output of the support vector machine. [sent-106, score-0.284]
58 That is, once we train the support vector machine, we can collapse all the support vectors down into a single model , where is the quantity in parenthesis in (19). [sent-109, score-0.118]
59 Third, although the NAPS kernel is reminiscent of the Mahalanobis distance, it is distinct. [sent-110, score-0.131]
60 No assumption of equal covariance matrices for different classes (speakers) is made for the new kernel–the kernel covariance matrix is a mixture of the individual class covariances. [sent-111, score-0.131]
61 Also, the kernel is not a distance measure–no subtraction of means occurs as in the Mahalanobis distance. [sent-112, score-0.131]
62 1 Setup The NAPS kernel was tested on the standard speaker recognition database YOHO [14] collected from 138 speakers. [sent-114, score-0.745]
63 ” Enrollment and verification session were recorded at distinct times. [sent-118, score-0.092]
64 (Enrollment is the process of collecting data for training and generating a speaker model. [sent-119, score-0.597]
65 , the user makes an identity claim and then this hypothesis is verified. [sent-122, score-0.144]
66 ) For each speaker, enrollment consisted of four sessions each containing twenty-four utterances. [sent-123, score-0.254]
67 Verification consisted of ten separate sessions with four utterances per session (again per speaker). [sent-124, score-0.24]
68 Thus, there are 40 tests of the speaker’s identity and 40*137=5480 possible impostor attempts on a speaker. [sent-125, score-0.163]
69 For clarity, we emphasize that enrollment and verification session data is completely separate. [sent-126, score-0.267]
70 To extract features for each of the utterances, we used standard speech processing. [sent-127, score-0.141]
71 Each utterance was broken up into frames of ms each with a frame rate of frames/sec. [sent-128, score-0.086]
72 T vT H H T8 § ¨H § ¨H T qT § ¥£ ¦¤¢ ST For verification, we measure performance in terms of the pooled and average equal error rates (EER). [sent-134, score-0.235]
73 The individual EER is the threshold at which the false accept rate (FAR) equals the false reject rate (FRR). [sent-136, score-0.138]
74 The pooled EER is found by setting a constant threshold across the entire population. [sent-137, score-0.22]
75 When the FAR equals the FRR for the entire population this is termed the pooled EER. [sent-138, score-0.215]
76 ¢ 8 ¢ ©8 To eliminate bias in verification, we trained the first speakers against the first and the second against the second (as in [7]). [sent-140, score-0.158]
77 We then performed verification using the second as impostors to the first speakers models and vice versa. [sent-141, score-0.149]
78 2 Experiments We trained support vector machines for each speaker using the software tool SVMTorch [15] and the NAPS kernel (17). [sent-145, score-0.792]
79 The cepstral features were mapped to a dimension vector using a rd degree polynomial classifier. [sent-146, score-0.195]
80 We cross-validated using the first enrollment sessions as training and the th enrollment session as a test to determine the best tradeoff between margin and error; the best performing value of was used with the final SVMTorch training. [sent-150, score-0.598]
81 Using the identical set of features and the same methodology, classifier models were also trained using the mean-squared error criterion using the method in [7]. [sent-151, score-0.137]
82 For final testing, all enrollment session were used for training, and all verification sessions were used for testing. [sent-152, score-0.346]
83 The new kernel method reduces error rates considerably–the average EER is reduced by , the pooled EER is reduced by , and the identification error rate is reduced by . [sent-154, score-0.576]
84 The average number of support vectors was which resulted in a model size of about bytes (in single precision floating point); using the model size reduction method in Section 4 resulted in a model size of bytes–over a hundred times reduction in size. [sent-155, score-0.273]
85 72% We also plotted scores for all speakers versus a threshold, see Figure 1. [sent-162, score-0.124]
86 One can easily see the reduction in pooled EER from the graph. [sent-164, score-0.189]
87 Note also the dramatic shifting of the FRR curve to the right for the SVM training, resulting in substantially better error rates , the MSE training method gives than the MSE training. [sent-165, score-0.191]
88 For instance, when FAR is ; whereas, the SVM training method gives an FRR of –a reduction by an FRR of a factor of in error. [sent-166, score-0.145]
89 This data-dependent kernel was motivated by using a probabilistic scoring method and mean-squared error training. [sent-168, score-0.379]
90 Experiments showed that incorporating this kernel in an SVM training architecture yielded performance superior to that of the MSE training criterion. [sent-169, score-0.285]
91 6 £ 8 The new kernel method is also applicable to more general situations. [sent-171, score-0.16]
92 Potential applications include–using the approach with radial basis functions, application to automatic speech recognition, and extending to an SVM/HMM architecture. [sent-172, score-0.192]
93 [2] Aravind Ganapathiraju and Joseph Picone, “Hybrid SVM/HMM architectures for speech recognition,” in Speech Transcription Workshop, 2000. [sent-176, score-0.141]
94 [5] Nathan Smith, Mark Gales, and Mahesan Niranjan, “Data-dependent kernels in SVM classification of speech patterns,” Tech. [sent-194, score-0.141]
95 Gopinath, “A hybrid GMM/SVM approach to speaker ri a recognition,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2001. [sent-199, score-0.546]
96 Assaleh, “Polynomial classifier techniques for speaker verification,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1999, pp. [sent-202, score-0.52]
97 Assaleh, “Speaker recognition using neural networks and conventional classifiers,” IEEE Trans. [sent-207, score-0.094]
98 Reynolds, “Automatic speaker recognition using Gaussian mixture speaker models,” The Lincoln Laboratory Journal, vol. [sent-214, score-1.134]
99 Bridle, “A speaker verification system using alpha-nets,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 1991, pp. [sent-221, score-0.52]
100 Broun, “A computationally scalable speaker recognition system,” in Proceedings of EUSIPCO, 2000, pp. [sent-229, score-0.614]
wordName wordTfidf (topN-words)
[('speaker', 0.52), ('veri', 0.257), ('eer', 0.249), ('mse', 0.208), ('classi', 0.176), ('enrollment', 0.175), ('frr', 0.175), ('naps', 0.175), ('scoring', 0.173), ('cation', 0.155), ('pooled', 0.15), ('speech', 0.141), ('kernel', 0.131), ('impostor', 0.125), ('polynomial', 0.124), ('er', 0.112), ('qv', 0.108), ('gv', 0.099), ('speakers', 0.099), ('svm', 0.096), ('recognition', 0.094), ('identi', 0.092), ('campbell', 0.092), ('session', 0.092), ('voice', 0.079), ('sessions', 0.079), ('claim', 0.079), ('training', 0.077), ('ers', 0.072), ('utterances', 0.069), ('sequence', 0.067), ('william', 0.063), ('acoustics', 0.06), ('discriminant', 0.058), ('observations', 0.053), ('expansion', 0.051), ('posteriori', 0.05), ('assaleh', 0.05), ('edr', 0.05), ('impostors', 0.05), ('joseph', 0.05), ('khaled', 0.05), ('mahalanobis', 0.05), ('yoho', 0.05), ('fed', 0.047), ('error', 0.046), ('bytes', 0.043), ('cholesky', 0.043), ('ua', 0.043), ('support', 0.043), ('sequences', 0.041), ('threshold', 0.04), ('resulted', 0.04), ('cepstral', 0.039), ('emission', 0.039), ('rates', 0.039), ('reduction', 0.039), ('identity', 0.038), ('frame', 0.036), ('signal', 0.035), ('population', 0.035), ('svmtorch', 0.035), ('trained', 0.033), ('machines', 0.033), ('vector', 0.032), ('bb', 0.031), ('st', 0.03), ('haussler', 0.03), ('reduces', 0.03), ('entire', 0.03), ('method', 0.029), ('criterion', 0.029), ('taylor', 0.028), ('preserves', 0.028), ('reduced', 0.027), ('output', 0.027), ('proceedings', 0.027), ('dramatically', 0.027), ('user', 0.027), ('coef', 0.027), ('application', 0.027), ('testing', 0.027), ('naive', 0.026), ('frames', 0.026), ('id', 0.026), ('mapping', 0.026), ('far', 0.026), ('hybrid', 0.026), ('eliminate', 0.026), ('false', 0.025), ('scores', 0.025), ('independence', 0.025), ('nal', 0.024), ('ideal', 0.024), ('extending', 0.024), ('jaakkola', 0.024), ('rate', 0.024), ('hg', 0.024), ('methodology', 0.024), ('sch', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
2 0.26012498 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
3 0.22320317 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
Author: John R. Hershey, Michael Casey
Abstract: It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audio-visual information for the task of speech enhancement. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information. 1
4 0.17197716 46 nips-2001-Categorization by Learning and Combining Object Parts
Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio
Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.
5 0.14021027 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
Author: Paul Viola, Michael Jones
Abstract: This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval). In such domains a cascade of simple classifiers each trained to achieve high detection rates and modest false positive rates can yield a final detector with many desirable features: including high detection rates, very low false positive rates, and fast performance. Achieving extremely high detection rates, rather than low error, is not a task typically addressed by machine learning algorithms. We propose a new variant of AdaBoost as a mechanism for training the simple classifiers used in the cascade. Experimental results in the domain of face detection show the training algorithm yields significant improvements in performance over conventional AdaBoost. The final face detection system can process 15 frames per second, achieves over 90% detection, and a false positive rate of 1 in a 1,000,000.
6 0.13622423 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
7 0.12467556 60 nips-2001-Discriminative Direction for Kernel Classifiers
8 0.12252197 172 nips-2001-Speech Recognition using SVMs
9 0.115605 105 nips-2001-Kernel Machines and Boolean Functions
10 0.11542501 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
11 0.11313887 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
12 0.11189935 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
13 0.11080127 164 nips-2001-Sampling Techniques for Kernel Methods
14 0.10456865 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets
15 0.10442074 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
16 0.10107777 139 nips-2001-Online Learning with Kernels
17 0.096967138 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
18 0.096179463 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance
19 0.089900315 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine
20 0.088736273 144 nips-2001-Partially labeled classification with Markov random walks
topicId topicWeight
[(0, -0.248), (1, 0.172), (2, -0.104), (3, 0.132), (4, -0.171), (5, 0.211), (6, 0.101), (7, -0.111), (8, 0.019), (9, 0.04), (10, -0.024), (11, -0.091), (12, -0.093), (13, 0.148), (14, -0.09), (15, 0.025), (16, 0.022), (17, -0.003), (18, -0.101), (19, -0.087), (20, 0.009), (21, -0.078), (22, 0.081), (23, 0.008), (24, -0.028), (25, 0.032), (26, -0.036), (27, -0.029), (28, 0.069), (29, -0.056), (30, -0.007), (31, -0.004), (32, -0.026), (33, 0.013), (34, 0.041), (35, 0.089), (36, -0.022), (37, 0.029), (38, 0.033), (39, -0.036), (40, -0.038), (41, 0.021), (42, -0.0), (43, -0.04), (44, -0.013), (45, -0.047), (46, -0.021), (47, -0.005), (48, 0.025), (49, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.96020353 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
2 0.85171586 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
3 0.67534292 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets
Author: S. Parveen, P. Green
Abstract: In the ‘missing data’ approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be ‘missing’. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.
4 0.662467 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine
Author: Ji Zhu, Trevor Hastie
Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.
5 0.62091607 46 nips-2001-Categorization by Learning and Combining Object Parts
Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio
Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.
6 0.59484369 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
7 0.58986998 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
8 0.57464474 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
9 0.57065636 60 nips-2001-Discriminative Direction for Kernel Classifiers
10 0.55317092 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
11 0.54568434 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance
12 0.53433603 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
13 0.52314812 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
14 0.51174664 172 nips-2001-Speech Recognition using SVMs
15 0.49462089 99 nips-2001-Intransitive Likelihood-Ratio Classifiers
16 0.48246506 105 nips-2001-Kernel Machines and Boolean Functions
17 0.4631404 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method
18 0.41674918 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
19 0.40462163 164 nips-2001-Sampling Techniques for Kernel Methods
20 0.40436307 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
topicId topicWeight
[(14, 0.059), (17, 0.012), (19, 0.038), (20, 0.02), (27, 0.098), (30, 0.139), (38, 0.015), (57, 0.29), (59, 0.02), (66, 0.011), (72, 0.081), (79, 0.03), (83, 0.017), (91, 0.098)]
simIndex simValue paperId paperTitle
same-paper 1 0.81660247 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
2 0.60360312 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
3 0.59820896 149 nips-2001-Probabilistic Abstraction Hierarchies
Author: Eran Segal, Daphne Koller, Dirk Ormoneit
Abstract: Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in “nearby” classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data. We present experimental results on synthetic data, and on real data in the domains of gene expression and text.
4 0.59415078 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
Author: Dieter Fox
Abstract: Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.
5 0.58317077 60 nips-2001-Discriminative Direction for Kernel Classifiers
Author: Polina Golland
Abstract: In many scientific and engineering applications, detecting and understanding differences between two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting, the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, such analysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical model in the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. We derive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.
6 0.58004498 163 nips-2001-Risk Sensitive Particle Filters
7 0.57949436 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
8 0.57852048 46 nips-2001-Categorization by Learning and Combining Object Parts
9 0.57088494 1 nips-2001-(Not) Bounding the True Error
10 0.56918228 22 nips-2001-A kernel method for multi-labelled classification
11 0.56914103 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model
12 0.5688386 56 nips-2001-Convolution Kernels for Natural Language
13 0.56833315 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
14 0.56729406 185 nips-2001-The Method of Quantum Clustering
15 0.56698298 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
16 0.56679577 116 nips-2001-Linking Motor Learning to Function Approximation: Learning in an Unlearnable Force Field
17 0.56490129 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
18 0.56248432 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
19 0.56221408 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
20 0.56187958 27 nips-2001-Activity Driven Adaptive Stochastic Resonance