nips nips2000 nips2000-84 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: George Saon, Mukund Padmanabhan
Abstract: We consider the problem of designing a linear transformation () E lRPx n, of rank p ~ n, which projects the features of a classifier x E lRn onto y = ()x E lRP such as to achieve minimum Bayes error (or probability of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.
Reference: text
sentIndex sentText sentNum sentScore
1 Phone: (914)-945-2985 Abstract We consider the problem of designing a linear transformation () E lRPx n, of rank p ~ n, which projects the features of a classifier x E lRn onto y = ()x E lRP such as to achieve minimum Bayes error (or probability of misclassification). [sent-7, score-0.531]
2 Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). [sent-8, score-0.408]
3 While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task. [sent-9, score-1.151]
4 1 Introduction Modern speech recognition systems use cepstral features characterizing the short-term spectrum of the speech signal for classifying frames into phonetic classes. [sent-10, score-1.244]
5 These features are augmented with dynamic information from the adjacent frames to capture transient spectral events in the signal. [sent-11, score-0.569]
6 What is commonly referred to as MFCC+~ + ~~ features consist in "static" mel-frequency cepstral coefficients (usually 13) plus their first and second order derivatives computed over a sliding window of typicaJly 9 consecutive frames yielding 39-dimensional feature vectors every IOms. [sent-12, score-1.094]
7 One major drawback of this front-end scheme is that the same computation is performed regardless of the application, channel conditions, speaker variability, etc. [sent-13, score-0.191]
8 In recent years, an alternative feature extraction procedure based on discriminant techniques has emerged: the consecutive cepstral frames are spliced together forming a supervector which is then projected down to a manageable dimension. [sent-14, score-1.211]
9 One of the most popular objective functions for designing the feature space projection is linear discriminant analysis. [sent-15, score-0.646]
10 Its application to speech recognition has shown consistent gains for small vocabulary tasks and mixed results for large vocabulary applications [4,6]. [sent-17, score-0.801]
11 Recently, there has been an interest in extending LDA to heteroscedastic discriminant analysis (HDA) by incorporating the individual class covariances in the objective function [6, 8]. [sent-18, score-0.595]
12 Indeed, the equal class covariance assumption made by LDA does not always hold true in practice making the LDA solution highly suboptimal for specific cases [8]. [sent-19, score-0.241]
13 However, since both LDA and HDA are heuristics, they do not guarantee an optimal projection in the sense of a minimum Bayes classification error. [sent-20, score-0.233]
14 The aim of this paper is to study feature space projections according to objective functions which are more intimately linked to the probability of misclassification. [sent-21, score-0.358]
15 An alternative approach is to define an upper bound on E(J and to directly minimize this bound. [sent-23, score-0.209]
16 The paper is organized as follows: in section 2 we recall the definition of the Bayes error rate and its link to the divergence and the Bhattacharyya bound, section 3 deals with the experiments and results and section 4 provides a final discussion. [sent-24, score-0.383]
17 1 Bayes error, divergence and Bhattacharyya bound Bayes error Consider the general problem of classifying an n-dimensional vector x into one of C distinct classes. [sent-26, score-0.397]
18 Let each class i be characterized by its own prior Ai and probability density function Pi, i = 1, . [sent-27, score-0.087]
wordName wordTfidf (topN-words)
[('lda', 0.471), ('cepstral', 0.285), ('bhattacharyya', 0.236), ('bayes', 0.228), ('vocabulary', 0.198), ('hda', 0.182), ('speech', 0.17), ('frames', 0.158), ('discriminant', 0.148), ('designing', 0.142), ('projected', 0.133), ('divergence', 0.122), ('features', 0.117), ('misclassification', 0.111), ('projection', 0.107), ('classifying', 0.102), ('objective', 0.097), ('bound', 0.096), ('consecutive', 0.095), ('class', 0.087), ('recognition', 0.085), ('minimum', 0.084), ('heteroscedastic', 0.079), ('manageable', 0.079), ('spliced', 0.079), ('george', 0.079), ('error', 0.077), ('feature', 0.075), ('sliding', 0.071), ('modern', 0.071), ('lost', 0.071), ('emerged', 0.071), ('extending', 0.066), ('heights', 0.066), ('yorktown', 0.066), ('heuristics', 0.066), ('lrn', 0.066), ('phone', 0.066), ('phonetic', 0.066), ('projects', 0.062), ('minimize', 0.061), ('mixed', 0.059), ('outperform', 0.059), ('augmented', 0.059), ('ibm', 0.059), ('watson', 0.059), ('covariances', 0.059), ('incorporating', 0.059), ('deals', 0.056), ('adjacent', 0.056), ('variability', 0.056), ('extraction', 0.056), ('linked', 0.056), ('transient', 0.056), ('practice', 0.055), ('classified', 0.053), ('consist', 0.053), ('suboptimal', 0.053), ('alternative', 0.052), ('equality', 0.051), ('gains', 0.051), ('speaker', 0.051), ('drawback', 0.051), ('forming', 0.051), ('discrimination', 0.051), ('rank', 0.049), ('static', 0.049), ('aim', 0.049), ('ox', 0.049), ('characterizing', 0.049), ('usually', 0.048), ('belonging', 0.047), ('link', 0.047), ('assignment', 0.047), ('eo', 0.047), ('always', 0.046), ('explored', 0.046), ('ny', 0.046), ('regardless', 0.046), ('plus', 0.046), ('events', 0.044), ('organized', 0.044), ('word', 0.043), ('channel', 0.043), ('spectral', 0.042), ('spectrum', 0.042), ('densities', 0.042), ('guarantee', 0.042), ('projections', 0.042), ('commonly', 0.041), ('application', 0.04), ('derivatives', 0.04), ('space', 0.039), ('referred', 0.038), ('amounts', 0.038), ('window', 0.038), ('popular', 0.038), ('yielding', 0.037), ('capture', 0.037), ('recall', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition
Author: George Saon, Mukund Padmanabhan
Abstract: We consider the problem of designing a linear transformation () E lRPx n, of rank p ~ n, which projects the features of a classifier x E lRn onto y = ()x E lRP such as to achieve minimum Bayes error (or probability of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.
2 0.14136778 51 nips-2000-Factored Semi-Tied Covariance Matrices
Author: Mark J. F. Gales
Abstract: A new form of covariance modelling for Gaussian mixture models and hidden Markov models is presented. This is an extension to an efficient form of covariance modelling used in speech recognition, semi-tied covariance matrices. In the standard form of semi-tied covariance matrices the covariance matrix is decomposed into a highly shared decorrelating transform and a component-specific diagonal covariance matrix. The use of a factored decorrelating transform is presented in this paper. This factoring effectively increases the number of possible transforms without increasing the number of free parameters. Maximum likelihood estimation schemes for all the model parameters are presented including the component/transform assignment, transform and component parameters. This new model form is evaluated on a large vocabulary speech recognition task. It is shown that using this factored form of covariance modelling reduces the word error rate.
3 0.12753493 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
Author: Hagai Attias, John C. Platt, Alex Acero, Li Deng
Abstract: This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.
4 0.10952861 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition
Author: Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier
Abstract: A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1
5 0.084492967 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition
Author: Hervé Bourlard, Samy Bengio, Katrin Weber
Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)
6 0.080303073 75 nips-2000-Large Scale Bayes Point Machines
7 0.078662336 37 nips-2000-Convergence of Large Margin Separable Linear Classification
8 0.076955289 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
9 0.071810193 6 nips-2000-A Neural Probabilistic Language Model
10 0.069059715 54 nips-2000-Feature Selection for SVMs
11 0.067337759 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
12 0.064062141 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
13 0.064019442 13 nips-2000-A Tighter Bound for Graphical Models
14 0.063646778 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
15 0.059233725 96 nips-2000-One Microphone Source Separation
16 0.058693927 58 nips-2000-From Margin to Sparsity
17 0.05798617 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications
18 0.057471849 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
19 0.05411287 33 nips-2000-Combining ICA and Top-Down Attention for Robust Speech Recognition
20 0.05298619 121 nips-2000-Sparse Kernel Principal Component Analysis
topicId topicWeight
[(0, 0.182), (1, 0.034), (2, 0.041), (3, 0.123), (4, -0.057), (5, -0.132), (6, -0.171), (7, -0.038), (8, 0.021), (9, 0.079), (10, 0.03), (11, -0.01), (12, 0.014), (13, 0.044), (14, -0.035), (15, 0.09), (16, 0.073), (17, -0.005), (18, -0.036), (19, 0.117), (20, 0.118), (21, -0.052), (22, 0.019), (23, 0.153), (24, -0.172), (25, 0.04), (26, -0.038), (27, 0.1), (28, 0.015), (29, -0.044), (30, -0.019), (31, -0.012), (32, -0.068), (33, -0.056), (34, 0.186), (35, -0.095), (36, 0.248), (37, 0.076), (38, -0.165), (39, 0.176), (40, 0.007), (41, 0.078), (42, -0.063), (43, 0.004), (44, -0.143), (45, -0.105), (46, -0.064), (47, -0.136), (48, 0.064), (49, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.97345656 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition
Author: George Saon, Mukund Padmanabhan
Abstract: We consider the problem of designing a linear transformation () E lRPx n, of rank p ~ n, which projects the features of a classifier x E lRn onto y = ()x E lRP such as to achieve minimum Bayes error (or probability of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.
2 0.59849203 51 nips-2000-Factored Semi-Tied Covariance Matrices
Author: Mark J. F. Gales
Abstract: A new form of covariance modelling for Gaussian mixture models and hidden Markov models is presented. This is an extension to an efficient form of covariance modelling used in speech recognition, semi-tied covariance matrices. In the standard form of semi-tied covariance matrices the covariance matrix is decomposed into a highly shared decorrelating transform and a component-specific diagonal covariance matrix. The use of a factored decorrelating transform is presented in this paper. This factoring effectively increases the number of possible transforms without increasing the number of free parameters. Maximum likelihood estimation schemes for all the model parameters are presented including the component/transform assignment, transform and component parameters. This new model form is evaluated on a large vocabulary speech recognition task. It is shown that using this factored form of covariance modelling reduces the word error rate.
3 0.47261113 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
Author: Hagai Attias, John C. Platt, Alex Acero, Li Deng
Abstract: This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.
4 0.45751244 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition
Author: Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier
Abstract: A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1
5 0.43823466 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition
Author: Hervé Bourlard, Samy Bengio, Katrin Weber
Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)
6 0.35982284 54 nips-2000-Feature Selection for SVMs
7 0.33423218 5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm
8 0.32519096 29 nips-2000-Bayes Networks on Ice: Robotic Search for Antarctic Meteorites
9 0.29087725 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
10 0.26051456 30 nips-2000-Bayesian Video Shot Segmentation
11 0.25516894 37 nips-2000-Convergence of Large Margin Separable Linear Classification
12 0.25345069 6 nips-2000-A Neural Probabilistic Language Model
13 0.24821049 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
14 0.23863435 13 nips-2000-A Tighter Bound for Graphical Models
15 0.23764329 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
16 0.22863306 23 nips-2000-An Adaptive Metric Machine for Pattern Classification
17 0.22369958 120 nips-2000-Sparse Greedy Gaussian Process Regression
18 0.21644174 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization
19 0.21280731 70 nips-2000-Incremental and Decremental Support Vector Machine Learning
20 0.21003599 79 nips-2000-Learning Segmentation by Random Walks
topicId topicWeight
[(10, 0.032), (12, 0.011), (17, 0.14), (32, 0.028), (33, 0.055), (49, 0.388), (54, 0.016), (55, 0.022), (62, 0.028), (67, 0.043), (75, 0.012), (76, 0.019), (79, 0.013), (81, 0.041), (90, 0.046), (97, 0.022)]
simIndex simValue paperId paperTitle
1 0.84430516 113 nips-2000-Robust Reinforcement Learning
Author: Jun Morimoto, Kenji Doya
Abstract: This paper proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning by simulations and for on-line action planning. However, the difference between the model and the real environment can lead to unpredictable, often unwanted results. Based on the theory of H oocontrol, we consider a differential game in which a 'disturbing' agent (disturber) tries to make the worst possible disturbance while a 'control' agent (actor) tries to make the best control input. The problem is formulated as finding a minmax solution of a value function that takes into account the norm of the output deviation and the norm of the disturbance. We derive on-line learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call
same-paper 2 0.78792679 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition
Author: George Saon, Mukund Padmanabhan
Abstract: We consider the problem of designing a linear transformation () E lRPx n, of rank p ~ n, which projects the features of a classifier x E lRn onto y = ()x E lRP such as to achieve minimum Bayes error (or probability of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.
3 0.39257506 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition
Author: Yee Whye Teh, Geoffrey E. Hinton
Abstract: We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known. Our method compares favorably with other methods in the literature. The generative model consists of a single layer of rate-coded, non-linear feature detectors and it has the property that, given a data vector, the true posterior probability distribution over the feature detector activities can be inferred rapidly without iteration or approximation. The weights of the feature detectors are learned by comparing the correlations of pixel intensities and feature activations in two phases: When the network is observing real data and when it is observing reconstructions of real data generated from the feature activations.
4 0.38887259 130 nips-2000-Text Classification using String Kernels
Author: Huma Lodhi, John Shawe-Taylor, Nello Cristianini, Christopher J. C. H. Watkins
Abstract: We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results. 1
5 0.38883105 74 nips-2000-Kernel Expansions with Unlabeled Examples
Author: Martin Szummer, Tommi Jaakkola
Abstract: Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trained via the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulation of the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requires very few labeled examples for high classification accuracy.
6 0.38749805 4 nips-2000-A Linear Programming Approach to Novelty Detection
7 0.3858048 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications
8 0.38443482 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm
9 0.3838771 133 nips-2000-The Kernel Gibbs Sampler
10 0.38364509 79 nips-2000-Learning Segmentation by Random Walks
11 0.38347995 122 nips-2000-Sparse Representation for Gaussian Process Models
12 0.38225976 51 nips-2000-Factored Semi-Tied Covariance Matrices
13 0.37882173 111 nips-2000-Regularized Winnow Methods
14 0.3782717 60 nips-2000-Gaussianization
15 0.37673241 37 nips-2000-Convergence of Large Margin Separable Linear Classification
16 0.37626401 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
17 0.37464935 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
18 0.3725931 36 nips-2000-Constrained Independent Component Analysis
19 0.37151328 21 nips-2000-Algorithmic Stability and Generalization Performance
20 0.37014347 75 nips-2000-Large Scale Bayes Point Machines