nips nips2001 nips2001-168 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: K. Yao, S. Nakamura
Abstract: We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and prediction likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is carried out over these samples by fusion the estimation of samples according to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calculation efficiency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed artificially, and highly non-stationary Machinegun noise. In all the experiments carried out, we observed that the method can have significant recognition performance improvement, over that achieved by noise compensation with stationary noise assumption. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Sequential noise compensation by sequential Monte Carlo method Kaisheng Yao and Satoshi Nakamura ATR Spoken Language Translation Research Laboratories 2-2-2, Hikaridai Seika-cho, Souraku-gun, Kyoto, 619-0288, Japan E-mail: {kaisheng. [sent-1, score-0.648]
2 jp Abstract We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. [sent-6, score-0.946]
3 The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. [sent-7, score-0.658]
4 An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and prediction likelihood for weighting each sample. [sent-8, score-0.773]
5 Minimum mean square error (MMSE) inference of the time-varying noise parameter is carried out over these samples by fusion the estimation of samples according to their weights. [sent-9, score-0.512]
6 A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calculation efficiency. [sent-10, score-0.16]
7 Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed artificially, and highly non-stationary Machinegun noise. [sent-11, score-0.527]
8 In all the experiments carried out, we observed that the method can have significant recognition performance improvement, over that achieved by noise compensation with stationary noise assumption. [sent-12, score-0.923]
9 1 Introduction Speech recognition in noise has been considered to be essential for its real applications. [sent-13, score-0.282]
10 Among many approaches, model-based approach assumes explicit models representing noise effects on speech features. [sent-15, score-0.437]
11 In this approach, most researches are focused on stationary or slow-varying noise conditions. [sent-16, score-0.301]
12 In this situation, environment noise parameters are often estimated before speech recognition from a small set of environment adaptation data. [sent-17, score-0.707]
13 The estimated environment noise parameters are then used to compensate noise effects in the feature or model space for recognition of noisy speech. [sent-18, score-0.643]
14 However, it is well-known that noise statistics may vary during recognition. [sent-19, score-0.221]
15 In this situation, the noise parameters estimated prior to speech recognition of the utterances is possibly not relevant to the subsequent frames of input speech if environment changes. [sent-20, score-0.874]
16 A number of techniques have been proposed to compensate time-varying noise effects. [sent-21, score-0.238]
17 In the first approach, timevarying environment sources are modeled by Hidden Markov Models (HMM) or Gaussian mixtures that were trained by prior measurement of environments, so that noise compensation is a task of identification of the underlying state sequences of the noise HMMs, e. [sent-23, score-0.933]
18 ), so that statistics at some states or mixtures obtained before speech recognition are close to the real testing environments. [sent-27, score-0.282]
19 In the second approach, environment model parameters are assumed to be timevarying, so it is not only an inference problem but also related to environment statistics estimation during speech recognition. [sent-28, score-0.454]
20 In the Bayesian methods, all relevant information on the set of environment parameters and speech parameters, which are denoted as Θ(t) at frame t, is included in the posterior distribution given observation sequence Y (0 : t), i. [sent-33, score-0.431]
21 For example, in [5], a Laplace transform is used to approximate the joint distribution of speech and noise parameters by vector Taylor series. [sent-38, score-0.443]
22 The approximated joint distribution can give analytical formula for posterior distribution updating. [sent-39, score-0.091]
23 We report an alternative approach for Bayesian estimation and compensation of noise effects on speech features. [sent-40, score-0.743]
24 The method is based on sequential Monte Carlo method [6]. [sent-41, score-0.18]
25 In the method, a set of samples is generated hierarchically from the prior distribution given by speech models. [sent-42, score-0.353]
26 A state space model representing noise effects on speech features is used explicitly, and an extended Kalman filter (EKF) is constructed in each sample. [sent-43, score-0.464]
27 The prediction likelihood of the EKF in each sample gives its weight for selection, smoothing, and inference of the time-varying noise parameter, so that noise compensation is carried out afterwards. [sent-44, score-0.824]
28 Since noise parameter estimation, noise compensation and speech recognition are carried out frame-byframe, we denote this approach as sequential noise compensation. [sent-45, score-1.386]
29 2 Speech and noise model Our work is on speech features derived from Mel Frequency Cepstral Coefficients (MFCC). [sent-46, score-0.423]
30 In our work, speech and noise are respectively modeled by HMMs and a Gaussian mixture. [sent-50, score-0.44]
31 For speech recognition in stationary additive noise, the following formula [4] has been shown to be effective in compensating noise effects. [sent-51, score-0.603]
32 For Gaussian mixture kt at state st , the Log-Add method transforms the mean vector µl t kt of s the Gaussian mixture by, µl t kt ˆs = µl t kt + log(1 + exp(µl − µl t kt )) s n s (1) where µl is the mean vector in the noise model. [sent-52, score-4.57]
33 n S and M each denote the number of states in speech models and the number of mixtures at each state. [sent-54, score-0.236]
34 After the transformation, the mean vector µl t kt is further transformed by DCT, ˆs and then plugged into speech models for recognition of noisy speech. [sent-56, score-1.122]
35 Accordingly, n n the compensated mean is µl t kt (t). [sent-60, score-0.827]
36 st and kt each denote the state and Gaussian mixture at frame t in speech models. [sent-62, score-1.265]
37 µl t kt (t) and µl (t) each denote the speech and noise n s parameter. [sent-63, score-1.249]
38 In Gaussian mixture kt at state st of speech model, speech parameter µl t kt (t) is assumed to be distributed in Gaussian s with mean µl t kt and variance Σl t kt . [sent-66, score-3.932]
39 On the other hand, since the environment s s parameter is assumed to be time varying, the evolution of the environment mean vector can be modeled by a random walk function, i. [sent-67, score-0.276]
40 , µl (t) = µl (t − 1) + v(t) n n (2) where v(t) is the environment driving noise in Gaussian distribution with zero mean and variance V . [sent-69, score-0.476]
41 The above formula gives the prior distribution of the set of speech and noise model parameter Θ(t) = {st , kt , µl t kt (t), µl (t)}. [sent-71, score-2.137]
42 , Y l (t) = µl t kt (t) + log (1 + exp (µl (t) − µl t kt (t))) + wst kt (t) s s n (4) where wst kt (t) is Gaussian with zero mean and variance Σl t kt , i. [sent-74, score-4.159]
43 Another difficulty is that the n speech state and mixture sequence is hidden in (7). [sent-79, score-0.252]
44 3 Time-varying noise parameter estimation by sequential Monte Carlo method We apply the sequential Monte Carlo method [6] for posterior distribution updating. [sent-81, score-0.631]
45 At each frame t, a proposal importance distribution is sampled whose target is the posterior distribution in (7), and it is implemented by sampling from lower distributions in hierarchy. [sent-82, score-0.177]
46 MMSE inference of the time-varying noise parameter is a by-product of the steps, carried out after the smoothing step. [sent-84, score-0.325]
47 In the sampling step, the prior distribution given by speech models is set to the proposal importance distribution, i. [sent-85, score-0.317]
48 , q(Θ(t)|Θ(t − 1)) = ast−1 st pst kt N (µl t kt (t); µl t kt , Σl t kt ). [sent-87, score-3.445]
49 The samples are then generated by sampling s s s hierarchically of the prior distribution described as follows: set i = 1 and perform the following steps: (i) 1. [sent-88, score-0.181]
50 sample µ ∼ ps(i) kt l(i) (i) (i) st kt t (t) ∼ N (; µl (i) (i) st kt , Σl (i) (i) st kt ), and set i = i + 1 4. [sent-91, score-3.734]
51 repeat step 1 to 3 until i = N where superscript (i) denotes the index of samples and N denotes the number of samples. [sent-92, score-0.123]
52 Each sample represents certain speech and noise parameter, which is (i) (i) l(i) l(i) denoted as Θ(i) (t) = (st , kt , µ (i) (i) (t), µn (t)). [sent-93, score-1.306]
53 The weight of each sample is st kt given by t p(Θ(τ )(i) |Y l (τ )) τ =1 q(Θ(τ )(i) |Θ(τ −1)(i) ) . [sent-94, score-0.993]
54 The remaining part in the right side of above equation, in fact, represents the prediction likelihood of the state space model given by (2) and (4) for each sample (i). [sent-96, score-0.09]
55 This likelihood can be obtained analytically since after linearization of (4) with respect to µl (t) at n l(i) µn (t − 1), an extended Kalman filter (EKF) can be obtained, where the prediction likelihood of the EKF gives the weight, and the updated continuous state of EKF l(i) gives µn (t). [sent-97, score-0.098]
56 In practice, after the above sampling step, the weights of all but several samples may become insignificant. [sent-98, score-0.132]
57 Given the fixed number of samples, this will results in degeneracy of the estimation, where not only some computational resources are wasted, but also estimation might be biased because of losing detailed information on some parts important to the parameter estimation. [sent-99, score-0.089]
58 A selection step by residual resampling [6] is adopted after the sampling step. [sent-100, score-0.112]
59 The method avoids the degeneracy by discarding those samples with insignificant weights, and in order to keep the number of the samples constant, samples with significant weights are duplicated. [sent-101, score-0.328]
60 Denote the ˜ ˜ set of samples after the selection step as Θ(t) = {Θ(i) (t); i = 1 · · · N } with weights ˜ = {β (i) (t); i = 1 · · · N }. [sent-103, score-0.144]
61 ˜ β(t) After the selection step at frame t, these N samples are distributed approximately according to the posterior distribution in (7). [sent-104, score-0.212]
62 However, the discrete nature of the approximation can lead to a skewed importance weights distribution, where ˜ the extreme case is all the samples have the same Θ(t) estimated. [sent-105, score-0.126]
63 A MetropolisHastings smoothing [7] step is introduced in each sample where the step involves ˜ sampling a candidate Θ (i) (t) given the current Θ(i) (t) according to the proposal ˜ importance distribution q(Θ (t)|Θ(i) (t)). [sent-106, score-0.195]
64 Denote the β ˇ ˇ ˇ ˇ obtained samples as Θ(t) = {Θ(i) (t); i = 1 · · · N } with weights β(t) = {β (i) (t); i = 1 · · · N }. [sent-109, score-0.102]
65 , n N µl (t) = ˆn i=1 ˇ β (i) (t) µl(i) (t) ˇ ˇ(j) (t) n β N j=1 l(i) where µn (t) is the updated continuous state of the EKF in the sample after the ˇ smoothing step. [sent-112, score-0.107]
66 Once the estimate µl (t) has been obtained, it is plugged into (1) ˆn to do non-linear transformation of clean speech models. [sent-113, score-0.303]
67 Five hundred clean speech utterances from 15 speakers and 111 utterances unseen in the training set were used for training and testing, respectively. [sent-116, score-0.307]
68 Digits and silence were respectively modeled by 10-state and 3-state whole word HMMs with 4 diagonal Gaussian mixtures in each state. [sent-117, score-0.078]
69 The first was the baseline trained on clean speech without noise compensation, and the second was the system with noise compensation by (1) assuming stationary noise [4]. [sent-126, score-1.342]
70 , without training transcript, and it was denoted according to the number of samples and variance of the environment driving noise V . [sent-130, score-0.569]
71 Four seconds of contaminating noise was used in each experiment to obtain noise mean vector µl in (1) for Stationary Compensan tion. [sent-131, score-0.527]
72 It was also for initialization of µl (0) in the sequential method. [sent-132, score-0.11]
73 2 Speech recognition in simulated non-stationary noise White noise signal was multiplied by a Chirp signal and a rectangular signal, so that the noise power of the contaminating White noise changed continuously, denoted as experiment A, and dramatically, denoted as experiment B. [sent-139, score-1.205]
74 As a result, signalto-noise ratio (SNR) of the contaminating noise ranged from 0dB to 20. [sent-140, score-0.262]
75 We plotted the noise power in 12th filter bank versus frames in Figure 2, together with the estimated noise power by the sequential method with number of samples set to 120 and environment driving noise variance set to 0. [sent-142, score-1.222]
76 As a comparison, we also plotted the noise power and its estimate by the method with the same number of samples but larger driving noise variance to 0. [sent-144, score-0.705]
77 First, the method can track the evolution of the noise power. [sent-147, score-0.269]
78 Second, the larger driving noise variance V will make faster convergence but larger estimation error of the method. [sent-148, score-0.378]
79 In terms of recognition performance, Table 1 shows that the method can effectively improve system robustness to the time-varying noise. [sent-149, score-0.118]
80 For example, with 60 samples, and the environment driving noise variance V set to 0. [sent-150, score-0.44]
81 For example, given environment driving noise variance V set to 0. [sent-155, score-0.44]
82 0001, increasing number of samples from 60 to 120, can improve word accuracy from 77. [sent-156, score-0.173]
83 Table 1: Word Accuracy (in %) in simulated non-stationary noises, achieved by the sequential Monte Carlo method in comparison with baseline without noise compensation, denoted as Baseline, and noise compensation assuming stationary noise, denoted as Stationary Compensation. [sent-159, score-1.122]
84 84 Speech recognition in real noise In this experiment, speech signals were contaminated by highly non-stationary Machinegun noise in different SNRs. [sent-177, score-0.705]
85 The number of samples was set to 120, and the environment driving noise variance V was set to 0. [sent-178, score-0.525]
86 Figure 2: Estimation of the time-varying parameter µl (t) by the sequential Monte n Carlo method at 12th filter bank in experiment A. [sent-181, score-0.228]
87 86dB SNR, the method can improve word accuracy from 75. [sent-189, score-0.123]
88 Table 2: Word Accuracy (in %) in Machinegun noise, achieved by the sequential Monte Carlo method in comparison with baseline without noise compensation, denoted as Baseline, and noise compensation assuming stationary noise, denoted as Stationary Compensation. [sent-194, score-1.103]
89 89 Summary We have presented a sequential Monte Carlo method for Bayesian estimation of time-varying noise parameter, which is for sequential noise compensation applied to robust speech recognition. [sent-212, score-1.234]
90 The method uses samples to approximate the posterior distribution of the additive noise and speech parameters given observation sequence. [sent-213, score-0.615]
91 Figure 3: Estimation of the time-varying parameter µl (t) by the sequential Monte n Carlo method at 12th filter bank in experiment A. [sent-214, score-0.228]
92 Once the noise parameter has been inferred, it is plugged into a non-linear transformation of clean speech models. [sent-220, score-0.554]
93 Experiments conducted on digits recognition in simulated non-stationary noises and real noises have shown that the method is very effective to improve system robustness to time-varying additive noise. [sent-221, score-0.251]
94 Moore, “Hidden markov model decomposition of speech and noise,” in ICASSP, 1990, pp. [sent-225, score-0.202]
95 Kim, “Nonstationary environment compensation based on sequential estimation,” IEEE Signal Processing Letters, vol. [sent-229, score-0.492]
96 Nakamura, “Sequential noise compensation by a sequential kullback proximal algorithm,” in EUROSPEECH, 2001, pp. [sent-236, score-0.613]
97 Cao, “Residual noise compensation by a sequential em algorithm for robust speech recognition in nonstationary noise,” in ICSLP, 2000, vol. [sent-243, score-0.911]
98 Kristjansson, “Algonquin: Iterating laplace’s method to remove multiple types of acoustic distortion for robust speech recognition,” in EUROSPEECH, 2001, pp. [sent-250, score-0.252]
99 Chen, “Sequential monte carlo methods for dynamic systems,” J. [sent-255, score-0.146]
100 Hastings, “Monte carlo sampling methods using markov chains and their applications,” Biometrika, vol. [sent-263, score-0.103]
wordName wordTfidf (topN-words)
[('kt', 0.811), ('compensation', 0.282), ('noise', 0.221), ('speech', 0.202), ('st', 0.154), ('sequential', 0.11), ('environment', 0.1), ('driving', 0.085), ('samples', 0.085), ('ekf', 0.081), ('stationary', 0.08), ('carlo', 0.073), ('monte', 0.073), ('baseline', 0.066), ('recognition', 0.061), ('mmse', 0.054), ('lter', 0.053), ('clean', 0.049), ('snr', 0.047), ('noises', 0.047), ('pst', 0.047), ('denoted', 0.044), ('ast', 0.043), ('word', 0.042), ('contaminating', 0.041), ('machinegun', 0.041), ('nakamura', 0.041), ('yao', 0.041), ('estimation', 0.038), ('smoothing', 0.037), ('kalman', 0.036), ('method', 0.035), ('variance', 0.034), ('frame', 0.033), ('plugged', 0.032), ('posterior', 0.032), ('sampling', 0.03), ('parameter', 0.03), ('sample', 0.028), ('utterances', 0.028), ('experiment', 0.028), ('state', 0.027), ('dct', 0.027), ('mfcc', 0.027), ('refereeing', 0.027), ('wst', 0.027), ('di', 0.026), ('ects', 0.026), ('bank', 0.025), ('power', 0.024), ('accuracy', 0.024), ('importance', 0.024), ('hierarchically', 0.023), ('timevarying', 0.023), ('insigni', 0.023), ('estimated', 0.023), ('carried', 0.023), ('selection', 0.023), ('mixture', 0.023), ('prior', 0.023), ('improve', 0.022), ('hmms', 0.021), ('cepstral', 0.021), ('degeneracy', 0.021), ('residual', 0.021), ('likelihood', 0.021), ('transformation', 0.02), ('distribution', 0.02), ('additive', 0.02), ('eurospeech', 0.02), ('nonstationary', 0.02), ('formula', 0.019), ('gaussian', 0.019), ('resampling', 0.019), ('superscript', 0.019), ('acceptance', 0.019), ('simulated', 0.019), ('mixtures', 0.019), ('step', 0.019), ('proposal', 0.018), ('modeled', 0.017), ('compensate', 0.017), ('weights', 0.017), ('laplace', 0.016), ('ective', 0.016), ('signal', 0.016), ('mean', 0.016), ('curve', 0.016), ('denote', 0.015), ('robust', 0.015), ('updated', 0.015), ('bayesian', 0.015), ('inference', 0.014), ('table', 0.014), ('frames', 0.014), ('prediction', 0.014), ('representing', 0.014), ('erent', 0.014), ('environments', 0.014), ('evolution', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method
Author: K. Yao, S. Nakamura
Abstract: We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and prediction likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is carried out over these samples by fusion the estimation of samples according to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calculation efficiency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed artificially, and highly non-stationary Machinegun noise. In all the experiments carried out, we observed that the method can have significant recognition performance improvement, over that achieved by noise compensation with stationary noise assumption. 1
2 0.21430072 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
Author: Brendan J. Frey, Trausti T. Kristjansson, Li Deng, Alex Acero
Abstract: A challenging, unsolved problem in the speech recognition community is recognizing speech signals that are corrupted by loud, highly nonstationary noise. One approach to noisy speech recognition is to automatically remove the noise from the cepstrum sequence before feeding it in to a clean speech recognizer. In previous work published in Eurospeech, we showed how a probability model trained on clean speech and a separate probability model trained on noise could be combined for the purpose of estimating the noisefree speech from the noisy speech. We showed how an iterative 2nd order vector Taylor series approximation could be used for probabilistic inference in this model. In many circumstances, it is not possible to obtain examples of noise without speech. Noise statistics may change significantly during an utterance, so that speechfree frames are not sufficient for estimating the noise model. In this paper, we show how the noise model can be learned even when the data contains speech. In particular, the noise model can be learned from the test utterance and then used to de noise the test utterance. The approximate inference technique is used as an approximate E step in a generalized EM algorithm that learns the parameters of the noise model from a test utterance. For both Wall Street J ournal data with added noise samples and the Aurora benchmark, we show that the new noise adaptive technique performs as well as or significantly better than the non-adaptive algorithm, without the need for a separate training set of noise examples. 1
3 0.16882305 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
Author: John R. Hershey, Michael Casey
Abstract: It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audio-visual information for the task of speech enhancement. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information. 1
4 0.081843257 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
Author: Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama
Abstract: A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs). 1
5 0.077553049 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets
Author: S. Parveen, P. Green
Abstract: In the ‘missing data’ approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be ‘missing’. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.
6 0.063914165 172 nips-2001-Speech Recognition using SVMs
7 0.05896309 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
8 0.058110204 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation
9 0.05692165 43 nips-2001-Bayesian time series classification
10 0.055632684 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
11 0.054756004 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
12 0.054107077 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
13 0.052507285 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
14 0.050966326 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
15 0.050300807 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
16 0.046755757 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
17 0.045934841 163 nips-2001-Risk Sensitive Particle Filters
18 0.045398746 35 nips-2001-Analysis of Sparse Bayesian Learning
19 0.042649694 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
20 0.042280946 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo
topicId topicWeight
[(0, -0.118), (1, -0.008), (2, -0.018), (3, -0.056), (4, -0.2), (5, 0.043), (6, 0.176), (7, -0.017), (8, 0.031), (9, -0.064), (10, 0.032), (11, 0.036), (12, 0.009), (13, 0.136), (14, -0.062), (15, 0.034), (16, -0.048), (17, -0.039), (18, -0.229), (19, 0.027), (20, -0.015), (21, -0.074), (22, 0.07), (23, 0.126), (24, 0.07), (25, -0.098), (26, 0.075), (27, -0.068), (28, 0.087), (29, 0.064), (30, 0.032), (31, 0.102), (32, -0.003), (33, 0.041), (34, -0.038), (35, -0.023), (36, 0.045), (37, 0.087), (38, 0.084), (39, 0.023), (40, -0.02), (41, -0.05), (42, -0.083), (43, 0.041), (44, 0.081), (45, 0.048), (46, 0.046), (47, 0.068), (48, 0.013), (49, -0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.96856207 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method
Author: K. Yao, S. Nakamura
Abstract: We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and prediction likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is carried out over these samples by fusion the estimation of samples according to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calculation efficiency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed artificially, and highly non-stationary Machinegun noise. In all the experiments carried out, we observed that the method can have significant recognition performance improvement, over that achieved by noise compensation with stationary noise assumption. 1
2 0.86467695 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
Author: Brendan J. Frey, Trausti T. Kristjansson, Li Deng, Alex Acero
Abstract: A challenging, unsolved problem in the speech recognition community is recognizing speech signals that are corrupted by loud, highly nonstationary noise. One approach to noisy speech recognition is to automatically remove the noise from the cepstrum sequence before feeding it in to a clean speech recognizer. In previous work published in Eurospeech, we showed how a probability model trained on clean speech and a separate probability model trained on noise could be combined for the purpose of estimating the noisefree speech from the noisy speech. We showed how an iterative 2nd order vector Taylor series approximation could be used for probabilistic inference in this model. In many circumstances, it is not possible to obtain examples of noise without speech. Noise statistics may change significantly during an utterance, so that speechfree frames are not sufficient for estimating the noise model. In this paper, we show how the noise model can be learned even when the data contains speech. In particular, the noise model can be learned from the test utterance and then used to de noise the test utterance. The approximate inference technique is used as an approximate E step in a generalized EM algorithm that learns the parameters of the noise model from a test utterance. For both Wall Street J ournal data with added noise samples and the Aurora benchmark, we show that the new noise adaptive technique performs as well as or significantly better than the non-adaptive algorithm, without the need for a separate training set of noise examples. 1
3 0.77426273 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
Author: John R. Hershey, Michael Casey
Abstract: It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audio-visual information for the task of speech enhancement. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information. 1
4 0.6315878 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets
Author: S. Parveen, P. Green
Abstract: In the ‘missing data’ approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be ‘missing’. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.
5 0.33804673 20 nips-2001-A Sequence Kernel and its Application to Speaker Recognition
Author: William M. Campbell
Abstract: A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
6 0.31945047 99 nips-2001-Intransitive Likelihood-Ratio Classifiers
7 0.30312157 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
8 0.28943777 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
9 0.26230958 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
10 0.25973168 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
11 0.24898596 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation
12 0.24881074 172 nips-2001-Speech Recognition using SVMs
13 0.24380809 61 nips-2001-Distribution of Mutual Information
14 0.24197195 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
15 0.23672551 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
16 0.23543394 21 nips-2001-A Variational Approach to Learning Curves
17 0.23511796 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
18 0.23498949 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention
19 0.22124012 43 nips-2001-Bayesian time series classification
20 0.21433735 35 nips-2001-Analysis of Sparse Bayesian Learning
topicId topicWeight
[(13, 0.022), (14, 0.011), (17, 0.019), (19, 0.016), (20, 0.016), (27, 0.057), (30, 0.121), (38, 0.017), (59, 0.4), (72, 0.045), (79, 0.055), (83, 0.022), (91, 0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.9050023 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method
Author: K. Yao, S. Nakamura
Abstract: We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and prediction likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is carried out over these samples by fusion the estimation of samples according to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calculation efficiency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed artificially, and highly non-stationary Machinegun noise. In all the experiments carried out, we observed that the method can have significant recognition performance improvement, over that achieved by noise compensation with stationary noise assumption. 1
2 0.824628 108 nips-2001-Learning Body Pose via Specialized Maps
Author: Rómer Rosales, Stan Sclaroff
Abstract: A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1
3 0.80116481 164 nips-2001-Sampling Techniques for Kernel Methods
Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf
Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.
4 0.76366591 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
5 0.55629689 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
Author: Dieter Fox
Abstract: Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.
6 0.52567619 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
7 0.51878864 74 nips-2001-Face Recognition Using Kernel Methods
8 0.49121192 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
9 0.48960793 154 nips-2001-Products of Gaussians
10 0.48906118 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
11 0.4849664 71 nips-2001-Estimating the Reliability of ICA Projections
12 0.4811838 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo
13 0.47972521 46 nips-2001-Categorization by Learning and Combining Object Parts
14 0.47925633 163 nips-2001-Risk Sensitive Particle Filters
15 0.46682227 155 nips-2001-Quantizing Density Estimators
16 0.46255153 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
17 0.46235621 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation
18 0.45937863 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
19 0.45555761 149 nips-2001-Probabilistic Abstraction Hierarchies
20 0.45529649 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation