nips nips2000 nips2000-90 knowledge-graph by maker-knowledge-mining

90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition


Source: pdf

Author: Hervé Bourlard, Samy Bengio, Katrin Weber

Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ch Abstract In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. [sent-4, score-0.363]

2 More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. [sent-5, score-0.034]

3 As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation. [sent-7, score-0.53]

4 Furthermore, time correlation, and consequently the dynamic of the signal, inside each HMM state is also usually disregarded (although the use of temporal delta and delta-delta features can capture some of this correlation). [sent-9, score-0.291]

5 Consequently, only medium-term dependencies are captured via the topology of the HMM model, while short-term and long-term dependencies are usually very poorly modeled. [sent-10, score-0.104]

6 Ideally, we want to design a particular HMM able to accommodate multiple time-scale characteristics so that we can capture phonetic properties, as well as syllable structures and {long term) invariants that are more robust to noise. [sent-11, score-0.222]

7 It is, however, clear that those different time-scale features will also exhibit different levels of stationarity and will require different HMM topologies to capture their dynamics. [sent-12, score-0.087]

8 There are many potential advantages to such a multi-stream approach, including: 1. [sent-13, score-0.094]

9 The definition of a principled way to merge different temporal knowledge sources such as acoustic and visual inputs, even if the temporal sequences are not synchronous and do not have the same data rate - see [13] for further discussion about this. [sent-14, score-0.206]

10 Possibility to incorporate multiple time resolutions (as part of a structure with multiple unit lengths, such as phon(l and syllable). [sent-16, score-0.028]

11 As a particular case of multi-stream processing, mufti-band ASR [2, 5], involving the independent processing and combination of partial frequency bands, have many potential advantages briefly discussed below. [sent-18, score-0.456]

12 In the following, we will not discuss the underlying algorithms (more or less "complex" variants of Viterbi decoding), nor detailed experimental results (see, e. [sent-19, score-0.058]

13 Instead, we will mainly focus on the combination strategy and discuss different variants arounds the same formalism. [sent-22, score-0.146]

14 1 Multiband-based ASR General Formalism As a particular case of the multi-stream paradigm, we have been investigating an ASR approach based on independent processing and combination of frequency subbands. [sent-24, score-0.324]

15 1, is to split the whole frequency band (represented in terms of critical bands) into a few subbands on which different recognizers are independently applied. [sent-26, score-0.434]

16 The resulting probabilities are then combined for recognition later in the process at some segmental level. [sent-27, score-0.219]

17 Starting from critical bands, acoustic processing is now performed independently for each frequency band, yielding K input streams, each being associated with a particular frequency band. [sent-28, score-0.447]

18 In multi-band speech recognition, the frequency range is split into several bands, and information in the bands is used for phonetic probability estimation by independent modules. [sent-31, score-0.677]

19 These probabilities are then combined for recognition later in the process at some segmental level. [sent-32, score-0.219]

20 In this case, each of the K sub-recognizer (channel) is now using the information xt, contained in a specific frequency band Xk = { x~, . [sent-33, score-0.299]

21 , x~}, where each x~ represents the acoustic (spectral) vector at time n in the k-th stream. [sent-39, score-0.078]

22 In the case of hybrid HMM/ ANN systems, HMM local emission (posterior) probabilities are estimated by an artificial neural network (ANN), estimating P(qjlxn), where q3 is an HMM state and Xn = (x~, . [sent-40, score-0.3]

23 In the case of multi-stream (or subband-based) HMM£ ANN systems, different ANNs will compute state specific stream posteriors P(qjJxn)· Combination ofthese local posteriors can then be performed at different temporal levels, and in many ways, including [2]: untrained linear or trained linear (e. [sent-47, score-0.465]

24 , as a function of automatically estimated local SNR) functions, as well as trained nonlinear functions (e. [sent-49, score-0.066]

25 In the simplest case, this subband posterior recombination is performed at the HMM state level, which then amounts to performing a standard Viterbi decoding in which local {log) probabilities are obtained from a linear or nonlinear combination of the local subband probabilities. [sent-52, score-0.984]

26 For example, in the initial subband-based ASR, local posteriors P(qjJxn) were estimated according to: K P(qjJxn) = I:wkP(qjJx! [sent-53, score-0.161]

27 , E>k) is computed with a band-specific ANN of parameters E>k and with x~ (possibly with temporal context) at its input. [sent-55, score-0.078]

28 The weighting factors can be assigned a uniform distribution (already performing very well [2]) or be proportional to the estimated SNR. [sent-56, score-0.036]

29 Over the last few years, several results were reported showing that such a simple approach was usually more robust to band limited noise. [sent-57, score-0.282]

30 2 Motivations and Drawbacks The multi-band briefly discussed above has several potential advantages summarized here. [sent-59, score-0.191]

31 Better robustness to band-limited noise- The signal may be impaired (e. [sent-60, score-0.143]

32 When recognition is based on several independent decisions from different frequency subbands, the decoding of a linguistic message need not be severely impaired, as long as the remaining clean sub bands supply sufficiently reliable information. [sent-66, score-0.73]

33 Surprisingly, even when the combination is simply performed at the HMM state level, it is observed that the multi-band approach is yielding better performance and noise robustness than a regular full-band system. [sent-70, score-0.341]

34 Similar conclusions were also observed in the framework of the missing feature theory [7, 9]. [sent-71, score-0.14]

35 Better modeling- Sub band modeling will usually be more robust. [sent-73, score-0.171]

36 Indeed, since the dimension of each (subband) feature space is smaller, it is easier to estimate reliable statistics (resulting in a more robust parametrization). [sent-74, score-0.202]

37 Moreover, the allpole modeling usually used in ASR will be more robust if performed on sub bands, i. [sent-75, score-0.25]

38 , in lower dimensional spaces, than on the full-band signal [12]. [sent-77, score-0.057]

39 Channel asynchrony - Transitions between more stationary segments of speech do not necessarily occur at the same time across the different frequency bands [8], which makes the piecewise stationary assumption more fragile. [sent-78, score-0.708]

40 The subband approach may have the potential of relaxing the synchrony constraint inherent in current HMM systems. [sent-79, score-0.395]

41 Channel specific processing and modeling - Different recognition strate- gies might ultimately be applied in different subbands. [sent-80, score-0.214]

42 , time resolution and width of analysis window depending on the frequency subband). [sent-83, score-0.193]

43 Finally, some subbands may be inherently better for certain classes of speech sounds than others. [sent-84, score-0.381]

44 Major objections ~nd drawbacks - One of the common objections [8] to this separate modeling of each frequency band has been that important information in the form of correlation between bands may be lost. [sent-85, score-0.7]

45 Although this may be true, several studies [8], as well as the good recognition rates achieved on small frequency bands [3, 6], tend to show that most of the phonetic information is preserved in each frequency band (possibly provided that we have enough temporal information). [sent-86, score-0.885]

46 3 Full Combination Subband ASR If we know where the noise is, and based on the results obtained with missing data [7, 9], impressive noise robustness can be achieved by using the marginal distribution, estimating the HMM emission probability based on the clean frequency bands only. [sent-88, score-0.814]

47 In our subband approach, we do not assume that we know, or detect explicitly, where the noise is. [sent-89, score-0.369]

48 Following the above developments and discussions, it thus seems reasonable to integrate over all possible positions of the noisy bands, and thus to simultaneously deal with all the L = 2K possible subband combinations S~ (with i = 1, . [sent-90, score-0.374]

49 e)P(E~Ixn) (2) £=1 where P(E~Ixn) represents the relative reliability of a specific feature set. [sent-95, score-0.115]

50 e denotes the set of (ANN) parameters used to compute the subband posteriors. [sent-97, score-0.316]

51 Typically, training of the L neural nets would be done once and for all on clean data, and the recognizer would then be adapted on line simply by adjusting the weights P(E~Ixn) (still representing a limited set of L parameters) to increase the global posteriors. [sent-98, score-0.122]

52 This adaptation can be performed by online estimation of the signal-to-noise ratio or by online, unsupervised, EM adaptation. [sent-99, score-0.027]

53 However, it has the advantage of not requiring the subband independence assumption [3]. [sent-102, score-0.316]

54 Combination approach in different noisy conditions are reported in [3, 4), where the performance of this above approximation was also compared to the "optimal" estimators (2). [sent-105, score-0.089]

55 Interestingly, it was shown that this independence assumption did not hurt much and that the resulting recognition performance was similar to the performance obtained by training and recombining all possible L nets (and significantly better than the original subband approach). [sent-106, score-0.494]

56 In both cases, the recognition rate and the robustness to noise were greatly improved compared to the initial subband approach. [sent-107, score-0.514]

57 This further confirms that we do not seem to lose "critically" important information when neglecting the correlation between bands. [sent-108, score-0.051]

58 In the next section, we biefly introduced a further extension of this approach where the segmentation into subbands is no longer done explicitly, but is achieved dynamically over time, and where the integration over all possible frequency segmentation is part of the same formalism. [sent-109, score-0.445]

59 4 HMM2: Mixture of HMMs HMM emission probabilities are typically modeled through Gaussian mixtures or neural networks. [sent-110, score-0.174]

60 We propose here an alternative approach, referred to as HMM2, integrating standard HMMs (referred to as ''temporal HMMs") with state-dependent feature-based HMMs (referred to as ''feature HMMs") responsible for the estimation of the emission probabilities. [sent-111, score-0.231]

61 In this case, each feature vector Xn at time n is considered as a fixed length sequence, which has supposedly been generated by a temporal HMM state specific HMM for which each state is emitting individual feature components that are modeled by, e. [sent-112, score-0.417]

62 The feature HMM thus looks at all possible subband segmentations and automatically performs the combination of the likelihoods to yield a single emission probability. [sent-115, score-0.634]

63 In this example, the HMM2 is composed of an HMM that handle sequences of features through time. [sent-117, score-0.057]

64 This HMM is composed of 3 left-to-right connected states (q1, q2 and q3) and each state emits a vector of features at each time step. [sent-118, score-0.145]

65 The particularity of an HMM2 is that each state uses an HMM to emit the feature vector, as if it was an ordered sequence (instead of a vector). [sent-119, score-0.136]

66 In Figure 2, state q2 contains a feature HMM with 4 states connected top-down. [sent-120, score-0.136]

67 Of course, while the temporal HMM usually has a left-to-right structure, the topology of the feature HMM can take many forms, which will then reflect the correlation being captured by the model. [sent-121, score-0.309]

68 The feature HMM could even have more states than feature components, in which case "high-order" correlation information could be extracted. [sent-122, score-0.203]

69 We believe that HMM2 (which includes the classical mixture of Gaussian HMMs as a particular case) has several potential advantages, including: 1. [sent-125, score-0.076]

70 Better feature correlation modeling through the feature-based (frequency) HMM topology. [sent-126, score-0.165]

71 Also, the complexity of this topology and the probability density function associated with each state easily control the number of parameters. [sent-127, score-0.126]

72 In the same way the conventional HMM does time warping and time integration, the feature-based HMM performs frequency warping and frequency integration. [sent-130, score-0.47]

73 As further discussed below, the HMM2 structure has the potential to extract some relevant formant structure information, which is often considered as important to robust speech recognition. [sent-133, score-0.446]

74 All the parameters of HMM2 models were trained according to the above EM algorithm on delta-frequency features (differences of two consecutive log Rasta PLP coefficients). [sent-135, score-0.057]

75 The feature HMM had a simple top-down topology with 4 states. [sent-136, score-0.142]

76 After training, Figure 3 shows (on unseen test data) the value of the features for the phoneme iy as well as the segmentation found by a Viterbi decoding along the delta-frequency axis (the thick black lines). [sent-137, score-0.218]

77 At each time step, we kept the 3 positions where the deltafrequency HMM changed its state during decoding (for instance, at the first time frame, the HMM goes from state 1 to state 2 after the third feature). [sent-138, score-0.304]

78 In [14], it has been shown that the use of that information could significantly enhance standard speech recognition systems. [sent-140, score-0.297]

79 Time Figure 2: An HMM2: the emission distributions of the HMM are estimated by another HMM. [sent-141, score-0.161]

80 Figure 3: Frequency deltas of log Rasta PLP and segmentation for an example of phoneme iy. [sent-142, score-0.093]

81 Acknowledgments The content and themes discussed in this paper largely benefited from the collaboration with our colleagues Andrew Morris, Astrid Hagen and Herve Glotin. [sent-143, score-0.035]

82 , "A new ASR approach based on independent processing and combination of partial frequency bands," Proc. [sent-152, score-0.324]

83 , "Subband-based speech recognition in noisy conditions: The full combination approach," IDIAP Research Report no. [sent-160, score-0.443]

84 , "Different weighting schemes in the full combination subbands approach for noise robust ASR," Proceedings of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions (Tampere, Finland), May 25-26, 1999. [sent-165, score-0.409]

85 , "Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise," Proc. [sent-185, score-0.584]

86 , "Some solutions to the missing features problem in data classification, with application to noise robust ASR," Proc. [sent-200, score-0.264]

87 , "The full combination subbands approach to noise robust HMM/ ANN-based ASR," Proc. [sent-208, score-0.409]

88 , "Analysis of linear prediction, coding, and spectral estimation from subbands," IEEE Irans. [sent-221, score-0.034]

89 , "Modelling asynchrony in speech using elementary single-signal decomposition," Proc. [sent-235, score-0.249]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('asr', 0.421), ('hmm', 0.406), ('subband', 0.316), ('bands', 0.204), ('speech', 0.2), ('bourlard', 0.168), ('frequency', 0.165), ('subbands', 0.147), ('hmms', 0.128), ('emission', 0.125), ('ann', 0.122), ('hagen', 0.098), ('idiap', 0.098), ('ixn', 0.098), ('recognition', 0.097), ('posteriors', 0.095), ('morris', 0.095), ('band', 0.095), ('robust', 0.09), ('acoustics', 0.088), ('weber', 0.088), ('combination', 0.088), ('temporal', 0.078), ('feature', 0.076), ('clean', 0.075), ('formant', 0.073), ('martigny', 0.073), ('qjjxn', 0.073), ('decoding', 0.068), ('bengio', 0.066), ('topology', 0.066), ('missing', 0.064), ('state', 0.06), ('noisy', 0.058), ('features', 0.057), ('signal', 0.057), ('sub', 0.057), ('noise', 0.053), ('viterbi', 0.053), ('phonetic', 0.053), ('switzerland', 0.053), ('correlation', 0.051), ('segmentation', 0.051), ('qj', 0.05), ('acoustic', 0.05), ('probabilities', 0.049), ('asynchrony', 0.049), ('drawbacks', 0.049), ('eurospeech', 0.049), ('herve', 0.049), ('katrin', 0.049), ('objections', 0.049), ('plp', 0.049), ('qjjx', 0.049), ('segmental', 0.049), ('syllable', 0.049), ('potential', 0.048), ('robustness', 0.048), ('nets', 0.047), ('advantages', 0.046), ('channel', 0.046), ('warping', 0.042), ('hermansky', 0.042), ('october', 0.042), ('phoneme', 0.042), ('rasta', 0.042), ('referred', 0.041), ('stream', 0.041), ('processing', 0.04), ('integrating', 0.04), ('specific', 0.039), ('modeling', 0.038), ('usually', 0.038), ('impaired', 0.038), ('estimated', 0.036), ('reliable', 0.036), ('spoken', 0.035), ('swiss', 0.035), ('discussed', 0.035), ('better', 0.034), ('spectral', 0.034), ('automatic', 0.034), ('briefly', 0.034), ('em', 0.033), ('discuss', 0.032), ('approach', 0.031), ('stationary', 0.031), ('capture', 0.03), ('philadelphia', 0.03), ('local', 0.03), ('likelihoods', 0.029), ('time', 0.028), ('several', 0.028), ('split', 0.027), ('performed', 0.027), ('marginal', 0.027), ('variants', 0.026), ('responsible', 0.025), ('ieee', 0.024), ('combined', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

Author: Hervé Bourlard, Samy Bengio, Katrin Weber

Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)

2 0.22244111 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

Author: Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier

Abstract: A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1

3 0.20449844 96 nips-2000-One Microphone Source Separation

Author: Sam T. Roweis

Abstract: Source separation, or computational auditory scene analysis , attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as lCA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting (

4 0.18839677 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models

Author: Hagai Attias, John C. Platt, Alex Acero, Li Deng

Abstract: This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.

5 0.1117093 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals

Author: Lucas C. Parra, Clay Spence, Paul Sajda

Abstract: We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1

6 0.10936049 51 nips-2000-Factored Semi-Tied Covariance Matrices

7 0.105045 138 nips-2000-The Use of Classifiers in Sequential Inference

8 0.10375071 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

9 0.10161912 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

10 0.085270092 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

11 0.08481025 80 nips-2000-Learning Switching Linear Models of Human Motion

12 0.084492967 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition

13 0.068437688 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

14 0.065207765 6 nips-2000-A Neural Probabilistic Language Model

15 0.057770088 82 nips-2000-Learning and Tracking Cyclic Human Motion

16 0.05622226 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System

17 0.054470327 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

18 0.052888982 33 nips-2000-Combining ICA and Top-Down Attention for Robust Speech Recognition

19 0.052574586 122 nips-2000-Sparse Representation for Gaussian Process Models

20 0.047918823 131 nips-2000-The Early Word Catches the Weights


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.202), (1, -0.115), (2, 0.136), (3, 0.163), (4, -0.106), (5, -0.155), (6, -0.307), (7, -0.069), (8, 0.022), (9, 0.129), (10, 0.194), (11, 0.068), (12, 0.078), (13, -0.031), (14, 0.022), (15, 0.026), (16, 0.104), (17, 0.043), (18, 0.02), (19, -0.048), (20, 0.056), (21, -0.031), (22, -0.012), (23, -0.018), (24, -0.117), (25, 0.034), (26, 0.05), (27, -0.096), (28, 0.003), (29, -0.047), (30, 0.015), (31, 0.091), (32, 0.009), (33, -0.034), (34, -0.029), (35, 0.006), (36, 0.039), (37, 0.009), (38, 0.042), (39, -0.101), (40, -0.059), (41, -0.043), (42, -0.032), (43, 0.061), (44, 0.005), (45, -0.058), (46, -0.03), (47, -0.057), (48, -0.043), (49, -0.008)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9654938 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

Author: Hervé Bourlard, Samy Bengio, Katrin Weber

Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)

2 0.82043791 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

Author: Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier

Abstract: A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1

3 0.76897126 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models

Author: Hagai Attias, John C. Platt, Alex Acero, Li Deng

Abstract: This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.

4 0.65797466 96 nips-2000-One Microphone Source Separation

Author: Sam T. Roweis

Abstract: Source separation, or computational auditory scene analysis , attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as lCA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting (

5 0.58581287 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

Author: Lawrence K. Saul, Jont B. Allen

Abstract: An eigenvalue method is developed for analyzing periodic structure in speech. Signals are analyzed by a matrix diagonalization reminiscent of methods for principal component analysis (PCA) and independent component analysis (ICA). Our method-called periodic component analysis (1l

6 0.51297796 138 nips-2000-The Use of Classifiers in Sequential Inference

7 0.44381669 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals

8 0.43074441 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition

9 0.42894712 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

10 0.4103744 51 nips-2000-Factored Semi-Tied Covariance Matrices

11 0.37828127 80 nips-2000-Learning Switching Linear Models of Human Motion

12 0.37376598 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

13 0.30775696 6 nips-2000-A Neural Probabilistic Language Model

14 0.29449075 125 nips-2000-Stability and Noise in Biochemical Switches

15 0.29412448 131 nips-2000-The Early Word Catches the Weights

16 0.26699373 124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks

17 0.25161582 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

18 0.23788184 82 nips-2000-Learning and Tracking Cyclic Human Motion

19 0.23785993 103 nips-2000-Probabilistic Semantic Video Indexing

20 0.21858904 48 nips-2000-Exact Solutions to Time-Dependent MDPs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.011), (4, 0.396), (10, 0.02), (17, 0.099), (26, 0.011), (32, 0.016), (33, 0.053), (55, 0.046), (62, 0.046), (65, 0.025), (67, 0.03), (75, 0.022), (76, 0.031), (79, 0.018), (81, 0.044), (90, 0.027), (97, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84821039 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

Author: Hervé Bourlard, Samy Bengio, Katrin Weber

Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)

2 0.69694656 45 nips-2000-Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images

Author: Rafal Bogacz, Malcolm W. Brown, Christophe G. Giraud-Carrier

Abstract: Olshausen & Field demonstrated that a learning algorithm that attempts to generate a sparse code for natural scenes develops a complete family of localised, oriented, bandpass receptive fields, similar to those of 'simple cells' in VI. This paper describes an algorithm which finds a sparse code for sequences of images that preserves information about the input. This algorithm when trained on natural video sequences develops bases representing the movement in particular directions with particular speeds, similar to the receptive fields of the movement-sensitive cells observed in cortical visual areas. Furthermore, in contrast to previous approaches to learning direction selectivity, the timing of neuronal activity encodes the phase of the movement, so the precise timing of spikes is crucially important to the information encoding.

3 0.49122256 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

Author: Christopher K. I. Williams

Abstract: In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1

4 0.37362593 96 nips-2000-One Microphone Source Separation

Author: Sam T. Roweis

Abstract: Source separation, or computational auditory scene analysis , attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as lCA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting (

5 0.35288784 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

Abstract: In this work, we introduce an Interactive Parts (IP) model as an alternative to Hidden Markov Models (HMMs). We t ested both models on a database of on-line cursive script. We show that implementations of HMMs and the IP model, in which all letters are assumed to have the same average width , give comparable results. However , in contrast to HMMs, the IP model can handle duration modeling without an increase in computational complexity. 1

6 0.35275063 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

7 0.34987837 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

8 0.3498781 138 nips-2000-The Use of Classifiers in Sequential Inference

9 0.34457487 8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas

10 0.33967903 51 nips-2000-Factored Semi-Tied Covariance Matrices

11 0.33965835 109 nips-2000-Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features

12 0.33325106 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

13 0.32639393 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

14 0.3257654 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

15 0.32276207 101 nips-2000-Place Cells and Spatial Navigation Based on 2D Visual Feature Extraction, Path Integration, and Reinforcement Learning

16 0.32221675 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

17 0.32035851 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

18 0.31904632 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

19 0.31740835 124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks

20 0.31585556 74 nips-2000-Kernel Expansions with Unlabeled Examples