acl acl2012 acl2012-74 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hao Tang ; Joseph Keshet ; Karen Livescu
Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.
Reference: text
sentIndex sentText sentNum sentScore
1 edu c , , Abstract We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. [sent-2, score-0.21]
2 In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29. [sent-7, score-0.178]
3 1 Introduction One of the problems faced by automatic speech recognition, especially of conversational speech, is that of modeling the mapping between words and their possible pronunciations in terms of sub-word units such as phones. [sent-10, score-0.425]
4 While pronouncing dictionaries provide each word’s canonical pronunciation(s) in terms of phoneme strings, running speech often includes pronunciations that differ greatly from 194 the dictionary. [sent-11, score-0.384]
5 For example, some pronunciations of “probably” in the Switchboard conversational speech database are [p r aa b iy], [p r aa liy], [p r ay], and [p ow ih] (Greenberg et al. [sent-12, score-0.64]
6 In addition, pronunciation variants sometimes include sounds not present in the dictionary at all, such as nasalized vowels (“can’t” → [k ae n n t]) or cfrhic aasti nveass ainliztreoddu vcoewde edlsue ( “toc incomplete c aoens no nna tn])t closures (“legal” → [l iy g fr ix l]). [sent-17, score-0.615]
7 1 This variatciloons umreaske (“s pronunciation modeling one of the major challenges facing speech recognition (McAllaster et al. [sent-18, score-0.6]
8 However, they also tend to cause additional confusability due to the introduction of additional homonyms (Fosler1We use the ARPAbet phonetic alphabet with additional diacritics, such as [ n] for nasalization and [ fr] for frication. [sent-26, score-0.218]
9 2This problem is separate from the grapheme-to-phoneme problem, in which pronunciations are predicted from a word’s spelling; here, we assume the availability of a dictionary of canonical pronunciations as is usual in speech recognition. [sent-27, score-0.703]
10 Some other alternatives are articulatory pronunciation models, in which words are represented as multiple parallel sequences of articulatory features rather than single sequences of phones, and which outperform phone-based models on some tasks (Livescu and Glass, 2004; Jyothi et al. [sent-32, score-1.175]
11 , 2011); and models for learning edit distances between dictionary and actual pronunciations (Ristad and Yianilos, 1998; Filali and Bilmes, 2005). [sent-33, score-0.354]
12 , they provide distributions over possible pronunciations given the canonical one(s)—and they are typically trained by maximizing the likelihood over training data. [sent-36, score-0.267]
13 , 2009; Korkmazskiy and Juang, 1997) optimize a minimum classification error (MCE) criterion to learn the weights (equivalently, probabilities) of alternative pronunciations for each word; (Schramm and Beyerlein, 2001) use a similar approach with discriminative model combination. [sent-39, score-0.275]
14 We propose a general, flexible discriminative approach to pronunciation modeling, rather than discriminatively optimizing a generative model. [sent-42, score-0.507]
15 The approach is related to the recently proposed segmental conditional random field (SCRF) approach to speech recognition (Zweig et al. [sent-44, score-0.281]
16 The main differences are that we optimize large-margin objective functions, which lead to sparser, faster, and better-performing models than conditional random field optimization in our experiments; and we use a large set of different feature functions tailored to pronunciation modeling. [sent-46, score-0.697]
17 In order to focus attention on the pronunciation model alone, our experiments focus on a task that measures only the mapping between words and subword units. [sent-47, score-0.442]
18 For gener- ative models, phonetic error rate of generated pronunciations (Venkataramani and Byrne, 2001) and 195 phone- or frame-level perplexity (Riley et al. [sent-49, score-0.358]
19 For our discriminative models, we consider the task of lexical access; that is, prediction of a single word given its pronunciation in terms of sub-word units (Fissore et al. [sent-52, score-0.544]
20 ) As we show below, our approach outperforms both traditional phonetic rule-based models and the best previously published results on our data set obtained with generative articulatory approaches. [sent-56, score-0.498]
21 2 Problem setting We define a pronunciation of a word as a representation of the way it is produced by a speaker in terms of some set of linguistically meaningful sub-word units. [sent-57, score-0.442]
22 A pronunciation can be, for example, a sequence of phones or multiple sequences of articu- latory features such as nasality, voicing, and tongue and lip positions. [sent-58, score-0.822]
23 For purposes of this paper, we will assume that a pronunciation is a single sequence of units, but the approach applies to other representations. [sent-59, score-0.478]
24 We distinguish between two types of pronunciations of a word: (i) canonical pronunciations, the ones typically found in the dictionary, and (ii) surface pronunciations, the ways a speaker may actually produce the word. [sent-60, score-0.337]
25 In the task of lexical access we are given a surface pronunciation of a word, and our goal is to predict the word. [sent-61, score-0.571]
26 Formally, we define a pronunciation as a sequence of sub-word units p = (p1, p2, . [sent-62, score-0.515]
27 es O as input a osur ffinacde a pronunciation and→ →ret Vur tnhsa t h teak ewso rads from the vocabulary that was spoken. [sent-72, score-0.442]
28 Let wˆ = f(p) be the predicted word given the pronunciation p. [sent-75, score-0.442]
29 iEoancsh, f {eφatu}re function takes a surface: pronunciation p acnhd f a proposed twioonrd ta w san ad s returns a scalar which, intuitively, should be correlated with whether the pronunciation p corresponds to the word w. [sent-82, score-0.92]
30 The feature functions map pronunciations of different lengths along with a proposed word to a vector of fixed dimension in RN. [sent-83, score-0.388]
31 For example, one feature function might measure the Levenshtein dis- tance between the pronunciation p and the canonical pronunciation of the word w. [sent-84, score-1.05]
32 This feature function counts the minimum number of edit operations (insertions, deletions, and substitutions) that are needed to convert the surface pronunciation to the canonical pronunciation; it is low if the surface pronunciation is close to the canonical one and high otherwise. [sent-85, score-1.282]
33 The function f maximizes a score relating the word w to the pronunciation p. [sent-86, score-0.478]
34 We restrict ourselves to scores that are linear in the feature functions, where each φj is scaled by a weight θj : XN Xθjφj(p,w) = θ · φ(p,w), Xj=1 where we have used vector notation for the feature functions φ = (φ1 , . [sent-87, score-0.251]
35 Linearity is not a very strong restriction, since the feature functions can be arbitrarily non-linear. [sent-94, score-0.178]
36 4 Feature functions Before defining the feature functions, we define some notation. [sent-133, score-0.178]
37 We assume we have a pronunciation dictionary, which is a set of words and their baseforms. [sent-145, score-0.442]
38 We access the dictionary through the function pron, which takes a word w ∈ V and returns a set of baseforms. [sent-146, score-0.204]
39 1 TF-IDF feature functions Term frequency (TF) and inverse document frequency (IDF) are measures that have been heavily used in information retrieval to search for documents using word queries (Salton et al. [sent-148, score-0.178]
40 ” In this analogy, we use sub-sequences in surface pronunciations to “search” for baseforms in the dictionary. [sent-152, score-0.385]
41 These 197 features measure the frequency of each n-gram in observed pronunciations of a given word in the training set, along with the discriminative power of the ngram. [sent-153, score-0.308]
42 W :e R Rtheref×ore R have as many TF-IDF feature functions as we have n-grams. [sent-166, score-0.178]
43 The dictionary maps “problem” btloe /pcl p r aa }bc. [sent-172, score-0.235]
44 l b lax m/ and “probably” to /pcl p r aa bcl b liy/, and our input is (p, w) = ( [p r aa b liy] ,problem). [sent-173, score-0.343]
45 2 Length feature function The length feature functions measure how the length of a word’s surface form tends to deviate from the baseform. [sent-180, score-0.427]
46 Phonetic alignment feature functions Beyond the length, we also measure specific phonetic deviations from the dictionary. [sent-192, score-0.401]
47 We define phonetic alignment features that count the (normalized) frequencies of phonetic insertions, phonetic deletions, and substitutions of one surface phone for another baseform phone. [sent-193, score-0.884]
48 Given (p, w), we use dynamic programming to align the surface form p with all of the baseforms of w. [sent-194, score-0.175]
49 The 198 pcl pcl pcl p r aa pcl p r aa p r aa pcl p r aa bcl pcl p bcl b p er b ax er l iy −l iy −−l iy −bcl b− l iy Table 1: Possible alignments of [p r aa pcl p er liy] with two baseforms of “probably” in the dictionary. [sent-201, score-2.252]
50 Define the length o=f t |hper alignment with the k-th baseform as Lk, for 1 ≤ k ≤ Kw. [sent-204, score-0.25]
51 As an example, consider the input pair (p, w) = ([p r aa pcl p er liy] ,probably) and suppose there are two baseforms of the word “probably” in the dictionary. [sent-211, score-0.436]
52 Unlike the TF-IDF feature functions and the length feature functions, the alignment feature functions can assign a non-zero score to words that are not seen at training time (but are in the dictionary), as long as there is a good alignment with their base- forms. [sent-214, score-0.614]
53 The weights given to the alignment features are the analogue of substitution, insertion, and deletion rule probabilities in traditional phone-based pronunciation models such as (Riley et al. [sent-215, score-0.55]
54 4 Dictionary feature function The dictionary feature is an indicator of whether a pronunciation is an exact match to a baseform, which also generalizes to words unseen in training. [sent-219, score-0.733]
55 We define the dictionary feature as φdict(p, w) = 1p∈pron(w). [sent-220, score-0.182]
56 For example, assume there is a baseform /pcl p r aa bcl b liy/ for the word “probably” in the dictionary, and p = /pcl p r aa bcl b liy/. [sent-221, score-0.574]
57 5 Articulatory feature functions Articulatory models represented as dynamic Bayesian networks (DBNs) have been successful in the past on the lexical access task (Livescu and Glass, 2004; Jyothi et al. [sent-224, score-0.237]
58 In such models, pronunciation variation is seen as the result of asynchrony between the articulators (lips, tongue, etc. [sent-226, score-0.669]
59 Given a sequence p and a word w, we use the DBN to produce an alignment at the articulatory level, which is a sequence of 7-tuples, representing the articulatory variables3 lip opening, tongue tip location and opening, tongue body location and opening, velum opening, and glottis opening. [sent-228, score-1.456]
60 The substitution features are similar to the phonetic alignment features in Section 4. [sent-230, score-0.289]
61 3, except that the alignment is not a sequence of pairs but a sequence of 14-tuples (7 for the baseform and 7 for the surface form). [sent-231, score-0.357]
62 The DBN model is based on articulatory phonology (Browman and Goldstein, 1992), in which there are no insertions and deletions, only substitutions (apparent insertions and deletions are accounted for by articulatory asynchrony). [sent-232, score-0.892]
63 Formally, consider the seven sets of articulatory variable values F1, . [sent-233, score-0.35]
64 ,C corint-isciadel,r an articulatory v Laerita Fble = =F { ∈ F. [sent-242, score-0.35]
65 199 is the length of the alignment and ai, bi ∈ F, for 1 ≤ i≤ L. [sent-248, score-0.164]
66 Here the ai are the intended articulatory v1a ≤ria ibl ≤e Lva. [sent-249, score-0.381]
67 The asynchrony features are also extracted from the DBN alignments. [sent-252, score-0.208]
68 Articulators are not always synchronized, which is one cause of pronunciation variation. [sent-253, score-0.442]
69 Formally, we consider two articulatory variables Fh, Fk ∈ F. [sent-255, score-0.35]
70 The average degree of asynchrony is, t fhoern 1 1d ≤efin ie ≤d as async(Fh,Fk) =1LXi=L1(th,i− tk,i). [sent-262, score-0.175]
71 More generally, we compute the average asynchrony between any two sets of variables F1, F2 ⊂ F as async(F1 F2) = , 1LXi=L1|F11|FXh∈F1th,i−|F12|FXk∈F2tk,i. [sent-263, score-0.175]
72 We then define the asynchrony features as φa≤async(F1,F2)≤b = 1a≤async(F1,F2)≤b. [sent-264, score-0.208]
73 Finally, the log-likelihood feature is the DBN alignment score, shifted and scaled so that the value lies between zero and one, φdbn-LL(p,w) = L(p,wc) − h, where L is the log-likelihood function of the DBN, wh hise trhee L shift, aen ldo c ilisk tehlieh socoadle f. [sent-265, score-0.184]
74 5 Experiments All experiments are conducted on a subset of the Switchboard conversational speech corpus that has been labeled at a fine phonetic level (Greenberg et al. [sent-267, score-0.326]
75 , 1996); these phonetic transcriptions are the input to our lexical access models. [sent-268, score-0.207]
76 The data subset, phone set P, and dictionary are the same as ones previously ,u asendd in d (Livescu a anrde Glass, 2004; Jyothi perte al. [sent-269, score-0.181]
77 The dictionary contains 3328 words, consisting of the 5000 most frequent words in Switchboard, excluding ones with fewer than four phones in their baseforms. [sent-271, score-0.158]
78 The baseforms use a similar, slightly smaller phone set (lacking, e. [sent-272, score-0.177]
79 For all of the articulatory DBN features, we use the DBN from (Livescu, 2005) (the one in (Jyothi et al. [sent-277, score-0.35]
80 For the asynchrony features, the articulatory pairs are (F1, F2) ∈ {({tongue tip}, {tongue body}), ({lip opening}, {tongue tip, tongue body}), an(d{ ({lip opening, tongue tip, tongue body}, {glottis, velum})}, as i tno (Livescu, 2005). [sent-279, score-1.05]
81 oTdhye} parameters (a, b) o}f, athse length sacnud, asynchrony features are drawn from (a, b) ∈ {(−3, −2), (−2, −1), . [sent-280, score-0.243]
82 For a second baseline, we calculate the Levenshtein (0-1 edit) distance between the input pronunciation and each dictionary baseform, and predict the word corresponding to the baseform closest to the input. [sent-294, score-0.691]
83 The set ALL contains DP+ and the articulatory DBN features. [sent-301, score-0.35]
84 Table 2 shows the best previous result on this data set from the articulatory model ofJyothi et al. [sent-306, score-0.35]
85 The remaining rows of Table 2 give results with our feature functions and various learning algorithms. [sent-309, score-0.178]
86 The feature selection experiments in Figure 2 shows that the TF-IDF features alone are quite weak, while the dynamic programming alignment features alone are quite good. [sent-335, score-0.214]
87 In the figure, phone bigram TF-IDF is labeled p2; phonetic alignment with dynamic programming is labeled DP. [sent-343, score-0.295]
88 extension of the model and learning algorithm to word sequences and (2) feature functions that relate acoustic measurements to sub-word units. [sent-345, score-0.211]
89 To incorporate acoustics, we can use feature functions based on classifiers of sub-word units, similarly to previous work on CRF-based speech recognition (Gunawardana et al. [sent-348, score-0.336]
90 Additional extensions include new feature functions, such as context-sensitive alignment features, and joint inference and learning of the align- ment models embedded in the feature functions. [sent-359, score-0.221]
91 Special issue on modeling pronunciation variation for automatic speech recognition. [sent-369, score-0.559]
92 A dynamic Bayesian framework to model context and memory in edit distance learning: An application to pronunciation classification. [sent-391, score-0.477]
93 Lexical access to large vocabularies for speech recog- nition. [sent-400, score-0.176]
94 Insights into spoken language gleaned from phonetic transcription of the Switchboard corpus. [sent-426, score-0.18]
95 What kind of pronunciation variation is hard for triphones to model? [sent-483, score-0.442]
96 Fabricating conversational speech data with acoustic models : A program to examine model-data mismatch. [sent-546, score-0.211]
97 A factored conditional random field model for articulatory feature forced transcription. [sent-560, score-0.5]
98 Pronunciation change in conversational speech and its implications for automatic speech recognition. [sent-598, score-0.295]
99 Discriminative pronunciation learning using phonetic decoder and minimum-classification-error criterion. [sent-643, score-0.59]
100 Speech recognition with segmental conditional random fields: A summary of the JHU CLSP 2010 summer workshop. [sent-675, score-0.164]
wordName wordTfidf (topN-words)
[('pronunciation', 0.442), ('articulatory', 0.35), ('pronunciations', 0.21), ('jyothi', 0.21), ('asynchrony', 0.175), ('tongue', 0.175), ('livescu', 0.167), ('pegasos', 0.153), ('dbn', 0.152), ('phonetic', 0.148), ('baseform', 0.14), ('pcl', 0.14), ('aa', 0.126), ('speech', 0.117), ('dictionary', 0.109), ('acoustics', 0.107), ('baseforms', 0.105), ('functions', 0.105), ('switchboard', 0.091), ('bcl', 0.091), ('lip', 0.087), ('glass', 0.083), ('zweig', 0.076), ('opening', 0.075), ('alignment', 0.075), ('keshet', 0.073), ('feature', 0.073), ('phone', 0.072), ('surface', 0.07), ('async', 0.07), ('tip', 0.07), ('loss', 0.069), ('discriminative', 0.065), ('epoch', 0.065), ('iy', 0.064), ('crf', 0.061), ('liy', 0.061), ('conversational', 0.061), ('access', 0.059), ('riley', 0.058), ('icassp', 0.058), ('canonical', 0.057), ('pi', 0.054), ('bi', 0.054), ('articulators', 0.052), ('filali', 0.052), ('levenshtein', 0.052), ('deletions', 0.05), ('substitutions', 0.05), ('phones', 0.049), ('probably', 0.047), ('signal', 0.047), ('insertions', 0.046), ('segmental', 0.046), ('ristad', 0.046), ('idf', 0.043), ('conditional', 0.042), ('dict', 0.042), ('greenberg', 0.042), ('recognition', 0.041), ('pron', 0.039), ('asru', 0.039), ('units', 0.037), ('tsochantaridis', 0.037), ('sequence', 0.036), ('function', 0.036), ('random', 0.035), ('edit', 0.035), ('bourlard', 0.035), ('browman', 0.035), ('confusability', 0.035), ('fissore', 0.035), ('glottis', 0.035), ('gunawardana', 0.035), ('hazen', 0.035), ('holter', 0.035), ('idfu', 0.035), ('korkmazskiy', 0.035), ('mcallaster', 0.035), ('nasalization', 0.035), ('prabhavalkar', 0.035), ('schramm', 0.035), ('scrfs', 0.035), ('velum', 0.035), ('venkataramani', 0.035), ('vinyals', 0.035), ('pa', 0.035), ('length', 0.035), ('lookup', 0.033), ('acoustic', 0.033), ('features', 0.033), ('dp', 0.033), ('er', 0.033), ('body', 0.032), ('spoken', 0.032), ('suppose', 0.032), ('icslp', 0.032), ('ai', 0.031), ('wi', 0.031), ('fold', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999869 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
Author: Hao Tang ; Joseph Keshet ; Karen Livescu
Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.
2 0.24717638 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein
Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.
3 0.12066844 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
Author: Chia-ying Lee ; James Glass
Abstract: We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers sub-word units that are highly correlated with English phones and also produces better segmentation than the state-of-the-art unsupervised baseline. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baselines, our model improves the relative precision of top hits by at least 22.1% and outper- forms a language-mismatched acoustic model.
4 0.087479562 7 acl-2012-A Computational Approach to the Automation of Creative Naming
Author: Gozde Ozbal ; Carlo Strapparava
Abstract: In this paper, we propose a computational approach to generate neologisms consisting of homophonic puns and metaphors based on the category of the service to be named and the properties to be underlined. We describe all the linguistic resources and natural language processing techniques that we have exploited for this task. Then, we analyze the performance of the system that we have developed. The empirical results show that our approach is generally effective and it constitutes a solid starting point for the automation ofthe naming process.
5 0.081000686 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
6 0.071490891 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
7 0.068384796 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
8 0.065781631 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
9 0.065081246 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
10 0.059670419 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
11 0.059131116 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
12 0.058104374 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
13 0.055375788 140 acl-2012-Machine Translation without Words through Substring Alignment
14 0.054749176 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
15 0.054072756 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
16 0.053354863 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
17 0.052687831 195 acl-2012-The Creation of a Corpus of English Metalanguage
18 0.051500618 103 acl-2012-Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation
19 0.04991379 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
20 0.049658131 56 acl-2012-Computational Approaches to Sentence Completion
topicId topicWeight
[(0, -0.162), (1, 0.002), (2, -0.01), (3, -0.004), (4, -0.025), (5, 0.13), (6, 0.048), (7, -0.005), (8, -0.012), (9, -0.008), (10, -0.094), (11, -0.088), (12, -0.082), (13, 0.036), (14, -0.046), (15, 0.022), (16, 0.098), (17, 0.103), (18, 0.062), (19, 0.095), (20, -0.005), (21, -0.152), (22, -0.097), (23, -0.064), (24, -0.188), (25, -0.052), (26, 0.099), (27, -0.064), (28, 0.05), (29, 0.05), (30, 0.119), (31, 0.135), (32, 0.153), (33, 0.004), (34, 0.051), (35, -0.047), (36, 0.123), (37, -0.044), (38, 0.158), (39, 0.043), (40, 0.132), (41, -0.039), (42, 0.035), (43, -0.015), (44, -0.159), (45, 0.005), (46, -0.077), (47, -0.165), (48, 0.111), (49, -0.127)]
simIndex simValue paperId paperTitle
same-paper 1 0.92846823 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
Author: Hao Tang ; Joseph Keshet ; Karen Livescu
Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.
2 0.83505428 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein
Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.
3 0.53939319 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
Author: Chia-ying Lee ; James Glass
Abstract: We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers sub-word units that are highly correlated with English phones and also produces better segmentation than the state-of-the-art unsupervised baseline. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baselines, our model improves the relative precision of top hits by at least 22.1% and outper- forms a language-mismatched acoustic model.
4 0.50670862 7 acl-2012-A Computational Approach to the Automation of Creative Naming
Author: Gozde Ozbal ; Carlo Strapparava
Abstract: In this paper, we propose a computational approach to generate neologisms consisting of homophonic puns and metaphors based on the category of the service to be named and the properties to be underlined. We describe all the linguistic resources and natural language processing techniques that we have exploited for this task. Then, we analyze the performance of the system that we have developed. The empirical results show that our approach is generally effective and it constitutes a solid starting point for the automation ofthe naming process.
5 0.40136313 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater
Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.
7 0.37795982 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
8 0.37369359 42 acl-2012-Bootstrapping via Graph Propagation
9 0.37292117 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
10 0.36525929 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
11 0.36445937 112 acl-2012-Humor as Circuits in Semantic Networks
12 0.36160886 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
13 0.35227424 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
14 0.34919241 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation
15 0.34263152 120 acl-2012-Information-theoretic Multi-view Domain Adaptation
16 0.33839193 39 acl-2012-Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum
17 0.33687416 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
18 0.33586136 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech
19 0.329936 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries
20 0.31639671 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis
topicId topicWeight
[(25, 0.018), (26, 0.041), (28, 0.036), (30, 0.011), (31, 0.35), (37, 0.032), (39, 0.045), (59, 0.015), (74, 0.036), (82, 0.022), (84, 0.029), (85, 0.033), (90, 0.144), (92, 0.047), (94, 0.026), (99, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.72711831 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
Author: Hao Tang ; Joseph Keshet ; Karen Livescu
Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.
2 0.69333613 58 acl-2012-Coreference Semantics from Web Features
Author: Mohit Bansal ; Dan Klein
Abstract: To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution.
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
4 0.45822495 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Author: Weiwei Sun ; Hans Uszkoreit
Abstract: From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by constituent parsing and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated approaches yield a relative error reduction of 18% in total over a stateof-the-art baseline.
5 0.45333844 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
Author: Spence Green ; John DeNero
Abstract: When automatically translating from a weakly inflected source language like English to a target language with richer grammatical features such as gender and dual number, the output commonly contains morpho-syntactic agreement errors. To address this issue, we present a target-side, class-based agreement model. Agreement is promoted by scoring a sequence of fine-grained morpho-syntactic classes that are predicted during decoding for each translation hypothesis. For English-to-Arabic translation, our model yields a +1.04 BLEU average improvement over a state-of-the-art baseline. The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. 1
6 0.45262334 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
7 0.45237228 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
8 0.45106158 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
9 0.45103082 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
10 0.45073518 140 acl-2012-Machine Translation without Words through Substring Alignment
11 0.45053822 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
12 0.44967031 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
13 0.44927621 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
14 0.44914144 167 acl-2012-QuickView: NLP-based Tweet Search
15 0.44859797 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
16 0.44784692 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
17 0.44770947 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
18 0.44748753 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
19 0.44727048 136 acl-2012-Learning to Translate with Multiple Objectives
20 0.44725585 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT