acl acl2012 acl2012-41 knowledge-graph by maker-knowledge-mining

41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition


Source: pdf

Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein

Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. [sent-7, score-0.473]

2 Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. [sent-8, score-0.573]

3 We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. [sent-9, score-1.029]

4 The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. [sent-10, score-0.282]

5 We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. [sent-11, score-0.657]

6 In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation. [sent-12, score-0.328]

7 1 Introduction Infants acquiring their first language confront two difficult cognitive problems: building a lexicon of word forms, and learning basic phonetics and phonology. [sent-13, score-0.32]

8 represented (a) using a canonical phonemic encoding for each word and (b) as they might be pronounced phonetically. [sent-17, score-0.203]

9 For instance, if an infant who already knows the word [ju] “you” encounters a new word [j@], they must decide whether it is a new lexical item or a variant of the word they already know. [sent-21, score-0.262]

10 Evidence for the correct conclusion comes from the pronunciation (many English vowels are reduced to [@] in unstressed positions) and the context—if the next word is “want”, “you” is a plausible choice. [sent-22, score-0.195]

11 To date, most models of infant language learning have focused on either lexicon-building or phonetic learning in isolation. [sent-23, score-0.393]

12 Based on this evidence, a more realistic model of early language acquisition should propose a method of inferring the intended forms (Figure 1a) from the unsegmented surface forms (1c) while also learning a model of phonetic variation relating the intended and surface forms (a) and (b). [sent-31, score-1.784]

13 , 2009; R¨asa¨nen, 2011) or have modeled variability only in vowels (Feldman et al. [sent-33, score-0.221]

14 , 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model). [sent-34, score-0.275]

15 Our main contribution is a joint lexicalphonetic model that infers intended forms from segmented surface forms; we test the system using input with either gold standard word boundaries or boundaries induced by an existing unsupervised segmentation model (Goldwater et al. [sent-35, score-1.126]

16 We show that in both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each intended form has a unique surface form. [sent-37, score-0.743]

17 Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability. [sent-38, score-0.859]

18 But unlike speech recognition, we have no hintended-form, surface-formi training pairs to trainh the phonetic model, nor even a dictionary of intended-form strings to train the language model. [sent-39, score-0.368]

19 , a surface phone is likely to share articulatory features with the intended phone) and use a bootstrapping process to iteratively infer the intended forms and retrain the language model and noise model. [sent-42, score-1.054]

20 While we do not claim that the particular inference mechanism we use is cognitively plausible, our positive results further support the claim that infants can and do acquire phonetics and the lexicon in concert. [sent-43, score-0.466]

21 They extend a model for 185 clustering acoustic tokens into phonetic categories (Vallabha et al. [sent-46, score-0.612]

22 , 2007) by adding a lexical level that simultaneously clusters word tokens (which contain the acoustic tokens) into lexical entries. [sent-47, score-0.336]

23 Including the lexical level improves the model’s phonetic categorization, and a follow-up study on artificial language learning (Feldman, 2011) supports the claim that human learners use lexical knowledge to distinguish meaningful from unimportant phonetic contrasts. [sent-48, score-0.793]

24 (2009) use a real-valued representation for vowels (formant values), but assume no variability in consonants, and treat each word token independently. [sent-50, score-0.346]

25 To our knowledge, the only other lexicon-building systems that also learn about phonetic variability are those of Driesen et al. [sent-52, score-0.509]

26 These systems learn to represent lexical items and their variability from a discretized representation of the speech stream, but they are tested on an artificial corpus with only 80 vocabulary items that was constructed so as to “avoid strong word-to-word dependencies” (R¨asa¨nen, 2011). [sent-54, score-0.373]

27 In addition to the models mentioned in Section 1, which use phonemic input, a few models of word segmentation have been tested using phonetic input (Fleck, 2008; Rytting, 2007; Daland and Pierrehumbert, 2010). [sent-64, score-0.6]

28 However, they do not cluster segmented Figure 2: Our generative model of the surface tokens s from intended tokens x, which occur with left and right contexts land r. [sent-65, score-0.759]

29 word tokens into lexical items (none of these models even maintains an explicit lexicon), nor do they model or learn from phonetic variation in the input. [sent-66, score-0.656]

30 Our observed data consists of a (segmented) sequence of surface words s1 . [sent-72, score-0.205]

31 We wish to recover the corresponding sequence of intended words x1 . [sent-76, score-0.256]

32 As shown in Figure 2, si is produced from xi by a transducer T: si ∼ T(xi), which models phonetic changes. [sent-80, score-0.814]

33 Each xi ∼is T sampled from a distribution θ which represents word frequencies, and its left and right context words, li and ri, are drawn from distributions conditioned on xi, in order to capture information about the environments in which xi appears: li ∼ PL(xi) , ri ∼ PR(xi). [sent-81, score-0.442]

34 186 Our generative model of xi is unusual for two reasons. [sent-83, score-0.215]

35 First, we treat each xi independently rather than linking them via a Markov chain. [sent-84, score-0.176]

36 During inference, however, we will never compute the joint probability of all the data at once, only the probabilities of subsets of the variables with particular intended word forms u and v. [sent-86, score-0.444]

37 We make this independence assumption for computational reasons—when deciding whether to merge u and v into a single lexical entry, we compute the change in estimated probability for their contexts, but not the effect on other words for which u and v themselves appear as context words. [sent-88, score-0.234]

38 A weighted finite-state transducer (WFST) is a variant of a finitestate automaton (Pereira et al. [sent-93, score-0.311]

39 It contains a state for each triplet of (previous, current, next) phones; conditioned on this state, it emits a character output which can be thought of as a possible surface realization of current in its particular environment. [sent-97, score-0.242]

40 The transducer is parameterized by the probabilities of the arcs. [sent-102, score-0.311]

41 Following Figure 3: The fragment of the transducer responsible for input string [Di] “the”. [sent-104, score-0.365]

42 The model features are based on articulatory phonetics and distinguish three dimensions of sound production: voicing, place of articulation and manner of articulation. [sent-112, score-0.351]

43 →Each template is →instantiated once per articulatory dimension, with prev, curr, next and out replaced by their values for that dimension: for instance, there are two voicing values, voiced and unvoiced1 and the (curr)→out template for [D] producing [d] would be ins→tantiated as (voiced)→voiced. [sent-114, score-0.325]

44 To capture trends specific to particular sounds, each template is instantiated again using the actual symbol for curr and articulatory values for everything else (e. [sent-115, score-0.434]

45 There are also faithfulness features, same-sound, same-voice, same-place and same-manner which check if curr is exactly identical to out or shares the exact value of a particular feature. [sent-119, score-0.241]

46 We begin with a simple initial transducer and alternate between two phases: clustering together surface forms, and reestimating the transducer parameters. [sent-130, score-0.879]

47 In our clustering phase, we improve the model posterior as much as possible by greedily making type merges, where, for a pair of intended word forms u and v, we replace all instances of xi = u with xi = v. [sent-132, score-0.899]

48 We maintain the invariant that each intended word form’s most common surface form must be itself; this biases the model toward solutions with low distortion in the transducer. [sent-133, score-0.55]

49 1 Scoring merges We write the change in the log posterior probability of the model resulting from a type merge of u to v as ∆(u, v), which factors into two terms, one depending on the surface string and the transducer, and the other depending on the string ofintended words. [sent-135, score-0.687]

50 In order to ensure that each intended word form’s most common surface form is itself, we define ∆(u, v) = −∞ if u issu more common stehlaf,n v. [sent-136, score-0.511]

51 If we merge u into v, we no longer need to produce any surface forms from u, but instead we must derive them from v. [sent-138, score-0.418]

52 If #(·) counts the occurrences of some event in the cu#rre(·n)t state of the model, the transducer component of ∆ is: ∆T = X#(xi=u,si=s)(T(s|v) − T(s|u)) (1) Xs This term is typically negative, voting against a merge, since u is more similar to itself than to v. [sent-139, score-0.348]

53 We deal first w|xith the| p(xi) unigram term, considering all tokens where xi ∈ {u, v} and computing the probability pu = p(xi = u|xi ∈ {u, v}). [sent-142, score-0.334]

54 The change in log-probability resulting f)rom the merge is closely related to the entropy of the distribution: ∆U = −#(xi=u) log(pu) −#(xi=v) log(pv) (3) This change must be positive and favors merging. [sent-145, score-0.19]

55 Next, we consider the change in probability from the left contexts (the derivations for right contexts are equivalent). [sent-146, score-0.17]

56 Because neither the transducer nor the language model are perfect models of the true distribution, they can have incompatible dynamic ranges. [sent-153, score-0.35]

57 Often, the transducer distribution is too peaked; to remedy this, we downweight the transducer probability by λ, a parameter of our model, which we set to . [sent-154, score-0.703]

58 Downweighting of the acoustic model versus the LM is typical in speech recognition (Bahl et al. [sent-156, score-0.192]

59 The transducer regularization r = 1and unigram prior α = 2, which we set ad-hoc, have little impact on performance. [sent-159, score-0.311]

60 The Kneser-Ney discount d = 2 and transducer downweight λ = . [sent-160, score-0.404]

61 2 Clustering algorithm In the clustering phase, we start with an initial solution in which each surface form is its own intended pronunciation and iteratively improve this solution by merging together word types, picking (approximately) the best merger at each point. [sent-163, score-0.77]

62 We begin by computing a set of candidate mergers for each surface word type u. [sent-164, score-0.299]

63 This step saves time by quickly rejecting mergers which are certain to get very low transducer scores. [sent-165, score-0.355]

64 At each step of the algorithm, we pop the u with the current best ∆∗ (u), recompute its scores, and then merge it with v∗ (u) if doing so would improve the model posterior. [sent-171, score-0.227]

65 (If the best merge would not improve the probability, we reject it, but since its score might increase if we merge v∗ (u), we leave u in the queue, setting its ∆ score to −∞; this score will be updated if we merge v∗ (u). [sent-175, score-0.336]

66 ) Since we recompute the exact scores ∆(u, v) immediately before merging u, the algorithm is guaran3The transducer scores can be cached since they depend only on surface forms, but the language model scores cannot. [sent-176, score-0.694]

67 3 Training the transducer To train the transducer on a set of mappings between surface and intended forms, we find the maximumprobability state sequence for each mapping (another application of Viterbi EM) and extract features for each state and its output. [sent-181, score-1.157]

68 4 To construct our initial transducer, we first learn weights for the marginal distribution on surface sounds by training the max-ent system with only the bias features active. [sent-183, score-0.261]

69 Brent and Cartwright (1996) created a phonemic version of this corpus by extracting all infant-directed utterances and converted them to a phonemic transcription using a dictionary. [sent-188, score-0.399]

70 This version, which contains 9790 utterances (33399 tokens, 1321 types), is now standard for word segmentation, but contains no phonetic variability. [sent-189, score-0.426]

71 Since producing a close phonetic transcription of this data would be impractical, we instead construct an approximate phonetic version using information from the Buckeye corpus (Pitt et al. [sent-190, score-0.698]

72 0 same-sound 5 same-{place,voice, manner} 2 isnasmeret-i{opnl -3 Table 1: Initial transducer weights. [sent-194, score-0.311]

73 To create our phonetic corpus, we replace each phonemic word in the Bernstein-Ratner-Brent corpus with a phonetic pronunciation of that word sampled from the empirical distribution of pronunciations in Buckeye (Table 2). [sent-197, score-1.086]

74 If the word never occurs in Buckeye, we use the original phonemic version. [sent-198, score-0.203]

75 Since each pronunciation is sampled independently, it lacks coarticulation and prosodic effects, and the distribution of pronunciations is derived from adult-directed rather than child-directed speech. [sent-200, score-0.179]

76 Nonetheless, it represents phonetic variability more realistically than the BernsteinRatner-Brent corpus, while still maintaining the lexi- cal characteristics of infant-directed speech (as compared to the Buckeye corpus, with its much larger vocabulary and more complex language model). [sent-201, score-0.55]

77 (2009): word boundary F-score, word token F-score, and lexicon (word type) F-score. [sent-206, score-0.275]

78 In our first set of experiments we evaluate how well our system clusters together surface forms derived from the same intended form, assuming gold standard word boundaries. [sent-208, score-0.652]

79 We do not evaluate the induced intended forms directly against the gold standard intended forms—we want to evaluate cluster memberships and not labels. [sent-209, score-0.701]

80 Instead we compute a one-to-one mapping between our induced lexical items and the gold standard, maximizing the agreement between the two (Haghighi and Klein, 2006). [sent-210, score-0.186]

81 Using this mapping, we compute mapped token Fscore5 and lexicon F-score. [sent-211, score-0.175]

82 We report the standard word boundary F and unlabeled word token F as well as mapped F. [sent-213, score-0.175]

83 The unlabeled token score counts correctly segmented tokens, whether assigned a correct intended form or not. [sent-214, score-0.379]

84 3 Known word boundaries We first run our system with known word boundaries (Table 3). [sent-216, score-0.33]

85 As a baseline, we treat every surface token as its own intended form (none). [sent-217, score-0.536]

86 This baseline has fairly high accuracy; 65% of word tokens receive the most common pronunciation for their intended As an upper bound, we find the best intended form for each surface type (type ubound). [sent-218, score-0.955]

87 This correctly resolves 91% of tokens; the remaining error is due to homophones (surface types corresponding to more than one intended form). [sent-219, score-0.256]

88 6 5When using the gold word boundaries, the precision and recall are equal and this is is the same as the accuracy; in segmentation experiments the two differ, because with fewer segmentation boundaries, the system proposes fewer tokens. [sent-221, score-0.23]

89 4 Unknown word boundaries As a simple extension of our model to the case of unknown word boundaries, we interleave it with an existing model of word segmentation, dpseg (Gold191 water et al. [sent-253, score-0.387]

90 We then concatenate the intended word sequence proposed by our model to produce the next iteration’s segmenter input. [sent-256, score-0.381]

91 Using induced word boundaries also makes it harder to recover the lexicon (Table 5), lowering the baseline F-score from 67% to 43%. [sent-259, score-0.313]

92 Nevertheless, our system improves the lexicon F-score to 46%, with token F rising from 44% to 49%, demonstrating the system’s ability to work without gold word boundaries. [sent-260, score-0.265]

93 6 Conclusion We have presented a noisy-channel model that simultaneously learns a lexicon, a bigram language model, and a model of phonetic variation, while using only the noisy surface forms as training data. [sent-262, score-0.747]

94 It is the first model of lexical-phonetic acquisition to include word-level context and to be tested on an infant-directed corpus with realistic phonetic variability. [sent-263, score-0.46]

95 Whether trained using gold standard or automatically induced word boundaries, the model recovers lexical items more effectively than a system that assumes no phonetic variability; moreover, the use of word-level context is key to the model’s success. [sent-264, score-0.602]

96 Ultimately, we hope to extend the model to jointly infer word boundaries along with lexical-phonetic knowledge, and to work directly from acoustic input. [sent-265, score-0.316]

97 However, we have already shown that lexical-phonetic learning from a broad-coverage corpus is possible, supporting the claim that infants acquire lexical and phonetic knowledge simultaneously. [sent-266, score-0.578]

98 OpenFst: A general and efficient weighted finite-state transducer library. [sent-284, score-0.311]

99 Testing the robustness of online word segmentation: effects of linguistic diversity and phonetic variation. [sent-313, score-0.377]

100 A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events. [sent-436, score-0.312]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('phonetic', 0.327), ('transducer', 0.311), ('intended', 0.256), ('surface', 0.205), ('curr', 0.197), ('variability', 0.182), ('xi', 0.176), ('articulatory', 0.153), ('feldman', 0.153), ('phonemic', 0.153), ('buckeye', 0.131), ('infants', 0.122), ('boundaries', 0.115), ('goldwater', 0.115), ('phonetics', 0.114), ('acoustic', 0.112), ('merge', 0.112), ('pronunciation', 0.106), ('forms', 0.101), ('lexicon', 0.1), ('tokens', 0.082), ('asa', 0.076), ('recompute', 0.076), ('token', 0.075), ('pronunciations', 0.073), ('segmentation', 0.07), ('driesen', 0.066), ('dupoux', 0.066), ('infant', 0.066), ('phonotactic', 0.066), ('merging', 0.063), ('variation', 0.06), ('sharon', 0.057), ('hayes', 0.057), ('naturalistic', 0.057), ('nen', 0.057), ('cognitive', 0.056), ('sounds', 0.056), ('string', 0.054), ('optimality', 0.052), ('prev', 0.052), ('log', 0.052), ('clustering', 0.052), ('items', 0.052), ('realistic', 0.052), ('word', 0.05), ('posterior', 0.049), ('phonology', 0.049), ('discount', 0.049), ('utterances', 0.049), ('wan', 0.049), ('induced', 0.048), ('segmented', 0.048), ('claim', 0.047), ('contexts', 0.047), ('merges', 0.046), ('brent', 0.046), ('lexical', 0.046), ('template', 0.045), ('sound', 0.045), ('transcription', 0.044), ('allahverdyan', 0.044), ('boruta', 0.044), ('daland', 0.044), ('downweight', 0.044), ('dpseg', 0.044), ('faithfulness', 0.044), ('fleck', 0.044), ('ilcc', 0.044), ('kuki', 0.044), ('mcinnes', 0.044), ('mergers', 0.044), ('vallabha', 0.044), ('voiced', 0.044), ('bootstrapping', 0.044), ('acquisition', 0.042), ('speech', 0.041), ('viterbi', 0.041), ('gold', 0.04), ('ri', 0.04), ('change', 0.039), ('vowels', 0.039), ('instantiated', 0.039), ('pu', 0.039), ('model', 0.039), ('childes', 0.038), ('bahl', 0.038), ('dreyer', 0.038), ('merger', 0.038), ('voicing', 0.038), ('pl', 0.038), ('transcribed', 0.038), ('probability', 0.037), ('state', 0.037), ('segmenter', 0.036), ('acquire', 0.036), ('bigram', 0.036), ('bayesian', 0.036), ('pitt', 0.035), ('varadarajan', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein

Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

2 0.24717638 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Author: Hao Tang ; Joseph Keshet ; Karen Livescu

Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.

3 0.19633916 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

Author: Chia-ying Lee ; James Glass

Abstract: We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers sub-word units that are highly correlated with English phones and also produces better segmentation than the state-of-the-art unsupervised baseline. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baselines, our model improves the relative precision of top hits by at least 22.1% and outper- forms a language-mismatched acoustic model.

4 0.17299108 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

5 0.090184063 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

Author: Wei Lu ; Dan Roth

Abstract: This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. The model takes in coarse mention and type information and predicts argument roles for a given event template. This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. We establish and discuss connections between this framework and other existing works. We show empirically that the structured preferences are crucial to the success of our task. Our model, trained without annotated data and with a small number of structured preferences, yields performance competitive to some baseline supervised approaches.

6 0.087674186 7 acl-2012-A Computational Approach to the Automation of Creative Naming

7 0.087328777 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation

8 0.082166478 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

9 0.080673866 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

10 0.076201193 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction

11 0.076013319 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

12 0.072154246 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

13 0.069011807 210 acl-2012-Unsupervized Word Segmentation: the Case for Mandarin Chinese

14 0.068468124 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

15 0.068464711 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

16 0.068327479 140 acl-2012-Machine Translation without Words through Substring Alignment

17 0.068133742 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions

18 0.066581033 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

19 0.066038549 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

20 0.065313607 207 acl-2012-Unsupervised Morphology Rivals Supervised Morphology for Arabic MT


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.217), (1, 0.025), (2, -0.035), (3, 0.007), (4, -0.048), (5, 0.173), (6, 0.056), (7, -0.013), (8, 0.027), (9, 0.01), (10, -0.133), (11, -0.121), (12, -0.115), (13, -0.003), (14, -0.072), (15, -0.012), (16, 0.043), (17, 0.088), (18, 0.04), (19, 0.165), (20, 0.004), (21, -0.146), (22, -0.121), (23, -0.051), (24, -0.233), (25, -0.077), (26, 0.171), (27, -0.035), (28, 0.097), (29, 0.06), (30, 0.089), (31, 0.156), (32, 0.123), (33, 0.018), (34, 0.008), (35, -0.104), (36, 0.098), (37, -0.028), (38, 0.165), (39, 0.017), (40, 0.053), (41, -0.035), (42, 0.067), (43, 0.029), (44, -0.09), (45, -0.063), (46, -0.001), (47, -0.074), (48, 0.058), (49, -0.087)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93174314 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein

Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

2 0.86138254 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Author: Hao Tang ; Joseph Keshet ; Karen Livescu

Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.

3 0.70045894 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

Author: Chia-ying Lee ; James Glass

Abstract: We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers sub-word units that are highly correlated with English phones and also produces better segmentation than the state-of-the-art unsupervised baseline. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baselines, our model improves the relative precision of top hits by at least 22.1% and outper- forms a language-mismatched acoustic model.

4 0.53944248 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

5 0.52654088 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition

Author: Khe Chai Sim

Abstract: This paper presents a probabilistic framework that combines multiple knowledge sources for Haptic Voice Recognition (HVR), a multimodal input method designed to provide efficient text entry on modern mobile devices. HVR extends the conventional voice input by allowing users to provide complementary partial lexical information via touch input to improve the efficiency and accuracy of voice recognition. This paper investigates the use of the initial letter of the words in the utterance as the partial lexical information. In addition to the acoustic and language models used in automatic speech recognition systems, HVR uses the haptic and partial lexical models as additional knowledge sources to reduce the recognition search space and suppress confusions. Experimental results show that both the word error rate and runtime factor can be re- duced by a factor of two using HVR.

6 0.52172613 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

7 0.49437398 7 acl-2012-A Computational Approach to the Automation of Creative Naming

8 0.45629311 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation

9 0.43163779 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

10 0.39474356 42 acl-2012-Bootstrapping via Graph Propagation

11 0.38362569 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

12 0.37581331 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis

13 0.36790758 210 acl-2012-Unsupervized Word Segmentation: the Case for Mandarin Chinese

14 0.35974193 112 acl-2012-Humor as Circuits in Semantic Networks

15 0.35911545 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

16 0.3512238 39 acl-2012-Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum

17 0.34660167 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

18 0.34107491 195 acl-2012-The Creation of a Corpus of English Metalanguage

19 0.34015319 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

20 0.32962903 163 acl-2012-Prediction of Learning Curves in Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.013), (26, 0.306), (28, 0.042), (30, 0.025), (31, 0.014), (37, 0.031), (39, 0.095), (59, 0.016), (74, 0.035), (82, 0.029), (84, 0.029), (85, 0.029), (90, 0.123), (92, 0.038), (94, 0.019), (99, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97325134 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

Author: Jennifer Williams ; Graham Katz

Abstract: We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of the verb. Summary statistics for 483 verb lemmas and their typical habit and episode durations has been compiled and made available. This automatically generated duration information is broadly comparable to hand-annotation. 1

2 0.94146991 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

Author: Rafal Rak ; BalaKrishna Kolluru ; Sophia Ananiadou

Abstract: Argo is a web-based NLP and text mining workbench with a convenient graphical user interface for designing and executing processing workflows of various complexity. The workbench is intended for specialists and nontechnical audiences alike, and provides the ever expanding library of analytics compliant with the Unstructured Information Management Architecture, a widely adopted interoperability framework. We explore the flexibility of this framework by demonstrating workflows involving three processing components capable of performing self-contained machine learning-based tagging. The three components are responsible for the three distinct tasks of 1) generating observations or features, 2) training a statistical model based on the generated features, and 3) tagging unlabelled data with the model. The learning and tagging components are based on an implementation of conditional random fields (CRF); whereas the feature generation component is an analytic capable of extending basic token information to a comprehensive set of features. Users define the features of their choice directly from Argo’s graphical interface, without resorting to programming (a commonly used approach to feature engineering). The experimental results performed on two tagging tasks, chunking and named entity recognition, showed that a tagger with a generic set of features built in Argo is capable of competing with taskspecific solutions. 121

3 0.93587524 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering

Author: Nikhil Garg ; James Henserdon

Abstract: We propose a probabilistic generative model for unsupervised semantic role induction, which integrates local role assignment decisions and a global role ordering decision in a unified model. The role sequence is divided into intervals based on the notion of primary roles, and each interval generates a sequence of secondary roles and syntactic constituents using local features. The global role ordering consists of the sequence of primary roles only, thus making it a partial ordering.

4 0.90317398 134 acl-2012-Learning to Find Translations and Transliterations on the Web

Author: Joseph Z. Chang ; Jason S. Chang ; Roger Jyh-Shing Jang

Abstract: Jason S. Chang Department of Computer Science, National Tsing Hua University 101, Kuangfu Road, Hsinchu, 300, Taiwan j s chang@ c s .nthu . edu .tw Jyh-Shing Roger Jang Department of Computer Science, National Tsing Hua University 101, Kuangfu Road, Hsinchu, 300, Taiwan j ang@ c s .nthu .edu .tw identifying such translation counterparts Web, we can cope with the OOV problem. In this paper, we present a new method on the for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small set of terms and translations to obtain mixed-code snippets from a search engine, and automatically annotating the snippets with tags and features for training a conditional random field model. At runtime, the model is used to extracting translation candidates for a given term. Preliminary experiments and evaluation show our method cleanly combining various features, resulting in a system that outperforms previous work. 1

same-paper 5 0.8457405 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein

Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

6 0.71819675 187 acl-2012-Subgroup Detection in Ideological Discussions

7 0.68295211 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

8 0.6591152 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

9 0.65746284 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

10 0.6414662 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

11 0.63636559 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

12 0.63481486 83 acl-2012-Error Mining on Dependency Trees

13 0.63233346 47 acl-2012-Chinese Comma Disambiguation for Discourse Analysis

14 0.63230777 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

15 0.63153529 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

16 0.63088894 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

17 0.63050902 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

18 0.63025308 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

19 0.62712067 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

20 0.62666059 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model