emnlp emnlp2013 emnlp2013-122 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif
Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102
Reference: text
sentIndex sentText sentNum sentScore
1 In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. [sent-4, score-0.469]
2 Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. [sent-5, score-1.058]
3 We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. [sent-8, score-0.563]
4 Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. [sent-9, score-1.058]
5 With the motivation of spurring further research in this genre, we apply stochastic transduction grammar induction algorithms to address some of the modeling issues in song lyrics. [sent-11, score-0.468]
6 An ideal starting point for this investigation is hip hop, a genre that emphasizes rapping, spoken or chanted rhyming lyrics against strong beats or simple melodies. [sent-12, score-0.984]
7 Hip hop lyrics, in contrast to poetry and other genres of music, present a significant number ofchallenges for learning as it lacks well-defined structure in terms of rhyme scheme, meter, or overall meaning making it an interesting genre to bring to light some of the less studied modeling issues. [sent-13, score-0.816]
8 The domain of hip hop lyrics is particularly unstructured when compared to classical poetry, a domain on which statistical methods have been applied in the past. [sent-14, score-1.079]
9 Hip hop lyrics are unstructured in the sense that a very high degree of variation is permitted in the meter of the lyrics, and large amounts of colloquial vocabulary and slang from the subculture are employed. [sent-15, score-0.732]
10 The variance in the permitted meter makes it hard to make any assumptions about the stress patterns of verses in order to identify the rhyming words used when generating output. [sent-16, score-0.44]
11 The broad range of unorthodox vocabulary used in hip hop make it difficult to use off-the-shelf NLP tools for doing phonological and/or morphological analysis. [sent-17, score-0.806]
12 Hence, our Freestyle system models the problem of improvising a rhyming response given any hip hop lyric challenge as transducing a challenge line into a rhyming response. [sent-22, score-1.547]
13 We use a stochastic transduction grammar induced in a completely unsupervised fashion using a combination of token based rule induction and segmenting (Saers et al. [sent-23, score-0.634]
14 , 2013) as the underlying model to fully-automatically learn a challenge-response system and compare its performance against a simpler token based transduction grammar model. [sent-24, score-0.439]
15 We believe that the challenge-response system based on an interpolated combination of token based rule induction and rule segmenting transduction grammars will generate more fluent and rhyming responses compared to one based on token based transduction grammars models. [sent-26, score-1.631]
16 Therefore, as a principal part of our investigation we compare the quality of responses generated using a combination of token based rule induction and top-down rule segmenting transduction grammars to those generated by pure token based transduction grammars. [sent-28, score-1.21]
17 We also hypothesize that in order to generate fluent and rhyming responses, it is not sufficient to train the transduction grammars on all adjacent lines of a hip hop verse. [sent-29, score-1.62]
18 Therefore, we propose a data selec- tion scheme using a rhyme scheme detector acquired through unsupervised learning to generate the training data for the challenge-response systems. [sent-30, score-0.623]
19 The rhyme scheme detector segments each verse of a hip hop song into stanzas and identifies the lines in each stanza that rhyme with each other which are then added as training instances. [sent-31, score-2.079]
20 Unlike conventional spoken and written language, disfluencies and backing vocals2 occur very frequently in the domain of hip hop lyrics which affect the performance of NLP models designed for processing well-formed sentences. [sent-34, score-1.127]
21 stanza a segment within a verse which has a meter and rhyme scheme. [sent-43, score-0.632]
22 Particularly in hip hop, a single verse often contains many stanzas with different rhyme schemes and meters. [sent-45, score-1.01]
23 We compare the performance of token and segment based transduction grammar models in Section 4. [sent-50, score-0.439]
24 Finally, in 2Particularly the repetitive chants, exclamations, and interjections in hip hop “hype man” style backing vocals. [sent-52, score-0.832]
25 Section 7 we describe some preliminary results obtained using our approach on improvising hip hop responses in French and conclude in Section 8. [sent-53, score-1.016]
26 2 Related work Although a few attempts have been made to apply statistical NLP learning methods to unconventional domains, Freestyle is among the first to tackle the genre of hip hop lyrics (Addanki and Wu, 2013; Wu et al. [sent-54, score-1.081]
27 As a step towards this direction, we contrast the performance of interpolated bottom-up token based rule induction and top-down segmenting transduction grammar models and token based transduction grammar models. [sent-57, score-1.061]
28 However, in hip hop lyrics it is hard to make any linguistic or structural assumptions. [sent-60, score-1.058]
29 For example, words such as sho, flo, holla which frequently appear in the lyrics are not part of any standard lexicon and hip hop does not require a set number ofsyllables in a line, unlike poems. [sent-61, score-1.058]
30 Also, surprising and unlikely rhymes in hip hop are frequently achieved via intonation and assonance, making it hard to apply prior phonological constraints. [sent-62, score-0.846]
31 (2012) use controlled Markov processes to semi-automatically generate lyrics that satisfy the structural constraints of rhyme and meter. [sent-67, score-0.61]
32 A language-independent generative model for stanzas in poetry was proposed by Reddy and Knight (201 1) via which they could discover rhyme schemes in French and English poetry. [sent-79, score-0.502]
33 1 Training data We used freely available user generated hip hop lyrics on the Internet to provide training data for our experiments. [sent-84, score-1.095]
34 We collected approximately 52,000 English hip hop song lyrics amounting to approximately 800Mb of raw HTML content. [sent-85, score-1.094]
35 As human evaluation using expert hip hop listeners is expensive, a small subset of 85 lines was chosen as the test set to provide challenges for comparing the quality ofresponses generatedby different systems. [sent-89, score-0.934]
36 2 Evaluation scheme The performance of various Freestyle versions was evaluated on the task ofgenerating a improvised fluent and rhyming response given a single line of a hip hop verse as a challenge. [sent-91, score-1.502]
37 The output of all the systems on the test set was given to three independent frequent hip hop listeners for manual evaluation. [sent-92, score-0.832]
38 They were free to choose the tune to make the lyrics rhyme as the beats of the song were not used in the training data. [sent-94, score-0.66]
39 Each evaluator was asked to score the response of each system on the criterion of fluency and rhyming as being good, acceptable or bad. [sent-95, score-0.505]
40 As no automatic quality evaluation metrics exist for hip hop responses analogous to BLEU for SMT, the model weights cannot be tuned in conventional ways such as MERT (Och, 2003). [sent-100, score-1.007]
41 token based model We compare the performance of transduction grammars induced via interpolated token based and rule segmenting (ISTG) versus token based transduction grammars (TG) on the task of generating a rhyming and fluent response to hip hop challenges. [sent-103, score-2.27]
42 We use the framework of stochastic transduction grammars, specifically bracketing ITGs (inversion transduction 105 grammars) (Wu, 1997), as our translation model for “transducing” any given challenge into a rhyming and fluent response. [sent-104, score-0.987]
43 Instead, we use a completely unsupervised learning algorithm for segmental ITGs that stays strictly within the transduction grammar optimization framework for both training and testing as proposed in Saers et al. [sent-120, score-0.434]
44 (2013) induce a phrasal inversion transduction grammar via interpolating the bottomup rule chunking approach proposed in Saers et al. [sent-123, score-0.469]
45 2 Decoding heuristics We use our in-house ITG decoder implemented according to the algorithm mentioned in Wu (1996) for the generating responses to challenges by decoding with the trained transduction grammars. [sent-137, score-0.491]
46 3 Results: Rule segmentation improves responses Results in Table 1indicate that the ISTG outperforms the TG model towards the task of generating fluent and rhyming responses. [sent-149, score-0.594]
47 , either good or acceptable) responses on fluency and rhyming criteria. [sent-152, score-0.604]
48 PBSMT+RS, TG+RS, ISTG+RS are models trained on rhyme scheme based corpus selection strategy. [sent-154, score-0.483]
49 Upon inspecting the learned rules, we noticed that the ISTG models capture rhyming correspondences both at the token and segmental levels. [sent-183, score-0.473]
50 Table 2 shows some examples of the transduction rules learned by ISTG grammar trained using rhyme scheme detection. [sent-184, score-0.827]
51 adjacent lines We now compare two data selection approaches for generating the training data for transduction grammar induction via a rhyme scheme detection module and choosing all adjacent lines in a verse. [sent-186, score-1.184]
52 We also briefly describe the training of the rhyme scheme detection module and determine the efficacy of our data selection scheme by training the ISTG model, TG model and the PBSMT baseline on training data generated with and without employing the 107 rhyme scheme detection module. [sent-187, score-1.192]
53 As the rule segmenting approach was intended to improve the fluency as opposed to the rhyming nature of the responses, we only train the rule segmenting model on the randomly chosen subset of all adjacent lines in the verse. [sent-188, score-0.711]
54 The segmental transduction grammar model was combined with the token based transduction grammar model trained on data selected with and without using rhyme scheme detection model. [sent-190, score-1.33]
55 For example, adding successive lines of a stanza which follows ABAB rhyme scheme as training instances to the transduction grammar causes incorrect rhyme correspondences to be learned. [sent-193, score-1.36]
56 The fact that a verse (which is usually represented as a separate paragraph) may contain multiple stanzas of varying length and rhyme schemes worsens this problem. [sent-194, score-0.607]
57 We employ a rhyme scheme detection model (Addanki and Wu, 2013) in order to select training instances that are likely to rhyme. [sent-196, score-0.503]
58 Lines belonging to the same stanza and marked as rhyming according to the rhyme scheme detection model are added to the training corpus. [sent-197, score-0.89]
59 We believe that this data selec- tion scheme will improve the rhyming associations learned during the transduction grammar induction thereby biasing the model towards producing fluent and rhyming output. [sent-198, score-1.212]
60 The rhyme scheme detection model proposes a HMM based generative model for a verse of hip hop lyrics similar to Reddy and Knight (2011). [sent-199, score-1.684]
61 However, owing to the lack of well-defined verse structure in hip hop, a number of hidden states corresponding to stanzas of varying length are used to automatically obtain a soft-segmentation of the verse. [sent-200, score-0.629]
62 Each state in the HMM corresponds to a stanza with a particular rhyme scheme such as AA, ABAB, AAAA while the emissions correspond to the final words in the stanza. [sent-201, score-0.538]
63 We restrict the maximum length of a stanza to be four to maintain a tractable number of states and further only use states to represent stanzas whose rhyme schemes could not be partitioned into smaller schemes without losing a rhyme correspondence. [sent-202, score-0.932]
64 The parameters of the HMM are estimated using the EM algorithm (Devijer, 1985) using the corpus generated by taking the final word of each line in the hip hop lyrics. [sent-203, score-0.848]
65 The lines from each stanza that rhyme with each other according to the Viterbi parse using the trained model are added as training instances for transduction grammar induction. [sent-204, score-0.87]
66 The training data for the rhyme scheme detector was obtained by extracting the end-of-line tokens from each verse. [sent-206, score-0.506]
67 However, upon data inspection we noticed that shorter lines in hip hop stanzas are typically joined with a comma and represented as a single line of text and hence all the tokens before the commas were also added to the training corpus. [sent-207, score-1.025]
68 We evaluated the performance of our rhyme scheme detector on the task of correctly labeling a given verse with rhyme schemes. [sent-210, score-0.987]
69 Two native English speakers who were frequent hip 108 hop listeners were asked to partition the verse into stanzas and assign them with a gold standard rhyme scheme. [sent-212, score-1.416]
70 The rhyme scheme detection module employed in our data selection obtained a precision of 35. [sent-214, score-0.539]
71 2 Training data selection via rhyme scheme detection We obtained around 600,000 training instances upon extracting a training corpus using rhyme scheme detection module as described in Section 5. [sent-219, score-1.073]
72 We added those lines that were adjacent and labeled as rhyming by the rhyme scheme detector as training instances resulting in a training corpus ofsize 200,000. [sent-221, score-0.927]
73 In order to ensure fair comparison of models trained on data selected using rhyme scheme detection, we randomly chose 200,000 training instances from the generated corpus. [sent-224, score-0.494]
74 4 Results: Rhyme scheme detection helps Results in Table 1 indicate that using the rhyme scheme detector for training data selection helps produce significantly more fluent responses compared to using adjacent lines. [sent-227, score-0.971]
75 Given the significantly higher rate of response fluency when using rhyme scheme detection, we argue that using rhyme scheme detector for data selection is beneficial. [sent-230, score-1.119]
76 It is also interesting to note from Table 1 that ISTG+RS performs better than TG+RS indicating that transduction grammar induced via interpolating token based grammar and rule segmenting produces betterresponses than token based transduction grammar on both data selection schemes. [sent-232, score-1.098]
77 Although the fluency of the responses generated by PBSMT+RS drastically improves compared to PBSMT it still lags behind the TG+RS and ISTG+RS models on both fluency and rhyming. [sent-241, score-0.433]
78 It is interesting to note that TG+RS produces responses comparable to PBSMT+RS despite being a token based transduction grammar. [sent-246, score-0.544]
79 Moreover, TG models produce 109 responses that indeed rhyme better (shown in bold- face). [sent-248, score-0.544]
80 In fact, TG tries to rhyme words not only at the end but also in middle of the lines, as our transduction grammar model captures structural associations more effectively than the phrase-based model. [sent-249, score-0.709]
81 6 Disfluency handling via disfluency correction and filtering In this section, we compare the effect of two disfluency mitigating strategies on the quality of the responses generated by the PBSMT baseline and token based transduction grammar model with and without using rhyme scheme detection. [sent-250, score-1.422]
82 Disfluency correction strategy produces higher fraction of responses with ≥acceptable fluency compared to the filtering strategy for both TG and TG+RS models. [sent-267, score-0.435]
83 This result is not surprising, as harshly pruning Table 4: Effect of the disfluency correction strategies on fluency of the responses generated for the TG induction models vs PBSMT baselines using both rhyme scheme detection and adjacent lines as the corpus selection method. [sent-268, score-1.189]
84 As disfluency correction yields more fluent responses for TG and TG+RS models, the results for ISTG and ISTG+RS models in Table 1were obtained using disfluency correction strategy. [sent-271, score-0.605]
85 7 Maghrebi French hip hop We have begun to apply Freestyle to rap in languages other than English, taking advantage of the language independence and linguistics-light approach of our unsupervised transduction grammar induction methods. [sent-272, score-1.266]
86 We briefly describe our initial experiments on Maghrebi French hip hop lyrics below. [sent-275, score-1.058]
87 1 Dataset We collected freely available French hip hop lyrics of approximately 1300 songs. [sent-277, score-1.058]
88 We extracted the end-of-line words and obtained a corpus containing 120,000 tokens corresponding to potential rhyming candidates with around 29,000 unique token types which was used as the training data for the rhyme scheme detector module. [sent-286, score-0.9]
89 For the transduction grammar induction, the training data contained about 47,000 sentence pairs selected using rhyme scheme detection. [sent-287, score-0.822]
90 2 Results After human evaluation by native French speakers and frequent hip hop listeners, our transduction grammar based model generates about 9. [sent-289, score-1.157]
91 5% of the responses that are rated good by the human evaluators on the criterion of fluency and rhyming respectively. [sent-291, score-0.654]
92 From Table 5, we can see that responses generated by the system rhyme with the challenges. [sent-296, score-0.567]
93 In the second example, the model realizes a less common AABA rhyme scheme through the response. [sent-298, score-0.457]
94 These examples illustrate that our transduction grammar formalism coupled with our rhyme scheme detection module does capture the necessary correspondences between lines ofhip hop lyrics without assuming any language specific resources. [sent-307, score-1.599]
95 We compared the performance of our Freestyle model against a widely used off-theshelf phrase-based SMT model, showing that PBSMT falls short in tackling the noisy and highly unstructured domain of hip hop lyrics. [sent-309, score-0.827]
96 We showed that the quality of responses improves when the training data for the transduction grammar induction is selected using a rhyme scheme detector. [sent-310, score-1.071]
97 We 111 also reported results on Maghrebi French hip hop lyrics which indicate that our model works surprisingly well with no special adaptation for languages other than English. [sent-312, score-1.058]
98 In the future, we plan to investigate alternative training data selection techniques, disfluency handling strategies, search heuristics, and novel transduction grammar induction models. [sent-313, score-0.566]
99 “Unsupervised rhyme scheme identification in hip hop lyrics using hidden Markov models. [sent-325, score-1.515]
100 “FREESTYLE: A challenge-response system for hip hop lyrics via unsupervised induction of stochastic transduction grammars. [sent-430, score-1.427]
wordName wordTfidf (topN-words)
[('hip', 0.403), ('hop', 0.403), ('rhyme', 0.358), ('rhyming', 0.306), ('transduction', 0.27), ('lyrics', 0.252), ('responses', 0.186), ('tg', 0.169), ('pbsmt', 0.145), ('istg', 0.137), ('verse', 0.137), ('fluency', 0.112), ('disfluency', 0.112), ('scheme', 0.099), ('saers', 0.097), ('stanzas', 0.089), ('token', 0.088), ('fluent', 0.087), ('grammar', 0.081), ('stanza', 0.081), ('freestyle', 0.073), ('itgs', 0.073), ('rs', 0.071), ('lines', 0.066), ('induction', 0.063), ('dekai', 0.057), ('addanki', 0.056), ('maghrebi', 0.056), ('meter', 0.056), ('correction', 0.054), ('rule', 0.053), ('segmental', 0.051), ('grammars', 0.05), ('verses', 0.048), ('smt', 0.047), ('segmenting', 0.043), ('french', 0.042), ('karteek', 0.04), ('acceptable', 0.04), ('song', 0.036), ('itg', 0.036), ('rated', 0.035), ('detector', 0.035), ('adjacent', 0.035), ('inversion', 0.034), ('wu', 0.033), ('detection', 0.032), ('songs', 0.032), ('poetry', 0.032), ('response', 0.032), ('markus', 0.028), ('disfluencies', 0.028), ('rap', 0.028), ('selection', 0.026), ('listeners', 0.026), ('backing', 0.026), ('interpolated', 0.024), ('module', 0.024), ('improvising', 0.024), ('rhymes', 0.024), ('schemes', 0.023), ('genre', 0.023), ('generated', 0.023), ('strategy', 0.023), ('strategies', 0.023), ('fraction', 0.021), ('unstructured', 0.021), ('challenges', 0.02), ('line', 0.019), ('rules', 0.019), ('challenge', 0.019), ('successive', 0.019), ('stochastic', 0.018), ('june', 0.018), ('unsupervised', 0.018), ('upon', 0.017), ('interpolating', 0.017), ('translation', 0.017), ('abab', 0.016), ('aceptbl', 0.016), ('barbieri', 0.016), ('couplets', 0.016), ('improvised', 0.016), ('intonation', 0.016), ('ofresponses', 0.016), ('rgc', 0.016), ('tamil', 0.016), ('transducing', 0.016), ('reddy', 0.016), ('filtering', 0.016), ('generating', 0.015), ('criterion', 0.015), ('conventional', 0.015), ('stress', 0.015), ('training', 0.014), ('music', 0.014), ('noticed', 0.014), ('correspondences', 0.014), ('artists', 0.014), ('bottomup', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif
Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102
2 0.16287218 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries
Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen
Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1
3 0.0896383 4 emnlp-2013-A Dataset for Research on Short-Text Conversations
Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen
Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.
4 0.071181595 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
5 0.061242446 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
6 0.061044682 150 emnlp-2013-Pair Language Models for Deriving Alternative Pronunciations and Spellings from Pronunciation Dictionaries
7 0.055370517 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
8 0.044984449 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
9 0.043745324 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
10 0.040926881 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
11 0.03880395 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology
12 0.035452131 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
13 0.034570705 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
14 0.034426283 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
15 0.033871643 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
16 0.033046868 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering
17 0.033011798 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
18 0.032356743 201 emnlp-2013-What is Hidden among Translation Rules
19 0.031049358 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
20 0.030256644 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
topicId topicWeight
[(0, -0.106), (1, -0.058), (2, 0.007), (3, 0.005), (4, -0.011), (5, -0.011), (6, 0.008), (7, 0.063), (8, 0.035), (9, -0.015), (10, -0.052), (11, 0.051), (12, -0.03), (13, 0.032), (14, 0.006), (15, 0.029), (16, -0.038), (17, -0.155), (18, 0.053), (19, 0.064), (20, 0.141), (21, -0.05), (22, 0.054), (23, -0.057), (24, 0.016), (25, 0.013), (26, -0.051), (27, 0.0), (28, 0.036), (29, -0.091), (30, -0.07), (31, -0.006), (32, -0.011), (33, 0.016), (34, 0.004), (35, 0.003), (36, 0.013), (37, -0.108), (38, 0.133), (39, -0.069), (40, -0.181), (41, -0.096), (42, 0.072), (43, 0.002), (44, -0.108), (45, -0.202), (46, 0.047), (47, -0.089), (48, 0.178), (49, -0.153)]
simIndex simValue paperId paperTitle
same-paper 1 0.93065012 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif
Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102
2 0.4569152 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries
Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen
Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: Many statistical learning problems in NLP call for local model search methods. But accuracy tends to suffer with current techniques, which often explore either too narrowly or too broadly: hill-climbers can get stuck in local optima, whereas samplers may be inefficient. We propose to arrange individual local optimizers into organized networks. Our building blocks are operators of two types: (i) transform, which suggests new places to search, via non-random restarts from already-found local optima; and (ii) join, which merges candidate solutions to find better optima. Experiments on grammar induction show that pursuing different transforms (e.g., discarding parts of a learned model or ignoring portions of training data) results in improvements. Groups of locally-optimal solutions can be further perturbed jointly, by constructing mixtures. Using these tools, we designed several modular dependency grammar induction networks of increasing complexity. Our complete sys- tem achieves 48.6% accuracy (directed dependency macro-average over all 19 languages in the 2006/7 CoNLL data) more than 5% higher than the previous state-of-the-art. —
4 0.39311558 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
5 0.35352498 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
Author: Bowei Zou ; Guodong Zhou ; Qiaoming Zhu
Abstract: Scope detection is a key task in information extraction. This paper proposes a new approach for tree kernel-based scope detection by using the structured syntactic parse information. In addition, we have explored the way of selecting compatible features for different part-of-speech cues. Experiments on the BioScope corpus show that both constituent and dependency structured syntactic parse features have the advantage in capturing the potential relationships between cues and their scopes. Compared with the state of the art scope detection systems, our system achieves substantial improvement.
6 0.34062904 4 emnlp-2013-A Dataset for Research on Short-Text Conversations
7 0.32314345 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching
8 0.30659217 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
10 0.2776792 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
11 0.26983595 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
12 0.26676148 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
13 0.24798402 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
14 0.24591848 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation
15 0.24581495 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology
16 0.24529728 162 emnlp-2013-Russian Stress Prediction using Maximum Entropy Ranking
17 0.2353588 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings
18 0.23429815 58 emnlp-2013-Dependency Language Models for Sentence Completion
19 0.21995683 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
20 0.21881118 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries
topicId topicWeight
[(3, 0.026), (18, 0.034), (22, 0.039), (26, 0.402), (30, 0.105), (45, 0.02), (50, 0.018), (51, 0.108), (66, 0.029), (71, 0.017), (75, 0.025), (77, 0.04), (96, 0.017)]
simIndex simValue paperId paperTitle
1 0.95997167 141 emnlp-2013-Online Learning for Inexact Hypergraph Search
Author: Hao Zhang ; Liang Huang ; Kai Zhao ; Ryan McDonald
Abstract: Online learning algorithms like the perceptron are widely used for structured prediction tasks. For sequential search problems, like left-to-right tagging and parsing, beam search has been successfully combined with perceptron variants that accommodate search errors (Collins and Roark, 2004; Huang et al., 2012). However, perceptron training with inexact search is less studied for bottom-up parsing and, more generally, inference over hypergraphs. In this paper, we generalize the violation-fixing perceptron of Huang et al. (2012) to hypergraphs and apply it to the cube-pruning parser of Zhang and McDonald (2012). This results in the highest reported scores on WSJ evaluation set (UAS 93.50% and LAS 92.41% respectively) without the aid of additional resources.
2 0.84756798 145 emnlp-2013-Optimal Beam Search for Machine Translation
Author: Alexander Rush ; Yin-Wen Chang ; Michael Collins
Abstract: Beam search is a fast and empirically effective method for translation decoding, but it lacks formal guarantees about search error. We develop a new decoding algorithm that combines the speed of beam search with the optimal certificate property of Lagrangian relaxation, and apply it to phrase- and syntax-based translation decoding. The new method is efficient, utilizes standard MT algorithms, and returns an exact solution on the majority of translation examples in our test data. The algorithm is 3.5 times faster than an optimized incremental constraint-based decoder for phrase-based translation and 4 times faster for syntax-based translation.
same-paper 3 0.72677797 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif
Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102
4 0.50156409 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training
Author: Heng Yu ; Liang Huang ; Haitao Mi ; Kai Zhao
Abstract: While large-scale discriminative training has triumphed in many NLP problems, its definite success on machine translation has been largely elusive. Most recent efforts along this line are not scalable (training on the small dev set with features from top ∼100 most frequent wt woridths) f eaantdu overly complicated. oWste f iren-stead present a very simple yet theoretically motivated approach by extending the recent framework of “violation-fixing perceptron”, using forced decoding to compute the target derivations. Extensive phrase-based translation experiments on both Chinese-to-English and Spanish-to-English tasks show substantial gains in BLEU by up to +2.3/+2.0 on dev/test over MERT, thanks to 20M+ sparse features. This is the first successful effort of large-scale online discriminative training for MT. 1Introduction Large-scale discriminative training has witnessed great success in many NLP problems such as parsing (McDonald et al., 2005) and tagging (Collins, 2002), but not yet for machine translation (MT) despite numerous recent efforts. Due to scalability issues, most of these recent methods can only train on a small dev set of about a thousand sentences rather than on the full training set, and only with 2,000–10,000 rather “dense-like” features (either unlexicalized or only considering highest-frequency words), as in MIRA (Watanabe et al., 2007; Chiang et al., 2008; Chiang, 2012), PRO (Hopkins and May, 2011), and RAMP (Gimpel and Smith, 2012). However, it is well-known that the most important features for NLP are lexicalized, most of which can not ∗ Work done while visiting City University of New York. Corresponding author. † 1112 be seen on a small dataset. Furthermore, these methods often involve complicated loss functions and intricate choices of the “target” derivations to update towards or against (e.g. k-best/forest oracles, or hope/fear derivations), and are thus hard to replicate. As a result, the classical method of MERT (Och, 2003) remains the default training algorithm for MT even though it can only tune a handful of dense features. See also Section 6 for other related work. As a notable exception, Liang et al. (2006) do train a structured perceptron model on the training data with sparse features, but fail to outperform MERT. We argue this is because structured perceptron, like many structured learning algorithms such as CRF and MIRA, assumes exact search, and search errors inevitably break theoretical properties such as convergence (Huang et al., 2012). Empirically, it is now well accepted that standard perceptron performs poorly when search error is severe (Collins and Roark, 2004; Zhang et al., 2013). To address the search error problem we propose a very simple approach based on the recent framework of “violation-fixing perceptron” (Huang et al., 2012) which is designed specifically for inexact search, with a theoretical convergence guarantee and excellent empirical performance on beam search parsing and tagging. The basic idea is to update when search error happens, rather than at the end of the search. To adapt it to MT, we extend this framework to handle latent variables corresponding to the hidden derivations. We update towards “gold-standard” derivations computed by forced decoding so that each derivation leads to the exact reference translation. Forced decoding is also used as a way of data selection, since those reachable sentence pairs are generally more literal and of higher quality, which the training should focus on. When the reachable subset is small for some language pairs, we augment Proce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et.h ?oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is1t1ic2s–1 23, it by including reachable prefix-pairs when the full sentence pair is not. We make the following contributions: 1. Our work is the first successful effort to scale online structured learning to a large portion of the training data (as opposed to the dev set). 2. Our work is the first to use a principled learning method customized for inexact search which updates on partial derivations rather than full ones in order to fix search errors. We adapt it to MT using latent variables for derivations. 3. Contrary to the common wisdom, we show that simply updating towards the exact reference translation is helpful, which is much simpler than k-best/forest oracles or loss-augmented (e.g. hope/fear) derivations, avoiding sentencelevel BLEU scores or other loss functions. 4. We present a convincing analysis that it is the search errors and standard perceptron’s inability to deal with them that prevent previous work, esp. Liang et al. (2006), from succeeding. 5. Scaling to the training data enables us to engineer a very rich feature set of sparse, lexicalized, and non-local features, and we propose various ways to alleviate overfitting. For simplicity and efficiency reasons, in this paper we use phrase-based translation, but our method has the potential to be applicable to other translation paradigms. Extensive experiments on both Chineseto-English and Spanish-to-English tasks show statistically significant gains in BLEU by up to +2.3/+2.0 on dev/test over MERT, and up to +1.5/+1.5 over PRO, thanks to 20M+ sparse features. 2 Phrase-Based MT and Forced Decoding We first review the basic phrase-based decoding algorithm (Koehn, 2004), which will be adapted for forced decoding. 2.1 Background: Phrase-based Decoding We will use the following running example from Chinese to English from Mi et al. (2008): 0123456 Figure 1: Standard beam-search phrase-based decoding. B `ush´ ı y uˇ Sh¯ al´ ong j ˇux ´ıng le hu` ıt´ an Bush with Sharon hold -ed meeting ‘Bush held a meeting with Sharon’ Phrase-based decoders generate partial targetlanguage outputs in left-to-right order in the form of hypotheses (or states) (Koehn, 2004). Each hypothesis has a coverage vector capturing the sourcelanguage words translated so far, and can be extended into a longer hypothesis by a phrase-pair translating an uncovered segment. For example, the following is one possible derivation: (• 3(• •() • :1( •s063),:“(Bs)u2s:,h)“(hBs:e1ul(d,s0“ht,aB“hleuk”ls) hdw”t)ailhkrsS1”h)aro2n”)r3 where a • in the coverage vector indicates the source wwoherdre a at •th i ns position aisg e“ vcoecvteorred in”d iacnadte ws thheer seo euarcche si is the score of each state, each adding the rule score and the distortion cost (dc) to the score of the previous state. To compute the distortion cost we also need to maintain the ending position of the last phrase (e.g., the 3 and 6 in the coverage vectors). In phrase-based translation there is also a distortionlimit which prohibits long-distance reorderings. The above states are called −LM states since they do Tnhoet ainbovovleve st language mlleodd −el LcMos tsst.a eTso iandcde a beiygram model, we split each −LM state into a series ogrfa +mL mMo states; ee sapchli t+ eaLcMh −staLtMe h satsa ttehe in ftoor ma (v,a) where a is the last word of the hypothesis. Thus a +LM version of the above derivation might be: (• 3(• ,(•Sh1a•(r6o0,nta)l:ks,()Bsu:03sh,(s“<)s02
5 0.45068115 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
Author: Kai Zhao ; James Cross ; Liang Huang
Abstract: We present the first provably optimal polynomial time dynamic programming (DP) algorithm for best-first shift-reduce parsing, which applies the DP idea of Huang and Sagae (2010) to the best-first parser of Sagae and Lavie (2006) in a non-trivial way, reducing the complexity of the latter from exponential to polynomial. We prove the correctness of our algorithm rigorously. Experiments confirm that DP leads to a significant speedup on a probablistic best-first shift-reduce parser, and makes exact search under such a model tractable for the first time.
6 0.41404504 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
7 0.41245365 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
8 0.40393153 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
9 0.39484188 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
10 0.39383438 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
11 0.39259204 2 emnlp-2013-A Convex Alternative to IBM Model 2
12 0.3869096 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
13 0.3852444 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
15 0.37654114 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
16 0.37524584 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
17 0.37467977 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
18 0.37460741 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
19 0.37454447 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
20 0.37386066 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation