acl acl2011 acl2011-321 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sravana Reddy ; Kevin Knight
Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation. [sent-3, score-1.64]
2 1 Introduction Rhyming stanzas of poetry are characterized by rhyme schemes, patterns that specify how the lines in the stanza rhyme with one another. [sent-4, score-2.184]
3 The question we raise in this paper is: can we infer the rhyme scheme of a stanza given no information about pronunciations or rhyming relations among words? [sent-5, score-1.512]
4 Background A rhyme scheme is represented as a string corresponding to the sequence of lines that comprise the stanza, in which rhyming lines are denoted by the same letter. [sent-6, score-1.287]
5 For example, the limerick’s rhyme scheme is aabba, indicating that the 1st, 2nd, and 5th lines rhyme, as do the the 3rd and 4th. [sent-7, score-0.873]
6 ‘Culturomics ’ The field of digital humanities ‘isC growing, sw’i tThh a ffioecldus on isgtaittaislti chsu mtoa ntritaiceks cultural and literary trends (partially spurred by projects like the Google Books Ngrams1). [sent-9, score-0.128]
7 com/ 77 Kevin Knight Information Sciences Institute University of Southern California Marina del Rey, CA 90292 knight @ i i s . [sent-12, score-0.041]
8 edu Rhyming corpora could be extremely useful for large-scale statistical analyses of poetic texts. [sent-13, score-0.055]
9 • Historical Linguistics/Study of Dialects Rhymes olf a word in poetry of a given time period or dialect region provide clues about its pronunciation in that time or dialect, a fact that is often taken advantage of by linguists (Wyld, 1923). [sent-14, score-0.474]
10 One could automate this task given enough annotated data. [sent-15, score-0.023]
11 An obvious approach to finding rhyme schemes is to use word pronunciations and a definition of rhyme, in which case the problem is fairly easy. [sent-16, score-0.972]
12 However, we favor an unsupervised solution that utilizes no external knowledge for several reasons. [sent-17, score-0.055]
13 The definition of rhyme varies across poetic tTrahdeit dioefnisn taiondn languages, arnieds may sin pcoluedtiec slant rhymes like gate/mat, ‘sight rhymes’ like word/sword, assonance/consonance like shore/ alone, leaves/lance, etc. [sent-20, score-1.158]
14 • Pronunciations and spelling conventions change over ntisme. [sent-21, score-0.022]
15 Words that rhymed historically may not anymore, like prove and love or proued and beloued. [sent-22, score-0.048]
16 – 2 Related Work There have been a number of recent papers on the automated annotation, analysis, or translation of poProceedings ofP tohretl 4an9tdh, O Anrneguoanl, M Jueentein 19g- o2f4 t,h 2e0 A1s1s. [sent-23, score-0.036]
17 cc ia2t0io1n1 f Aors Cocoimatpiounta ftoiorn Caolm Lipnugtuaitsiotincasl:s Lhionrgtpuaisptiecrs , pages 77–82, etry. [sent-25, score-0.02]
18 (2010) use a finite state transducer to infer the syllable-stress assignments in lines of poetry under metrical constraints. [sent-27, score-0.426]
19 (2010) incorporate constraints on meter and rhyme (where the stress and rhyming information is derived from a pronunciation dictionary) into a machine translation system. [sent-29, score-1.276]
20 Jiang and Zhou (2008) develop a system to generate the second line of a Chinese couplet given the first. [sent-30, score-0.048]
21 A few researchers have also explored the problem of poetry generation under some constraints (Manurung et al. [sent-31, score-0.276]
22 There has also been some work on computational approaches to characterizing rhymes (Byrd and Chodorow, 1985) and global properties of the rhyme network (Sonderegger, 2011) in English. [sent-35, score-1.067]
23 To the best ofour knowledge, there has been no language-independent computational work on finding rhyme schemes. [sent-36, score-0.804]
24 3 Finding Stanza Rhyme Schemes A collection of rhyming poetry inevitably contains repetition of rhyming pairs. [sent-37, score-0.976]
25 For example, the word trees will often rhyme with breeze across different stanzas, even those with different rhyme schemes and written by different authors. [sent-38, score-1.65]
26 This is partly due to sparsity of rhymes many words that have no rhymes at all, and many others have only a handful, forcing poets to reuse rhyming pairs. [sent-39, score-0.96]
27 In this section, we describe an unsupervised algorithm to infer rhyme schemes that harnesses this repetition, based on a model of stanza generation. [sent-40, score-1.2]
28 Pick a rhyme scheme r of length n with probability P(r). [sent-43, score-0.804]
29 For each i ∈ [1, n] , pick a word sequence, choosing t hie ∈last [21 ,wno],rd p xi as afo wlloorwds :s (a) If, according to r, the line does not rhyme with any previous line in the stanza, pick a word xi from a vocabulary of line-end words with probability P(xi). [sent-45, score-1.074]
30 (b) If the ith line rhymes with some previous line(s) j according to r, choose a word xi that ith 2A rhyme may span more than one word in a line – for example, laureate. [sent-46, score-1.248]
31 An extension of our model could include a latent variable that selects the entire rhyming portion of a line. [sent-53, score-0.324]
wordName wordTfidf (topN-words)
[('rhyme', 0.76), ('rhyming', 0.324), ('rhymes', 0.288), ('poetry', 0.253), ('stanza', 0.252), ('schemes', 0.098), ('pronunciations', 0.075), ('meter', 0.072), ('stanzas', 0.072), ('lines', 0.069), ('pronunciation', 0.06), ('poetic', 0.055), ('dialect', 0.055), ('repetition', 0.05), ('line', 0.048), ('pick', 0.047), ('xi', 0.046), ('scheme', 0.044), ('chicago', 0.038), ('infer', 0.035), ('uren', 0.032), ('olf', 0.032), ('cago', 0.032), ('harnesses', 0.032), ('afo', 0.032), ('breeze', 0.032), ('isc', 0.032), ('metrical', 0.032), ('mtoa', 0.032), ('rwe', 0.032), ('growing', 0.031), ('ramakrishnan', 0.029), ('byrd', 0.029), ('nin', 0.029), ('byron', 0.029), ('ith', 0.029), ('anymore', 0.027), ('greene', 0.027), ('sight', 0.027), ('reddy', 0.027), ('historically', 0.026), ('qj', 0.026), ('tthh', 0.026), ('tory', 0.026), ('inevitably', 0.025), ('dialects', 0.024), ('knight', 0.024), ('constraints', 0.023), ('humanities', 0.023), ('automate', 0.023), ('forcing', 0.023), ('unsupervised', 0.023), ('sin', 0.022), ('raise', 0.022), ('ofour', 0.022), ('finding', 0.022), ('conventions', 0.022), ('chodorow', 0.022), ('handful', 0.022), ('reuse', 0.022), ('love', 0.022), ('historical', 0.021), ('transducer', 0.021), ('southern', 0.021), ('aors', 0.021), ('lhionrgtpuaisptiecrs', 0.021), ('cultural', 0.021), ('comprise', 0.021), ('stress', 0.021), ('marina', 0.021), ('region', 0.02), ('linguists', 0.02), ('ye', 0.02), ('anrneguoanl', 0.02), ('caolm', 0.02), ('cocoimatpiounta', 0.02), ('ftoiorn', 0.02), ('jueentein', 0.02), ('lipnugtuaitsiotincasl', 0.02), ('tohretl', 0.02), ('literary', 0.019), ('tnh', 0.019), ('rey', 0.019), ('characterizing', 0.019), ('period', 0.018), ('books', 0.018), ('characterized', 0.018), ('sw', 0.017), ('definition', 0.017), ('del', 0.017), ('digital', 0.017), ('utilizes', 0.017), ('il', 0.016), ('varies', 0.016), ('trends', 0.016), ('translation', 0.016), ('clues', 0.016), ('assignments', 0.016), ('favor', 0.015), ('partly', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 321 acl-2011-Unsupervised Discovery of Rhyme Schemes
Author: Sravana Reddy ; Kevin Knight
Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
2 0.032510854 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
Author: Omar F. Zaidan ; Chris Callison-Burch
Abstract: The written form of Arabic, Modern Standard Arabic (MSA), differs quite a bit from the spoken dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life. However, due to MSA’s prevalence in written form, almost all Arabic datasets have predominantly MSA content. We present the Arabic Online Commentary Dataset, a 52M-word monolingual dataset rich in dialectal content, and we describe our long-term annotation effort to identify the dialect level (and dialect itself) in each sentence of the dataset. So far, we have labeled 108K sentences, 41% of which as having dialectal content. We also present experimental results on the task of automatic dialect identification, using the collected labels for training and evaluation.
3 0.022851035 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
Author: Siwei Wang ; Gina-Anne Levow
Abstract: Verbal feedback is an important information source in establishing interactional rapport. However, predicting verbal feedback across languages is challenging due to languagespecific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. In this paper, we employ an approach combining classifier weighting and SMOTE algorithm oversampling to improve verbal feedback prediction in Arabic, English, and Spanish dyadic conversations. This approach improves the prediction of verbal feedback, up to 6-fold, while maintaining a high overall accuracy. Analyzing highly weighted features highlights widespread use of pitch, with more varied use of intensity and duration.
4 0.022283088 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
5 0.02217856 68 acl-2011-Classifying arguments by scheme
Author: Vanessa Wei Feng ; Graeme Hirst
Abstract: Argumentation schemes are structures or templates for various kinds of arguments. Given the text of an argument with premises and conclusion identified, we classify it as an instance ofone offive common schemes, using features specific to each scheme. We achieve accuracies of 63–91% in one-against-others classification and 80–94% in pairwise classification (baseline = 50% in both cases).
6 0.018190654 336 acl-2011-Why Press Backspace? Understanding User Input Behaviors in Chinese Pinyin Input Method
7 0.016745185 197 acl-2011-Latent Class Transliteration based on Source Language Origin
8 0.016726743 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
9 0.016481174 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
10 0.015930071 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
11 0.015916059 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
12 0.015635222 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
13 0.015046688 204 acl-2011-Learning Word Vectors for Sentiment Analysis
14 0.014805673 153 acl-2011-How do you pronounce your name? Improving G2P with transliterations
15 0.014036186 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web
16 0.013845243 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
17 0.013603355 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model
18 0.013512611 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives
19 0.012799337 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
20 0.012698709 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
topicId topicWeight
[(0, 0.039), (1, -0.003), (2, 0.002), (3, 0.01), (4, -0.006), (5, 0.0), (6, 0.014), (7, -0.011), (8, -0.01), (9, 0.011), (10, -0.002), (11, 0.006), (12, 0.001), (13, 0.03), (14, 0.004), (15, -0.006), (16, -0.023), (17, 0.004), (18, -0.002), (19, 0.021), (20, 0.001), (21, 0.018), (22, 0.005), (23, 0.029), (24, -0.01), (25, -0.003), (26, 0.011), (27, -0.019), (28, 0.001), (29, -0.017), (30, -0.011), (31, 0.003), (32, 0.034), (33, -0.004), (34, 0.029), (35, -0.018), (36, -0.022), (37, -0.017), (38, 0.008), (39, -0.027), (40, 0.012), (41, 0.003), (42, -0.009), (43, 0.029), (44, -0.013), (45, -0.013), (46, -0.022), (47, 0.014), (48, -0.011), (49, 0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.80123967 321 acl-2011-Unsupervised Discovery of Rhyme Schemes
Author: Sravana Reddy ; Kevin Knight
Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
Author: Nina Dethlefs ; Heriberto Cuayahuitl
Abstract: Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation. It is based on a generation space in the form of a Hidden Markov Model (HMM). Results in terms of task success and human-likeness suggest that our unified approach performs better than greedy or random baselines.
3 0.44687182 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking
Author: Sina Zarriess ; Aoife Cahill ; Jonas Kuhn
Abstract: This paper addresses a data-driven surface realisation model based on a large-scale reversible grammar of German. We investigate the relationship between the surface realisation performance and the character of the input to generation, i.e. its degree of underspecification. We extend a syntactic surface realisation system, which can be trained to choose among word order variants, such that the candidate set includes active and passive variants. This allows us to study the interaction of voice and word order alternations in realistic German corpus data. We show that with an appropriately underspecified input, a linguistically informed realisation model trained to regenerate strings from the underlying semantic representation achieves 91.5% accuracy (over a baseline of 82.5%) in the prediction of the original voice. 1
4 0.44317922 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation
Author: Fahad Alotaiby
Abstract: Arabic language is a morphologically complex language. Affixes and clitics are regularly attached to stems which make direct comparison between words not practical. In this paper we propose a new automatic headline generation technique that utilizes character cross-correlation to extract best headlines and to overcome the Arabic language complex morphology. The system that uses character cross-correlation achieves ROUGE-L score of 0. 19384 while the exact word matching scores only 0. 17252 for the same set of documents. 1
Author: Kenneth Hild ; Umut Orhan ; Deniz Erdogmus ; Brian Roark ; Barry Oken ; Shalini Purwar ; Hooman Nezamfar ; Melanie Fried-Oken
Abstract: Event related potentials (ERP) corresponding to stimuli in electroencephalography (EEG) can be used to detect the intent of a person for brain computer interfaces (BCI). This paradigm is widely used to build letter-byletter text input systems using BCI. Nevertheless using a BCI-typewriter depending only on EEG responses will not be sufficiently accurate for single-trial operation in general, and existing systems utilize many-trial schemes to achieve accuracy at the cost of speed. Hence incorporation of a language model based prior or additional evidence is vital to improve accuracy and speed. In this demonstration we will present a BCI system for typing that integrates a stochastic language model with ERP classification to achieve speedups, via the rapid serial visual presentation (RSVP) paradigm.
6 0.43257695 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
7 0.42904869 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
8 0.41411722 74 acl-2011-Combining Indicators of Allophony
9 0.40682384 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
10 0.38721642 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
12 0.3829214 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models
13 0.37762469 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
14 0.37076581 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
15 0.36819565 142 acl-2011-Generalized Interpolation in Decision Tree LM
16 0.36735833 68 acl-2011-Classifying arguments by scheme
17 0.36469468 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
18 0.35907322 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
19 0.35372382 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
20 0.34898803 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
topicId topicWeight
[(0, 0.48), (5, 0.017), (17, 0.017), (26, 0.013), (37, 0.025), (39, 0.032), (41, 0.036), (55, 0.025), (59, 0.03), (72, 0.053), (75, 0.015), (91, 0.033), (96, 0.084)]
simIndex simValue paperId paperTitle
same-paper 1 0.74415189 321 acl-2011-Unsupervised Discovery of Rhyme Schemes
Author: Sravana Reddy ; Kevin Knight
Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
2 0.72578806 134 acl-2011-Extracting and Classifying Urdu Multiword Expressions
Author: Annette Hautli ; Sebastian Sulger
Abstract: This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of 0.5 and 0.746 for locations and persons, respectively. A target application is the Urdu ParGram grammar, where MWEs are needed to generate a more precise syntactic and semantic analysis.
3 0.63642794 214 acl-2011-Lost in Translation: Authorship Attribution using Frame Semantics
Author: Steffen Hedegaard ; Jakob Grue Simonsen
Abstract: We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve clas- sifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.
4 0.53726172 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment.
5 0.51472133 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging
Author: Michael Auli ; Adam Lopez
Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.
8 0.25808236 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
9 0.25793493 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution
10 0.24559724 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components
11 0.24528858 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
12 0.24355757 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style
13 0.243478 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
14 0.2424165 252 acl-2011-Prototyping virtual instructors from human-human corpora
15 0.24017502 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
16 0.23943257 261 acl-2011-Recognizing Named Entities in Tweets
17 0.23927064 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
18 0.23731753 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
19 0.23696813 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
20 0.23635876 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning