acl acl2011 acl2011-321 knowledge-graph by maker-knowledge-mining

321 acl-2011-Unsupervised Discovery of Rhyme Schemes

Source: pdf

Author: Sravana Reddy ; Kevin Knight

Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation. [sent-3, score-1.64]

2 1 Introduction Rhyming stanzas of poetry are characterized by rhyme schemes, patterns that specify how the lines in the stanza rhyme with one another. [sent-4, score-2.184]

3 The question we raise in this paper is: can we infer the rhyme scheme of a stanza given no information about pronunciations or rhyming relations among words? [sent-5, score-1.512]

4 Background A rhyme scheme is represented as a string corresponding to the sequence of lines that comprise the stanza, in which rhyming lines are denoted by the same letter. [sent-6, score-1.287]

5 For example, the limerick’s rhyme scheme is aabba, indicating that the 1st, 2nd, and 5th lines rhyme, as do the the 3rd and 4th. [sent-7, score-0.873]

6 ‘Culturomics ’ The field of digital humanities ‘isC growing, sw’i tThh a ffioecldus on isgtaittaislti chsu mtoa ntritaiceks cultural and literary trends (partially spurred by projects like the Google Books Ngrams1). [sent-9, score-0.128]

7 com/ 77 Kevin Knight Information Sciences Institute University of Southern California Marina del Rey, CA 90292 knight @ i i s . [sent-12, score-0.041]

8 edu Rhyming corpora could be extremely useful for large-scale statistical analyses of poetic texts. [sent-13, score-0.055]

9 • Historical Linguistics/Study of Dialects Rhymes olf a word in poetry of a given time period or dialect region provide clues about its pronunciation in that time or dialect, a fact that is often taken advantage of by linguists (Wyld, 1923). [sent-14, score-0.474]

10 One could automate this task given enough annotated data. [sent-15, score-0.023]

11 An obvious approach to finding rhyme schemes is to use word pronunciations and a definition of rhyme, in which case the problem is fairly easy. [sent-16, score-0.972]

12 However, we favor an unsupervised solution that utilizes no external knowledge for several reasons. [sent-17, score-0.055]

13 The definition of rhyme varies across poetic tTrahdeit dioefnisn taiondn languages, arnieds may sin pcoluedtiec slant rhymes like gate/mat, ‘sight rhymes’ like word/sword, assonance/consonance like shore/ alone, leaves/lance, etc. [sent-20, score-1.158]

14 • Pronunciations and spelling conventions change over ntisme. [sent-21, score-0.022]

15 Words that rhymed historically may not anymore, like prove and love or proued and beloued. [sent-22, score-0.048]

16 – 2 Related Work There have been a number of recent papers on the automated annotation, analysis, or translation of poProceedings ofP tohretl 4an9tdh, O Anrneguoanl, M Jueentein 19g- o2f4 t,h 2e0 A1s1s. [sent-23, score-0.036]

17 cc ia2t0io1n1 f Aors Cocoimatpiounta ftoiorn Caolm Lipnugtuaitsiotincasl:s Lhionrgtpuaisptiecrs , pages 77–82, etry. [sent-25, score-0.02]

18 (2010) use a finite state transducer to infer the syllable-stress assignments in lines of poetry under metrical constraints. [sent-27, score-0.426]

19 (2010) incorporate constraints on meter and rhyme (where the stress and rhyming information is derived from a pronunciation dictionary) into a machine translation system. [sent-29, score-1.276]

20 Jiang and Zhou (2008) develop a system to generate the second line of a Chinese couplet given the first. [sent-30, score-0.048]

21 A few researchers have also explored the problem of poetry generation under some constraints (Manurung et al. [sent-31, score-0.276]

22 There has also been some work on computational approaches to characterizing rhymes (Byrd and Chodorow, 1985) and global properties of the rhyme network (Sonderegger, 2011) in English. [sent-35, score-1.067]

23 To the best ofour knowledge, there has been no language-independent computational work on finding rhyme schemes. [sent-36, score-0.804]

24 3 Finding Stanza Rhyme Schemes A collection of rhyming poetry inevitably contains repetition of rhyming pairs. [sent-37, score-0.976]

25 For example, the word trees will often rhyme with breeze across different stanzas, even those with different rhyme schemes and written by different authors. [sent-38, score-1.65]

26 This is partly due to sparsity of rhymes many words that have no rhymes at all, and many others have only a handful, forcing poets to reuse rhyming pairs. [sent-39, score-0.96]

27 In this section, we describe an unsupervised algorithm to infer rhyme schemes that harnesses this repetition, based on a model of stanza generation. [sent-40, score-1.2]

28 Pick a rhyme scheme r of length n with probability P(r). [sent-43, score-0.804]

29 For each i ∈ [1, n] , pick a word sequence, choosing t hie ∈last [21 ,wno],rd p xi as afo wlloorwds :s (a) If, according to r, the line does not rhyme with any previous line in the stanza, pick a word xi from a vocabulary of line-end words with probability P(xi). [sent-45, score-1.074]

30 (b) If the ith line rhymes with some previous line(s) j according to r, choose a word xi that ith 2A rhyme may span more than one word in a line – for example, laureate. [sent-46, score-1.248]

31 An extension of our model could include a latent variable that selects the entire rhyming portion of a line. [sent-53, score-0.324]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rhyme', 0.76), ('rhyming', 0.324), ('rhymes', 0.288), ('poetry', 0.253), ('stanza', 0.252), ('schemes', 0.098), ('pronunciations', 0.075), ('meter', 0.072), ('stanzas', 0.072), ('lines', 0.069), ('pronunciation', 0.06), ('poetic', 0.055), ('dialect', 0.055), ('repetition', 0.05), ('line', 0.048), ('pick', 0.047), ('xi', 0.046), ('scheme', 0.044), ('chicago', 0.038), ('infer', 0.035), ('uren', 0.032), ('olf', 0.032), ('cago', 0.032), ('harnesses', 0.032), ('afo', 0.032), ('breeze', 0.032), ('isc', 0.032), ('metrical', 0.032), ('mtoa', 0.032), ('rwe', 0.032), ('growing', 0.031), ('ramakrishnan', 0.029), ('byrd', 0.029), ('nin', 0.029), ('byron', 0.029), ('ith', 0.029), ('anymore', 0.027), ('greene', 0.027), ('sight', 0.027), ('reddy', 0.027), ('historically', 0.026), ('qj', 0.026), ('tthh', 0.026), ('tory', 0.026), ('inevitably', 0.025), ('dialects', 0.024), ('knight', 0.024), ('constraints', 0.023), ('humanities', 0.023), ('automate', 0.023), ('forcing', 0.023), ('unsupervised', 0.023), ('sin', 0.022), ('raise', 0.022), ('ofour', 0.022), ('finding', 0.022), ('conventions', 0.022), ('chodorow', 0.022), ('handful', 0.022), ('reuse', 0.022), ('love', 0.022), ('historical', 0.021), ('transducer', 0.021), ('southern', 0.021), ('aors', 0.021), ('lhionrgtpuaisptiecrs', 0.021), ('cultural', 0.021), ('comprise', 0.021), ('stress', 0.021), ('marina', 0.021), ('region', 0.02), ('linguists', 0.02), ('ye', 0.02), ('anrneguoanl', 0.02), ('caolm', 0.02), ('cocoimatpiounta', 0.02), ('ftoiorn', 0.02), ('jueentein', 0.02), ('lipnugtuaitsiotincasl', 0.02), ('tohretl', 0.02), ('literary', 0.019), ('tnh', 0.019), ('rey', 0.019), ('characterizing', 0.019), ('period', 0.018), ('books', 0.018), ('characterized', 0.018), ('sw', 0.017), ('definition', 0.017), ('del', 0.017), ('digital', 0.017), ('utilizes', 0.017), ('il', 0.016), ('varies', 0.016), ('trends', 0.016), ('translation', 0.016), ('clues', 0.016), ('assignments', 0.016), ('favor', 0.015), ('partly', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

Author: Sravana Reddy ; Kevin Knight

Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.

2 0.032510854 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

Author: Omar F. Zaidan ; Chris Callison-Burch

Abstract: The written form of Arabic, Modern Standard Arabic (MSA), differs quite a bit from the spoken dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life. However, due to MSA’s prevalence in written form, almost all Arabic datasets have predominantly MSA content. We present the Arabic Online Commentary Dataset, a 52M-word monolingual dataset rich in dialectal content, and we describe our long-term annotation effort to identify the dialect level (and dialect itself) in each sentence of the dataset. So far, we have labeled 108K sentences, 41% of which as having dialectal content. We also present experimental results on the task of automatic dialect identification, using the collected labels for training and evaluation.

3 0.022851035 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport

Author: Siwei Wang ; Gina-Anne Levow

Abstract: Verbal feedback is an important information source in establishing interactional rapport. However, predicting verbal feedback across languages is challenging due to languagespecific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. In this paper, we employ an approach combining classifier weighting and SMOTE algorithm oversampling to improve verbal feedback prediction in Arabic, English, and Spanish dyadic conversations. This approach improves the prediction of verbal feedback, up to 6-fold, while maintaining a high overall accuracy. Analyzing highly weighted features highlights widespread use of pitch, with more varied use of intensity and duration.

4 0.022283088 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

5 0.02217856 68 acl-2011-Classifying arguments by scheme

Author: Vanessa Wei Feng ; Graeme Hirst

Abstract: Argumentation schemes are structures or templates for various kinds of arguments. Given the text of an argument with premises and conclusion identified, we classify it as an instance ofone offive common schemes, using features specific to each scheme. We achieve accuracies of 63–91% in one-against-others classification and 80–94% in pairwise classification (baseline = 50% in both cases).

6 0.018190654 336 acl-2011-Why Press Backspace? Understanding User Input Behaviors in Chinese Pinyin Input Method

7 0.016745185 197 acl-2011-Latent Class Transliteration based on Source Language Origin

8 0.016726743 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

9 0.016481174 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

10 0.015930071 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus

11 0.015916059 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

12 0.015635222 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

13 0.015046688 204 acl-2011-Learning Word Vectors for Sentiment Analysis

14 0.014805673 153 acl-2011-How do you pronounce your name? Improving G2P with transliterations

15 0.014036186 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web

16 0.013845243 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

17 0.013603355 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model

18 0.013512611 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

19 0.012799337 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

20 0.012698709 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.039), (1, -0.003), (2, 0.002), (3, 0.01), (4, -0.006), (5, 0.0), (6, 0.014), (7, -0.011), (8, -0.01), (9, 0.011), (10, -0.002), (11, 0.006), (12, 0.001), (13, 0.03), (14, 0.004), (15, -0.006), (16, -0.023), (17, 0.004), (18, -0.002), (19, 0.021), (20, 0.001), (21, 0.018), (22, 0.005), (23, 0.029), (24, -0.01), (25, -0.003), (26, 0.011), (27, -0.019), (28, 0.001), (29, -0.017), (30, -0.011), (31, 0.003), (32, 0.034), (33, -0.004), (34, 0.029), (35, -0.018), (36, -0.022), (37, -0.017), (38, 0.008), (39, -0.027), (40, 0.012), (41, 0.003), (42, -0.009), (43, 0.029), (44, -0.013), (45, -0.013), (46, -0.022), (47, 0.014), (48, -0.011), (49, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80123967 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

Author: Sravana Reddy ; Kevin Knight

Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.

2 0.47001269 149 acl-2011-Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation

Author: Nina Dethlefs ; Heriberto Cuayahuitl

Abstract: Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation. It is based on a generation space in the form of a Hidden Markov Model (HMM). Results in terms of task success and human-likeness suggest that our unified approach performs better than greedy or random baselines.

3 0.44687182 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking

Author: Sina Zarriess ; Aoife Cahill ; Jonas Kuhn

Abstract: This paper addresses a data-driven surface realisation model based on a large-scale reversible grammar of German. We investigate the relationship between the surface realisation performance and the character of the input to generation, i.e. its degree of underspecification. We extend a syntactic surface realisation system, which can be trained to choose among word order variants, such that the candidate set includes active and passive variants. This allows us to study the interaction of voice and word order alternations in realistic German corpus data. We show that with an appropriately underspecified input, a linguistically informed realisation model trained to regenerate strings from the underlying semantic representation achieves 91.5% accuracy (over a baseline of 82.5%) in the prediction of the original voice. 1

4 0.44317922 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation

Author: Fahad Alotaiby

Abstract: Arabic language is a morphologically complex language. Affixes and clitics are regularly attached to stems which make direct comparison between words not practical. In this paper we propose a new automatic headline generation technique that utilizes character cross-correlation to extract best headlines and to overcome the Arabic language complex morphology. The system that uses character cross-correlation achieves ROUGE-L score of 0. 19384 while the exact word matching scores only 0. 17252 for the same set of documents. 1

5 0.44271389 35 acl-2011-An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling

Author: Kenneth Hild ; Umut Orhan ; Deniz Erdogmus ; Brian Roark ; Barry Oken ; Shalini Purwar ; Hooman Nezamfar ; Melanie Fried-Oken

Abstract: Event related potentials (ERP) corresponding to stimuli in electroencephalography (EEG) can be used to detect the intent of a person for brain computer interfaces (BCI). This paradigm is widely used to build letter-byletter text input systems using BCI. Nevertheless using a BCI-typewriter depending only on EEG responses will not be sufficiently accurate for single-trial operation in general, and existing systems utilize many-trial schemes to achieve accuracy at the cost of speed. Hence incorporation of a language model based prior or additional evidence is vital to improve accuracy and speed. In this demonstration we will present a BCI system for typing that integrates a stochastic language model with ERP classification to achieve speedups, via the rapid serial visual presentation (RSVP) paradigm.

6 0.43257695 301 acl-2011-The impact of language models and loss functions on repair disfluency detection

7 0.42904869 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

8 0.41411722 74 acl-2011-Combining Indicators of Allophony

9 0.40682384 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

10 0.38721642 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text

11 0.38423276 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

12 0.3829214 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models

13 0.37762469 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

14 0.37076581 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport

15 0.36819565 142 acl-2011-Generalized Interpolation in Decision Tree LM

16 0.36735833 68 acl-2011-Classifying arguments by scheme

17 0.36469468 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation

18 0.35907322 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

19 0.35372382 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

20 0.34898803 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.48), (5, 0.017), (17, 0.017), (26, 0.013), (37, 0.025), (39, 0.032), (41, 0.036), (55, 0.025), (59, 0.03), (72, 0.053), (75, 0.015), (91, 0.033), (96, 0.084)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74415189 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

Author: Sravana Reddy ; Kevin Knight

Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.

2 0.72578806 134 acl-2011-Extracting and Classifying Urdu Multiword Expressions

Author: Annette Hautli ; Sebastian Sulger

Abstract: This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of 0.5 and 0.746 for locations and persons, respectively. A target application is the Urdu ParGram grammar, where MWEs are needed to generate a more precise syntactic and semantic analysis.

3 0.63642794 214 acl-2011-Lost in Translation: Authorship Attribution using Frame Semantics

Author: Steffen Hedegaard ; Jakob Grue Simonsen

Abstract: We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve clas- sifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.

4 0.53726172 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds

Author: Taniya Mishra ; Srinivas Bangalore

Abstract: There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment.

5 0.51472133 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

Author: Michael Auli ; Adam Lopez

Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.

6 0.51051128 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

7 0.31325716 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

8 0.25808236 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

9 0.25793493 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution

10 0.24559724 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components

11 0.24528858 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus

12 0.24355757 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

13 0.243478 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

14 0.2424165 252 acl-2011-Prototyping virtual instructors from human-human corpora

15 0.24017502 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation

16 0.23943257 261 acl-2011-Recognizing Named Entities in Tweets

17 0.23927064 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

18 0.23731753 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

19 0.23696813 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

20 0.23635876 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning