acl acl2011 acl2011-249 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment.
Reference: text
sentIndex sentText sentNum sentScore
1 Predicting Relative Prominence in Noun-Noun Compounds Taniya Mishra AT&T; Labs-Research 180 Park Ave Florham Park, NJ 07932 t aniya @ re s earch att com . [sent-1, score-0.044]
2 Abstract There are several theories regarding what influences prominence assignment in English noun-noun compounds. [sent-3, score-0.799]
3 We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. [sent-4, score-1.651]
4 The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment. [sent-5, score-0.856]
5 1 Introduction Text-to-speech synthesis (TTS) systems stand to gain in improved intelligibility and naturalness if we have good control of the prosody. [sent-6, score-0.022]
6 Typically, prosodic labels are predicted through text analysis and are used to control the acoustic parameters for a TTS system. [sent-7, score-0.028]
7 An important aspect of prosody prediction is predicting which words should be prosodically prominent, i. [sent-8, score-0.12]
8 , produced with greater energy, higher pitch, and/or longer duration than the neighboring words, in order to indicate the former’s greater communicative salience. [sent-10, score-0.05]
9 Appropriate prominence assignment is crucial for listeners’ understanding of the intended message. [sent-11, score-0.654]
10 However, the immense prosodic variability found in spoken language makes prominence prediction a challenging problem. [sent-12, score-0.739]
11 A particular sub-problem of prominence prediction that still defies a complete solution is prediction of relative prominence in noun-noun compounds. [sent-13, score-1.395]
12 Noun-noun compounds such as White House, cherry pie, parking lot, Madison Avenue, Wall Street, nail polish, , french fries, computer programmer, dog catcher, silk tie, and self reliance, occur quite frequently in the English language. [sent-14, score-0.398]
13 In a discourse neutral context, such constructions usually have leftmost prominence, i. [sent-15, score-0.046]
14 , speakers produce the left-hand noun with greater prominence than the 609 Srinivas Bangalore AT&T; Labs-Research 180 Park Ave Florham Park, NJ 07932 s rini @ re s earch . [sent-17, score-0.726]
15 However, a significant portion about 25% (Liberman and Sproat, 1992) of them are assigned rightmost prominence (such as cherry pie, Madison Avenue, silk tie, computer programmer, and self reliance from the list above). [sent-20, score-0.785]
16 What factors influence speakers’ decision to assign left or right prominence is still an open question. [sent-21, score-0.635]
17 1 However, in most studies, the different theories are examined and applied in isolation, thus making it difficult to compare them directly. [sent-23, score-0.145]
18 It would be informative and illuminating to apply these theories to the same task and the same dataset. [sent-24, score-0.172]
19 — — For this paper, we focus on two particular theories, the informativeness theory and the semantic composition theory. [sent-25, score-0.614]
20 The informativeness theory posits that the relatively more informative and unexpected noun is given greater prominence in the NN compound than the less informative and more predictable noun. [sent-26, score-1.458]
21 The semantic composition theory posits that relative prominence assignment in NN compounds is decided according to the semantic relationship between the two nouns. [sent-27, score-1.314]
22 We apply these two theories to the task of predicting relative prominence in NN compounds via statistical corpus-driven methods, within the larger context of building a system that can predict appropriate prominence patterns for text-to-speech synthesis. [sent-28, score-1.7]
23 Here we are only focusing on predicting relative prominence of NN compounds in a neutral context, where there are no pragmatic reasons (such as contrastiveness or given/new distinction) for shifting prominence. [sent-29, score-0.972]
24 1In-depth reviews of the different theories Plag (2006) and Bell and Plag (2010). [sent-30, score-0.145]
25 UP = logPFirFerqe(qw(iw)i) (1) This is a very simple mPeasure of word informativeness that has been shown to be effective in a similar task (Pan and McKeown, 1999). [sent-35, score-0.442]
26 • Bigram Predictability (BP): Defined as the predBiicgtarabmili tPyr eodfi cat awbiolritdy g (BivePn): a D perfeinveidou ass w thoerd pr, eitis measured as the log probability of noun N2 given noun N1. [sent-36, score-0.225]
27 We define PKL as Prob(N2 | N1)logPrPobr(oNb(2N |2 N)1) (5) 610 Another way to consider PKL is as PMI nor- malized by the predictability of N2 given N1. [sent-42, score-0.124]
28 All except the first the aforementioned five informativeness measures are relative measures. [sent-43, score-0.566]
29 Of these, PMI and Dice Coefficient are symmetric measures while Bigram Predictability and PKL are nonsymmetric (unidirectional) measures. [sent-44, score-0.058]
30 3 Semantic Relationship Modeling We modeled the semantic relationship between the two nouns in the NN compound as follows. [sent-45, score-0.318]
31 For each of the two nouns in each NN compound, we maintain a semantic category vector of 26 elements. [sent-46, score-0.18]
32 The 26 elements are associated with 26 semantic categories (such as food, event, act, location, artifact, etc. [sent-47, score-0.083]
33 For each noun, each element of the semantic category vector is assigned a value of 1, if the lemmatized noun (i. [sent-49, score-0.238]
34 , the associated uninflected dictionary entry) is assigned the associated semantic category by WordNet, otherwise, the element is assigned a value of 0. [sent-51, score-0.204]
35 (If a semantic category vector is entirely populated by zeros, then that noun has not been assigned any semantic category information by WordNet. [sent-52, score-0.322]
36 ) We expected the cross-product of the se- mantic category vectors of the two nouns in the NN compound to roughly encode the possible semantic relationships between the two nouns, which following the semantic composition theory correlates with prominence assignment to some extent. [sent-53, score-1.159]
37 — — 4 Semantic Informativeness Features For each noun in each NN compound, we also maintain three semantic informativeness features: (1) Number of possible synsets associated with the noun. [sent-54, score-0.599]
38 A synset is a set of words that have the same sense or meaning. [sent-55, score-0.044]
39 (2) Left positional family size and (3) Right positional family size. [sent-56, score-0.278]
40 Positional family size is the number of unique NN compounds that include the particular noun, either on the left or on the right (Bell and Plag, 2010). [sent-57, score-0.406]
41 The intuition behind extracting synset counts and positional family size was, once again, to measure the relative informativeness ofthe nouns in NN compounds. [sent-59, score-0.754]
42 Smaller synset counts indicate more specific meaning of the noun, and thus perhaps more information content. [sent-60, score-0.07]
43 Larger right (or left) positional family size indicates that the noun is present in the right (left) position of many possible NN com- pounds, and thus less likely to receive higher prominence in such compounds. [sent-61, score-0.876]
44 These features capture type-based informativeness, in contrast to the measures described in Section 2, which capture token-based informativeness. [sent-62, score-0.036]
45 5 Experimental evaluation For our evaluation, we used a hand-labeled corpus of 783 1NN compounds randomly selected from the 1990 Associated Press newswire, and hand-tagged for leftmost or rightmost prominence (Sproat, 1994). [sent-63, score-0.964]
46 This corpus contains 64 pairs of NN compounds that differ in terms of capitalization but not in terms of relative prominence assignment. [sent-64, score-0.991]
47 It only contains four pairs of NN compounds that differ in terms of capitalization and in terms of relative prominence assignment. [sent-65, score-0.991]
48 Since there is not enough data in this corpus to consider capitalization as a feature, we removed the case information (by lowercasing the entire corpora), and removed any duplicates. [sent-66, score-0.055]
49 For each of the NN compounds in this corpus, we computed the three aforementioned feature sets. [sent-69, score-0.352]
50 To compute the informativeness features, we used the LDC English Gigaword corpus. [sent-70, score-0.442]
51 The semantic category vectors and the semantic informativeness features were obtained from Wordnet. [sent-71, score-0.615]
52 Using each of the three feature sets individually as well as combined together, we built automatic relative prominence prediction models using Boostexter, a discriminative classification model based on the boosting family of algorithms, which was first proposed in Freund and Schapire (1996). [sent-72, score-0.881]
53 For each test case, the output of the prediction models was either a 0 (indicating that the leftmost noun receive higher prominence) or a 1 (indicating that the rightmost noun receive higher prominence). [sent-74, score-0.393]
54 We estimated the model error of the different prediction models by computing the relative error reduction from the baseline error. [sent-75, score-0.26]
55 The baseline error was obtained by assigning 611 the majority class to all test cases. [sent-76, score-0.097]
56 We also present the results of building prediction models by combining different features sets. [sent-84, score-0.084]
57 These results show that each of the prediction models reduces the baseline error, thus indicating that the different types of feature sets are each correlated with prominence assignment in NN compounds to some extent. [sent-85, score-1.11]
58 However, it appears that some feature sets are more predictive. [sent-86, score-0.055]
59 Of the individual feature sets, SRF and INF features appear to be more predictive than the SIF features. [sent-87, score-0.084]
60 Combined together, the three feature sets are most predictive, reducing model error over the baseline error by almost 33% (compared to 16-22% for individual feature sets), though combining INF with SRF features almost achieves the same reduction in baseline error. [sent-88, score-0.228]
61 Note that none of the three types of feature sets that we have defined contain any direct lexical information such as the nouns themselves or their lem- mata. [sent-89, score-0.123]
62 However, considering that the lexical content of the words is a rich source of information that could have substantial predictive power, we included the lemmata associated with the nouns in the NN compounds as additional features to each feature set and rebuilt the prediction models. [sent-90, score-0.594]
63 Indeed, addition of the lemmatized form of the NN compounds substantially increases the predictive power of all the models. [sent-92, score-0.371]
64 The baseline error is reduced by almost 50% in each of the models the error reduction being the greatest (53%) for the model built by combining all three feature sets. [sent-93, score-0.148]
65 — 6 Discussion and Conclusion Several other studies have examined the main idea of relative prominence assignment using one or more of the theories that we have focused on in this paper (though the particular tasks and terminology used were different) and found similar results. [sent-94, score-0.883]
66 For example, Pan and Hirschberg (2000) have used some of the same informativeness measures (denoted by INF above) to predict pitch accent placement in word bi- grams. [sent-95, score-0.742]
67 Since pitch accents and perception of prominence are strongly correlated, their conclusion that informativeness measures are a good predictor of pitch accent placement agrees with our conclusion that informativeness measures are useful predictors of relative prominence assignment. [sent-96, score-2.543]
68 However, we cannot compare their results to ours directly, since their corpus and baseline error measurement2 were different from ours. [sent-97, score-0.07]
69 His feature set was developed to model the semantic relationship between the two nouns in the NN compound, and included the lemmata related to the nouns. [sent-100, score-0.242]
70 The model was trained and tested on the same hand-labeled corpus that we used for this study and the baseline error was measured in the same way. [sent-101, score-0.095]
71 2Pan and Hirschberg present error obtained by using a unigram-based predictability model as baseline error. [sent-103, score-0.194]
72 It is unclear what is the error obtained by assigning left prominence to all words in their database, which was our baseline error. [sent-104, score-0.71]
73 612 ment in NN compounds: the informativeness theory and the semantic composition theory. [sent-105, score-0.614]
74 Our evaluation indicates that each of these theories is relevant, though perhaps to different degrees. [sent-107, score-0.171]
75 This is supported by the observation that the combined model (in Table 1) is substantially more predictive than any of the individual models. [sent-108, score-0.051]
76 This indicates that the different feature sets capture different correlations, and that perhaps each of the theories (on which the feature sets are based) account for different types ofvariability in prominence assignment. [sent-109, score-0.864]
77 Our results also highlight the difference between being able to use lexical information in prominence prediction of NN compounds, or not. [sent-110, score-0.667]
78 Using lexical features, we can improve prediction over the default case (i. [sent-111, score-0.084]
79 , assigning prominence to the left noun in all cases) by over 50%. [sent-113, score-0.714]
80 But if the given input is an out-of-vocabulary NN compound, our non-lexically enhanced best model can still improve prediction over the default by about 33%. [sent-114, score-0.084]
81 Informativeness is a determinant of compound stress in English. [sent-121, score-0.238]
82 Compounding and stress in English: A closer look at the boundary between morphology and syntax. [sent-178, score-0.085]
83 The variability of compound stress in English: structural, semantic and analogical factors. [sent-198, score-0.379]
wordName wordTfidf (topN-words)
[('prominence', 0.583), ('informativeness', 0.442), ('compounds', 0.292), ('nn', 0.198), ('compound', 0.153), ('pkl', 0.149), ('theories', 0.145), ('sproat', 0.135), ('accent', 0.132), ('predictability', 0.124), ('pitch', 0.096), ('stress', 0.085), ('prediction', 0.084), ('positional', 0.077), ('plag', 0.074), ('noun', 0.074), ('assignment', 0.071), ('nouns', 0.068), ('srf', 0.065), ('inf', 0.064), ('family', 0.062), ('semantic', 0.061), ('relative', 0.061), ('dice', 0.06), ('theory', 0.056), ('composition', 0.055), ('capitalization', 0.055), ('pan', 0.055), ('pmi', 0.052), ('schapire', 0.051), ('category', 0.051), ('predictive', 0.051), ('park', 0.05), ('florham', 0.05), ('ladd', 0.05), ('programmer', 0.05), ('sif', 0.05), ('silk', 0.05), ('liberman', 0.05), ('leftmost', 0.046), ('error', 0.045), ('variability', 0.044), ('synset', 0.044), ('earch', 0.044), ('lemmata', 0.044), ('rightmost', 0.043), ('pie', 0.04), ('bell', 0.039), ('madison', 0.038), ('posits', 0.038), ('prob', 0.038), ('measures', 0.036), ('predicting', 0.036), ('relationship', 0.036), ('receive', 0.036), ('boosting', 0.036), ('placement', 0.036), ('analogical', 0.036), ('tts', 0.034), ('feature', 0.033), ('ave', 0.033), ('freund', 0.033), ('predictable', 0.033), ('wordnet', 0.031), ('left', 0.03), ('bp', 0.03), ('self', 0.03), ('tie', 0.03), ('hirschberg', 0.029), ('reliance', 0.029), ('prosodic', 0.028), ('lemmatized', 0.028), ('assigning', 0.027), ('cat', 0.027), ('avenue', 0.027), ('aforementioned', 0.027), ('informative', 0.027), ('chicago', 0.026), ('cherry', 0.026), ('perhaps', 0.026), ('baseline', 0.025), ('measured', 0.025), ('greater', 0.025), ('dc', 0.025), ('log', 0.025), ('assigned', 0.024), ('studies', 0.023), ('pointwise', 0.023), ('fellbaum', 0.023), ('english', 0.022), ('right', 0.022), ('associated', 0.022), ('sets', 0.022), ('vergence', 0.022), ('failing', 0.022), ('nonsymmetric', 0.022), ('qw', 0.022), ('vlc', 0.022), ('intelligibility', 0.022), ('bolinger', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment.
2 0.20037305 193 acl-2011-Language-independent compound splitting with morphological operations
Author: Klaus Macherey ; Andrew Dai ; David Talbot ; Ashok Popat ; Franz Och
Abstract: Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
3 0.10102486 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
Author: Sara Stymne
Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.
4 0.06289535 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
Author: Je Hun Jeon ; Wen Wang ; Yang Liu
Abstract: In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount ofdata and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.
5 0.056271721 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
Author: Zhongguo Li
Abstract: Lots of Chinese characters are very productive in that they can form many structured words either as prefixes or as suffixes. Previous research in Chinese word segmentation mainly focused on identifying only the word boundaries without considering the rich internal structures of many words. In this paper we argue that this is unsatisfying in many ways, both practically and theoretically. Instead, we propose that word structures should be recovered in morphological analysis. An elegant approach for doing this is given and the result is shown to be promising enough for encouraging further effort in this direction. Our probability model is trained with the Penn Chinese Treebank and actually is able to parse both word and phrase structures in a unified way. 1 Why Parse Word Structures? Research in Chinese word segmentation has progressed tremendously in recent years, with state of the art performing at around 97% in precision and recall (Xue, 2003; Gao et al., 2005; Zhang and Clark, 2007; Li and Sun, 2009). However, virtually all these systems focus exclusively on recognizing the word boundaries, giving no consideration to the internal structures of many words. Though it has been the standard practice for many years, we argue that this paradigm is inadequate both in theory and in practice, for at least the following four reasons. The first reason is that if we confine our definition of word segmentation to the identification of word boundaries, then people tend to have divergent 1405 opinions as to whether a linguistic unit is a word or not (Sproat et al., 1996). This has led to many different annotation standards for Chinese word segmentation. Even worse, this could cause inconsistency in the same corpus. For instance, 䉂 擌 奒 ‘vice president’ is considered to be one word in the Penn Chinese Treebank (Xue et al., 2005), but is split into two words by the Peking University corpus in the SIGHAN Bakeoffs (Sproat and Emerson, 2003). Meanwhile, 䉂 䀓 惼 ‘vice director’ and 䉂 䚲䡮 ‘deputy are both segmented into two words in the same Penn Chinese Treebank. In fact, all these words are composed of the prefix 䉂 ‘vice’ and a root word. Thus the structure of 䉂擌奒 ‘vice president’ can be represented with the tree in Figure 1. Without a doubt, there is complete agree- manager’ NN ,,ll JJf NNf 䉂 擌奒 Figure 1: Example of a word with internal structure. ment on the correctness of this structure among native Chinese speakers. So if instead of annotating only word boundaries, we annotate the structures of every word, then the annotation tends to be more 1 1Here it is necessary to add a note on terminology used in this paper. Since there is no universally accepted definition of the “word” concept in linguistics and especially in Chinese, whenever we use the term “word” we might mean a linguistic unit such as 䉂 擌奒 ‘vice president’ whose structure is shown as the tree in Figure 1, or we might mean a smaller unit such as 擌奒 ‘president’ which is a substructure of that tree. Hopefully, ProceedingPso orftla thned 4,9 Otrhe Agonnn,u Jauln Mee 1e9t-i2ng4, o 2f0 t1h1e. A ?c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s405–1414, consistent and there could be less duplication of efforts in developing the expensive annotated corpus. The second reason is applications have different requirements for granularity of words. Take the personal name 撱 嗤吼 ‘Zhou Shuren’ as an example. It’s considered to be one word in the Penn Chinese Treebank, but is segmented into a surname and a given name in the Peking University corpus. For some applications such as information extraction, the former segmentation is adequate, while for others like machine translation, the later finer-grained output is more preferable. If the analyzer can produce a structure as shown in Figure 4(a), then every application can extract what it needs from this tree. A solution with tree output like this is more elegant than approaches which try to meet the needs of different applications in post-processing (Gao et al., 2004). The third reason is that traditional word segmentation has problems in handling many phenomena in Chinese. For example, the telescopic compound 㦌 撥 怂惆 ‘universities, middle schools and primary schools’ is in fact composed ofthree coordinating elements 㦌惆 ‘university’, 撥 惆 ‘middle school’ and 怂惆 ‘primary school’ . Regarding it as one flat word loses this important information. Another example is separable words like 扩 扙 ‘swim’ . With a linear segmentation, the meaning of ‘swimming’ as in 扩 堑 扙 ‘after swimming’ cannot be properly represented, since 扩扙 ‘swim’ will be segmented into discontinuous units. These language usages lie at the boundary between syntax and morphology, and are not uncommon in Chinese. They can be adequately represented with trees (Figure 2). (a) NN (b) ???HHH JJ NNf ???HHH JJf JJf JJf 㦌 撥 怂 惆 VV ???HHH VV NNf ZZ VVf VVf 扩 扙 堑 Figure 2: Example of telescopic compound (a) and separable word (b). The last reason why we should care about word the context will always make it clear what is being referred to with the term “word”. 1406 structures is related to head driven statistical parsers (Collins, 2003). To illustrate this, note that in the Penn Chinese Treebank, the word 戽 䊂䠽 吼 ‘English People’ does not occur at all. Hence constituents headed by such words could cause some difficulty for head driven models in which out-ofvocabulary words need to be treated specially both when they are generated and when they are conditioned upon. But this word is in turn headed by its suffix 吼 ‘people’, and there are 2,233 such words in Penn Chinese Treebank. If we annotate the structure of every compound containing this suffix (e.g. Figure 3), such data sparsity simply goes away.
6 0.051943149 167 acl-2011-Improving Dependency Parsing with Semantic Classes
7 0.04601502 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
8 0.043464907 333 acl-2011-Web-Scale Features for Full-Scale Parsing
9 0.042500768 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
10 0.039401781 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
11 0.037546616 44 acl-2011-An exponential translation model for target language morphology
12 0.037181556 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
13 0.035503265 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment
14 0.034973491 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
15 0.034239996 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
16 0.033353172 163 acl-2011-Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes
17 0.033324882 159 acl-2011-Identifying Noun Product Features that Imply Opinions
18 0.03315381 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
19 0.03280279 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations
20 0.032124192 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
topicId topicWeight
[(0, 0.103), (1, 0.007), (2, -0.017), (3, -0.019), (4, -0.024), (5, 0.018), (6, 0.046), (7, -0.009), (8, 0.026), (9, 0.023), (10, -0.017), (11, -0.027), (12, -0.012), (13, -0.002), (14, 0.028), (15, -0.005), (16, -0.049), (17, -0.005), (18, 0.012), (19, -0.014), (20, 0.043), (21, 0.02), (22, -0.056), (23, -0.008), (24, 0.009), (25, 0.009), (26, 0.068), (27, -0.009), (28, -0.053), (29, 0.027), (30, 0.006), (31, 0.025), (32, 0.096), (33, -0.07), (34, 0.034), (35, 0.031), (36, 0.026), (37, -0.029), (38, -0.086), (39, 0.127), (40, -0.053), (41, -0.113), (42, 0.011), (43, -0.074), (44, 0.044), (45, 0.069), (46, 0.063), (47, -0.072), (48, 0.003), (49, -0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.87018287 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment.
2 0.7197755 193 acl-2011-Language-independent compound splitting with morphological operations
Author: Klaus Macherey ; Andrew Dai ; David Talbot ; Ashok Popat ; Franz Och
Abstract: Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
3 0.51068163 223 acl-2011-Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts
Author: Derya Ozkan ; Louis-Philippe Morency
Abstract: In many computational linguistic scenarios, training labels are subjectives making it necessary to acquire the opinions of multiple annotators/experts, which is referred to as ”wisdom of crowds”. In this paper, we propose a new approach for modeling wisdom of crowds based on the Latent Mixture of Discriminative Experts (LMDE) model that can automatically learn the prototypical patterns and hidden dynamic among different experts. Experiments show improvement over state-of-the-art approaches on the task of listener backchannel prediction in dyadic conversations.
4 0.47736853 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
Author: Sara Stymne
Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.
5 0.477364 10 acl-2011-A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing
Author: John Lee ; Jason Naradowsky ; David A. Smith
Abstract: Most previous studies of morphological disambiguation and dependency parsing have been pursued independently. Morphological taggers operate on n-grams and do not take into account syntactic relations; parsers use the “pipeline” approach, assuming that morphological information has been separately obtained. However, in morphologically-rich languages, there is often considerable interaction between morphology and syntax, such that neither can be disambiguated without the other. In this paper, we propose a discriminative model that jointly infers morphological properties and syntactic structures. In evaluations on various highly-inflected languages, this joint model outperforms both a baseline tagger in morphological disambiguation, and a pipeline parser in head selection.
6 0.47698978 124 acl-2011-Exploiting Morphology in Turkish Named Entity Recognition System
7 0.45653975 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
8 0.45220837 125 acl-2011-Exploiting Readymades in Linguistic Creativity: A System Demonstration of the Jigsaw Bard
10 0.44289127 303 acl-2011-Tier-based Strictly Local Constraints for Phonology
11 0.43132794 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
12 0.42259234 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction
13 0.40075621 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach
14 0.39873609 118 acl-2011-Entrainment in Speech Preceding Backchannels.
15 0.39840677 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
16 0.39399087 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon
17 0.39374173 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
18 0.38402981 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser
19 0.38182479 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity
20 0.38028216 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
topicId topicWeight
[(0, 0.322), (5, 0.026), (17, 0.046), (26, 0.023), (31, 0.012), (37, 0.089), (39, 0.039), (41, 0.051), (55, 0.02), (59, 0.053), (72, 0.045), (91, 0.036), (96, 0.115), (97, 0.024), (98, 0.015)]
simIndex simValue paperId paperTitle
1 0.87616265 134 acl-2011-Extracting and Classifying Urdu Multiword Expressions
Author: Annette Hautli ; Sebastian Sulger
Abstract: This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of 0.5 and 0.746 for locations and persons, respectively. A target application is the Urdu ParGram grammar, where MWEs are needed to generate a more precise syntactic and semantic analysis.
2 0.87134773 321 acl-2011-Unsupervised Discovery of Rhyme Schemes
Author: Sravana Reddy ; Kevin Knight
Abstract: This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
3 0.81121689 214 acl-2011-Lost in Translation: Authorship Attribution using Frame Semantics
Author: Steffen Hedegaard ; Jakob Grue Simonsen
Abstract: We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve clas- sifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.
same-paper 4 0.74660933 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment.
5 0.72803879 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging
Author: Michael Auli ; Adam Lopez
Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.
8 0.49521863 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
9 0.48543328 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
10 0.48543003 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
11 0.48517895 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components
12 0.48240882 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
13 0.48195308 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
14 0.48182592 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
15 0.48149982 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution
16 0.48070186 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
17 0.47987407 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
18 0.47950521 311 acl-2011-Translationese and Its Dialects
19 0.47919193 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
20 0.47824332 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment