acl acl2013 acl2013-205 knowledge-graph by maker-knowledge-mining

205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

Source: pdf

Author: Will Radford ; James R. Curran

Abstract: Appositions are adjacent NPs used to add information to a discourse. We propose systems exploiting syntactic and semantic constraints to extract appositions from OntoNotes. Our joint log-linear model outperforms the state-of-the-art Favre and Hakkani-T u¨r (2009) model by ∼10% on HBarokakdacnais-tT News, a9n)d m aocdheielv beys ∼541.03%% oFnscore on multiple genres.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 au Abstract Appositions are adjacent NPs used to add information to a discourse. [sent-5, score-0.064]

2 We propose systems exploiting syntactic and semantic constraints to extract appositions from OntoNotes. [sent-6, score-0.534]

3 Our joint log-linear model outperforms the state-of-the-art Favre and Hakkani-T u¨r (2009) model by ∼10% on HBarokakdacnais-tT News, a9n)d m aocdheielv beys ∼541. [sent-7, score-0.038]

4 1 Introduction Appositions are typically adjacent coreferent noun phrases (NP) that often add information about named entities (NEs). [sent-9, score-0.064]

5 The apposition in Figure 1 consists of three comma-separated NPs the first NP (HEAD) names an entity and the others (ATTRs) – supply age and profession attributes. [sent-10, score-0.519]

6 Attributes can be difficult to identify despite characteristic punctuation cues, as punctuation plays many roles and attributes may have rich substructure. [sent-11, score-0.12]

7 While linguists have studied apposition in detail (Quirk et al. [sent-12, score-0.519]

8 , 1985; Meyer, 1992), most apposition extraction has been within other tasks, such as coreference resolution (Luo and Zitouni, 2005; Culotta et al. [sent-13, score-0.754]

9 We analyze apposition distribution in OntoNotes 4 (Pradhan et al. [sent-16, score-0.519]

10 , 2007) and compare rule-based, classification and parsing extraction systems. [sent-17, score-0.042]

11 Our best system uses a joint model to classify pairs of NPs with features that faithfully encode syntactic and semantic restrictions on appositions, using parse trees and WordNet synsets. [sent-18, score-0.179]

12 , , , Figure 1: Example apposition from OntoNotes 4 Our approach substantially outperforms Favre and Hakkani-T u¨r on Broadcast News (BN) at 54. [sent-23, score-0.519]

13 Our results will immediately help the many systems that already use apposition extraction components, such as coreference resolution and IE. [sent-26, score-0.754]

14 They are usually composed of two or more adjacent NPs, hierarchically structured, so one is the head NP (HEAD) and the rest attributes (ATTRs). [sent-29, score-0.183]

15 They are often flagged using punctuation in text and pauses in speech. [sent-30, score-0.094]

16 propose three tests for apposition: i) each phrase can be omitted without affecting sentence acceptability, ii) each fulfils the same syntac- tic function in the resultant sentences, iii) extralinguistic reference is unchanged. [sent-33, score-0.037]

17 We adopt the OntoNotes guidelines’ relatively strict interpretation: “a noun phrase that modifies an immediately-adjacent noun phrase (these may be separated by only a comma, colon, or parenthesis). [sent-39, score-0.104]

18 TR9A,5 IN9 F50DE9V76F64T1E,0S96T8 FT438R, 7A86I72N76,D85E904V26T,E48S9 T06 Table 1: Sentence and apposition distribution Apposition extraction is a common component in many NLP tasks: coreference resolution (Luo and Zitouni, 2005; Culotta et al. [sent-44, score-0.754]

19 , 2007; Bengtson and Roth, 2008; Poon and Domingos, 2008), textual entailment (Roth and Sammons, 2007; Cabrio and Magnini, 2010), sentence simplification (Miwa et al. [sent-45, score-0.074]

20 Despite this, few papers to our knowledge explicitly evaluate apposition extraction. [sent-51, score-0.519]

21 Moreover, apposition extraction is rarely the main research goal and descriptions of the methods used are often accordingly terse or do not match our guidelines. [sent-52, score-0.561]

22 (201 1) use rules to extract appositions for coreference resolution, selecting only those that are explicitly flagged using commas or parentheses. [sent-54, score-0.595]

23 While such differences capture useful information for coreference resolution, these methods would be unfairly disadvantaged in a direct evaluation. [sent-56, score-0.12]

24 Favre and Hakkani-T u¨r (2009, FHT) directly evaluate three extraction systems on OntoNotes 2. [sent-57, score-0.042]

25 The first retrains the Berkeley parser (Petrov and Klein, 2007) on trees la- belled with appositions by appending the HEAD and ATTR suffix to NPs we refer to this as a Labelled Berkeley Parser (LBP). [sent-59, score-0.516]

26 The second is a CRF labelling words using an IOB apposition scheme. [sent-60, score-0.556]

27 Their focus on BN automated speech recognition (ASR) output, which precludes punctuation cues, does not indicate how well the methods perform on textual genres. [sent-68, score-0.06]

28 Moreover all systems use parsers or parse-label features and do not completely evaluate non-parser methods for extraction despite including baselines. [sent-69, score-0.042]

29 We manually adjust appositions that do not have exactly one HEAD and one or more ATTR1 . [sent-87, score-0.441]

30 Some appositions are nested, and we keep only “leaf” appositions, removing the higher-level appositions. [sent-88, score-0.486]

31 OntoNotes 4 is made up of a wide vari- ety of sources: broadcast conversation and news, magazine, newswire and web text. [sent-91, score-0.037]

32 Appositions are most frequent in newswire (one per 192 words) and least common in broadcast conversation (one per 645 words) with the others in between (around one per 3 15 words). [sent-92, score-0.184]

33 Table 1 shows the distribution of sentences and appositions (HEAD-ATTR pairs). [sent-96, score-0.441]

34 1 Analysis Most appositions in TRAIN have one ATTR (97. [sent-98, score-0.441]

35 Comma-separated apposition is the most common (63%) and 93% are separated by zero or token. [sent-104, score-0.519]

36 4 Extracting Appositions We investigate different extraction systems using a range of syntactic information. [sent-138, score-0.092]

37 Our systems that use syntactic parses generate candidates (pairs of NPs: p1 and p2) that are then classified as apposition or not. [sent-139, score-0.601]

38 This paper contributes three complementary techniques for more faithfully modelling apposition. [sent-140, score-0.06]

39 Any adjacent NPs, disregarding intervening punctuation, could be considered candidates, however stronger syntactic constraints that only allow sibling NP children provide higher precision candidate sets. [sent-141, score-0.157]

40 Semantic compatibility features encoding that an ATTR provides consistent information for its HEAD. [sent-142, score-0.045]

41 A joint classifier models the complete apposition rather than combining separate phrase-wise decisions. [sent-143, score-0.557]

42 Pattern POS, NE and lexical patterns are used to extract appositions avoiding parsing’s computational overhead. [sent-146, score-0.441]

43 The “role” gazetteer is the transitive closure of hyponyms of the WordNet (Miller, 1995) synset person . [sent-149, score-0.116]

44 Tu- ples are post-processed to remove spurious appo2There is some overlap between TRAIN and DEVF/TESTF with appositions from the latter used in rule generation. [sent-154, score-0.441]

45 Rule We only consider HEADs whose syntactic head is a PER, ORG, LOC or GPE NE. [sent-157, score-0.139]

46 We formalise semantic compatibility by requiring the ATTR head to match a gazetteer dependent on the HEAD’s NE type. [sent-158, score-0.25]

47 To create PER, ORG and LOC gazetteers, we identified common ATTR heads in TRAIN and looked for matching WordNet synsets, selecting the most general hypernym that was still semantically compatible with the HEAD’s NE type. [sent-159, score-0.06]

48 We use partitive and NML-aware rules (Collins, 1999; Vadas and Curran, 2007) to extract syntactic heads from ATTRs. [sent-162, score-0.11]

49 Extracted tuples are post-processed as for Pattern and reranked by the OntoNotes specificity scale (i. [sent-166, score-0.066]

50 Labelled Berkeley Parser We train a LBP on TRAIN and recover appositions from parsed sentences. [sent-172, score-0.475]

51 Without syntactic constraints this is equivalent to FHT’s LBP system (LBPF) and indicated by † in Tables. [sent-173, score-0.093]

52 We use a log-linear model with a SGD optimizer from scikit-learn (Pedregosa 3Full description: http : / / s chwa . [sent-175, score-0.049]

53 o016-wthe ModelFull system-syn-sem-both+gold impact of removing constraints/features, +gold shows the impact of parse and tagging errors. [sent-191, score-0.137]

54 The binary features are calculated from a generated candidate phrase (p) and are the same as FHT’s phrase system (PhraseF), denoted ‡ in Tables. [sent-194, score-0.074]

55 and to decode classifications, adjacent apposition-classified NPs are re-ordered by specificity. [sent-196, score-0.064]

56 , “{the director}a” or “{her husband}a”) p’s syntactic rh “e{adhe mr hatucsbheasn a NE-specific sepm’san styicn gazetteer (e. [sent-199, score-0.166]

57 , “{the famous actor}a” → PER, “{investment bank}a” → ORG) p’s syntactic hsetmade th basa tkh}e POS CD (e. [sent-201, score-0.05]

58 The system uses the phrase model features as above as well as pairwise features: • • • • 5 the cross-product of selected features for p1 athned p2: gazetteer matches, NE type, specificity rank. [sent-210, score-0.219]

59 For example, if the HEAD has the NE type PER and the ATTR has the syntactic head in the PER gazetteer, for example “{Tom Cruise}h, {famous actor}a,” → (p1: PER, p2: PER-gaz) If semantic features are found in p1 and p2 p1/p2 specificity (e. [sent-212, score-0.205]

60 Table 4 shows our systems’ performance on the multi-genre DEV dataset, the impact of removing syntactic constraints, semantic features and parse/tag error. [sent-217, score-0.141]

61 All other results use parses and, al- though it has a low F-score, the Adjacent NPs’ 65. [sent-220, score-0.032]

62 1% recall, without syntactic constraints, is the upper bound for the parse-based systems. [sent-221, score-0.05]

63 Statistical models improve performance, with the joint models better than the higher-precision phrase model as the latter must make two independently correct classification decisions. [sent-222, score-0.075]

64 9% using a joint model over the de-labelled trees produced by the LBP. [sent-224, score-0.069]

65 This indicates that although our model does not use the apposition labels from the tree, the tree is a more suitable structure for extraction. [sent-225, score-0.519]

66 Removing syntactic constraints mostly reduces performance in parse-based systems as the system must consider lower-quality candidates. [sent-229, score-0.093]

67 Removing semantic features has less impact and removing both is most detrimental to performance. [sent-231, score-0.091]

68 These features have less impact on joint models; indeed, joint performance using BP trees increases without the features, perhaps as joint models already model the syntactic context. [sent-232, score-0.241]

69 We evaluate the impact of parser and tagger error by using gold-standard resources. [sent-233, score-0.09]

70 Goldstandard tags and trees improve recall in all cases leading to F-score improvements (+gold). [sent-234, score-0.031]

71 The pattern system is reasonably robust to automatic tagging errors, but parse-based models suffer considerably from automatic parses. [sent-235, score-0.031]

72 To compare the impact of tagging and parsing error, we configure the joint system to use gold parses and automatic NE tags and vice versa. [sent-236, score-0.116]

73 Using automatic tags does not greatly impact performance (-1. [sent-237, score-0.046]

74 3%), whereas 4We do not implement the IOB or use LBP features for TRAIN as these would require n-fold parser training. [sent-238, score-0.044]

75 ErrorBPLBPδ using automatic parses causes a drop of around 20% to 57. [sent-243, score-0.032]

76 7%, demonstrating that syntactic information is crucial for apposition extraction. [sent-244, score-0.569]

77 Our joint LBP system is substantially better, scoring 54. [sent-249, score-0.038]

78 Finally, we test whether labelling appositions can help parsing. [sent-255, score-0.478]

79 We parse DEV trees with LBP and BP, remove apposition labels and analyse the impact of labelling using the Berkeley Parser Analyser (Kummerfeld et al. [sent-256, score-0.633]

80 Table 6 shows the LBP makes fewer errors, particularly NP internal structuring, PP and clause attachment classes at the cost of modifier attachment and coordination errors. [sent-258, score-0.094]

81 Rather than increasing parsing difficulty, apposition labels seem complementary, improving performance. [sent-259, score-0.519]

82 6 Conclusion We present three apposition extraction techniques. [sent-260, score-0.561]

83 Linguistic tests for apposition motivate strict syntactic constraints on candidates and semantic features encode the addition of compatible information. [sent-261, score-0.642]

84 Joint models more faithfully capture apposition structure and our best system achieves stateof-the-art performance of 54. [sent-262, score-0.579]

85 Our results will immediately benefit the large number of systems with apposition extraction components for coreference resolution and IE. [sent-264, score-0.754]

86 Supporting the adaptation of texts for poor literacy readers: a text simplification editor for brazilian portuguese. [sent-285, score-0.037]

87 Phrase and word level strategies for detecting appositions in speech. [sent-309, score-0.441]

88 Parser showdown at the wall street corral: An empirical investigation of error types in parser output. [sent-316, score-0.044]

89 Stanford’s multi-pass sieve coreference resolution system at the conll2011 shared task. [sent-321, score-0.193]

90 CoNLL-201 1 shared task: Modeling unrestricted coreference in OntoNotes. [sent-382, score-0.12]

91 Resolving attachment and clause boundary ambiguities for simplify- ing relative clause constructs. [sent-408, score-0.092]

92 Extraction of entailed semantic relations through syntaxbased comma resolution. [sent-413, score-0.042]

93 A more precise analysis of punctuation for broad-coverage surface realization with CCG. [sent-429, score-0.06]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('apposition', 0.519), ('appositions', 0.441), ('lbp', 0.245), ('ontonotes', 0.17), ('attr', 0.169), ('nps', 0.157), ('fht', 0.123), ('coreference', 0.12), ('gazetteer', 0.116), ('favre', 0.112), ('np', 0.11), ('ne', 0.11), ('head', 0.089), ('attrs', 0.087), ('iob', 0.08), ('pradhan', 0.077), ('lbpf', 0.074), ('phrasef', 0.074), ('resolution', 0.073), ('specificity', 0.066), ('curran', 0.066), ('adjacent', 0.064), ('punctuation', 0.06), ('heads', 0.06), ('faithfully', 0.06), ('org', 0.057), ('bn', 0.057), ('sammons', 0.054), ('berkeley', 0.053), ('weischedel', 0.051), ('syntactic', 0.05), ('attorney', 0.049), ('bengtson', 0.049), ('candido', 0.049), ('chwa', 0.049), ('gore', 0.049), ('hakkanit', 0.049), ('tribe', 0.049), ('per', 0.049), ('culotta', 0.048), ('impact', 0.046), ('lance', 0.045), ('compatibility', 0.045), ('dev', 0.045), ('removing', 0.045), ('parser', 0.044), ('loc', 0.044), ('maker', 0.043), ('smedt', 0.043), ('constraints', 0.043), ('luo', 0.043), ('sameer', 0.043), ('ramshaw', 0.043), ('extraction', 0.042), ('pos', 0.042), ('comma', 0.042), ('quirk', 0.041), ('organizing', 0.04), ('cabrio', 0.04), ('committee', 0.039), ('roth', 0.039), ('joint', 0.038), ('zitouni', 0.038), ('kummerfeld', 0.038), ('vadas', 0.038), ('miwa', 0.038), ('bbn', 0.038), ('daelemans', 0.038), ('siddharthan', 0.038), ('simplification', 0.037), ('broadcast', 0.037), ('entailment', 0.037), ('phrase', 0.037), ('labelling', 0.037), ('vancouver', 0.036), ('laurence', 0.036), ('pedregosa', 0.036), ('srikumar', 0.036), ('advaith', 0.036), ('flagged', 0.034), ('gpe', 0.034), ('train', 0.034), ('benoit', 0.033), ('ah', 0.033), ('ha', 0.033), ('parses', 0.032), ('ralph', 0.032), ('attachment', 0.032), ('trees', 0.031), ('meyer', 0.031), ('poon', 0.031), ('pattern', 0.031), ('strict', 0.03), ('walter', 0.03), ('hierarchically', 0.03), ('bp', 0.03), ('clause', 0.03), ('schapire', 0.029), ('os', 0.029), ('martha', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999905 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

Author: Will Radford ; James R. Curran

2 0.14040601 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features

Author: Nathan Gilbert ; Ellen Riloff

Abstract: Most coreference resolvers rely heavily on string matching, syntactic properties, and semantic attributes of words, but they lack the ability to make decisions based on individual words. In this paper, we explore the benefits of lexicalized features in the setting of domain-specific coreference resolution. We show that adding lexicalized features to off-the-shelf coreference resolvers yields significant performance gains on four domain-specific data sets and with two types of coreference resolution architectures.

3 0.12449536 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

Author: Sebastian Martschat

Abstract: We present an unsupervised model for coreference resolution that casts the problem as a clustering task in a directed labeled weighted multigraph. The model outperforms most systems participating in the English track of the CoNLL’ 12 shared task.

4 0.10163282 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing

Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein

Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.

5 0.082410552 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

Author: Emmanuel Lassalle ; Pascal Denis

Abstract: This paper proposes a new method for significantly improving the performance of pairwise coreference models. Given a set of indicators, our method learns how to best separate types of mention pairs into equivalence classes for which we construct distinct classification models. In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs. Although our approach explores a very large space of possible feature spaces, it remains tractable by exploiting the structure of the hierarchies built from the indicators. Our exper- iments on the CoNLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features. Our best system obtains a competitive 67.2 of average F1 over MUC, and CEAF which, despite its simplicity, places it above the mean score of other systems on these datasets. B3,

6 0.081303947 80 acl-2013-Chinese Parsing Exploiting Characters

7 0.070927605 288 acl-2013-Punctuation Prediction with Transition-based Parsing

8 0.064977951 314 acl-2013-Semantic Roles for String to Tree Machine Translation

9 0.064435154 106 acl-2013-Decentralized Entity-Level Modeling for Coreference Resolution

10 0.062857494 267 acl-2013-PARMA: A Predicate Argument Aligner

11 0.058291264 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

12 0.057494126 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

13 0.055826031 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

14 0.054269433 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

15 0.052421786 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations

16 0.052222852 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

17 0.05178345 227 acl-2013-Learning to lemmatise Polish noun phrases

18 0.050778151 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

19 0.050538369 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition

20 0.049399607 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.153), (1, -0.022), (2, -0.07), (3, -0.046), (4, -0.035), (5, 0.077), (6, -0.002), (7, 0.026), (8, 0.028), (9, 0.023), (10, -0.01), (11, -0.007), (12, -0.035), (13, 0.039), (14, -0.059), (15, 0.055), (16, -0.026), (17, 0.107), (18, -0.066), (19, 0.074), (20, -0.076), (21, 0.048), (22, -0.005), (23, -0.098), (24, 0.026), (25, 0.054), (26, -0.046), (27, 0.012), (28, 0.079), (29, -0.023), (30, -0.003), (31, 0.04), (32, 0.05), (33, -0.022), (34, 0.009), (35, 0.025), (36, 0.057), (37, -0.062), (38, 0.008), (39, -0.036), (40, -0.058), (41, 0.022), (42, 0.015), (43, -0.042), (44, 0.0), (45, 0.074), (46, 0.011), (47, 0.009), (48, -0.014), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89634794 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

Author: Will Radford ; James R. Curran

2 0.86179602 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features

Author: Nathan Gilbert ; Ellen Riloff

3 0.81829846 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

Author: Sebastian Martschat

4 0.80560434 106 acl-2013-Decentralized Entity-Level Modeling for Coreference Resolution

Author: Greg Durrett ; David Hall ; Dan Klein

Abstract: Efficiently incorporating entity-level information is a challenge for coreference resolution systems due to the difficulty of exact inference over partitions. We describe an end-to-end discriminative probabilistic model for coreference that, along with standard pairwise features, enforces structural agreement constraints between specified properties of coreferent mentions. This model can be represented as a factor graph for each document that admits efficient inference via belief propagation. We show that our method can use entity-level information to outperform a basic pairwise system.

5 0.80258036 177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali

Author: Apurbalal Senapati ; Utpal Garain

Abstract: This paper attempts to use an off-the-shelf anaphora resolution (AR) system for Bengali. The language specific preprocessing modules of GuiTAR (v3.0.3) are identified and suitably designed for Bengali. Anaphora resolution module is also modified or replaced in order to realize different configurations of GuiTAR. Performance of each configuration is evaluated and experiment shows that the off-the-shelf AR system can be effectively used for Indic languages. 1

6 0.70712012 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

7 0.52563244 227 acl-2013-Learning to lemmatise Polish noun phrases

8 0.51356691 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation

9 0.50391698 225 acl-2013-Learning to Order Natural Language Texts

10 0.49866202 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information

11 0.49250248 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision

12 0.48854676 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits

13 0.47947535 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing

14 0.47586149 267 acl-2013-PARMA: A Predicate Argument Aligner

15 0.4717328 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)

16 0.46340108 371 acl-2013-Unsupervised joke generation from big data

17 0.45753294 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing

18 0.45375013 172 acl-2013-Graph-based Local Coherence Modeling

19 0.45227629 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies

20 0.45080835 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.045), (6, 0.052), (11, 0.071), (14, 0.025), (24, 0.034), (26, 0.049), (28, 0.011), (35, 0.065), (42, 0.051), (48, 0.04), (52, 0.012), (59, 0.244), (70, 0.05), (71, 0.033), (88, 0.044), (90, 0.024), (95, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77955019 33 acl-2013-A user-centric model of voting intention from Social Media

Author: Vasileios Lampos ; Daniel PreoÅ£iuc-Pietro ; Trevor Cohn

Abstract: Social Media contain a multitude of user opinions which can be used to predict realworld phenomena in many domains including politics, finance and health. Most existing methods treat these problems as linear regression, learning to relate word frequencies and other simple features to a known response variable (e.g., voting intention polls or financial indicators). These techniques require very careful filtering of the input texts, as most Social Media posts are irrelevant to the task. In this paper, we present a novel approach which performs high quality filtering automatically, through modelling not just words but also users, framed as a bilinear model with a sparse regulariser. We also consider the problem of modelling groups of related output variables, using a structured multi-task regularisation method. Our experiments on voting intention prediction demonstrate strong performance over large-scale input from Twitter on two distinct case studies, outperforming competitive baselines.

same-paper 2 0.75228882 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

Author: Will Radford ; James R. Curran

3 0.73889196 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

Author: Seid Muhie Yimam ; Iryna Gurevych ; Richard Eckart de Castilho ; Chris Biemann

Abstract: We present WebAnno, a general purpose web-based annotation tool for a wide range of linguistic annotations. WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles. WebAnno uses modern web technology for visualizing and editing annotations in a web browser. It supports arbitrarily large documents, pluggable import/export filters, the curation of annotations across various users, and an interface to farming out annotations to a crowdsourcing platform. Currently WebAnno allows part-ofspeech, named entity, dependency parsing and co-reference chain annotations. The architecture design allows adding additional modes of visualization and editing, when new kinds of annotations are to be supported.

4 0.55157095 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays

Author: Beata Beigman Klebanov ; Michael Flor

Abstract: We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and dis-associated pairs of words that co-exist in the given text. We illustrate the shape of the distirbution and observe variation with genre and target audience. We present a study of the relationship between quality of writing and word association profiles. For a set of essays written by college graduates on a number of general topics, we show that the higher scoring essays tend to have higher percentages of both highly associated and dis-associated pairs, and lower percentages of mildly associated pairs of words. Finally, we use word association profiles to improve a system for automated scoring of essays.

5 0.54584008 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu

Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.

6 0.54445022 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing

7 0.5429793 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

8 0.54296786 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing

9 0.54209965 333 acl-2013-Summarization Through Submodularity and Dispersion

10 0.53839731 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction

11 0.53805357 318 acl-2013-Sentiment Relevance

12 0.53740442 275 acl-2013-Parsing with Compositional Vector Grammars

13 0.53696358 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

14 0.53605425 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

15 0.53555548 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

16 0.53522402 225 acl-2013-Learning to Order Natural Language Texts

17 0.53484082 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

18 0.53378975 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions

19 0.53338748 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

20 0.53264517 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching