acl acl2013 acl2013-177 knowledge-graph by maker-knowledge-mining

177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali

Source: pdf

Author: Apurbalal Senapati ; Utpal Garain

Abstract: This paper attempts to use an off-the-shelf anaphora resolution (AR) system for Bengali. The language specific preprocessing modules of GuiTAR (v3.0.3) are identified and suitably designed for Bengali. Anaphora resolution module is also modified or replaced in order to realize different configurations of GuiTAR. Performance of each configuration is evaluated and experiment shows that the off-the-shelf AR system can be effectively used for Indic languages. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 i gmai l com @ Abstract This paper attempts to use an off-the-shelf anaphora resolution (AR) system for Bengali. [sent-5, score-0.461]

2 Anaphora resolution module is also modified or replaced in order to realize different configurations of GuiTAR. [sent-9, score-0.294]

3 Performance of each configuration is evaluated and experiment shows that the off-the-shelf AR system can be effectively used for Indic languages. [sent-10, score-0.022]

4 1 Introduction Little computational linguistics research has been done for anaphora resolution (AR) in Indic languages. [sent-11, score-0.461]

5 Progress of the research through these works was difficult to quantify as most of the authors used their selfgenerated datasets and in some cases algorithms lack in required details to make them reproducible. [sent-21, score-0.02]

6 Bengali has been taken as the reference Utpal Garain Indian Statistical Institute 203, B. [sent-26, score-0.017]

7 language and GuiTAR (Poesio, 2004) has been considered as the reference off-the-shelf system. [sent-30, score-0.017]

8 Therefore, the central contribution of this paper is to develop required resources for Bengali and thereby providing them to GuiTAR for anaphora resolution. [sent-32, score-0.303]

9 Finally, GuiTAR anaphora resolution module is replaced by a previously developed approach (which is primarily rule-based, Senapati, 2011; Senapati, 2012a) and performances of different configurations are compared. [sent-34, score-0.577]

10 2 Language specific issues in GuiTAR GuiTAR has two major modules namely, preprocessing and anaphora resolution (Kabadjov, 2007). [sent-35, score-0.537]

11 In both of these modules modifications are required to fit it to Bengali. [sent-36, score-0.104]

12 Let's first identify the components in both of these two modules where replacement/modifications are needed. [sent-37, score-0.04]

13 Pre-processing: The purpose of this module is to make GuiTAR independent from input format specifications and variations. [sent-38, score-0.107]

14 In case of text input, XML file generated by the LT-XML tool. [sent-40, score-0.05]

15 The XML file contains the information like word boundaries (tokens), grammatical classes (part-ofspeech), and chunking information. [sent-41, score-0.091]

16 From the XML format MAS-XML (Minimum Anaphoric Syntax - XML) is produced to include minimal information namely, noun phrase boundaries, utterance boundaries, categories of pronoun, number information, gender information, etc. [sent-42, score-0.115]

17 All these aspects are to be addressed for Bengali so that for a given input discourse in Bengali, MAS-XML file can be generated correctly. [sent-43, score-0.074]

18 The pronouns (personal and possessive) are resolved by using 126 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-46, score-0.217]

19 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 126–130, an implementation of MARS (Mitkov, 2002), whereas different algorithms are used for resolving definite descriptions, and proper nouns. [sent-48, score-0.022]

20 In Mitkov’s algorithm whenever a pronoun is to be resolved, it finds a list of potential antecedents within a given ‘window’ and checks three types of syntactic agreements (i. [sent-49, score-0.19]

21 , person, number and gender) between an antecedent and the pronoun. [sent-51, score-0.142]

22 We introduce suitable modifications in this module so that the same implementation of MARS can work for Bengali. [sent-55, score-0.134]

23 Table-1 categorizes all pronouns (522 in number) available in Bengali as observed in a corpus (Bengali corpus, undated) of 35 million words. [sent-59, score-0.218]

24 1 Number Acquisition for Nouns In Bengali, a set of nominal suffixes (Bhattacharya, 1993) (inflections and classifier) are used to recognize the number (singular/plural) of noun. [sent-61, score-0.031]

25 To identify the number of a noun, we check whether any of the nominal suffixes (indicating plurality) are attached with the noun. [sent-62, score-0.031]

26 If found, the number of the noun is tagged as plural. [sent-63, score-0.065]

27 From the corpus, we identified 17 such suffixes (e. [sent-64, score-0.031]

28 2 Honorificity of Nouns The honorific agreement exists in Bengali. [sent-70, score-0.096]

29 Honorificity of a noun is indicated by a word or expression with connotations conveying esteem or respect when used in addressing or referring to a person. [sent-71, score-0.067]

30 In Bengali three degree of honorificity are observed for the second person and two for the third person (Majumdar, 2000; Sengupta, 2000). [sent-72, score-0.319]

31 The second and third person pronouns have distinct forms for different degrees of honorificity. [sent-73, score-0.252]

32 Honorificity information is applicable for proper nouns (person) and nouns indicating relations like father, mother, teacher, etc. [sent-74, score-0.064]

33 The honorificity information is identified by maintaining a list of terms which can be considered as honorific addressing terms (e. [sent-75, score-0.321]

34 About 20 such terms are there in the list and we get these terms from analysis of the Bengali corpus. [sent-83, score-0.017]

35 When these terms are used to add honorificity of a noun they appear either before or after the noun. [sent-84, score-0.226]

36 Another additional way for identifying the honorificity information is to look at the inflection of the main verb which is inflected with ন/n (i. [sent-85, score-0.192]

37 Honorificity is extracted during the preprocessing phase and added with the attribute hon = . [sent-89, score-0.036]

38 lowest degree of honor) based on their degree of honorificity. [sent-96, score-0.05]

39 For pronouns, this information is available from the pronoun list (honorific singular and honorific plural) as shown in Table-1. [sent-97, score-0.219]

40 4 GuiTAR for Bengali The following sections explain the modifications needed to configure GuiTAR for Bengali. [sent-98, score-0.044]

41 The tagger is trained with about tagged 10,000 sentences and is found to produce about 92% accuracy while tested on 2,000 sentences. [sent-101, score-0.031]

42 A rule based Bengali chunker (De, 2011) is used to get chunking information. [sent-102, score-0.049]

43 NEIs and their classes (person, location, and organization) are tagged 127 manually (we did not get any Bengali NEI tool). [sent-103, score-0.031]

44 After adding all these information, the input text is formatted into GuiTAR specified input XML file and is converted into MAS-XML. [sent-104, score-0.066]

45 This file contains other syntactic information: person, types of pronouns, number and honorificity. [sent-105, score-0.05]

46 Information on person and types of pronouns comes from Table-1 . [sent-106, score-0.252]

47 Number and honorificity are identified as explained before. [sent-107, score-0.192]

48 Gender information has little role in Bengali anaphora resolution and hence is not considered. [sent-108, score-0.461]

49 2 GuiTAR-based Pronoun Resolution for Bengali GuiTAR resolves pronouns using MARS approach (Mitkov, 2002) that makes use of several agreements (based on person, number and gender). [sent-111, score-0.262]

50 Certain changes are required here as gender agreement has no role. [sent-112, score-0.062]

51 This agreement has been replaced by the honorific agreement. [sent-113, score-0.128]

52 Moreover, the way pronouns are divided in MARS implementation is not always relevant for Bengali pronouns. [sent-114, score-0.223]

53 For example, we do not differentiate between personal and possessive pronouns but they are separately treated in MARS. [sent-115, score-0.287]

54 In our case, we have only considered the personal and reflexive pronouns while applying MARS based implementation for anaphora resolution. [sent-116, score-0.593]

55 In case of more than one antecedent found, GuiTAR resolves it by using five antecedent indicators namely, aggregate score, immediate reference, collocational pattern, indicating verbs and referential distance. [sent-117, score-0.475]

56 For Bengali, the indicating verb indicator has no role in filtering the antecedents and hence removed. [sent-118, score-0.07]

57 5 Data and data format To evaluate the configured GuiTAR system the dataset provided by ICON 2011(ICON 2011) has been used. [sent-119, score-0.095]

58 They provided annotated data (POS tagged, chunked and name entity tagged) for three Indian languages including Bengali. [sent-120, score-0.016]

59 We have changed this format into GuiTAR specified XML format and finally checked/corrected manually. [sent-125, score-0.078]

60 Table 2: Description of ICON 2011 data format The ICON 2011 data contains nine texts from different domains (Tourism, Story, News article, Sports). [sent-129, score-0.039]

61 Table 3 shows the distribution of pronouns in the whole test data set for Bengali. [sent-132, score-0.201]

62 The dataset contains 1647 pronouns out of them 706 are personal pronouns (including reflexive pronouns). [sent-134, score-0.489]

63 As the MARS in GuiTAR resolves only personal pronouns, we have used only these personal pronouns for evaluation. [sent-135, score-0.361]

64 Three different systems are configured as described below: System-1 (Baseline): A baseline system is configured by considering the most recent noun phrase as the referent of a pronoun (the first noun phrase in the backward direction is the antecedent of a pronoun). [sent-136, score-0.428]

65 System-2 (GuiTAR with MARS): In this configuration, GuiTAR is used with the modifications (as described in Sec. [sent-137, score-0.044]

66 1) in its preprocessing module and the modified MARS (as described in Sec. [sent-139, score-0.104]

67 System-3 (GuiTAR with new a PAR module): Under this configuration, GuiTAR is used with the modifications (as described in Sec. [sent-142, score-0.044]

68 1) in its pre-processing module but MARS is replaced by a previously developed system (Senapati, 2011; Senapati, 2012a) for pronominal anaphora resolution in Bengali. [sent-144, score-0.61]

69 a possible antecedent) the method first maintains a list of possible pronouns which the antecedent could attach with (note that any noun phrase cannot be referred by any pronoun). [sent-148, score-0.394]

70 On encountering a pronoun, the method searches for the antecedents for which the pronoun is in the respective pro- noun-lists. [sent-149, score-0.15]

71 The evaluation has used five metrics namely, MUC, B3, CEAFM, CEAFE and BLANC. [sent-152, score-0.016]

72 Results show that GuiTAR with MARS gives better result than the situation where the most recent antecedent is picked (i. [sent-154, score-0.142]

73 When MARS is replaced by system-3, further improvement is achieved which is also statistically significant (p<0. [sent-159, score-0.032]

74 1 Error analysis Analysis of errors shows that errors in number acquisition and identification of the honorificity are two major errors during preprocessing phase. [sent-162, score-0.3]

75 These errors propagate and result in further errors during resolution. [sent-163, score-0.036]

76 For example, some Bengali personal pronouns are ambiguous (sometimes they are anaphoric whereas in other cases they may appear as non-anaphoric too). [sent-165, score-0.303]

77 স/se are two examples of such pronouns in Bengali (Senapati, 2012b) and the present resolution system is not able to resolve such cases. [sent-167, score-0.379]

78 7 Conclusion The present experiment shows that GuiTAR which is one of the off-the-shelf anaphora resolution systems can be effectively configured for Bengali. [sent-168, score-0.517]

79 Basic NLP information required by GuiTAR pre-processing module has been supplied mostly through automatic tools. [sent-169, score-0.088]

80 It is also revealed that MARS based implementation in GuiTAR is not very suitable for Bengali because the antecedent indicators used by MARS are probably not very effective for Bengali. [sent-172, score-0.188]

81 Addition of other resolution algorithms is definitely a future extension of this study. [sent-174, score-0.178]

82 Resolution of non-personal pronouns (which were not considered here) would be addressed next. [sent-175, score-0.201]

83 CogNIAC: high precision coreference with limited knowledge and linguistic resources, In ACL/EACL workshop on Operational factors in practical, robust anaphora resolution, pages 38- 45, Madrid, Spain. [sent-186, score-0.283]

84 A method for pronominal anaphora resolution in Bengali, In Proc. [sent-201, score-0.51]

85 Discourse salience and pronoun resolution in Hindi, in Penn Working Papers in Linguistics, pp. [sent-249, score-0.284]

86 Lexical anaphors and pronouns in Bnagla, Lexical Anaphors and Pronouns in Selected South Asian Langauges: A Principled Typology (Eds. [sent-253, score-0.235]

87 Anaphora Resolution in Bengali using global discourse knowledge, In Int. [sent-273, score-0.024]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bengali', 0.505), ('guitar', 0.492), ('anaphora', 0.283), ('mars', 0.25), ('pronouns', 0.201), ('honorificity', 0.192), ('senapati', 0.192), ('resolution', 0.178), ('icon', 0.147), ('antecedent', 0.142), ('garain', 0.115), ('pronoun', 0.106), ('honorific', 0.096), ('indic', 0.085), ('xml', 0.082), ('hindi', 0.069), ('module', 0.068), ('indian', 0.063), ('personal', 0.061), ('majumdar', 0.058), ('sengupta', 0.058), ('ar', 0.057), ('configured', 0.056), ('person', 0.051), ('kabadjov', 0.051), ('file', 0.05), ('pronominal', 0.049), ('tamil', 0.047), ('mitkov', 0.046), ('antecedents', 0.044), ('dhar', 0.044), ('honor', 0.044), ('modifications', 0.044), ('gender', 0.042), ('anaphoric', 0.041), ('modules', 0.04), ('format', 0.039), ('apurbalal', 0.038), ('ceafm', 0.038), ('chennai', 0.038), ('uppalapu', 0.038), ('resolves', 0.038), ('preprocessing', 0.036), ('bhattacharya', 0.034), ('anaphors', 0.034), ('ceafe', 0.034), ('noun', 0.034), ('replaced', 0.032), ('suffixes', 0.031), ('nei', 0.031), ('tagged', 0.031), ('suitably', 0.029), ('jain', 0.029), ('publisher', 0.029), ('contest', 0.029), ('chunker', 0.029), ('india', 0.028), ('collocational', 0.028), ('teams', 0.027), ('muc', 0.027), ('indicating', 0.026), ('reflexive', 0.026), ('degree', 0.025), ('agarwal', 0.025), ('possessive', 0.025), ('referential', 0.024), ('indicators', 0.024), ('discourse', 0.024), ('prasad', 0.024), ('agreements', 0.023), ('implementation', 0.022), ('configuration', 0.022), ('boundaries', 0.021), ('par', 0.021), ('chunking', 0.02), ('required', 0.02), ('poesio', 0.02), ('nouns', 0.019), ('south', 0.019), ('namely', 0.019), ('immediate', 0.019), ('baldwin', 0.018), ('acquisition', 0.018), ('asian', 0.018), ('errors', 0.018), ('categorizes', 0.017), ('blanc', 0.017), ('daarc', 0.017), ('esteem', 0.017), ('preprocessor', 0.017), ('reference', 0.017), ('list', 0.017), ('addressing', 0.016), ('configurations', 0.016), ('resolved', 0.016), ('five', 0.016), ('aggregate', 0.016), ('chunked', 0.016), ('sharma', 0.016), ('formatted', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali

Author: Apurbalal Senapati ; Utpal Garain

2 0.13399044 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

Author: Sebastian Martschat

Abstract: We present an unsupervised model for coreference resolution that casts the problem as a clustering task in a directed labeled weighted multigraph. The model outperforms most systems participating in the English track of the CoNLL’ 12 shared task.

3 0.11932104 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

Author: Xipeng Qiu ; Qi Zhang ; Xuanjing Huang

Abstract: The growing need for Chinese natural language processing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of issues need to be further improved and developed. FudanNLP is an open source toolkit for Chinese natural language processing (NLP) , which uses statistics-based and rule-based methods to deal with Chinese NLP tasks, such as word segmentation, part-ofspeech tagging, named entity recognition, dependency parsing, time phrase recognition, anaphora resolution and so on.

4 0.06880296 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

Author: Emmanuel Lassalle ; Pascal Denis

Abstract: This paper proposes a new method for significantly improving the performance of pairwise coreference models. Given a set of indicators, our method learns how to best separate types of mention pairs into equivalence classes for which we construct distinct classification models. In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs. Although our approach explores a very large space of possible feature spaces, it remains tractable by exploiting the structure of the hierarchies built from the indicators. Our exper- iments on the CoNLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features. Our best system obtains a competitive 67.2 of average F1 over MUC, and CEAF which, despite its simplicity, places it above the mean score of other systems on these datasets. B3,

5 0.057162229 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features

Author: Nathan Gilbert ; Ellen Riloff

Abstract: Most coreference resolvers rely heavily on string matching, syntactic properties, and semantic attributes of words, but they lack the ability to make decisions based on individual words. In this paper, we explore the benefits of lexicalized features in the setting of domain-specific coreference resolution. We show that adding lexicalized features to off-the-shelf coreference resolvers yields significant performance gains on four domain-specific data sets and with two types of coreference resolution architectures.

6 0.057014622 106 acl-2013-Decentralized Entity-Level Modeling for Coreference Resolution

7 0.04776625 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling

8 0.045767881 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

9 0.033816103 62 acl-2013-Automatic Term Ambiguity Detection

10 0.031877853 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

11 0.031256288 184 acl-2013-Identification of Speakers in Novels

12 0.030615486 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages

13 0.03024333 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections

14 0.0302286 290 acl-2013-Question Analysis for Polish Question Answering

15 0.02802019 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

16 0.026929097 172 acl-2013-Graph-based Local Coherence Modeling

17 0.02637792 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

18 0.026208382 61 acl-2013-Automatic Interpretation of the English Possessive

19 0.026198393 372 acl-2013-Using CCG categories to improve Hindi dependency parsing

20 0.025945487 289 acl-2013-QuEst - A translation quality estimation framework

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.069), (1, 0.016), (2, -0.02), (3, -0.034), (4, 0.009), (5, 0.036), (6, -0.022), (7, 0.028), (8, 0.029), (9, 0.018), (10, -0.019), (11, -0.015), (12, -0.03), (13, 0.009), (14, -0.052), (15, 0.027), (16, -0.06), (17, 0.055), (18, -0.044), (19, 0.028), (20, -0.09), (21, 0.057), (22, 0.012), (23, -0.092), (24, -0.001), (25, 0.012), (26, -0.0), (27, 0.011), (28, 0.054), (29, -0.016), (30, 0.034), (31, -0.033), (32, 0.05), (33, -0.039), (34, 0.018), (35, 0.031), (36, 0.034), (37, -0.034), (38, -0.015), (39, -0.011), (40, -0.122), (41, -0.047), (42, -0.031), (43, 0.016), (44, 0.012), (45, 0.023), (46, -0.049), (47, 0.035), (48, -0.0), (49, -0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92531484 177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali

Author: Apurbalal Senapati ; Utpal Garain

2 0.7348516 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

Author: Sebastian Martschat

3 0.72438169 106 acl-2013-Decentralized Entity-Level Modeling for Coreference Resolution

Author: Greg Durrett ; David Hall ; Dan Klein

Abstract: Efficiently incorporating entity-level information is a challenge for coreference resolution systems due to the difficulty of exact inference over partitions. We describe an end-to-end discriminative probabilistic model for coreference that, along with standard pairwise features, enforces structural agreement constraints between specified properties of coreferent mentions. This model can be represented as a factor graph for each document that admits efficient inference via belief propagation. We show that our method can use entity-level information to outperform a basic pairwise system.

4 0.69580066 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features

Author: Nathan Gilbert ; Ellen Riloff

5 0.5896647 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

Author: Will Radford ; James R. Curran

Abstract: Appositions are adjacent NPs used to add information to a discourse. We propose systems exploiting syntactic and semantic constraints to extract appositions from OntoNotes. Our joint log-linear model outperforms the state-of-the-art Favre and Hakkani-T u¨r (2009) model by ∼10% on HBarokakdacnais-tT News, a9n)d m aocdheielv beys ∼541.03%% oFnscore on multiple genres.

6 0.58012313 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

7 0.41527018 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision

8 0.39266491 172 acl-2013-Graph-based Local Coherence Modeling

9 0.35368094 184 acl-2013-Identification of Speakers in Novels

10 0.34462854 225 acl-2013-Learning to Order Natural Language Texts

11 0.33805946 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

12 0.33434558 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information

13 0.32975286 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian

14 0.31972679 34 acl-2013-Accurate Word Segmentation using Transliteration and Language Model Projection

15 0.31730327 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

16 0.31398782 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits

17 0.3118616 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation

18 0.30484605 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach

19 0.30373299 227 acl-2013-Learning to lemmatise Polish noun phrases

20 0.30036321 250 acl-2013-Models of Translation Competitions

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.031), (6, 0.013), (11, 0.058), (15, 0.011), (24, 0.05), (26, 0.044), (35, 0.059), (42, 0.029), (48, 0.028), (70, 0.026), (71, 0.39), (80, 0.016), (88, 0.074), (90, 0.014), (95, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82027489 177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali

Author: Apurbalal Senapati ; Utpal Garain

2 0.67517763 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

Author: Mohamed Amir Yosef ; Sandro Bauer ; Johannes Hoffart ; Marc Spaniol ; Gerhard Weikum

Abstract: Recent research has shown progress in achieving high-quality, very fine-grained type classification in hierarchical taxonomies. Within such a multi-level type hierarchy with several hundreds of types at different levels, many entities naturally belong to multiple types. In order to achieve high-precision in type classification, current approaches are either limited to certain domains or require time consuming multistage computations. As a consequence, existing systems are incapable of performing ad-hoc type classification on arbitrary input texts. In this demo, we present a novel Webbased tool that is able to perform domain independent entity type classification under real time conditions. Thanks to its efficient implementation and compacted feature representation, the system is able to process text inputs on-the-fly while still achieving equally high precision as leading state-ofthe-art implementations. Our system offers an online interface where natural-language text can be inserted, which returns semantic type labels for entity mentions. Further more, the user interface allows users to explore the assigned types by visualizing and navigating along the type-hierarchy.

3 0.55727553 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays

Author: Beata Beigman Klebanov ; Michael Flor

Abstract: We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and dis-associated pairs of words that co-exist in the given text. We illustrate the shape of the distirbution and observe variation with genre and target audience. We present a study of the relationship between quality of writing and word association profiles. For a set of essays written by college graduates on a number of general topics, we show that the higher scoring essays tend to have higher percentages of both highly associated and dis-associated pairs, and lower percentages of mildly associated pairs of words. Finally, we use word association profiles to improve a system for automated scoring of essays.

4 0.54748297 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing

Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein

Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.

5 0.52610195 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing

Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu

Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.

6 0.36319795 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

7 0.35778472 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

8 0.35145223 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation

9 0.34268922 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts

10 0.33995515 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

11 0.33743733 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

12 0.3369723 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text

13 0.33657622 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

14 0.33452255 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

15 0.33274308 318 acl-2013-Sentiment Relevance

16 0.33214802 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

17 0.33187008 327 acl-2013-Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison

18 0.32310528 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting

19 0.32259673 258 acl-2013-Neighbors Help: Bilingual Unsupervised WSD Using Context

20 0.3217063 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts