acl acl2011 acl2011-138 knowledge-graph by maker-knowledge-mining

138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus


Source: pdf

Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos

Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Abstract This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. [sent-8, score-0.64]

2 A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. [sent-9, score-0.45]

3 An automatic preannotation system was used to speed up the annotation process. [sent-10, score-0.123]

4 A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time. [sent-11, score-0.466]

5 1 Introduction The processing of temporal information (events, time expressions and relations between these entities) is essential for overall comprehension of natural language discourse. [sent-12, score-0.471]

6 Determining the temporal structure of a text can bring added value to numerous NLP applications (information extraction, Q&A; systems, summarization. [sent-13, score-0.421]

7 Progress has been made in recent years in the processing of temporal data, notably through the ISO-TimeML standard (ISO, 2008) and the creation of the TimeBank 1. [sent-17, score-0.551]

8 Here we present the French TimeBank (FTiB), a corpus for French annotated in ISO-TimeML. [sent-19, score-0.101]

9 We also present the methodology adopted for the creation of this resource, which may be generalized to other annotation tasks. [sent-20, score-0.494]

10 We evaluate the effects of our methodology on the quality of the corpus and the time taken in the task. [sent-21, score-0.308]

11 fr 2 ISO-TimeML ISO-TimeML (ISO, 2008) is a surface-based language for the marking of events ( classes by sub-genre. [sent-27, score-0.196]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('timebank', 0.463), ('alpage', 0.32), ('diderot', 0.301), ('amsili', 0.263), ('bittar', 0.263), ('paris', 0.232), ('danlos', 0.232), ('french', 0.228), ('temporal', 0.216), ('iso', 0.19), ('pascal', 0.155), ('creation', 0.136), ('methodology', 0.134), ('inria', 0.107), ('pustejovsky', 0.1), ('laurence', 0.1), ('events', 0.099), ('adopted', 0.089), ('denis', 0.082), ('comprehension', 0.077), ('markup', 0.077), ('andr', 0.069), ('bring', 0.068), ('modifications', 0.068), ('annotation', 0.065), ('marking', 0.065), ('notably', 0.065), ('reference', 0.058), ('covered', 0.055), ('numerous', 0.055), ('annotated', 0.051), ('progress', 0.051), ('essential', 0.047), ('generalized', 0.046), ('phenomena', 0.044), ('article', 0.042), ('speed', 0.041), ('effects', 0.04), ('made', 0.04), ('expressions', 0.04), ('determining', 0.039), ('resource', 0.038), ('years', 0.038), ('deal', 0.038), ('al', 0.038), ('preliminary', 0.038), ('entities', 0.037), ('quality', 0.035), ('corpus', 0.033), ('yet', 0.032), ('others', 0.03), ('project', 0.03), ('yields', 0.029), ('taken', 0.029), ('classes', 0.029), ('points', 0.028), ('added', 0.028), ('relations', 0.026), ('presents', 0.025), ('time', 0.023), ('standard', 0.022), ('improvements', 0.021), ('recent', 0.021), ('nlp', 0.02), ('positive', 0.02), ('applications', 0.02), ('extraction', 0.018), ('overall', 0.017), ('present', 0.017), ('significantly', 0.016), ('linguistic', 0.015), ('according', 0.015), ('specific', 0.015), ('main', 0.014), ('value', 0.014), ('evaluate', 0.014), ('higher', 0.014), ('terms', 0.014), ('processing', 0.013), ('structure', 0.013), ('automatic', 0.011), ('including', 0.01), ('systems', 0.007), ('text', 0.007), ('may', 0.006), ('evaluation', 0.006), ('system', 0.006), ('information', 0.005), ('natural', 0.004), ('data', 0.004), ('language', 0.003), ('number', 0.002), ('results', 0.002), ('introduction', 0.001), ('also', 0.001), ('used', 0.0), ('et', 0.0), ('abstract', 0.0)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus

Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos

Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.

2 0.22941074 294 acl-2011-Temporal Evaluation

Author: Naushad UzZaman ; James Allen

Abstract: In this paper we propose a new method for evaluating systems that extract temporal information from text. It uses temporal closure1 to reward relations that are equivalent but distinct. Our metric measures the overall performance of systems with a single score, making comparison between different systems straightforward. Our approach is easy to implement, intuitive, accurate, scalable and computationally inexpensive. 1

3 0.071911819 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

Author: Lonneke van der Plas ; Paola Merlo ; James Henderson

Abstract: Broad-coverage semantic annotations for training statistical learners are only available for a handful of languages. Previous approaches to cross-lingual transfer of semantic annotations have addressed this problem with encouraging results on a small scale. In this paper, we scale up previous efforts by using an automatic approach to semantic annotation that does not rely on a semantic ontology for the target language. Moreover, we improve the quality of the transferred semantic annotations by using a joint syntacticsemantic parser that learns the correlations between syntax and semantics of the target language and smooths out the errors from automatic transfer. We reach a labelled F-measure for predicates and arguments of only 4% and 9% points, respectively, lower than the upper bound from manual annotations.

4 0.052309044 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation

Author: Thomas Meyer

Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.

5 0.042797014 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction

Author: Bo Li ; Eric Gaussier ; Akiko Aizawa

Abstract: We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.

6 0.039771054 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs

7 0.036035102 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation

8 0.03241365 122 acl-2011-Event Extraction as Dependency Parsing

9 0.032089733 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis

10 0.031270042 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

11 0.030251054 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus

12 0.029065846 175 acl-2011-Integrating history-length interpolation and classes in language modeling

13 0.027165513 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

14 0.027124325 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

15 0.026320728 141 acl-2011-Gappy Phrasal Alignment By Agreement

16 0.026164796 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing

17 0.025150642 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

18 0.024468366 74 acl-2011-Combining Indicators of Allophony

19 0.023855368 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA

20 0.023603264 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.058), (1, 0.012), (2, -0.034), (3, 0.007), (4, 0.005), (5, 0.019), (6, 0.028), (7, -0.011), (8, 0.008), (9, -0.029), (10, -0.007), (11, -0.039), (12, 0.011), (13, 0.0), (14, -0.027), (15, -0.031), (16, 0.035), (17, -0.01), (18, 0.005), (19, -0.006), (20, 0.022), (21, 0.036), (22, -0.02), (23, -0.038), (24, 0.018), (25, 0.005), (26, -0.009), (27, -0.025), (28, -0.004), (29, -0.035), (30, -0.017), (31, 0.007), (32, 0.049), (33, 0.007), (34, -0.015), (35, 0.027), (36, -0.114), (37, 0.019), (38, 0.027), (39, -0.093), (40, -0.019), (41, 0.055), (42, 0.017), (43, -0.128), (44, -0.109), (45, -0.008), (46, -0.028), (47, -0.02), (48, 0.051), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94761842 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus

Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos

Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.

2 0.81915849 294 acl-2011-Temporal Evaluation

Author: Naushad UzZaman ; James Allen

Abstract: In this paper we propose a new method for evaluating systems that extract temporal information from text. It uses temporal closure1 to reward relations that are equivalent but distinct. Our metric measures the overall performance of systems with a single score, making comparison between different systems straightforward. Our approach is easy to implement, intuitive, accurate, scalable and computationally inexpensive. 1

3 0.53048539 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation

Author: Thomas Meyer

Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.

4 0.50979698 322 acl-2011-Unsupervised Learning of Semantic Relation Composition

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents an unsupervised method for deriving inference axioms by composing semantic relations. The method is independent of any particular relation inventory. It relies on describing semantic relations using primitives and manipulating these primitives according to an algebra. The method was tested using a set of eight semantic relations yielding 78 inference axioms which were evaluated over PropBank.

5 0.41779774 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices

Author: Alexis Nasr ; Frederic Bechet ; Jean-Francois Rey ; Benoit Favre ; Joseph Le Roux

Abstract: MACAON is a tool suite for standard NLP tasks developed for French. MACAON has been designed to process both human-produced text and highly ambiguous word-lattices produced by NLP tools. MACAON is made of several native modules for common tasks such as a tokenization, a part-of-speech tagging or syntactic parsing, all communicating with each other through XML files . In addition, exchange protocols with external tools are easily definable. MACAON is a fast, modular and open tool, distributed under GNU Public License.

6 0.40365013 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA

7 0.38855547 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

8 0.38771766 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

9 0.38661188 273 acl-2011-Semantic Representation of Negation Using Focus Detection

10 0.3829906 194 acl-2011-Language Use: What can it tell us?

11 0.37320855 311 acl-2011-Translationese and Its Dialects

12 0.36563191 8 acl-2011-A Corpus of Scope-disambiguated English Text

13 0.36400855 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations

14 0.35005078 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking

15 0.34989533 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

16 0.34839344 121 acl-2011-Event Discovery in Social Media Feeds

17 0.34701192 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

18 0.34612408 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature

19 0.342877 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

20 0.34270966 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.049), (15, 0.464), (17, 0.017), (37, 0.029), (39, 0.075), (41, 0.035), (55, 0.031), (59, 0.064), (72, 0.014), (91, 0.014), (96, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78305864 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus

Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos

Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.

2 0.43563992 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis

Author: Hongning Wang ; Duo Zhang ; ChengXiang Zhai

Abstract: Topic models have been successfully applied to many document analysis tasks to discover topics embedded in text. However, existing topic models generally cannot capture the latent topical structures in documents. Since languages are intrinsically cohesive and coherent, modeling and discovering latent topical transition structures within documents would be beneficial for many text analysis tasks. In this work, we propose a new topic model, Structural Topic Model, which simultaneously discovers topics and reveals the latent topical structures in text through explicitly modeling topical transitions with a latent first-order Markov chain. Experiment results show that the proposed Structural Topic Model can effectively discover topical structures in text, and the identified structures significantly improve the performance of tasks such as sentence annotation and sentence ordering. ,

3 0.43288136 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

Author: Lonneke van der Plas ; Paola Merlo ; James Henderson

Abstract: Broad-coverage semantic annotations for training statistical learners are only available for a handful of languages. Previous approaches to cross-lingual transfer of semantic annotations have addressed this problem with encouraging results on a small scale. In this paper, we scale up previous efforts by using an automatic approach to semantic annotation that does not rely on a semantic ontology for the target language. Moreover, we improve the quality of the transferred semantic annotations by using a joint syntacticsemantic parser that learns the correlations between syntax and semantics of the target language and smooths out the errors from automatic transfer. We reach a labelled F-measure for predicates and arguments of only 4% and 9% points, respectively, lower than the upper bound from manual annotations.

4 0.34359029 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

Author: Dong Wang ; Yang Liu

Abstract: This paper presents a pilot study of opinion summarization on conversations. We create a corpus containing extractive and abstractive summaries of speaker’s opinion towards a given topic using 88 telephone conversations. We adopt two methods to perform extractive summarization. The first one is a sentence-ranking method that linearly combines scores measured from different aspects including topic relevance, subjectivity, and sentence importance. The second one is a graph-based method, which incorporates topic and sentiment information, as well as additional information about sentence-to-sentence relations extracted based on dialogue structure. Our evaluation results show that both methods significantly outperform the baseline approach that extracts the longest utterances. In particular, we find that incorporating dialogue structure in the graph-based method contributes to the improved system performance.

5 0.27045572 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

Author: Jacob Eisenstein ; Noah A. Smith ; Eric P. Xing

Abstract: We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ‘1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.

6 0.2664147 192 acl-2011-Language-Independent Parsing with Empty Elements

7 0.26513547 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

8 0.26278877 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

9 0.259067 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

10 0.2584486 52 acl-2011-Automatic Labelling of Topic Models

11 0.25804329 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

12 0.2578052 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features

13 0.25721371 178 acl-2011-Interactive Topic Modeling

14 0.25637928 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

15 0.25460148 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

16 0.25444672 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

17 0.25438079 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

18 0.25371936 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

19 0.253539 182 acl-2011-Joint Annotation of Search Queries

20 0.25306275 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing