acl acl2011 acl2011-138 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos
Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.
Reference: text
sentIndex sentText sentNum sentScore
1 fr Abstract This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. [sent-8, score-0.64]
2 A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. [sent-9, score-0.45]
3 An automatic preannotation system was used to speed up the annotation process. [sent-10, score-0.123]
4 A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time. [sent-11, score-0.466]
5 1 Introduction The processing of temporal information (events, time expressions and relations between these entities) is essential for overall comprehension of natural language discourse. [sent-12, score-0.471]
6 Determining the temporal structure of a text can bring added value to numerous NLP applications (information extraction, Q&A; systems, summarization. [sent-13, score-0.421]
7 Progress has been made in recent years in the processing of temporal data, notably through the ISO-TimeML standard (ISO, 2008) and the creation of the TimeBank 1. [sent-17, score-0.551]
8 Here we present the French TimeBank (FTiB), a corpus for French annotated in ISO-TimeML. [sent-19, score-0.101]
9 We also present the methodology adopted for the creation of this resource, which may be generalized to other annotation tasks. [sent-20, score-0.494]
10 We evaluate the effects of our methodology on the quality of the corpus and the time taken in the task. [sent-21, score-0.308]
11 fr 2 ISO-TimeML ISO-TimeML (ISO, 2008) is a surface-based language for the marking of events ( classes by sub-genre. [sent-27, score-0.196]
wordName wordTfidf (topN-words)
[('timebank', 0.463), ('alpage', 0.32), ('diderot', 0.301), ('amsili', 0.263), ('bittar', 0.263), ('paris', 0.232), ('danlos', 0.232), ('french', 0.228), ('temporal', 0.216), ('iso', 0.19), ('pascal', 0.155), ('creation', 0.136), ('methodology', 0.134), ('inria', 0.107), ('pustejovsky', 0.1), ('laurence', 0.1), ('events', 0.099), ('adopted', 0.089), ('denis', 0.082), ('comprehension', 0.077), ('markup', 0.077), ('andr', 0.069), ('bring', 0.068), ('modifications', 0.068), ('annotation', 0.065), ('marking', 0.065), ('notably', 0.065), ('reference', 0.058), ('covered', 0.055), ('numerous', 0.055), ('annotated', 0.051), ('progress', 0.051), ('essential', 0.047), ('generalized', 0.046), ('phenomena', 0.044), ('article', 0.042), ('speed', 0.041), ('effects', 0.04), ('made', 0.04), ('expressions', 0.04), ('determining', 0.039), ('resource', 0.038), ('years', 0.038), ('deal', 0.038), ('al', 0.038), ('preliminary', 0.038), ('entities', 0.037), ('quality', 0.035), ('corpus', 0.033), ('yet', 0.032), ('others', 0.03), ('project', 0.03), ('yields', 0.029), ('taken', 0.029), ('classes', 0.029), ('points', 0.028), ('added', 0.028), ('relations', 0.026), ('presents', 0.025), ('time', 0.023), ('standard', 0.022), ('improvements', 0.021), ('recent', 0.021), ('nlp', 0.02), ('positive', 0.02), ('applications', 0.02), ('extraction', 0.018), ('overall', 0.017), ('present', 0.017), ('significantly', 0.016), ('linguistic', 0.015), ('according', 0.015), ('specific', 0.015), ('main', 0.014), ('value', 0.014), ('evaluate', 0.014), ('higher', 0.014), ('terms', 0.014), ('processing', 0.013), ('structure', 0.013), ('automatic', 0.011), ('including', 0.01), ('systems', 0.007), ('text', 0.007), ('may', 0.006), ('evaluation', 0.006), ('system', 0.006), ('information', 0.005), ('natural', 0.004), ('data', 0.004), ('language', 0.003), ('number', 0.002), ('results', 0.002), ('introduction', 0.001), ('also', 0.001), ('used', 0.0), ('et', 0.0), ('abstract', 0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos
Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.
2 0.22941074 294 acl-2011-Temporal Evaluation
Author: Naushad UzZaman ; James Allen
Abstract: In this paper we propose a new method for evaluating systems that extract temporal information from text. It uses temporal closure1 to reward relations that are equivalent but distinct. Our metric measures the overall performance of systems with a single score, making comparison between different systems straightforward. Our approach is easy to implement, intuitive, accurate, scalable and computationally inexpensive. 1
3 0.071911819 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
Author: Lonneke van der Plas ; Paola Merlo ; James Henderson
Abstract: Broad-coverage semantic annotations for training statistical learners are only available for a handful of languages. Previous approaches to cross-lingual transfer of semantic annotations have addressed this problem with encouraging results on a small scale. In this paper, we scale up previous efforts by using an automatic approach to semantic annotation that does not rely on a semantic ontology for the target language. Moreover, we improve the quality of the transferred semantic annotations by using a joint syntacticsemantic parser that learns the correlations between syntax and semantics of the target language and smooths out the errors from automatic transfer. We reach a labelled F-measure for predicates and arguments of only 4% and 9% points, respectively, lower than the upper bound from manual annotations.
4 0.052309044 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
5 0.042797014 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction
Author: Bo Li ; Eric Gaussier ; Akiko Aizawa
Abstract: We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.
6 0.039771054 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
7 0.036035102 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
8 0.03241365 122 acl-2011-Event Extraction as Dependency Parsing
9 0.032089733 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
10 0.031270042 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
11 0.030251054 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
12 0.029065846 175 acl-2011-Integrating history-length interpolation and classes in language modeling
13 0.027165513 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction
14 0.027124325 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
15 0.026320728 141 acl-2011-Gappy Phrasal Alignment By Agreement
16 0.026164796 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing
17 0.025150642 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal
18 0.024468366 74 acl-2011-Combining Indicators of Allophony
19 0.023855368 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
20 0.023603264 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
topicId topicWeight
[(0, 0.058), (1, 0.012), (2, -0.034), (3, 0.007), (4, 0.005), (5, 0.019), (6, 0.028), (7, -0.011), (8, 0.008), (9, -0.029), (10, -0.007), (11, -0.039), (12, 0.011), (13, 0.0), (14, -0.027), (15, -0.031), (16, 0.035), (17, -0.01), (18, 0.005), (19, -0.006), (20, 0.022), (21, 0.036), (22, -0.02), (23, -0.038), (24, 0.018), (25, 0.005), (26, -0.009), (27, -0.025), (28, -0.004), (29, -0.035), (30, -0.017), (31, 0.007), (32, 0.049), (33, 0.007), (34, -0.015), (35, 0.027), (36, -0.114), (37, 0.019), (38, 0.027), (39, -0.093), (40, -0.019), (41, 0.055), (42, 0.017), (43, -0.128), (44, -0.109), (45, -0.008), (46, -0.028), (47, -0.02), (48, 0.051), (49, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.94761842 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos
Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.
2 0.81915849 294 acl-2011-Temporal Evaluation
Author: Naushad UzZaman ; James Allen
Abstract: In this paper we propose a new method for evaluating systems that extract temporal information from text. It uses temporal closure1 to reward relations that are equivalent but distinct. Our metric measures the overall performance of systems with a single score, making comparison between different systems straightforward. Our approach is easy to implement, intuitive, accurate, scalable and computationally inexpensive. 1
3 0.53048539 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
4 0.50979698 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
Author: Eduardo Blanco ; Dan Moldovan
Abstract: This paper presents an unsupervised method for deriving inference axioms by composing semantic relations. The method is independent of any particular relation inventory. It relies on describing semantic relations using primitives and manipulating these primitives according to an algebra. The method was tested using a set of eight semantic relations yielding 78 inference axioms which were evaluated over PropBank.
5 0.41779774 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices
Author: Alexis Nasr ; Frederic Bechet ; Jean-Francois Rey ; Benoit Favre ; Joseph Le Roux
Abstract: MACAON is a tool suite for standard NLP tasks developed for French. MACAON has been designed to process both human-produced text and highly ambiguous word-lattices produced by NLP tools. MACAON is made of several native modules for common tasks such as a tokenization, a part-of-speech tagging or syntactic parsing, all communicating with each other through XML files . In addition, exchange protocols with external tools are easily definable. MACAON is a fast, modular and open tool, distributed under GNU Public License.
6 0.40365013 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
7 0.38855547 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications
8 0.38771766 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
9 0.38661188 273 acl-2011-Semantic Representation of Negation Using Focus Detection
10 0.3829906 194 acl-2011-Language Use: What can it tell us?
11 0.37320855 311 acl-2011-Translationese and Its Dialects
12 0.36563191 8 acl-2011-A Corpus of Scope-disambiguated English Text
13 0.36400855 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
14 0.35005078 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking
15 0.34989533 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal
16 0.34839344 121 acl-2011-Event Discovery in Social Media Feeds
17 0.34701192 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework
18 0.34612408 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
19 0.342877 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
20 0.34270966 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
topicId topicWeight
[(5, 0.049), (15, 0.464), (17, 0.017), (37, 0.029), (39, 0.075), (41, 0.035), (55, 0.031), (59, 0.064), (72, 0.014), (91, 0.014), (96, 0.08)]
simIndex simValue paperId paperTitle
same-paper 1 0.78305864 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
Author: Andre Bittar ; Pascal Amsili ; Pascal Denis ; Laurence Danlos
Abstract: This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time.
2 0.43563992 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis
Author: Hongning Wang ; Duo Zhang ; ChengXiang Zhai
Abstract: Topic models have been successfully applied to many document analysis tasks to discover topics embedded in text. However, existing topic models generally cannot capture the latent topical structures in documents. Since languages are intrinsically cohesive and coherent, modeling and discovering latent topical transition structures within documents would be beneficial for many text analysis tasks. In this work, we propose a new topic model, Structural Topic Model, which simultaneously discovers topics and reveals the latent topical structures in text through explicitly modeling topical transitions with a latent first-order Markov chain. Experiment results show that the proposed Structural Topic Model can effectively discover topical structures in text, and the identified structures significantly improve the performance of tasks such as sentence annotation and sentence ordering. ,
3 0.43288136 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
Author: Lonneke van der Plas ; Paola Merlo ; James Henderson
Abstract: Broad-coverage semantic annotations for training statistical learners are only available for a handful of languages. Previous approaches to cross-lingual transfer of semantic annotations have addressed this problem with encouraging results on a small scale. In this paper, we scale up previous efforts by using an automatic approach to semantic annotation that does not rely on a semantic ontology for the target language. Moreover, we improve the quality of the transferred semantic annotations by using a joint syntacticsemantic parser that learns the correlations between syntax and semantics of the target language and smooths out the errors from automatic transfer. We reach a labelled F-measure for predicates and arguments of only 4% and 9% points, respectively, lower than the upper bound from manual annotations.
4 0.34359029 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
Author: Dong Wang ; Yang Liu
Abstract: This paper presents a pilot study of opinion summarization on conversations. We create a corpus containing extractive and abstractive summaries of speaker’s opinion towards a given topic using 88 telephone conversations. We adopt two methods to perform extractive summarization. The first one is a sentence-ranking method that linearly combines scores measured from different aspects including topic relevance, subjectivity, and sentence importance. The second one is a graph-based method, which incorporates topic and sentiment information, as well as additional information about sentence-to-sentence relations extracted based on dialogue structure. Our evaluation results show that both methods significantly outperform the baseline approach that extracts the longest utterances. In particular, we find that incorporating dialogue structure in the graph-based method contributes to the improved system performance.
5 0.27045572 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity
Author: Jacob Eisenstein ; Noah A. Smith ; Eric P. Xing
Abstract: We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ‘1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.
6 0.2664147 192 acl-2011-Language-Independent Parsing with Empty Elements
7 0.26513547 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
8 0.26278877 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
9 0.259067 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
10 0.2584486 52 acl-2011-Automatic Labelling of Topic Models
11 0.25804329 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
12 0.2578052 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
13 0.25721371 178 acl-2011-Interactive Topic Modeling
14 0.25637928 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser
15 0.25460148 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
16 0.25444672 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
17 0.25438079 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
18 0.25371936 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
19 0.253539 182 acl-2011-Joint Annotation of Search Queries
20 0.25306275 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing