acl acl2013 acl2013-100 knowledge-graph by maker-knowledge-mining

100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web


Source: pdf

Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein

Abstract: unkown-abstract

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('weimar', 0.773), ('potthast', 0.297), ('logs', 0.244), ('matthias', 0.237), ('hagen', 0.22), ('reuse', 0.192), ('crowdsourcing', 0.183), ('germany', 0.173), ('interaction', 0.129), ('martin', 0.116), ('understand', 0.115), ('web', 0.054), ('michael', 0.05), ('text', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web

Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein

Abstract: unkown-abstract

2 0.10352691 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

Author: Anoop Kunchukuttan ; Rajen Chatterjee ; Shourya Roy ; Abhijit Mishra ; Pushpak Bhattacharyya

Abstract: Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation ofconstituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. Trans- Doop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.

3 0.055846833 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy

Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.

4 0.026736367 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

Author: Joanne Boisson ; Ting-Hui Kao ; Jian-Cheng Wu ; Tzu-Hsi Yen ; Jason S. Chang

Abstract: In this paper, we introduce a Web-scale linguistics search engine, Linggle, that retrieves lexical bundles in response to a given query. The query might contain keywords, wildcards, wild parts of speech (PoS), synonyms, and additional regular expression (RE) operators. In our approach, we incorporate inverted file indexing, PoS information from BNC, and semantic indexing based on Latent Dirichlet Allocation with Google Web 1T. The method involves parsing the query to transforming it into several keyword retrieval commands. Word chunks are retrieved with counts, further filtering the chunks with the query as a RE, and finally displaying the results according to the counts, similarities, and topics. Clusters of synonyms or conceptually related words are also provided. In addition, Linggle provides example sentences from The New York Times on demand. The current implementation of Linggle is the most functionally comprehensive, and is in principle language and dataset independent. We plan to extend Linggle to provide fast and convenient access to a wealth of linguistic information embodied in Web scale datasets including Google Web 1T and Google Books Ngram for many major languages in the world. 1

5 0.026731905 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees

Author: Alan Akbik ; Oresti Konomi ; Michail Melnikov

Abstract: The use ofdeep syntactic information such as typed dependencies has been shown to be very effective in Information Extraction. Despite this potential, the process of manually creating rule-based information extractors that operate on dependency trees is not intuitive for persons without an extensive NLP background. In this system demonstration, we present a tool and a workflow designed to enable initiate users to interactively explore the effect and expressivity of creating Information Extraction rules over dependency trees. We introduce the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROP- MINER tool.

6 0.025343237 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

7 0.022039253 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems

8 0.021196349 265 acl-2013-Outsourcing FrameNet to the Crowd

9 0.013776715 121 acl-2013-Discovering User Interactions in Ideological Discussions

10 0.013483213 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

11 0.012277381 29 acl-2013-A Visual Analytics System for Cluster Exploration

12 0.012184862 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures

13 0.011452003 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

14 0.010498324 146 acl-2013-Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications

15 0.010422627 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

16 0.010261946 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

17 0.010105754 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

18 0.0097481282 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

19 0.0094902525 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

20 0.0092241876 242 acl-2013-Mining Equivalent Relations from Linked Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.015), (1, 0.004), (2, 0.005), (3, -0.008), (4, 0.015), (5, -0.007), (6, -0.0), (7, -0.013), (8, 0.02), (9, 0.007), (10, -0.019), (11, 0.017), (12, 0.002), (13, 0.018), (14, -0.014), (15, -0.029), (16, -0.007), (17, -0.004), (18, 0.033), (19, 0.004), (20, -0.021), (21, 0.0), (22, -0.021), (23, 0.011), (24, -0.007), (25, -0.036), (26, -0.024), (27, 0.021), (28, -0.006), (29, 0.04), (30, -0.034), (31, 0.033), (32, 0.042), (33, -0.03), (34, -0.026), (35, -0.012), (36, -0.024), (37, -0.02), (38, -0.015), (39, -0.043), (40, 0.023), (41, -0.032), (42, -0.035), (43, 0.002), (44, -0.006), (45, 0.017), (46, -0.062), (47, 0.038), (48, -0.062), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96702802 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web

Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein

Abstract: unkown-abstract

2 0.46891105 265 acl-2013-Outsourcing FrameNet to the Crowd

Author: Marco Fossati ; Claudio Giuliano ; Sara Tonelli

Abstract: We present the first attempt to perform full FrameNet annotation with crowdsourcing techniques. We compare two approaches: the first one is the standard annotation methodology of lexical units and frame elements in two steps, while the second is a novel approach aimed at acquiring frames in a bottom-up fashion, starting from frame element annotation. We show that our methodology, relying on a single annotation step and on simplified role definitions, outperforms the standard one both in terms of accuracy and time.

3 0.40054697 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

Author: Anoop Kunchukuttan ; Rajen Chatterjee ; Shourya Roy ; Abhijit Mishra ; Pushpak Bhattacharyya

Abstract: Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation ofconstituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. Trans- Doop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.

4 0.32399067 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy

Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.

5 0.28936571 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

Author: Shachar Mirkin ; Sriram Venkatapathy ; Marc Dymetman ; Ioan Calapodescu

Abstract: The quality of automatic translation is affected by many factors. One is the divergence between the specific source and target languages. Another lies in the source text itself, as some texts are more complex than others. One way to handle such texts is to modify them prior to translation. Yet, an important factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. In this paper we present an interactive system where source modifications are induced by confidence estimates that are derived from the translation model in use. Modifications are automatically generated and proposed for the user’s ap- proval. Such a system can reduce postediting effort, replacing it by cost-effective pre-editing that can be done by monolinguals.

6 0.27302456 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation

7 0.2624968 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems

8 0.26091048 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

9 0.25195581 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

10 0.24960668 250 acl-2013-Models of Translation Competitions

11 0.24667372 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance

12 0.2399133 224 acl-2013-Learning to Extract International Relations from Political Context

13 0.23760392 128 acl-2013-Does Korean defeat phonotactic word segmentation?

14 0.23073484 64 acl-2013-Automatically Predicting Sentence Translation Difficulty

15 0.23033163 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

16 0.22438984 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis

17 0.22025293 310 acl-2013-Semantic Frames to Predict Stock Price Movement

18 0.21999089 124 acl-2013-Discriminative state tracking for spoken dialog systems

19 0.21310514 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web

20 0.2118548 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(67, 0.74)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95147634 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web

Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein

Abstract: unkown-abstract

2 0.35147607 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality

Author: Gae-won You ; Young-rok Cha ; Jinhan Kim ; Seung-won Hwang

Abstract: This paper studies named entity translation and proposes “selective temporality” as a new feature, as using temporal features may be harmful for translating “atemporal” entities. Our key contribution is building an automatic classifier to distinguish temporal and atemporal entities then align them in separate procedures to boost translation accuracy by 6. 1%.

3 0.3021498 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis

Author: Rudolf Rosa ; David Marecek ; Ales Tamchyna

Abstract: Deepfix is a statistical post-editing system for improving the quality of statistical machine translation outputs. It attempts to correct errors in verb-noun valency using deep syntactic analysis and a simple probabilistic model of valency. On the English-to-Czech translation pair, we show that statistical post-editing of statistical machine translation leads to an improvement of the translation quality when helped by deep linguistic knowledge.

4 0.29557514 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network

Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu

Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.

5 0.20772509 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

Author: Sebastian Martschat

Abstract: We present an unsupervised model for coreference resolution that casts the problem as a clustering task in a directed labeled weighted multigraph. The model outperforms most systems participating in the English track of the CoNLL’ 12 shared task.

6 0.20454164 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

7 0.11394136 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

8 0.097029559 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

9 0.082633771 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

10 0.073321164 294 acl-2013-Re-embedding words

11 0.064736202 219 acl-2013-Learning Entity Representation for Entity Disambiguation

12 0.064220652 84 acl-2013-Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling

13 0.054257691 275 acl-2013-Parsing with Compositional Vector Grammars

14 0.050495617 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

15 0.037409943 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation

16 0.0 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

17 0.0 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

18 0.0 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.

19 0.0 4 acl-2013-A Context Free TAG Variant

20 0.0 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art