acl acl2013 acl2013-100 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein
Abstract: unkown-abstract
Reference: text
sentIndex sentText sentNum sentScore
wordName wordTfidf (topN-words)
[('weimar', 0.773), ('potthast', 0.297), ('logs', 0.244), ('matthias', 0.237), ('hagen', 0.22), ('reuse', 0.192), ('crowdsourcing', 0.183), ('germany', 0.173), ('interaction', 0.129), ('martin', 0.116), ('understand', 0.115), ('web', 0.054), ('michael', 0.05), ('text', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web
Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein
Abstract: unkown-abstract
2 0.10352691 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
Author: Anoop Kunchukuttan ; Rajen Chatterjee ; Shourya Roy ; Abhijit Mishra ; Pushpak Bhattacharyya
Abstract: Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation ofconstituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. Trans- Doop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.
3 0.055846833 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy
Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.
4 0.026736367 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
Author: Joanne Boisson ; Ting-Hui Kao ; Jian-Cheng Wu ; Tzu-Hsi Yen ; Jason S. Chang
Abstract: In this paper, we introduce a Web-scale linguistics search engine, Linggle, that retrieves lexical bundles in response to a given query. The query might contain keywords, wildcards, wild parts of speech (PoS), synonyms, and additional regular expression (RE) operators. In our approach, we incorporate inverted file indexing, PoS information from BNC, and semantic indexing based on Latent Dirichlet Allocation with Google Web 1T. The method involves parsing the query to transforming it into several keyword retrieval commands. Word chunks are retrieved with counts, further filtering the chunks with the query as a RE, and finally displaying the results according to the counts, similarities, and topics. Clusters of synonyms or conceptually related words are also provided. In addition, Linggle provides example sentences from The New York Times on demand. The current implementation of Linggle is the most functionally comprehensive, and is in principle language and dataset independent. We plan to extend Linggle to provide fast and convenient access to a wealth of linguistic information embodied in Web scale datasets including Google Web 1T and Google Books Ngram for many major languages in the world. 1
5 0.026731905 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
Author: Alan Akbik ; Oresti Konomi ; Michail Melnikov
Abstract: The use ofdeep syntactic information such as typed dependencies has been shown to be very effective in Information Extraction. Despite this potential, the process of manually creating rule-based information extractors that operate on dependency trees is not intuitive for persons without an extensive NLP background. In this system demonstration, we present a tool and a workflow designed to enable initiate users to interactively explore the effect and expressivity of creating Information Extraction rules over dependency trees. We introduce the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROP- MINER tool.
6 0.025343237 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
7 0.022039253 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems
8 0.021196349 265 acl-2013-Outsourcing FrameNet to the Crowd
9 0.013776715 121 acl-2013-Discovering User Interactions in Ideological Discussions
10 0.013483213 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
11 0.012277381 29 acl-2013-A Visual Analytics System for Cluster Exploration
12 0.012184862 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
13 0.011452003 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
15 0.010422627 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking
16 0.010261946 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE
17 0.010105754 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
18 0.0097481282 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity
19 0.0094902525 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints
20 0.0092241876 242 acl-2013-Mining Equivalent Relations from Linked Data
topicId topicWeight
[(0, 0.015), (1, 0.004), (2, 0.005), (3, -0.008), (4, 0.015), (5, -0.007), (6, -0.0), (7, -0.013), (8, 0.02), (9, 0.007), (10, -0.019), (11, 0.017), (12, 0.002), (13, 0.018), (14, -0.014), (15, -0.029), (16, -0.007), (17, -0.004), (18, 0.033), (19, 0.004), (20, -0.021), (21, 0.0), (22, -0.021), (23, 0.011), (24, -0.007), (25, -0.036), (26, -0.024), (27, 0.021), (28, -0.006), (29, 0.04), (30, -0.034), (31, 0.033), (32, 0.042), (33, -0.03), (34, -0.026), (35, -0.012), (36, -0.024), (37, -0.02), (38, -0.015), (39, -0.043), (40, 0.023), (41, -0.032), (42, -0.035), (43, 0.002), (44, -0.006), (45, 0.017), (46, -0.062), (47, 0.038), (48, -0.062), (49, 0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.96702802 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web
Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein
Abstract: unkown-abstract
2 0.46891105 265 acl-2013-Outsourcing FrameNet to the Crowd
Author: Marco Fossati ; Claudio Giuliano ; Sara Tonelli
Abstract: We present the first attempt to perform full FrameNet annotation with crowdsourcing techniques. We compare two approaches: the first one is the standard annotation methodology of lexical units and frame elements in two steps, while the second is a novel approach aimed at acquiring frames in a bottom-up fashion, starting from frame element annotation. We show that our methodology, relying on a single annotation step and on simplified role definitions, outperforms the standard one both in terms of accuracy and time.
3 0.40054697 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
Author: Anoop Kunchukuttan ; Rajen Chatterjee ; Shourya Roy ; Abhijit Mishra ; Pushpak Bhattacharyya
Abstract: Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation ofconstituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. Trans- Doop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.
Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy
Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.
5 0.28936571 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation
Author: Shachar Mirkin ; Sriram Venkatapathy ; Marc Dymetman ; Ioan Calapodescu
Abstract: The quality of automatic translation is affected by many factors. One is the divergence between the specific source and target languages. Another lies in the source text itself, as some texts are more complex than others. One way to handle such texts is to modify them prior to translation. Yet, an important factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. In this paper we present an interactive system where source modifications are induced by confidence estimates that are derived from the translation model in use. Modifications are automatically generated and proposed for the user’s ap- proval. Such a system can reduce postediting effort, replacing it by cost-effective pre-editing that can be done by monolinguals.
6 0.27302456 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation
7 0.2624968 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems
8 0.26091048 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
9 0.25195581 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
10 0.24960668 250 acl-2013-Models of Translation Competitions
11 0.24667372 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
12 0.2399133 224 acl-2013-Learning to Extract International Relations from Political Context
13 0.23760392 128 acl-2013-Does Korean defeat phonotactic word segmentation?
14 0.23073484 64 acl-2013-Automatically Predicting Sentence Translation Difficulty
15 0.23033163 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
16 0.22438984 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis
17 0.22025293 310 acl-2013-Semantic Frames to Predict Stock Price Movement
18 0.21999089 124 acl-2013-Discriminative state tracking for spoken dialog systems
19 0.21310514 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
20 0.2118548 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
topicId topicWeight
[(67, 0.74)]
simIndex simValue paperId paperTitle
same-paper 1 0.95147634 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web
Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein
Abstract: unkown-abstract
2 0.35147607 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality
Author: Gae-won You ; Young-rok Cha ; Jinhan Kim ; Seung-won Hwang
Abstract: This paper studies named entity translation and proposes “selective temporality” as a new feature, as using temporal features may be harmful for translating “atemporal” entities. Our key contribution is building an automatic classifier to distinguish temporal and atemporal entities then align them in separate procedures to boost translation accuracy by 6. 1%.
Author: Rudolf Rosa ; David Marecek ; Ales Tamchyna
Abstract: Deepfix is a statistical post-editing system for improving the quality of statistical machine translation outputs. It attempts to correct errors in verb-noun valency using deep syntactic analysis and a simple probabilistic model of valency. On the English-to-Czech translation pair, we show that statistical post-editing of statistical machine translation leads to an improvement of the translation quality when helped by deep linguistic knowledge.
4 0.29557514 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
5 0.20772509 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
Author: Sebastian Martschat
Abstract: We present an unsupervised model for coreference resolution that casts the problem as a clustering task in a directed labeled weighted multigraph. The model outperforms most systems participating in the English track of the CoNLL’ 12 shared task.
6 0.20454164 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals
7 0.11394136 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
8 0.097029559 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
9 0.082633771 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
10 0.073321164 294 acl-2013-Re-embedding words
11 0.064736202 219 acl-2013-Learning Entity Representation for Entity Disambiguation
12 0.064220652 84 acl-2013-Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling
13 0.054257691 275 acl-2013-Parsing with Compositional Vector Grammars
14 0.050495617 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
15 0.037409943 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
18 0.0 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.
19 0.0 4 acl-2013-A Context Free TAG Variant