acl acl2013 acl2013-239 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pedro Fialho ; Luisa Coheur ; Sergio Curto ; Pedro Claudio ; Angela Costa ; Alberto Abad ; Hugo Meinedo ; Isabel Trancoso
Abstract: In this paper we describe a platform for embodied conversational agents with tutoring goals, which takes as input written and spoken questions and outputs answers in both forms. The platform is developed within a game environment, and currently allows speech recognition and synthesis in Portuguese, English and Spanish. In this paper we focus on its understanding component that supports in-domain interactions, and also small talk. Most indomain interactions are answered using different similarity metrics, which compare the perceived utterances with questions/sentences in the agent’s knowledge base; small-talk capabilities are mainly due to AIML, a language largely used by the chatbots’ community. In this paper we also introduce EDGAR, the butler of MONSERRATE, which was developed in the aforementioned platform, and that answers tourists’ questions about MONSERRATE.
Reference: text
sentIndex sentText sentNum sentScore
1 Meet EDGAR, a tutoring agent at MONSERRATE Pedro Fialho, Lu´ ısa Coheur, S ´ergio Curto, Pedro Cl´ audio Aˆngela Costa, Alberto Abad, Hugo Meinedo and Isabel Trancoso Spoken Language Systems Lab (L2F), INESC-ID Rua Alves Redol 9 1000-029 Lisbon, Portugal name . [sent-1, score-0.195]
2 pt 2 Abstract In this paper we describe a platform for embodied conversational agents with tutoring goals, which takes as input written and spoken questions and outputs answers in both forms. [sent-4, score-0.726]
3 The platform is developed within a game environment, and currently allows speech recognition and synthesis in Portuguese, English and Spanish. [sent-5, score-0.367]
4 Most indomain interactions are answered using different similarity metrics, which compare the perceived utterances with questions/sentences in the agent’s knowledge base; small-talk capabilities are mainly due to AIML, a language largely used by the chatbots’ community. [sent-7, score-0.169]
5 In this paper we also introduce EDGAR, the butler of MONSERRATE, which was developed in the aforementioned platform, and that answers tourists’ questions about MONSERRATE. [sent-8, score-0.239]
6 1 Introduction Several initiatives have been taking place in the last years, targeting the concept of Edutainment, that is, education through entertainment. [sent-9, score-0.031]
7 , 2011), and Sergeant Blackwell, installed in the Cooper-Hewitt National Design Museum in New York, is used by the U. [sent-11, score-0.03]
8 , 2009) and EDGAR are also examples of virtual characters for the Portuguese language with the same edutainment goal: DuARTE Digital answers questions about Cust o´dia de Bel e´m, a famous work of the Portuguese jewelry; EDGAR is a virtual butler that answers questions about MONSERRATE (Figure 1). [sent-17, score-0.753]
9 However, as expected, people tend also to make small talk when interacting with these agents. [sent-20, score-0.117]
10 In this paper, we describe the platform behind EDGAR, which we developed aiming at the fast insertion of in-domain knowledge, and to deal with small talk. [sent-23, score-0.244]
11 This platform is currently in the process of being industrially applied by a company known for its expertise in building and deploying kiosks. [sent-24, score-0.238]
12 We will provide the hardware and software required to demonstrate EDGAR, both on a computer and on a tablet. [sent-25, score-0.035]
13 This paper is organized as follows: in Section 2 we present EDGAR’s development platform 61 ProceedinSgosfi oa,f tB huel 5g1arsita, An Anuugauls Mt 4e-e9ti n2g01 o3f. [sent-26, score-0.208]
14 2 The Embodied Conversational Agent platform 2. [sent-30, score-0.208]
15 1 Architecture overview The architecture of the platform, generally designed for the development of Embodied Conversational Agents (ECAs) (such as EDGAR), is shown in Figure 2. [sent-31, score-0.04]
16 In this platform, several modules intercommunicate by means of well defined protocols, thus leveraging the capabilities of independent modules focused on specific tasks, such as speech recognition or 3D rendering/animation. [sent-32, score-0.216]
17 This independence allows us to use subsets of this platform modules in scenarios with different requirements (for instance, we can record characters uttering a text). [sent-33, score-0.35]
18 Design and deployment of the front end of EDGAR is performed in a game engine, which has enabled the use of computer graphics technologies and high quality assets, as seen in the video game industry. [sent-34, score-0.239]
19 2 Multimodal components The game environment, where all the interaction with EDGAR takes place, is developed in the Unity1 platform, being composed of one highly 1http : / /unity3d . [sent-36, score-0.123]
20 com/ detailed character, made and animated by Rocketbox studios2, a virtual keyboard and a push-whiletalking button. [sent-37, score-0.335]
21 Language models were interpolated with all the domain questions defined in the Natural Language Understanding (NLU) framework (see below), while ASR includes features such as speech/nonspeech (SNS) detection and automatic gain control (AGC). [sent-41, score-0.084]
22 Speech captured in a public space raises several ASR robustness issues, such as loudness variability of spoken utterances, which is particularly bound to happen in a museological environment (such as MONSERRATE) where silence is usually incited. [sent-42, score-0.208]
23 Thus, we have added a bounded amplication to the captured signal, despite the AGC mechanism, ensuring that too silent sounds are not discarded by the SNS mechanism. [sent-43, score-0.029]
24 Upon a spoken input, AUDIMUS translates it into a sentence, with a confidence value. [sent-44, score-0.05]
25 An empty recognition result, or one with low confidence, triggers a control tag (“ REPEAT ”) to the NLU module, which results in a request for the user to repeat what was said. [sent-45, score-0.04]
26 The answer returned by the NLU module is synthesized in a language dependent Text To Speech (TTS) system, with DIXI (Paulo et al. [sent-46, score-0.166]
27 com/ 62 synthesized audio is played while the corresponding phonemes are mapped into visemes, represented as skeletal animations, being synchronized according to phoneme durations, available in all the employed TTS engines. [sent-51, score-0.216]
28 Emotions are declared in the knowledge sources of the agent. [sent-52, score-0.078]
29 Figure 3: The EDGAR character in a joyful state. [sent-54, score-0.056]
30 3 Interacting with EDGAR In a typical interaction, the user enters a question with a virtual keyboard or says it to the microphone while pressing a button (Figure 4), in the language chosen in the interface (as previously said, Portuguese, English or Spanish). [sent-56, score-0.287]
31 Figure 4: A question written in the EDGAR inter- face. [sent-57, score-0.028]
32 Then, the ASR will transcribe it and the NLU module will process it. [sent-58, score-0.102]
33 Afterwards, the answer, chosen by the NLU module, is heard through the speakers, due to the TTS, and sequentially written in a talk bubble, according to the produced speech. [sent-59, score-0.118]
34 The answer is accompanied with visemes, represented by movements of the character’s mouth/lips, and by facial emotions as marked in the answers of the NLU knowledge base. [sent-60, score-0.185]
35 A demo of EDGAR, only for English interactions, can be tested in https : / / edgar . [sent-61, score-0.711]
36 1 In-domain knowledge sources The in-domain knowledge sources of the agent are XML files, hand-crafted by domain experts. [sent-67, score-0.184]
37 This XML files have multilingual pairs constituted by different paraphrases of the same question and possible answers. [sent-68, score-0.074]
38 The main reason to follow this approach (and contrary to other works where grammars are used), is to ease the process of creating/enriching the knowledge sources of the agent being developed, which is typically done by non experts in linguistics or computer science. [sent-69, score-0.137]
39 , 2006), where the agents knowledge sources are easy to create and maintain. [sent-71, score-0.148]
wordName wordTfidf (topN-words)
[('edgar', 0.711), ('nlu', 0.225), ('platform', 0.208), ('monserrate', 0.169), ('virtual', 0.148), ('meinedo', 0.127), ('animated', 0.112), ('portuguese', 0.107), ('asr', 0.107), ('agents', 0.101), ('embodied', 0.093), ('agent', 0.09), ('tts', 0.089), ('game', 0.087), ('interactions', 0.086), ('agc', 0.085), ('audimus', 0.085), ('edutainment', 0.085), ('visemes', 0.085), ('butler', 0.075), ('keyboard', 0.075), ('answers', 0.073), ('museums', 0.069), ('duarte', 0.069), ('module', 0.067), ('modules', 0.066), ('sns', 0.065), ('synthesized', 0.065), ('interacting', 0.062), ('tutoring', 0.059), ('conversational', 0.059), ('character', 0.056), ('talk', 0.055), ('questions', 0.055), ('environment', 0.055), ('pedro', 0.052), ('spoken', 0.05), ('ine', 0.048), ('capabilities', 0.048), ('emotions', 0.047), ('sources', 0.047), ('audio', 0.046), ('multimodal', 0.046), ('xml', 0.045), ('characters', 0.041), ('repeat', 0.04), ('architecture', 0.04), ('digital', 0.038), ('skeletal', 0.037), ('constituted', 0.037), ('tales', 0.037), ('bubble', 0.037), ('synchronized', 0.037), ('animations', 0.037), ('leuski', 0.037), ('silence', 0.037), ('alves', 0.037), ('assets', 0.037), ('ergio', 0.037), ('loudness', 0.037), ('recruiting', 0.037), ('sergeant', 0.037), ('files', 0.037), ('developed', 0.036), ('speech', 0.036), ('utterances', 0.035), ('hardware', 0.035), ('paulo', 0.035), ('graphics', 0.035), ('heard', 0.035), ('heinz', 0.035), ('tourists', 0.035), ('mendes', 0.035), ('uttering', 0.035), ('pressing', 0.035), ('jewelry', 0.035), ('transcribe', 0.035), ('answer', 0.034), ('robinson', 0.033), ('durations', 0.033), ('trancoso', 0.033), ('zen', 0.033), ('costa', 0.033), ('establishing', 0.031), ('declared', 0.031), ('phonemes', 0.031), ('initiatives', 0.031), ('andersen', 0.031), ('isabel', 0.031), ('protocols', 0.031), ('facial', 0.031), ('deployment', 0.03), ('museum', 0.03), ('deploying', 0.03), ('installed', 0.03), ('captured', 0.029), ('surname', 0.029), ('interpolated', 0.029), ('button', 0.029), ('written', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE
Author: Pedro Fialho ; Luisa Coheur ; Sergio Curto ; Pedro Claudio ; Angela Costa ; Alberto Abad ; Hugo Meinedo ; Isabel Trancoso
Abstract: In this paper we describe a platform for embodied conversational agents with tutoring goals, which takes as input written and spoken questions and outputs answers in both forms. The platform is developed within a game environment, and currently allows speech recognition and synthesis in Portuguese, English and Spanish. In this paper we focus on its understanding component that supports in-domain interactions, and also small talk. Most indomain interactions are answered using different similarity metrics, which compare the perceived utterances with questions/sentences in the agent’s knowledge base; small-talk capabilities are mainly due to AIML, a language largely used by the chatbots’ community. In this paper we also introduce EDGAR, the butler of MONSERRATE, which was developed in the aforementioned platform, and that answers tourists’ questions about MONSERRATE.
2 0.077382654 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP
Author: Valentin Tablan ; Kalina Bontcheva ; Ian Roberts ; Hamish Cunningham ; Marin Dimitrov
Abstract: This paper presents AnnoMarket, an open cloud-based platform which enables researchers to deploy, share, and use language processing components and resources, following the data-as-a-service and software-as-a-service paradigms. The focus is on multilingual text analysis resources and services, based on an opensource infrastructure and compliant with relevant NLP standards. We demonstrate how the AnnoMarket platform can be used to develop NLP applications with little or no programming, to index the results for enhanced browsing and search, and to evaluate performance. Utilising AnnoMarket is straightforward, since cloud infrastructural issues are dealt with by the platform, completely transparently to the user: load balancing, efficient data upload and storage, deployment on the virtual machines, security, and fault tolerance.
3 0.061797388 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency
Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
4 0.055795006 150 acl-2013-Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications
Author: Georgios Kontonatsios ; Paul Thompson ; Riza Theresa Batista-Navarro ; Claudiu Mihaila ; Ioannis Korkontzelos ; Sophia Ananiadou
Abstract: U-Compare is a UIMA-based workflow construction platform for building natural language processing (NLP) applications from heterogeneous language resources (LRs), without the need for programming skills. U-Compare has been adopted within the context of the METANET Network of Excellence, and over 40 LRs that process 15 European languages have been added to the U-Compare component library. In line with METANET’s aims of increasing communication between citizens of different European countries, U-Compare has been extended to facilitate the development of a wider range of applications, including both mul- tilingual and multimodal workflows. The enhancements exploit the UIMA Subject of Analysis (Sofa) mechanism, that allows different facets of the input data to be represented. We demonstrate how our customised extensions to U-Compare allow the construction and testing of NLP applications that transform the input data in different ways, e.g., machine translation, automatic summarisation and text-to-speech.
5 0.05451737 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?
Author: Naresh Kumar Elluru ; Anandaswarup Vadapalli ; Raghavendra Elluru ; Hema Murthy ; Kishore Prahallad
Abstract: In this paper, we relook at the problem of pronunciation of English words using native phone set. Specifically, we investigate methods of pronouncing English words using Telugu phoneset in the con- text of Telugu Text-to-Speech. We compare phone-phone substitution and wordphone mapping for pronunciation of English words using Telugu phones. We are not considering other than native language phoneset in all our experiments. This differentiates our approach from other works in polyglot speech synthesis.
6 0.050872732 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions
7 0.047159422 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs
8 0.043136027 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
9 0.04073678 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
10 0.040130872 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
11 0.037017092 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
12 0.03687026 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
13 0.036584996 62 acl-2013-Automatic Term Ambiguity Detection
14 0.035402611 184 acl-2013-Identification of Speakers in Novels
15 0.035080738 220 acl-2013-Learning Latent Personas of Film Characters
16 0.03380983 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
17 0.033098582 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
18 0.03294757 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
19 0.032678224 292 acl-2013-Question Classification Transfer
20 0.032556627 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation
topicId topicWeight
[(0, 0.071), (1, 0.029), (2, -0.01), (3, -0.023), (4, 0.029), (5, -0.009), (6, 0.027), (7, -0.07), (8, 0.051), (9, 0.036), (10, -0.041), (11, -0.023), (12, -0.018), (13, -0.009), (14, -0.01), (15, -0.053), (16, -0.02), (17, 0.034), (18, -0.004), (19, -0.041), (20, -0.068), (21, -0.083), (22, -0.001), (23, -0.007), (24, -0.025), (25, 0.002), (26, 0.061), (27, -0.004), (28, 0.059), (29, -0.026), (30, 0.037), (31, 0.003), (32, -0.052), (33, 0.023), (34, 0.025), (35, 0.047), (36, -0.024), (37, 0.001), (38, -0.005), (39, 0.051), (40, -0.014), (41, -0.006), (42, 0.033), (43, 0.003), (44, -0.059), (45, 0.057), (46, 0.021), (47, -0.001), (48, 0.013), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.92651254 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE
Author: Pedro Fialho ; Luisa Coheur ; Sergio Curto ; Pedro Claudio ; Angela Costa ; Alberto Abad ; Hugo Meinedo ; Isabel Trancoso
Abstract: In this paper we describe a platform for embodied conversational agents with tutoring goals, which takes as input written and spoken questions and outputs answers in both forms. The platform is developed within a game environment, and currently allows speech recognition and synthesis in Portuguese, English and Spanish. In this paper we focus on its understanding component that supports in-domain interactions, and also small talk. Most indomain interactions are answered using different similarity metrics, which compare the perceived utterances with questions/sentences in the agent’s knowledge base; small-talk capabilities are mainly due to AIML, a language largely used by the chatbots’ community. In this paper we also introduce EDGAR, the butler of MONSERRATE, which was developed in the aforementioned platform, and that answers tourists’ questions about MONSERRATE.
Author: Srinivasan Janarthanam ; Oliver Lemon ; Phil Bartie ; Tiphaine Dalmas ; Anna Dickinson ; Xingkun Liu ; William Mackaness ; Bonnie Webber
Abstract: We present a city navigation and tourist information mobile dialogue app with integrated question-answering (QA) and geographic information system (GIS) modules that helps pedestrian users to navigate in and learn about urban environments. In contrast to existing mobile apps which treat these problems independently, our Android app addresses the problem of navigation and touristic questionanswering in an integrated fashion using a shared dialogue context. We evaluated our system in comparison with Samsung S-Voice (which interfaces to Google navigation and Google search) with 17 users and found that users judged our system to be significantly more interesting to interact with and learn from. They also rated our system above Google search (with the Samsung S-Voice interface) for tourist information tasks.
Author: Georgios Kontonatsios ; Paul Thompson ; Riza Theresa Batista-Navarro ; Claudiu Mihaila ; Ioannis Korkontzelos ; Sophia Ananiadou
Abstract: U-Compare is a UIMA-based workflow construction platform for building natural language processing (NLP) applications from heterogeneous language resources (LRs), without the need for programming skills. U-Compare has been adopted within the context of the METANET Network of Excellence, and over 40 LRs that process 15 European languages have been added to the U-Compare component library. In line with METANET’s aims of increasing communication between citizens of different European countries, U-Compare has been extended to facilitate the development of a wider range of applications, including both mul- tilingual and multimodal workflows. The enhancements exploit the UIMA Subject of Analysis (Sofa) mechanism, that allows different facets of the input data to be represented. We demonstrate how our customised extensions to U-Compare allow the construction and testing of NLP applications that transform the input data in different ways, e.g., machine translation, automatic summarisation and text-to-speech.
4 0.58827651 118 acl-2013-Development and Analysis of NLP Pipelines in Argo
Author: Rafal Rak ; Andrew Rowley ; Jacob Carter ; Sophia Ananiadou
Abstract: Developing sophisticated NLP pipelines composed of multiple processing tools and components available through different providers may pose a challenge in terms of their interoperability. The Unstructured Information Management Architecture (UIMA) is an industry standard whose aim is to ensure such interoperability by defining common data structures and interfaces. The architecture has been gaining attention from industry and academia alike, resulting in a large volume ofUIMA-compliant processing components. In this paper, we demonstrate Argo, a Web-based workbench for the development and processing of NLP pipelines/workflows. The workbench is based upon UIMA, and thus has the potential of using many of the existing UIMA resources. We present features, and show examples, offacilitating the distributed development of components and the analysis of processing results. The latter includes annotation visualisers and editors, as well as serialisation to RDF format, which enables flexible querying in addition to data manipulation thanks to the semantic query language SPARQL. The distributed development feature allows users to seamlessly connect their tools to workflows running in Argo, and thus take advantage of both the available library of components (without the need of installing them locally) and the analytical tools.
5 0.56316465 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs
Author: Adam Vogel ; Christopher Potts ; Dan Jurafsky
Abstract: Conversational implicatures involve reasoning about multiply nested belief structures. This complexity poses significant challenges for computational models of conversation and cognition. We show that agents in the multi-agent DecentralizedPOMDP reach implicature-rich interpretations simply as a by-product of the way they reason about each other to maximize joint utility. Our simulations involve a reference game of the sort studied in psychology and linguistics as well as a dynamic, interactional scenario involving implemented artificial agents.
6 0.55427408 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP
7 0.54879487 184 acl-2013-Identification of Speakers in Novels
8 0.53178799 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
9 0.5029881 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions
10 0.50180012 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue
11 0.49179435 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic
12 0.48934001 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
13 0.48003182 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?
14 0.46098974 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
15 0.42938685 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
16 0.41401526 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
17 0.40406668 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
18 0.39853576 163 acl-2013-From Natural Language Specifications to Program Input Parsers
19 0.38000792 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics
20 0.37750447 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
topicId topicWeight
[(0, 0.093), (6, 0.035), (11, 0.032), (15, 0.018), (24, 0.045), (26, 0.046), (27, 0.357), (28, 0.02), (35, 0.055), (42, 0.036), (48, 0.023), (70, 0.044), (88, 0.022), (90, 0.041), (95, 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.77234983 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE
Author: Pedro Fialho ; Luisa Coheur ; Sergio Curto ; Pedro Claudio ; Angela Costa ; Alberto Abad ; Hugo Meinedo ; Isabel Trancoso
Abstract: In this paper we describe a platform for embodied conversational agents with tutoring goals, which takes as input written and spoken questions and outputs answers in both forms. The platform is developed within a game environment, and currently allows speech recognition and synthesis in Portuguese, English and Spanish. In this paper we focus on its understanding component that supports in-domain interactions, and also small talk. Most indomain interactions are answered using different similarity metrics, which compare the perceived utterances with questions/sentences in the agent’s knowledge base; small-talk capabilities are mainly due to AIML, a language largely used by the chatbots’ community. In this paper we also introduce EDGAR, the butler of MONSERRATE, which was developed in the aforementioned platform, and that answers tourists’ questions about MONSERRATE.
2 0.44059238 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
Author: Chen Li ; Xian Qian ; Yang Liu
Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
3 0.37697658 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
Author: Young-Bum Kim ; Benjamin Snyder
Abstract: In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet. Adopting a classical Bayesian perspective, we performs posterior inference over hundreds of languages, leveraging knowledge of known languages and alphabets to uncover general linguistic patterns of typologically coherent language clusters. We achieve average accuracy in the unsupervised consonant/vowel prediction task of 99% across 503 languages. We further show that our methodology can be used to predict more fine-grained phonetic distinctions. On a three-way classification task between vowels, nasals, and nonnasal consonants, our model yields unsu- pervised accuracy of 89% across the same set of languages.
4 0.37613928 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
Author: Vasile Rus ; Mihai Lintean ; Rajendra Banjade ; Nobal Niraula ; Dan Stefanescu
Abstract: We present in this paper SEMILAR, the SEMantic simILARity toolkit. SEMILAR implements a number of algorithms for assessing the semantic similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented semantic similarity methods. Furthermore, it offers facilities for manual se-mantic similarity annotation by experts through its component SEMILAT (a SEMantic simILarity Annotation Tool). 1
5 0.37572831 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
Author: Ulle Endriss ; Raquel Fernandez
Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.
6 0.37229705 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
7 0.3713277 118 acl-2013-Development and Analysis of NLP Pipelines in Argo
8 0.36892197 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
9 0.36867142 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
10 0.36862057 250 acl-2013-Models of Translation Competitions
11 0.36855906 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
12 0.36790159 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
13 0.36762637 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
14 0.36725157 297 acl-2013-Recognizing Partial Textual Entailment
15 0.36668947 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations
16 0.36660057 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions
17 0.36588308 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
18 0.36467609 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
19 0.36432934 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
20 0.36414444 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation