acl acl2010 acl2010-259 knowledge-graph by maker-knowledge-mining

259 acl-2010-WebLicht: Web-Based LRT Services for German

Source: pdf

Author: Erhard Hinrichs ; Marie Hinrichs ; Thomas Zastrow

Abstract: This software demonstration presents WebLicht (short for: Web-Based Linguistic Chaining Tool), a webbased service environment for the integration and use of language resources and tools (LRT). WebLicht is being developed as part of the D-SPIN project1. WebLicht is implemented as a web application so that there is no need for users to install any software on their own computers or to concern themselves with the technical details involved in building tool chains. The integrated web services are part of a prototypical infrastructure that was developed to facilitate chaining of LRT services. WebLicht allows the integration and use of distributed web services with standardized APIs. The nature of these open and standardized APIs makes it possible to access the web services from nearly any programming language, shell script or workflow engine (UIMA, Gate etc.) Additionally, an application for integration of additional services is available, allowing anyone to contribute his own web service. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar fürSprachwissenschaft, University of Tübingen firstname. [sent-1, score-0.256]

2 de Abstract This software demonstration presents WebLicht (short for: Web-Based Linguistic Chaining Tool), a webbased service environment for the integration and use of language resources and tools (LRT). [sent-3, score-0.398]

3 WebLicht is being developed as part of the D-SPIN project1. [sent-4, score-0.021]

4 WebLicht is implemented as a web application so that there is no need for users to install any software on their own computers or to concern themselves with the technical details involved in building tool chains. [sent-5, score-0.27]

5 The integrated web services are part of a prototypical infrastructure that was developed to facilitate chaining of LRT services. [sent-6, score-0.57]

6 WebLicht allows the integration and use of distributed web services with standardized APIs. [sent-7, score-0.479]

7 The nature of these open and standardized APIs makes it possible to access the web services from nearly any programming language, shell script or workflow engine (UIMA, Gate etc. [sent-8, score-0.56]

8 ) Additionally, an application for integration of additional services is available, allowing anyone to contribute his own web service. [sent-9, score-0.402]

9 eu for details nizer) and at the Seminar für Sprachwissenschaft/Computerlinguistik at the University of Tübingen (conversion of plain text to D-Spin format, GermaNet, Open Thesaurus synonym service, and Treebank browser). [sent-15, score-0.059]

10 For some of these tasks, more than one web service is available. [sent-17, score-0.292]

11 As a first external partner, the University of Helsinki in Finnland contributed a set of web services to create morphological annotated text corpora in the Finnish language. [sent-18, score-0.372]

12 With the help of the webbased user interface, these individual web services can be combined into a chain of linguistic applications. [sent-19, score-0.532]

13 , 2008), which means that distributed and independent services (Tanenbaum et al, 2002) are combined together to a chain of LRT tools. [sent-22, score-0.344]

14 A centralized database, the repository, stores technical and content-related metadata about each service. [sent-23, score-0.154]

15 c 0120 S1y0s Atesmso Dcieamtio n s ftorart Cio nms,p puatagteiso 2n5a–l2 L9in,guistics this repository, the chaining mechanism as described in section 3 is implemented. [sent-26, score-0.153]

16 The WebLicht user interface encapsulates this chaining mechanism in an AJAX driven web application. [sent-27, score-0.369]

17 Since web applications can be invoked from any browser, downloading and installation of indi- vidual tools on the user's local computer is avoided. [sent-28, score-0.192]

18 But using WebLicht web services is not restricted to the use of the integrated user interface. [sent-29, score-0.458]

19 It is also possible to access the web services from nearly any programming language, shell script or workflow engine (UIMA, Gate etc. [sent-30, score-0.507]

20 An important part of Service Oriented Architectures is ensuring interoperability between the underlying services. [sent-33, score-0.042]

21 Interoperability of web services, as they are implemented in WebLicht, refers to the seamless flow of data between them. [sent-34, score-0.187]

22 To be interoperable, these web services must first agree on protocols defining the interaction between the services (WSDL/SOAP, REST, XMLRPC). [sent-35, score-0.628]

23 They must also use a shared and standardized data exchange format, which is preferably based on widely accepted formats already in use (UTF-8, XML). [sent-36, score-0.207]

24 WebLicht uses the RESTstyle API and its own XML-based data exchange format (Text Corpus Format, TCF). [sent-37, score-0.207]

25 3 The Service Repository Every tool included in WebLicht is registered in a central repository, located in Leipzig. [sent-38, score-0.132]

26 Also realized as a web service, it offers metadata and processing information about each registered tool. [sent-39, score-0.268]

27 For example, the metadata includes information about the creator, name and the adress of the service. [sent-40, score-0.084]

28 The input and output specifications of each web service are required in order to determine which processing chains are possible. [sent-41, score-0.344]

29 Combining the metadata and the processing information, the repository is able to offer functions for the chain building process. [sent-42, score-0.251]

30 sun3etsromfaNet Tübingen's Semantic Annotator A specialized tool for registering new web services in the repository is available. [sent-45, score-0.565]

31 26 4 The WebLicht User Interface Figure 2 shows a screenshot of the WebLicht web interface, developed and hosted in Tübingen. [sent-46, score-0.137]

32 Area 1 shows a list of all WebLicht web services along with a subset of metadata (author, URL, description etc. [sent-47, score-0.456]

33 This list is extracted onthe-fly from a centralized repository located in Leipzig. [sent-49, score-0.145]

34 This means that after registration in the repository, a web service is immediatley available for inclusion in a processing chain. [sent-50, score-0.292]

35 The Language Filter selection box allows the selection of any language for which tools are available in WebLicht (currently, German, Eng- lish, Italian, French, Romanian, Spanish or Finnish). [sent-51, score-0.076]

36 The majority of the presently integrated web services operates on German input. [sent-52, score-0.396]

37 The platform, however, is language-independent and supports LRT resources for any language. [sent-53, score-0.019]

38 Plain text input to the service chain can be specified in one of three ways: a) entered by the user in the Input tab, b) file upload from the user's local harddrive or c) selecting one of the sample texts offered by WebLicht (Area 2). [sent-54, score-0.449]

39 Various format converters can be used to convert uploaded files into the data exchange format (TCF) used by WebLicht. [sent-55, score-0.387]

40 Input file formats accepted by WebLicht currently include plain text, Microsoft Word, RTF and PDF. [sent-56, score-0.217]

41 Figure 3: A Choice of Alternative Services In Area 3, one can assemble the service tool chain and execute it on the input text. [sent-57, score-0.33]

42 The Selected Tools list displays all web services that have already been entered into the web service chain. [sent-58, score-0.715]

43 The list under Next Tool Choices then offers the set of tools that can be entered as next into the chain. [sent-59, score-0.153]

44 This list is generated by inspecting the metadata of the tools which are already in the chain. [sent-60, score-0.16]

45 The chaining mechanism ensures that this list only contains tools, that are a valid next step in the chain. [sent-61, score-0.153]

46 For example, a Part-of-Speech Tagger can only be added to a chain after a tokenizer has been added. [sent-62, score-0.128]

47 The metadata of each tool contains information about the annotations which are required in the input data and which annotations are added by that tool. [sent-63, score-0.226]

48 As Figure 3 shows, the user sometimes has a choice of alternative tools - in the example at hand a wide variety of services are offered as candidates. [sent-64, score-0.418]

49 Figure 3 shows a subset of web service workflows currently available in WebLicht. [sent-65, score-0.362]

50 Notice that these workflows can combine tools from various institutions and are not re- stricted to predefined combinations of tools. [sent-66, score-0.113]

51 This allows users to compare the results of several tool chains and find the best solution for their individual use case. [sent-67, score-0.142]

52 The final result of running the tool chain as well as each individual step can be visualized in a Table View (implemented as a seperate frame, Area 4), or downloaded to the user's local harddrive in WebLicht's own data exchange format TCF. [sent-68, score-0.458]

53 5 The TCF Format The D-SPIN Text Corpus Format TCF (Heid et al, 2010) is used by WebLicht as an internal data Figure 4: A Short Example of a TCF Document, Containing the Plain Text, Tokens and POS Tags and Lemmas exchange format. [sent-69, score-0.059]

54 The TCF format allows the combination of the different linguistic annotations produced by the tool chain. [sent-70, score-0.264]

55 It supports incremental enrichment of linguistic annotations at different levels of analysis in a common XMLbased format (see Figure 4). [sent-71, score-0.193]

56 27 The Text Corpus Format was designed to efficiently enable the seamless flow of data between the individual services of a Service Oriented Architecture. [sent-72, score-0.307]

57 Lexical tokens are identi- fied via token IDs which serve as unique identifiers in different annotation layers. [sent-74, score-0.041]

58 From an organizational point-of-view, tokens can be seen as the central, atomic elements in TCF to which other annotation layers refer. [sent-75, score-0.088]

59 For example, the POS annotations refer to the token IDs in the token annotation layer via the attribute tokID. [sent-76, score-0.086]

60 The annotation layers are rendered in a stand-off annotation format. [sent-77, score-0.11]

61 TCF stores all linguistic annotation layers in one single file. [sent-78, score-0.116]

62 That means that during the chaining process, the file grows (see Figure 5). [sent-79, score-0.157]

63 Each tool is permitted to add an arbitrary number of layers, but it is not allowed to change or delete any existing layer. [sent-80, score-0.09]

64 Within the D-SPIN project, several other XML based data formats were developed beside the TCF format (for example, an encoding for lexicon based data). [sent-81, score-0.238]

65 In order to avoid any confusion of element names between these different formats, namespaces for the different contextual scopes within each format have been introduced. [sent-82, score-0.148]

66 At the end of the chaining process, converter services will convert the textcorpora from the Figure 5: Annotation Layers are Added to the TCF Document by Each Service TCF format into other common and standardized data formats, for example MAF/SynAF or TEI. [sent-83, score-0.621]

67 6 Implementation Details The web services are available in RESTstyle and use the TCF data format for input and output. [sent-84, score-0.52]

68 The concrete implementation can use any combination of programming language and server environment. [sent-85, score-0.019]

69 The repository is a relational database, offering its content also as RESTstyle web services. [sent-86, score-0.248]

70 The user interface is a Rich Internet Application (RIA), using an AJAX driven toolkit. [sent-87, score-0.1]

71 In order to participate in WebLicht by donating additional tools, one must implement the tool as as RESTful web service using the TCF data format. [sent-90, score-0.407]

72 8 Further Work The WebLicht platform in its current form moves the functionality of LRT tools from the users desktop computer into the net (Gray et al, 2005). [sent-92, score-0.107]

73 At this point, the user must download the results of the chaining process and deal with them on his local machine again. [sent-93, score-0.189]

74 In the future, an online workspace has to be implemented so that annotated textcorpora created with WebLicht can also be stored in and retrieved from the net. [sent-94, score-0.057]

75 For that purpose, an integration of the eSciDoc research environment3 into Weblicht is planned. [sent-95, score-0.03]

76 The eSciDoc infrastructure enables sustainable and reliable long-term preservation of primary research and analysis data. [sent-96, score-0.026]

77 These will consist of the most commonly used processing chains and will relieve the user of having to define the chains manually. [sent-98, score-0.166]

78 In the last year, WebLicht has proven to be a realizable and useful service environment for the humanities. [sent-99, score-0.176]

79 9 Scope of the Software Demonstration This demonstration will present the core functionalities of WebLicht as well as related modules and applications. [sent-101, score-0.038]

80 The process of building language-specific processing tool chains will be shown. [sent-102, score-0.142]

81 WebLichts capability of offering only appropriate tools at each step in the chainbuilding process will be demonstrated. [sent-103, score-0.105]

82 org/ 3 28 The selected tool chain can be applied to any arbitrary uploaded text. [sent-110, score-0.186]

83 The resulting annotated text corpus can be downloaded or visualized using an integrated software module. [sent-111, score-0.123]

84 All these functions will be shown live using just a webbrowser during the software demonstration. [sent-112, score-0.044]

85 Demo Preview and Hardware Requirements The call for papers asks submitters of software demonstrations to provide pointers to demo previews and to provide technical details about hardware requirements for the actual demo at the conference. [sent-113, score-0.146]

86 If the software demonstration is accepted, internet access is necessary at the conference, but no special hardware is required. [sent-117, score-0.166]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('weblicht', 0.741), ('tcf', 0.275), ('services', 0.256), ('service', 0.176), ('format', 0.148), ('lrt', 0.13), ('chaining', 0.127), ('web', 0.116), ('repository', 0.103), ('oriented', 0.095), ('tool', 0.09), ('metadata', 0.084), ('tools', 0.076), ('formats', 0.069), ('sprachverarbeitung', 0.068), ('layers', 0.066), ('chain', 0.064), ('tokenizer', 0.064), ('escidoc', 0.063), ('reststyle', 0.063), ('user', 0.062), ('bingen', 0.06), ('hardware', 0.06), ('exchange', 0.059), ('plain', 0.059), ('standardized', 0.053), ('chains', 0.052), ('entered', 0.051), ('seminar', 0.048), ('institut', 0.048), ('hinrichs', 0.048), ('software', 0.044), ('abteilung', 0.042), ('ajax', 0.042), ('akademie', 0.042), ('automatische', 0.042), ('binildas', 0.042), ('brandenburgische', 0.042), ('centralized', 0.042), ('harddrive', 0.042), ('heid', 0.042), ('interoperability', 0.042), ('registered', 0.042), ('tanenbaum', 0.042), ('interface', 0.038), ('demonstration', 0.038), ('finnish', 0.037), ('textcorpora', 0.037), ('leipzig', 0.037), ('workflows', 0.037), ('shell', 0.037), ('german', 0.037), ('webbased', 0.034), ('visualized', 0.034), ('maschinelle', 0.034), ('uima', 0.034), ('universit', 0.033), ('currently', 0.033), ('uploaded', 0.032), ('platform', 0.031), ('workflow', 0.03), ('seamless', 0.03), ('browser', 0.03), ('file', 0.03), ('integration', 0.03), ('offering', 0.029), ('architectures', 0.029), ('romanian', 0.029), ('stores', 0.028), ('stuttgart', 0.028), ('gate', 0.027), ('offers', 0.026), ('mechanism', 0.026), ('annotations', 0.026), ('accepted', 0.026), ('gray', 0.026), ('infrastructure', 0.026), ('xml', 0.026), ('area', 0.025), ('participate', 0.025), ('script', 0.025), ('integrated', 0.024), ('distributed', 0.024), ('offered', 0.024), ('access', 0.024), ('italian', 0.023), ('annotation', 0.022), ('architecture', 0.022), ('developed', 0.021), ('ids', 0.021), ('flow', 0.021), ('demo', 0.021), ('downloaded', 0.021), ('implemented', 0.02), ('conversion', 0.02), ('java', 0.02), ('berlin', 0.02), ('programming', 0.019), ('token', 0.019), ('supports', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 259 acl-2010-WebLicht: Web-Based LRT Services for German

Author: Erhard Hinrichs ; Marie Hinrichs ; Thomas Zastrow

2 0.091879413 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

Author: Steven Abney ; Steven Bird

Abstract: We present a grand challenge to build a corpus that will include all of the world’s languages, in a consistent structure that permits large-scale cross-linguistic processing, enabling the study of universal linguistics. The focal data types, bilingual texts and lexicons, relate each language to one of a set of reference languages. We propose that the ability to train systems to translate into and out of a given language be the yardstick for determining when we have successfully captured a language. We call on the computational linguistics community to begin work on this Universal Corpus, pursuing the many strands of activity described here, as their contribution to the global effort to document the world’s linguistic heritage before more languages fall silent.

3 0.063940376 31 acl-2010-Annotation

Author: Eduard Hovy

Abstract: unkown-abstract

4 0.051388498 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

Author: Nancy Ide ; Collin Baker ; Christiane Fellbaum ; Rebecca Passonneau

Abstract: The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or ported to any of a variety of other formats. MASC includes data from a much wider variety of genres than existing multiply-annotated corpora of English, and the project is committed to a fully open model of distribution, without restriction, for all data and annotations produced or contributed. As such, MASC is the first large-scale, open, communitybased effort to create much needed language resources for NLP. This paper describes the MASC project, its corpus and annotations, and serves as a call for contributions of data and annotations from the language processing community.

5 0.048650742 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images

Author: Yorick Wilks ; Roberta Catizone ; Alexiei Dingli ; Weiwei Cheng

Abstract: This paper describes an initial prototype demonstrator of a Companion, designed as a platform for novel approaches to the following: 1) The use of Information Extraction (IE) techniques to extract the content of incoming dialogue utterances after an Automatic Speech Recognition (ASR) phase, 2) The conversion of the input to Resource Descriptor Format (RDF) to allow the generation of new facts from existing ones, under the control of a Dialogue Manger (DM), that also has access to stored knowledge and to open knowledge accessed in real time from the web, all in RDF form, 3) A DM implemented as a stack and network virtual machine that models mixed initiative in dialogue control, and 4) A tuned dialogue act detector based on corpus evidence. The prototype platform was evaluated, and we describe this briefly; it is also designed to support more extensive forms of emotion detection carried by both speech and lexical content, as well as extended forms of machine learning.

6 0.04446185 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

7 0.043386247 126 acl-2010-GernEdiT - The GermaNet Editing Tool

8 0.038052537 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

9 0.033778522 224 acl-2010-Talking NPCs in a Virtual Game World

10 0.033267938 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

11 0.032240976 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems

12 0.031885296 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

13 0.031835157 204 acl-2010-Recommendation in Internet Forums and Blogs

14 0.031207699 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

15 0.029535435 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems

16 0.028885365 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

17 0.028197728 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

18 0.027988289 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

19 0.026326317 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

20 0.025719656 130 acl-2010-Hard Constraints for Grammatical Function Labelling

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.068), (1, 0.035), (2, -0.025), (3, -0.029), (4, 0.009), (5, -0.046), (6, -0.012), (7, 0.025), (8, -0.009), (9, 0.009), (10, 0.003), (11, 0.021), (12, -0.023), (13, -0.014), (14, -0.012), (15, 0.035), (16, 0.021), (17, -0.001), (18, 0.041), (19, 0.015), (20, -0.037), (21, -0.075), (22, 0.028), (23, -0.034), (24, 0.009), (25, -0.017), (26, 0.047), (27, 0.096), (28, 0.055), (29, -0.066), (30, -0.103), (31, -0.057), (32, 0.002), (33, -0.034), (34, -0.03), (35, -0.143), (36, -0.043), (37, -0.07), (38, 0.062), (39, 0.037), (40, 0.011), (41, 0.074), (42, 0.165), (43, 0.143), (44, 0.02), (45, 0.001), (46, 0.086), (47, -0.015), (48, -0.08), (49, 0.072)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96047825 259 acl-2010-WebLicht: Web-Based LRT Services for German

Author: Erhard Hinrichs ; Marie Hinrichs ; Thomas Zastrow

2 0.62834203 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

Author: Nancy Ide ; Collin Baker ; Christiane Fellbaum ; Rebecca Passonneau

3 0.59705597 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

Author: Steven Abney ; Steven Bird

4 0.52914935 31 acl-2010-Annotation

Author: Eduard Hovy

Abstract: unkown-abstract

5 0.51504928 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images

Author: Yorick Wilks ; Roberta Catizone ; Alexiei Dingli ; Weiwei Cheng

6 0.4402616 224 acl-2010-Talking NPCs in a Virtual Game World

7 0.43772715 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

8 0.4275046 64 acl-2010-Complexity Assumptions in Ontology Verbalisation

9 0.36400118 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

10 0.34989604 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction

11 0.33512259 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text

12 0.3188602 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

13 0.31325069 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

14 0.29789367 126 acl-2010-GernEdiT - The GermaNet Editing Tool

15 0.28662026 204 acl-2010-Recommendation in Internet Forums and Blogs

16 0.27800372 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

17 0.27277449 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

18 0.26875871 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

19 0.26246735 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies

20 0.26134488 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.018), (8, 0.012), (14, 0.013), (16, 0.012), (23, 0.081), (25, 0.047), (39, 0.015), (42, 0.011), (44, 0.017), (59, 0.06), (72, 0.014), (73, 0.381), (78, 0.027), (83, 0.038), (84, 0.038), (97, 0.034), (98, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91565955 259 acl-2010-WebLicht: Web-Based LRT Services for German

Author: Erhard Hinrichs ; Marie Hinrichs ; Thomas Zastrow

2 0.85100877 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

Author: Sindhu Raghavan ; Adriana Kovashka ; Raymond Mooney

Abstract: In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.

3 0.83841097 68 acl-2010-Conditional Random Fields for Word Hyphenation

Author: Nikolaos Trogkanis ; Charles Elkan

Abstract: Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random fields. We create new training sets for English and Dutch from the CELEX European lexical resource, and achieve error rates for English of less than 0.1% for correctly allowed hyphens, and less than 0.01% for Dutch. Experiments show that both the Knuth/Liang method and a leading current commercial alternative have error rates several times higher for both languages.

4 0.8333627 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures

Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta

Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.

5 0.82039899 141 acl-2010-Identifying Text Polarity Using Random Walks

Author: Ahmed Hassan ; Dragomir Radev

Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.

6 0.78507465 238 acl-2010-Towards Open-Domain Semantic Role Labeling

7 0.76861459 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

8 0.56668222 121 acl-2010-Generating Entailment Rules from FrameNet

9 0.55570948 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

10 0.53042644 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images

11 0.52231979 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

12 0.52137929 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

13 0.5199765 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

14 0.51952463 175 acl-2010-Models of Metaphor in NLP

15 0.5173651 204 acl-2010-Recommendation in Internet Forums and Blogs

16 0.51538599 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

17 0.50458926 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

18 0.49721128 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

19 0.49712062 158 acl-2010-Latent Variable Models of Selectional Preference

20 0.49707389 85 acl-2010-Detecting Experiences from Weblogs