acl acl2013 acl2013-183 knowledge-graph by maker-knowledge-mining

183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks


Source: pdf

Author: Markus Gartner ; Gregor Thiele ; Wolfgang Seeker ; Anders Bjorkelund ; Jonas Kuhn

Abstract: We present ICARUS, a versatile graphical search tool to query dependency treebanks. Search results can be inspected both quantitatively and qualitatively by means of frequency lists, tables, or dependency graphs. ICARUS also ships with plugins that enable it to interface with tool chains running either locally or remotely.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de a Jonas Kuhn Abstract We present ICARUS, a versatile graphical search tool to query dependency treebanks. [sent-4, score-0.678]

2 Search results can be inspected both quantitatively and qualitatively by means of frequency lists, tables, or dependency graphs. [sent-5, score-0.223]

3 ICARUS also ships with plugins that enable it to interface with tool chains running either locally or remotely. [sent-6, score-0.226]

4 1 Introduction In this paper we present ICARUS1 a search and visualization tool that primarily targets dependency syntax. [sent-7, score-0.47]

5 The tool has been designed such that it requires minimal effort to get started with searching a treebank or system output of an automatic dependency parser, while still allowing for flexible queries. [sent-8, score-0.313]

6 It enables the user to search dependency treebanks given a variety of constraints, including searching for particular subtrees. [sent-9, score-0.576]

7 Emphasis has been placed on a functionality that makes it possible for the user to switch back and forth between a high-level, aggregated view of the search results and browsing of particular corpus instances, with an intuitive visualization of the way in which it matches the query. [sent-10, score-0.589]

8 Search queries in ICARUS can be constructed either in a graphical or a text-based manner. [sent-12, score-0.136]

9 Building queries graphically removes the overhead of learning a specialized query language and thus makes the tool more accessible for a wider audience. [sent-13, score-0.627]

10 ICARUS provides a very intuitive way of breaking down the search results in terms of frequency statistics (such as the distribution of partof-speech on one child of a particular verb against the lemma of another child). [sent-14, score-0.47]

11 The dimensions for 1Interactive platform for Corpus Analysis and Research tools, University of Stuttgart the frequency break-down are simply specified by using grouping operators in the query. [sent-15, score-0.243]

12 The frequency tables are filled and updated in real time as the search proceeds through the corpus allowing for a quick detection of misassumptions in the query. [sent-16, score-0.271]

13 ICARUS uses a plugin-based architecture that permits the user to write his own plugins and integrate them into the system. [sent-17, score-0.352]

14 For example, it comes with a plugin that interfaces with an external parser that can be used to parse a sentence from within the user interface. [sent-18, score-0.51]

15 The constraints for the query can then be copy-pasted from the resulting parse visualization. [sent-19, score-0.318]

16 2 ICARUS is written entirely in Java and runs out of the box without requiring any installation of the tool itself or additional libraries. [sent-21, score-0.16]

17 ICARUS interfaces readily with NLP tools provided as web services by CLARIN-D,4 the German incarnation of the European Infrastructure initiative CLARIN. [sent-25, score-0.229]

18 c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 5 –60, The remainder of this paper is structured as follows: In Section 2 we elaborate on the motivation for the tool and discuss related work. [sent-36, score-0.119]

19 Of course, the querying problem is the same no matter whether some target annotation was added manually, as in a treebank, or automatically. [sent-43, score-0.115]

20 Yet, the strategy changes, as the user will try to make sure he catches systematic parsing errors and develops an understanding of how the results he is dealing with come about. [sent-44, score-0.162]

21 Syntactic annotations are quite difficult to query if one is interested in specific constructions that are not directly encoded in the annotation labels (which is the case for most interesting phenomena). [sent-46, score-0.409]

22 However, many of these tools are designed for constituent trees only. [sent-48, score-0.128]

23 A simple tool for visualization of dependency trees is What’s wrong with my NLP? [sent-52, score-0.311]

24 Its querying functionality is however limited to simple string-searching on surface forms. [sent-54, score-0.146]

25 A somewhat more advanced tool is MaltEval (Nilsson and Nivre, 2008), which offers a number of predefined search patterns ranging from part-ofspeech tag to branching degree. [sent-55, score-0.278]

26 On the other hand, powerful tools such as PMLTQ (Pajas and Sˇt eˇp a´nek, 2009) or INESS (Meurer, 2012) offer expressive query languages and can facilitate cross-layer queries (e. [sent-56, score-0.439]

27 In terms of complexity in usage and expressiv- ity, we believe ICARUS constitutes a middle way between highly expressive and very simple visualization tools. [sent-60, score-0.122]

28 It is easy to use, requires no installation, while still having rich query and visualization capabilities. [sent-61, score-0.377]

29 ICARUS is similar to PMLTQ in that it also allows the user to create queries graphically. [sent-62, score-0.259]

30 It is also similar to the search tool GrETEL (Augustinus et al. [sent-63, score-0.278]

31 , 2012) as it interfaces with a parser, allowing the user to create queries starting from an automatic parse. [sent-64, score-0.358]

32 Thus, queries can be created without any prior knowledge of the treebank annotation scheme. [sent-65, score-0.156]

33 As for searching constituent treebanks, there is a plethora of existing search tools, such as TGrep2 (Rohde, 2001), TigerSearch (Lezius, 2002), MonaSearch (Maryns, 2009), and Fangorn (Ghodke and Bird, 2012), among others. [sent-66, score-0.233]

34 They implement different query languages with varying efficiency and expressiveness. [sent-67, score-0.255]

35 Assume that a user is interested in passive constructions in English, but does not know exactly how this is annotated in a treebank. [sent-69, score-0.391]

36 As a first step, he can use a provided plugin that interfaces with a tool chain5 to parse a sentence that contains a passive construction (thus adopting the examplebased querying approach laid out in the introduc5using mate-tools by Bohnet (2010); available at http : / / code . [sent-70, score-0.624]

37 In the lower field, the user entered the sentence. [sent-74, score-0.162]

38 In the second step, the user can then mark parts of the output graph by selecting some nodes and edges, and have ICARUS construct a query structure from it, following the drag-and-drop scheme users are familiar with from typical office software. [sent-78, score-0.524]

39 The automatically built query can be manually adjusted by the user (relaxing constraints) and then be used to search for similar structures in a treebank. [sent-79, score-0.576]

40 The parsing step can of course be skipped altogether, and a query can be constructed by hand right away. [sent-80, score-0.282]

41 Figure 2 shows the query builder, where the user can define or edit search graphs graphically in the main window, or enter them as a query string in the lower window. [sent-81, score-0.959]

42 For the example, Figure 3 shows the query as it is automatically constructed by ICARUS from the partial parse tree (3a), and what it might look like after the user has changed it (3b). [sent-83, score-0.445]

43 The modified query matches passive constructions in English, as annotated in the CoNLL 2008 Shared Task data set (Surdeanu et al. [sent-84, score-0.481]

44 (a) automatically extracted (b) manually edited Figure 3: Search graphs for finding passive constructions. [sent-86, score-0.137]

45 Note that the query (Figure 3b) contains a < * >-expression. [sent-89, score-0.255]

46 This grouping operator groups the results according to the specified dimension, in this case by the lemma of the passivized verb. [sent-90, score-0.395]

47 Clicking on the lemma displays the list of matches containing that particular lemma on the right side. [sent-93, score-0.465]

48 Note that the instantiation ofthe query constraints is highlighted in the tree display. [sent-95, score-0.326]

49 Figure 4: Passive constructions in the treebank grouped by lemma and sorted by frequency. [sent-96, score-0.418]

50 The query could be further refined to restrict it to passives with an overt logical subject, using a more complex search graph for the by-phrase and a second instance of the grouping operator. [sent-97, score-0.774]

51 The results will then also be grouped by the lemma of the logical subject, and are therefore presented as a two-dimensional table. [sent-98, score-0.325]

52 Figure 5 shows the new query and the resulting view. [sent-99, score-0.255]

53 The user is presented with a frequency table, where each cell contains the number of hits for this particular combination of verb lemma and logical subject. [sent-100, score-0.538]

54 Clicking on the cell opens up a view similar to the right part of Figure 4 where the user can then again browse the actual trees. [sent-101, score-0.323]

55 57 Figure 5: Search graph and result view for passive constructions with overt logical subjects, grouped by lemma of the verb and the lemma of the logical subject. [sent-102, score-0.963]

56 Figure 6 shows a further refined query for passives with an overt logical subject and an object. [sent-104, score-0.485]

57 In the results, the user is presented with a list of values for the first grouping operator to the left. [sent-105, score-0.353]

58 Clicking on one item in that list opens up a table on the right presenting the other two dimensions of the query. [sent-106, score-0.115]

59 Figure 6: Search graph and result view for passive constructions with an overt logical subject and an object, grouped by lemma of the verb, the logical subject, and the object. [sent-107, score-0.795]

60 This example demonstrates a typical use case for a user that is interested in certain linguistic constructions in his corpus. [sent-108, score-0.284]

61 Creating the search graph and interpreting the results does not require any specialized knowledge other than familiarity with the annotation of the corpus being searched. [sent-109, score-0.275]

62 It especially does not require any programming skills, and the possibility to graphically build a query obviates the need to learn a specialized query language. [sent-110, score-0.666]

63 A main component is the search engine, which enables the user to quickly search treebanks for whatever he is interested in. [sent-112, score-0.674]

64 Currently, ICARUS can read the commonly used CoNLL dependency formats, and it is easy to write extensions in order to add additional formats. [sent-114, score-0.167]

65 1 Search Engine and Query Builder ICARUS has a tree-based search engine for treebanks, and includes a graphical query builder. [sent-116, score-0.582]

66 Structure and appearance of search graphs are similar to the design used for displaying dependency trees (cf. [sent-117, score-0.259]

67 Defining a query graphically basically amounts to drawing a partial graph structure that defines the type of structure that the user is interested in. [sent-120, score-0.615]

68 In practice, this is done by creating nodes in the query builder and connecting them by edges. [sent-121, score-0.411]

69 The nodes correspond to words in the dependency trees of the treebank. [sent-122, score-0.119]

70 can be specified for each node in the search graph in order to restrict the query. [sent-124, score-0.253]

71 The search engine supports regular expressions for all string-properties (form, lemma, part of speech, relation). [sent-128, score-0.288]

72 As an alternative to the search graph, the user can also specify the query in a text-based format by constructing a comma separated collection of constraints in the form of key=value pairs for a single node contained within square brackets. [sent-130, score-0.647]

73 A central feature of the query language is the grouping operator (<*>), which will match any value and cause the search engine to group result entries by the actual instance of the property declared to be grouped. [sent-136, score-0.704]

74 The results of the search will then be visualized as a list of instances together with their respective frequencies. [sent-137, score-0.189]

75 Depending on the number of grouping operators used (up to a maximum of three) the result is structured as a list of frequencies (cf. [sent-139, score-0.161]

76 Figure 5), or a list where each item then opens up a table of frequency results (cf. [sent-141, score-0.129]

77 In the search graph and the result view, different colors are used to distinguish between different grouping operators. [sent-143, score-0.319]

78 The ICARUS search engine offers three different search modes: Sentence-based. [sent-144, score-0.447]

79 Sentence based search stops at the first successful hit in a sentence and returns every sentence on a list of results at most once. [sent-145, score-0.229]

80 The exhaustive sentence-based search mode extends the sentence based search by the possibility of processing multiple hits within a single sentence. [sent-147, score-0.367]

81 In the result view, the user can then browse the different hits found in one sentence. [sent-149, score-0.282]

82 When a query is issued, the search results are displayed on the fly as the search engine is processing the treebank. [sent-152, score-0.702]

83 The sentences can be rendered in one of two ways: either as a tree, where nodes are arranged vertically by depth in the tree, or horizontally with all the nodes arranged sideby-side. [sent-153, score-0.164]

84 net / work for defining plugins similarly to the engine used by the popular Eclipse IDE project. [sent-158, score-0.236]

85 The plugin-based architecture makes it possible for anybody to write extensions to ICARUS that are specialized for a particular task. [sent-159, score-0.197]

86 The plugin system facilitates custom extensions that make it possible to intercept certain stages of an ongoing search process and interact with it. [sent-161, score-0.498]

87 This makes it possible for external tools to preprocess search data and apply additional annotations and/or filtering, or even make use of existing indices by using search constraints to limit the amount of data passed to the search engine. [sent-162, score-0.687]

88 ICARUS comes with a dedicated plugin that enables access to web services provided by CLARIN-D. [sent-164, score-0.302]

89 The project aims to provide tools and services for language-centered research in the humanities and social sciences. [sent-165, score-0.209]

90 , mate-tools, where the tool chain is executed locally, the user can define a tool chain by chaining several web services (e. [sent-168, score-0.475]

91 As new NLP tools are added as CLARIN-D web services they can be immediately employed by ICARUS. [sent-175, score-0.162]

92 5 Upcoming Extensions An upcoming release includes the following extensions: • Currently, treebanks are assumed to fit into tChuer executing computer’s ummaeind memory. [sent-176, score-0.154]

93 The new implementation will support asynchronous loading of data, with notifications passed to the query engine or a plugin when required data is available. [sent-177, score-0.601]

94 • 6 The search engine is being extended with an operator cthha etn aglilnoew iss disjunctions eofd queries. [sent-179, score-0.347]

95 This will enable the user to aggregate frequency output over multiple queries. [sent-180, score-0.207]

96 Conclusion We have presented ICARUS, a versatile and userfriendly search and visualization tool for dependency trees. [sent-181, score-0.506]

97 It lets the user create queries graphically and returns results (1) quantitatively by means of frequency lists and tables as well as (2) qualitatively by connecting the statistics to the matching sentences and allowing the user to browse them graphically. [sent-185, score-0.825]

98 Its pluginbased architecture enables it to interface for example with external processing pipelines, which lets the user apply processing tools directly from the user interface. [sent-186, score-0.566]

99 In the future, specialized plugins are planned to work with different linguistic annotations, e. [sent-187, score-0.165]

100 Additionally, a plugin is intended that interfaces the search engine with a database. [sent-190, score-0.543]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('icarus', 0.591), ('query', 0.255), ('lemma', 0.198), ('plugin', 0.188), ('user', 0.162), ('search', 0.159), ('engine', 0.129), ('visualization', 0.122), ('tool', 0.119), ('querying', 0.115), ('treebanks', 0.113), ('builder', 0.107), ('plugins', 0.107), ('passive', 0.107), ('grouping', 0.102), ('rel', 0.102), ('graphically', 0.098), ('queries', 0.097), ('tools', 0.087), ('logical', 0.084), ('constructions', 0.08), ('services', 0.075), ('overt', 0.075), ('browse', 0.071), ('dependency', 0.07), ('interfaces', 0.067), ('vbn', 0.062), ('treebank', 0.059), ('operator', 0.059), ('specialized', 0.058), ('graph', 0.058), ('extensions', 0.056), ('vc', 0.054), ('opens', 0.054), ('augustinus', 0.054), ('fangorn', 0.054), ('ghodke', 0.054), ('lgs', 0.054), ('malteval', 0.054), ('monasearch', 0.054), ('pmltq', 0.054), ('clicking', 0.052), ('hits', 0.049), ('nodes', 0.049), ('pmod', 0.047), ('lets', 0.047), ('humanities', 0.047), ('frequency', 0.045), ('haji', 0.044), ('pajas', 0.044), ('grouped', 0.043), ('architecture', 0.042), ('interested', 0.042), ('constituent', 0.041), ('write', 0.041), ('passives', 0.041), ('installation', 0.041), ('upcoming', 0.041), ('intuitive', 0.04), ('hit', 0.04), ('graphical', 0.039), ('boguslavsky', 0.039), ('heid', 0.039), ('enables', 0.039), ('matches', 0.039), ('java', 0.038), ('parser', 0.038), ('sorted', 0.038), ('quantitatively', 0.038), ('qualitatively', 0.038), ('conll', 0.037), ('view', 0.036), ('instantiation', 0.036), ('versatile', 0.036), ('custom', 0.036), ('format', 0.036), ('specified', 0.036), ('constraints', 0.035), ('tables', 0.035), ('stuttgart', 0.033), ('arranged', 0.033), ('searching', 0.033), ('wolfgang', 0.032), ('inspected', 0.032), ('annotations', 0.032), ('allowing', 0.032), ('dimensions', 0.031), ('functionality', 0.031), ('graphs', 0.03), ('subject', 0.03), ('nilsson', 0.03), ('ongoing', 0.03), ('list', 0.03), ('operators', 0.029), ('facilitates', 0.029), ('passed', 0.029), ('parse', 0.028), ('breaking', 0.028), ('external', 0.027), ('course', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks

Author: Markus Gartner ; Gregor Thiele ; Wolfgang Seeker ; Anders Bjorkelund ; Jonas Kuhn

Abstract: We present ICARUS, a versatile graphical search tool to query dependency treebanks. Search results can be inspected both quantitatively and qualitatively by means of frequency lists, tables, or dependency graphs. ICARUS also ships with plugins that enable it to interface with tool chains running either locally or remotely.

2 0.11530823 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy

Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.

3 0.11307842 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

Author: Chenguang Wang ; Nan Duan ; Ming Zhou ; Ming Zhang

Abstract: Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on paraphrases of original queries. Ex- periments performed on the large scale query-document data set show that, the search performance can be significantly improved, with +3.28% and +1.14% NDCG gains on dev and test sets respectively.

4 0.11074948 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees

Author: Alan Akbik ; Oresti Konomi ; Michail Melnikov

Abstract: The use ofdeep syntactic information such as typed dependencies has been shown to be very effective in Information Extraction. Despite this potential, the process of manually creating rule-based information extractors that operate on dependency trees is not intuitive for persons without an extensive NLP background. In this system demonstration, we present a tool and a workflow designed to enable initiate users to interactively explore the effect and expressivity of creating Information Extraction rules over dependency trees. We introduce the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROP- MINER tool.

5 0.10576376 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

Author: Joanne Boisson ; Ting-Hui Kao ; Jian-Cheng Wu ; Tzu-Hsi Yen ; Jason S. Chang

Abstract: In this paper, we introduce a Web-scale linguistics search engine, Linggle, that retrieves lexical bundles in response to a given query. The query might contain keywords, wildcards, wild parts of speech (PoS), synonyms, and additional regular expression (RE) operators. In our approach, we incorporate inverted file indexing, PoS information from BNC, and semantic indexing based on Latent Dirichlet Allocation with Google Web 1T. The method involves parsing the query to transforming it into several keyword retrieval commands. Word chunks are retrieved with counts, further filtering the chunks with the query as a RE, and finally displaying the results according to the counts, similarities, and topics. Clusters of synonyms or conceptually related words are also provided. In addition, Linggle provides example sentences from The New York Times on demand. The current implementation of Linggle is the most functionally comprehensive, and is in principle language and dataset independent. We plan to extend Linggle to provide fast and convenient access to a wealth of linguistic information embodied in Web scale datasets including Google Web 1T and Google Books Ngram for many major languages in the world. 1

6 0.10351668 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

7 0.10232674 29 acl-2013-A Visual Analytics System for Cluster Exploration

8 0.095012859 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems

9 0.089788035 290 acl-2013-Question Analysis for Polish Question Answering

10 0.089258827 94 acl-2013-Coordination Structures in Dependency Treebanks

11 0.084824659 270 acl-2013-ParGramBank: The ParGram Parallel Treebank

12 0.08283063 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing

13 0.078591846 300 acl-2013-Reducing Annotation Effort for Quality Estimation via Active Learning

14 0.076538362 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

15 0.07480409 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

16 0.071312882 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP

17 0.070197277 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

18 0.068646625 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing

19 0.065355346 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

20 0.062370446 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.165), (1, -0.006), (2, -0.081), (3, -0.062), (4, -0.018), (5, -0.004), (6, 0.064), (7, -0.091), (8, 0.087), (9, -0.072), (10, -0.069), (11, 0.05), (12, -0.035), (13, 0.071), (14, -0.069), (15, -0.046), (16, -0.007), (17, 0.017), (18, 0.005), (19, -0.022), (20, -0.1), (21, 0.047), (22, -0.09), (23, -0.005), (24, 0.039), (25, -0.072), (26, -0.062), (27, 0.133), (28, -0.049), (29, -0.006), (30, -0.029), (31, 0.029), (32, -0.206), (33, -0.099), (34, 0.08), (35, -0.022), (36, -0.018), (37, -0.021), (38, 0.032), (39, -0.012), (40, -0.025), (41, -0.083), (42, -0.037), (43, -0.046), (44, 0.001), (45, -0.157), (46, 0.034), (47, -0.034), (48, -0.044), (49, -0.008)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96230292 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks

Author: Markus Gartner ; Gregor Thiele ; Wolfgang Seeker ; Anders Bjorkelund ; Jonas Kuhn

Abstract: We present ICARUS, a versatile graphical search tool to query dependency treebanks. Search results can be inspected both quantitatively and qualitatively by means of frequency lists, tables, or dependency graphs. ICARUS also ships with plugins that enable it to interface with tool chains running either locally or remotely.

2 0.59247732 270 acl-2013-ParGramBank: The ParGram Parallel Treebank

Author: Sebastian Sulger ; Miriam Butt ; Tracy Holloway King ; Paul Meurer ; Tibor Laczko ; Gyorgy Rakosi ; Cheikh Bamba Dione ; Helge Dyvik ; Victoria Rosen ; Koenraad De Smedt ; Agnieszka Patejuk ; Ozlem Cetinoglu ; I Wayan Arka ; Meladel Mistica

Abstract: This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (LexicalFunctional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena. The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are avail- able in other treebanks, that represents me ladel .mi st ica@ gmai l com . deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.

3 0.59080935 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

Author: Chenguang Wang ; Nan Duan ; Ming Zhou ; Ming Zhang

Abstract: Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on paraphrases of original queries. Ex- periments performed on the large scale query-document data set show that, the search performance can be significantly improved, with +3.28% and +1.14% NDCG gains on dev and test sets respectively.

4 0.58799386 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

Author: Seid Muhie Yimam ; Iryna Gurevych ; Richard Eckart de Castilho ; Chris Biemann

Abstract: We present WebAnno, a general purpose web-based annotation tool for a wide range of linguistic annotations. WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles. WebAnno uses modern web technology for visualizing and editing annotations in a web browser. It supports arbitrarily large documents, pluggable import/export filters, the curation of annotations across various users, and an interface to farming out annotations to a crowdsourcing platform. Currently WebAnno allows part-ofspeech, named entity, dependency parsing and co-reference chain annotations. The architecture design allows adding additional modes of visualization and editing, when new kinds of annotations are to be supported.

5 0.58167225 271 acl-2013-ParaQuery: Making Sense of Paraphrase Collections

Author: Lili Kotlerman ; Nitin Madnani ; Aoife Cahill

Abstract: Pivoting on bilingual parallel corpora is a popular approach for paraphrase acquisition. Although such pivoted paraphrase collections have been successfully used to improve the performance of several different NLP applications, it is still difficult to get an intrinsic estimate of the quality and coverage of the paraphrases contained in these collections. We present ParaQuery, a tool that helps a user interactively explore and characterize a given pivoted paraphrase collection, analyze its utility for a particular domain, and compare it to other popular lexical similarity resources all within a single interface.

6 0.58096206 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

7 0.57952803 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP

8 0.57832056 94 acl-2013-Coordination Structures in Dependency Treebanks

9 0.56160569 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics

10 0.55102044 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees

11 0.54597718 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

12 0.54316705 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

13 0.53729588 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

14 0.53649342 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

15 0.53523391 29 acl-2013-A Visual Analytics System for Cluster Exploration

16 0.51375324 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections

17 0.49805075 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

18 0.49487981 279 acl-2013-PhonMatrix: Visualizing co-occurrence constraints of sounds

19 0.47235182 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)

20 0.44436672 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.083), (6, 0.029), (11, 0.051), (15, 0.012), (24, 0.081), (26, 0.079), (35, 0.09), (36, 0.192), (42, 0.081), (48, 0.025), (61, 0.018), (64, 0.017), (70, 0.042), (88, 0.03), (90, 0.022), (95, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94063741 381 acl-2013-Variable Bit Quantisation for LSH

Author: Sean Moran ; Victor Lavrenko ; Miles Osborne

Abstract: We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.

same-paper 2 0.84820312 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks

Author: Markus Gartner ; Gregor Thiele ; Wolfgang Seeker ; Anders Bjorkelund ; Jonas Kuhn

Abstract: We present ICARUS, a versatile graphical search tool to query dependency treebanks. Search results can be inspected both quantitatively and qualitatively by means of frequency lists, tables, or dependency graphs. ICARUS also ships with plugins that enable it to interface with tool chains running either locally or remotely.

3 0.84231859 224 acl-2013-Learning to Extract International Relations from Political Context

Author: Brendan O'Connor ; Brandon M. Stewart ; Noah A. Smith

Abstract: We describe a new probabilistic model for extracting events between major political actors from news corpora. Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models) with contextual political information— temporal and dyad dependence—to infer latent event classes. We quantitatively evaluate the model’s performance on political science benchmarks: recovering expert-assigned event class valences, and detecting real-world conflict. We also conduct a small case study based on our model’s inferences. A supplementary appendix, and replication software/data are available online, at: http://brenocon.com/irevents

4 0.70099556 225 acl-2013-Learning to Order Natural Language Texts

Author: Jiwei Tan ; Xiaojun Wan ; Jianguo Xiao

Abstract: Ordering texts is an important task for many NLP applications. Most previous works on summary sentence ordering rely on the contextual information (e.g. adjacent sentences) of each sentence in the source document. In this paper, we investigate a more challenging task of ordering a set of unordered sentences without any contextual information. We introduce a set of features to characterize the order and coherence of natural language texts, and use the learning to rank technique to determine the order of any two sentences. We also propose to use the genetic algorithm to determine the total order of all sentences. Evaluation results on a news corpus show the effectiveness of our proposed method. 1

5 0.69689971 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

Author: Angeliki Lazaridou ; Ivan Titov ; Caroline Sporleder

Abstract: We propose a joint model for unsupervised induction of sentiment, aspect and discourse information and show that by incorporating a notion of latent discourse relations in the model, we improve the prediction accuracy for aspect and sentiment polarity on the sub-sentential level. We deviate from the traditional view of discourse, as we induce types of discourse relations and associated discourse cues relevant to the considered opinion analysis task; consequently, the induced discourse relations play the role of opinion and aspect shifters. The quantitative analysis that we conducted indicated that the integration of a discourse model increased the prediction accuracy results with respect to the discourse-agnostic approach and the qualitative analysis suggests that the induced representations encode a meaningful discourse structure.

6 0.68765777 318 acl-2013-Sentiment Relevance

7 0.68689519 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

8 0.68410665 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

9 0.6838873 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

10 0.68224961 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

11 0.6800496 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

12 0.67999071 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

13 0.67996007 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach

14 0.67957425 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

15 0.67926663 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

16 0.67846334 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

17 0.67732322 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

18 0.67666525 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

19 0.67625463 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

20 0.67542851 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation