acl acl2011 acl2011-26 knowledge-graph by maker-knowledge-mining

26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search


Source: pdf

Author: Andrei Popescu-Belis ; Majid Yazdani ; Alexandre Nanchen ; Philip N. Garner

Abstract: The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition. Results are displayed in real time to meeting participants, or to users watching a recorded lecture or conversation.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ch i Abstract The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. [sent-3, score-0.436]

2 The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition. [sent-4, score-0.365]

3 Results are displayed in real time to meeting participants, or to users watching a recorded lecture or conversation. [sent-5, score-0.061]

4 1 Introduction Enriching a monologue or a conversation with related content, such as textual or audio-visual documents on the same topic, is a task with multiple applications in the field of computer-mediated human- human communication. [sent-6, score-0.183]

5 In this paper, we describe the Automatic Content Linking Device (ACLD), a system that analyzes spoken input from one or more speakers using automatic speech recognition (ASR), in order to retrieve related content, in real-time, from a variety of repositories. [sent-7, score-0.035]

6 These include local document databases or archives of multimedia recordings, as well as websites. [sent-8, score-0.151]

7 Local repositories are queried using a keyword-based search engine, or using a semantic similarity measure, while websites are queried using commercial search engines. [sent-9, score-0.443]

8 We will first describe the scenarios of use of the ACLD in Section 2, and review previous systems for 80 just-in-time retrieval in Section 3. [sent-10, score-0.104]

9 finding useful documents without the need for a user to initiate a direct search for them, is one of the ways in which the large quantity of knowledge that is available in networked environments can be efficiently put to use. [sent-19, score-0.282]

10 To perform this task, a system must consider explicit and implicit input from users, mainly speech or typed input, and attempt to model their context, in order to provide recommendations, which users are free to consult if they feel the need for additional information. [sent-20, score-0.135]

11 One of the main scenarios of use for the ACLD involves people taking part in meetings, who often mention documents containing facts under discussion, but do not have the time to search for them without interrupting the discussion flow. [sent-21, score-0.23]

12 Moreover, as the ACLD was developed on meetings from the AMI Corpus, it can also perform the same operations on a replayed meeting, as a complement to a meeting browser, for development or demonstration purposes. [sent-23, score-0.139]

13 In a second scenario, content linking is performed over live or recorded lectures, for instance in a computer-assisted learning environment for individual students. [sent-24, score-0.174]

14 The ACLD enriches the lectures with related material drawn from various repositories, through a search process that can be guided in real PortlandP, Ororce geodnin,g UsS oAf, t 2h1e J AuCnLe-2H 0L1T1. [sent-25, score-0.221]

15 The advantage of real-time content linking over a more static enrichment, such as the Feynman lectures at Microsoft Research,1 is that users can tune search parameters at will while viewing the lecture. [sent-28, score-0.418]

16 3 Just-in-Time Retrieval Systems The first precursors to the ACLD were the Fixit query-free search system (Hart and Graham, 1997), the Remembrance Agent for just-in-time retrieval (Rhodes and Maes, 2000), and the Implicit Queries (IQ) system (Dumais et al. [sent-29, score-0.189]

17 Fixit monitored the state of a user’s interaction with a diagnostic system, and excerpts from maintenance manuals depending on the interaction state. [sent-31, score-0.106]

18 A version of the Remembrance Agent called Jimminy was conceived as a wearable assistant for taking notes, but ASR was only simulated for evaluation (Rhodes, 1997). [sent-34, score-0.131]

19 The Watson system (Budzik and Hammond, 2000) monitored the user’s operations in a text editor, but proposed a more complex mechanism than the Remembrance Agent for selecting terms for queries, which were directed to a web search engine. [sent-35, score-0.236]

20 Another assistant for an authoring environment was developed in the A-Propos project (Puerta Melguizo et al. [sent-36, score-0.128]

21 A query-free system was designed for enriching television news with articles from the Web (Henziker et al. [sent-38, score-0.054]

22 , 2006), which provides multi-modal access to recordings of lectures via a table top interface, bears many similarities to the ACLD. [sent-41, score-0.14]

23 However, it requires the use of specific voice commands by one user only, and does not spontaneously follow a conversation. [sent-42, score-0.129]

24 More recently, several speech-based search engines have become available, including as smart phone applications. [sent-43, score-0.121]

25 Conversely, many systems allow searching of spoken document archives. [sent-44, score-0.127]

26 , 2004) and a personal assistant using dual-purpose speech (Lyons et al. [sent-53, score-0.09]

27 , 2004), which enable users to search for information using commands that are identified in the speech flow. [sent-54, score-0.258]

28 The ACLD improves over numerous past ones by giving access to indexed multimedia recordings as well as websites, with fully operational ASR and semantic search, as we now explain. [sent-55, score-0.259]

29 4 Description of the ACLD The architecture of the ACLD comprises the following functions: document preparation, text extraction and indexing; input sensing and query preparation; search and integration of results; user interface to display the results. [sent-56, score-0.392]

30 1 Document Preparation and Indexing The preparation of the local database of documents for content linking involves mainly the extraction of text, and then the indexing of the documents, which is done using Apache Lucene software. [sent-58, score-0.321]

31 The document repository is generally prepared before using the ACLD, but users can also add files at will. [sent-60, score-0.142]

32 Because past discussions are relevant to subsequent ones, they are passed through offline ASR and then chunked into smaller units (e. [sent-61, score-0.049]

33 The ACLD uses external search engines to search in external repositories, for instance the Google Web search API or the Google Desktop application to search the user’s local drives. [sent-65, score-0.484]

34 2 Sensing the User’s Information Needs We believe that the most useful cues about the information needs of participants in a conversation, or of people viewing a lecture, are the words that are spoken during the conversation or the lecture. [sent-67, score-0.098]

35 One of its main features is the use of a pre-compiled grammar, which allows it to retain accuracy even when running in realtime on a low resource machine. [sent-70, score-0.041]

36 Of course, when content linking is done over past meetings, or for text extraction from past recordings, the ASR system runs slower than real-time to maximize accuracy of recognition. [sent-71, score-0.238]

37 These values indicate that enough correct words are sensed by the real-time ASR to make it applicable to the ACLD, and that a robust search mechanism could help avoiding retrieval errors due to spurious words. [sent-75, score-0.221]

38 The words obtained from the ASR are filtered for stopwords, so that only content words are used for search; our list has about 80 words. [sent-76, score-0.046]

39 A list of pre-specified keywords can be defined based on such knowledge and can be modified while running the ACLD. [sent-78, score-0.048]

40 3 Querying the Document Database The Query Aggregator (QA) uses the ASR words to retrieve the most relevant documents from one or more databases. [sent-82, score-0.073]

41 The current version of the ACLD makes use of semantic search (see next subsection), while previous versions used word-based search from Apache Lucene for local documents, or from the Google Web or Google Desktop APIs. [sent-83, score-0.305]

42 ASR words from the latest time frame are put together (minus the stopwords) to form queries, and recognized keywords are boosted in the Lucene query. [sent-84, score-0.048]

43 This duration is a compromise between the need to gather enough words for search, and the need to refresh the search results reasonably often. [sent-86, score-0.121]

44 82 The results are integrated with those from the previous time frame, using a persistence model to smooth variations over time. [sent-87, score-0.051]

45 The model keeps track of the salience of each result, initialized from their ranking among the search results, then decreasing in time unless the document is again retrieved. [sent-88, score-0.202]

46 The rate of decrease (or its inverse, persistence) can be tuned by the user, but in any case, all past results are saved by the user interface and can be consulted at any time. [sent-89, score-0.233]

47 4 Semantic Search over Wikipedia The goal of our method for semantic search is to improve the relevance of the retrieved documents, and to make the mechanism more robust to noise from the ASR. [sent-91, score-0.274]

48 We have applied to document retrieval the graph-based model of semantic relatedness that we recently developed (Yazdani and Popescu-Belis, 2010), which is also related to other proposals (Strube and Ponzetto, 2006; Gabrilovich and Markovitch, 2007; Yeh et al. [sent-92, score-0.297]

49 The model is grounded in a measure of semantic relatedness between text fragments, which is computed using random walk over the network of Wikipedia articles about 1. [sent-94, score-0.246]

50 2 million articles from the WEX data set (Metaweb Technologies, 2010). [sent-95, score-0.054]

51 The articles are linked through hyperlinks, and also through lexical similarity links that are constructed upon initialization. [sent-96, score-0.054]

52 The random walk model allows the computation of a visiting probability (VP) from one article to another, and then a VP between sets of articles, which has been shown to function as a measure of semantic relatedness, and has been applied to various NLP problems. [sent-97, score-0.107]

53 To compute relatedness between two text fragments, these are first projected represented into the network by the ten closest articles in terms of lexical similarity. [sent-98, score-0.139]

54 For the ACLD, the use of semantic relatedness for document retrieval amounts to searching, in a very large collection, the documents that are the most closely related to the words from the ASR in a given timeframe. [sent-99, score-0.37]

55 Here, the document collection is (again) the set ofWikipedia articles from WEX, and the goal is to return the eight most related articles. [sent-100, score-0.135]

56 Such a search is hard to perform in real time; hence, the so– lution that was found makes use of several approximations to compute average VP between the ASR fragment and all articles in the Wikipedia network. [sent-101, score-0.175]

57 Hovering the mouse over a result (here, the most relevant one) displays a pop-up window with more information about it. [sent-103, score-0.036]

58 5 The User Interface (UI) The main goal of the UI is to make available all information produced by the system, in a configurable way, allowing users to see a larger or smaller amount of information according to their needs. [sent-105, score-0.061]

59 A modular architecture with a flexible layout has been implemented, maximizing the accessibility but also the understandability of the results, and displaying also intermediary data such as ASR words and found keywords. [sent-106, score-0.068]

60 The UI displays up to five widgets, which can be arranged at will: 1. [sent-107, score-0.036]

61 Names of documents and past meeting snippets found by the QA. [sent-112, score-0.238]

62 Two main arrangements are intended, though many others are possible: an informative full-screen UI, shown in Figure 2 with widgets 1–4; and an unobtrusive widget UI, with superposed tabs, shown in Figure 1 with widget 3. [sent-117, score-0.304]

63 The document names displayed in widgets 3–5 function as hyperlinks to the documents, launching appropriate external viewers when the user clicks on them. [sent-118, score-0.352]

64 Moreover, when hovering over a document name, a pop-up window displays metadata and document excerpts that match words from the query, as an explanation of why the document was retrieved. [sent-119, score-0.379]

65 83 5 Evaluation Experiments Four types of evidence for the relevance and utility of the ACLD are summarized in this section. [sent-120, score-0.058]

66 1 Feedback from Potential Users The ACLD was demonstrated to about 50 potential users (industrial partners, focus groups, etc. [sent-122, score-0.061]

67 the importance of matching context, linking on demand, and the UI unobtrusive mode. [sent-127, score-0.195]

68 2 Pilot Task-based Experiments A pilot experiment was conducted by a team at the University of Edinburgh with an earlier version of the unobtrusive UI. [sent-129, score-0.139]

69 Four subjects had to complete a task that was started in previous meetings (ES2008a- b-c from the AMI Corpus). [sent-130, score-0.192]

70 Two pilot runs have shown that the ACLD was being consulted about five times per meeting. [sent-133, score-0.077]

71 3 Usability Evaluation of the UI The UI was submitted to a usability evaluation experiment with nine non-technical subjects. [sent-136, score-0.104]

72 The subjects used the ACLD over a replayed meeting recording, and were asked to perform several tasks with it, such as adding a keyword to monitor, searching for a word, or changing the layout. [sent-137, score-0.201]

73 The subjects then rated usability-related statements, leading to an assessment on the System Usability Scale (Brooke, 1996). [sent-138, score-0.104]

74 The overall usability score was 68% (SD: 10), which is considered as ‘acceptable usability’ for the SUS. [sent-139, score-0.104]

75 In free-form feedback, subjects found the system helpful to review meetings but also lec- tures, appreciated the availability of documents, but also noted that search results (with keyword-based Figure 2: Full screen UI with four widgets: ASR, keywords, document and website results. [sent-141, score-0.394]

76 Semantic Search We compared the output of semantic search with that of keyword-based search. [sent-146, score-0.184]

77 The ASR transcript of one AMI meeting (ES2008d) was passed to both search methods, and ‘evaluation snippets’ containing the manual transcript for one-minute excerpts, accompanied by the 8-best Wikipedia articles found by each method were produced. [sent-147, score-0.305]

78 The manual transcript shown to subjects was enriched with punctuation and speakers’ names, and the names of the Wikipedia pages were placed on each side of the transcript frame. [sent-149, score-0.281]

79 Subjects were then asked to read each snippet, and decide which of the two document sets was the most relevant to the discussion taking place, i. [sent-150, score-0.081]

80 They could also answer ‘none’ , and could consult the result if necessary. [sent-153, score-0.039]

81 Results were obtained from 8 subjects, each see84 ing 9 snippets out of 36. [sent-154, score-0.116]

82 The subjects agreed on 23 (64%) snippets and disagreed on 13 (36%). [sent-156, score-0.22]

83 Over the 23 snippets on which subjects agreed, the result of semantic search was judged more relevant than that of keyword search for 19 snippets (53% of the total), and the reverse for 4 snippets only (11%). [sent-158, score-0.757]

84 Alternatively, if one counts the votes cast by subjects in favor of each system, regardless of agreement, then semantic search received 72% of the votes and keyword-based only 28%. [sent-159, score-0.288]

85 These numbers show that semantic search quite clearly improves relevance in comparison to keyword-based one, but there is still room for improvement. [sent-160, score-0.242]

86 6 Conclusion The ACLD is, to the best of our knowledge, the first just-in-time retrieval system to use spontaneous speech and to support access to multimedia documents and web pages, using a robust semantic search method. [sent-161, score-0.462]

87 Future work will aim at improving the relevance of semantic search, at modeling context to improve timing of results, and at inferring relevance feedback from users. [sent-162, score-0.215]

88 The ACLD should also be applied to specific use cases, and an experiment with group work in a learning environment is under way. [sent-163, score-0.034]

89 Speech Spotter: On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. [sent-198, score-0.1]

90 A random walk framework to compute textual semantic similarity: A unified model for three benchmark tasks. [sent-247, score-0.107]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('acld', 0.632), ('asr', 0.321), ('ui', 0.14), ('remembrance', 0.126), ('ami', 0.123), ('search', 0.121), ('snippets', 0.116), ('usability', 0.104), ('subjects', 0.104), ('unobtrusive', 0.101), ('widgets', 0.101), ('linking', 0.094), ('user', 0.088), ('meetings', 0.088), ('relatedness', 0.085), ('document', 0.081), ('recordings', 0.077), ('garner', 0.076), ('iq', 0.076), ('rhodes', 0.076), ('wearable', 0.076), ('yazdani', 0.076), ('documents', 0.073), ('multimedia', 0.07), ('preparation', 0.07), ('repositories', 0.07), ('retrieval', 0.068), ('agent', 0.068), ('desktop', 0.067), ('transcript', 0.065), ('conversation', 0.065), ('lectures', 0.063), ('semantic', 0.063), ('wex', 0.062), ('users', 0.061), ('relevance', 0.058), ('lucene', 0.058), ('wikipedia', 0.058), ('interface', 0.057), ('excerpts', 0.055), ('assistant', 0.055), ('articles', 0.054), ('budzik', 0.051), ('fame', 0.051), ('fixit', 0.051), ('henziker', 0.051), ('melguizo', 0.051), ('metze', 0.051), ('monitored', 0.051), ('persistence', 0.051), ('puerta', 0.051), ('replayed', 0.051), ('spotter', 0.051), ('widget', 0.051), ('past', 0.049), ('keywords', 0.048), ('names', 0.047), ('content', 0.046), ('searching', 0.046), ('bradley', 0.045), ('goto', 0.045), ('hovering', 0.045), ('majid', 0.045), ('metaweb', 0.045), ('monologue', 0.045), ('sensing', 0.045), ('queries', 0.044), ('walk', 0.044), ('stopwords', 0.042), ('commands', 0.041), ('lyons', 0.041), ('realtime', 0.041), ('google', 0.04), ('consult', 0.039), ('andrei', 0.039), ('authoring', 0.039), ('consulted', 0.039), ('indexing', 0.038), ('pilot', 0.038), ('apache', 0.037), ('enriches', 0.037), ('gabrilovich', 0.037), ('yeh', 0.037), ('feedback', 0.036), ('scenarios', 0.036), ('displays', 0.036), ('speech', 0.035), ('hyperlinks', 0.035), ('device', 0.035), ('hart', 0.035), ('layout', 0.035), ('minus', 0.035), ('strube', 0.035), ('environment', 0.034), ('queried', 0.034), ('displaying', 0.033), ('snippet', 0.033), ('viewing', 0.033), ('web', 0.032), ('mechanism', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

Author: Andrei Popescu-Belis ; Majid Yazdani ; Alexandre Nanchen ; Philip N. Garner

Abstract: The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition. Results are displayed in real time to meeting participants, or to users watching a recorded lecture or conversation.

2 0.14039846 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System

Author: Wei-Bin Liang ; Chung-Hsien Wu ; Chia-Ping Chen

Abstract: In this study, a novel approach to robust dialogue act detection for error-prone speech recognition in a spoken dialogue system is proposed. First, partial sentence trees are proposed to represent a speech recognition output sentence. Semantic information and the derivation rules of the partial sentence trees are extracted and used to model the relationship between the dialogue acts and the derivation rules. The constructed model is then used to generate a semantic score for dialogue act detection given an input speech utterance. The proposed approach is implemented and evaluated in a Mandarin spoken dialogue system for tour-guiding service. Combined with scores derived from the ASR recognition probability and the dialogue history, the proposed approach achieves 84.3% detection accuracy, an absolute improvement of 34.7% over the baseline of the semantic slot-based method with 49.6% detection accuracy.

3 0.12184155 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns

Author: Je Hun Jeon ; Wen Wang ; Yang Liu

Abstract: In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount ofdata and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.

4 0.11986626 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

Author: Gunter Neumann ; Sven Schmeier

Abstract: We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a specific empirical collocation extraction task where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. An initial user evaluation shows that this system is especially helpful for finding new interesting information on topics about which the user has only a vague idea or even no idea at all.

5 0.093331628 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

Author: Bo Pang ; Ravi Kumar

Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —

6 0.085170515 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech

7 0.083648309 82 acl-2011-Content Models with Attitude

8 0.07521449 177 acl-2011-Interactive Group Suggesting for Twitter

9 0.072865643 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

10 0.069563709 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

11 0.063255034 167 acl-2011-Improving Dependency Parsing with Semantic Classes

12 0.061703883 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

13 0.057537924 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

14 0.057093069 115 acl-2011-Engkoo: Mining the Web for Language Learning

15 0.056453262 285 acl-2011-Simple supervised document geolocation with geodesic grids

16 0.0563615 182 acl-2011-Joint Annotation of Search Queries

17 0.055656519 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia

18 0.054565717 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

19 0.053247053 258 acl-2011-Ranking Class Labels Using Query Sessions

20 0.05215488 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.141), (1, 0.072), (2, -0.07), (3, 0.067), (4, -0.109), (5, -0.013), (6, -0.045), (7, -0.056), (8, -0.01), (9, -0.012), (10, 0.006), (11, 0.007), (12, 0.006), (13, -0.038), (14, 0.018), (15, -0.009), (16, 0.065), (17, -0.04), (18, -0.001), (19, -0.047), (20, 0.074), (21, 0.012), (22, -0.032), (23, 0.041), (24, 0.042), (25, -0.071), (26, 0.039), (27, -0.022), (28, -0.038), (29, -0.037), (30, -0.029), (31, 0.049), (32, 0.04), (33, -0.036), (34, 0.042), (35, -0.012), (36, 0.025), (37, -0.008), (38, -0.018), (39, 0.028), (40, -0.025), (41, -0.042), (42, 0.007), (43, 0.149), (44, 0.158), (45, -0.011), (46, -0.035), (47, -0.014), (48, 0.063), (49, 0.082)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92714363 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

Author: Andrei Popescu-Belis ; Majid Yazdani ; Alexandre Nanchen ; Philip N. Garner

Abstract: The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition. Results are displayed in real time to meeting participants, or to users watching a recorded lecture or conversation.

2 0.7368722 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis

Author: Daniel Bar ; Nicolai Erbs ; Torsten Zesch ; Iryna Gurevych

Abstract: We present Wikulu1, a system focusing on supporting wiki users with their everyday tasks by means of an intelligent interface. Wikulu is implemented as an extensible architecture which transparently integrates natural language processing (NLP) techniques with wikis. It is designed to be deployed with any wiki platform, and the current prototype integrates a wide range of NLP algorithms such as keyphrase extraction, link discovery, text segmentation, summarization, or text similarity. Additionally, we show how Wikulu can be applied for visually analyzing the results of NLP algorithms, educational purposes, and enabling semantic wikis.

3 0.6374588 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

Author: Gunter Neumann ; Sven Schmeier

Abstract: We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a specific empirical collocation extraction task where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. An initial user evaluation shows that this system is especially helpful for finding new interesting information on topics about which the user has only a vague idea or even no idea at all.

4 0.553056 248 acl-2011-Predicting Clicks in a Vocabulary Learning System

Author: Aaron Michelony

Abstract: We consider the problem of predicting which words a student will click in a vocabulary learning system. Often a language learner will find value in the ability to look up the meaning of an unknown word while reading an electronic document by clicking the word. Highlighting words likely to be unknown to a readeris attractive due to drawing his orher attention to it and indicating that information is available. However, this option is usually done manually in vocabulary systems and online encyclopedias such as Wikipedia. Furthurmore, it is never on a per-user basis. This paper presents an automated way of highlighting words likely to be unknown to the specific user. We present related work in search engine ranking, a description of the study used to collect click data, the experiment we performed using the random forest machine learning algorithm and finish with a discussion of future work.

5 0.55107409 285 acl-2011-Simple supervised document geolocation with geodesic grids

Author: Benjamin Wing ; Jason Baldridge

Abstract: We investigate automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and it is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evidence. All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds. For Wikipedia, our best method obtains a median prediction error of just 11.8 kilometers. Twitter geolocation is more challenging: we obtain a median error of 479 km, an improvement on previous results for the dataset.

6 0.51131451 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

7 0.50689304 337 acl-2011-Wikipedia Revision Toolkit: Efficiently Accessing Wikipedias Edit History

8 0.49794137 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity

9 0.49252963 80 acl-2011-ConsentCanvas: Automatic Texturing for Improved Readability in End-User License Agreements

10 0.48776898 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus

11 0.48265445 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns

12 0.47771293 291 acl-2011-SystemT: A Declarative Information Extraction System

13 0.47351354 125 acl-2011-Exploiting Readymades in Linguistic Creativity: A System Demonstration of the Jigsaw Bard

14 0.46849406 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

15 0.4667863 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

16 0.46074009 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

17 0.4525834 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search

18 0.4477942 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

19 0.44202846 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

20 0.44131792 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.017), (5, 0.047), (9, 0.01), (17, 0.052), (26, 0.048), (31, 0.016), (37, 0.042), (39, 0.037), (41, 0.066), (55, 0.023), (57, 0.012), (59, 0.034), (66, 0.072), (72, 0.037), (88, 0.011), (89, 0.132), (91, 0.026), (96, 0.198), (97, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92573082 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes

Author: Emilia Apostolova ; Noriko Tomuro ; Dina Demner-Fushman

Abstract: Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and their scopes (the BioScope corpus). The system performs on par with state-of-the-art machine learning systems. Additionally, the intuitive and linguistically motivated rules will allow for manual adaptation of the rule set to new domains and corpora. 1 Motivation Information Extraction (IE) systems often face the problem of distinguishing between affirmed, negated, and speculative information in text. For example, sentiment analysis systems need to detect negation for accurate polarity classification. Similarly, medical IE systems need to differentiate between affirmed, negated, and speculated (possible) medical conditions. The importance of the task of negation and speculation (a.k.a. hedge) detection is attested by a number of research initiatives. The creation of the BioScope corpus (Vincze et al., 2008) assisted in the development and evaluation of several negation/hedge scope detection systems. The corpus consists of medical and biological texts annotated for negation, speculation, and their linguistic scope. The 2010 283 Noriko Tomuro Dina Demner-Fushman DePaul University Chicago, IL USA t omuro @ c s . depaul . edu National Library of Medicine Bethesda, MD USA ddemne r@mai l nih . gov . i2b2 NLP Shared Task1 included a track for detection of the assertion status of medical problems (e.g. affirmed, negated, hypothesized, etc.). The CoNLL2010 Shared Task (Farkas et al., 2010) focused on detecting hedges and their scopes in Wikipedia articles and biomedical texts. In this paper, we present a linguistically motivated rule-based system for the detection of negation and speculation scopes that performs on par with state-of-the-art machine learning systems. The rules used by the ScopeFinder system are automatically extracted from the BioScope corpus and encode lexico-syntactic patterns in a user-friendly format. While the system was developed and tested using a biomedical corpus, the rule extraction mechanism is not domain-specific. In addition, the linguistically motivated rule encoding allows for manual adaptation to new domains and corpora. 2 Task Definition Negation/Speculation detection is typically broken down into two sub-tasks - discovering a negation/speculation cue and establishing its scope. The following example from the BioScope corpus shows the annotated hedging cue (in bold) together with its associated scope (surrounded by curly brackets): Finally, we explored the {possible role of 5hydroxyeicosatetraenoic acid as a regulator of arachidonic acid liberation}. Typically, systems first identify negation/speculation cues and subsequently try to identify their associated cue scope. However, the two tasks are interrelated and both require 1https://www.i2b2.org/NLP/Relations/ Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 283–287, syntactic understanding. Consider the following two sentences from the BioScope corpus: 1) By contrast, {D-mib appears to be uniformly expre1ss)e Bdy yin c oimnatrgaisnta,l { dDis-mcsi }b. 2) Differentiation assays using water soluble phorbol esters reveal that differentiation becomes irreversible soon after AP-1 appears. Both sentences contain the word form appears, however in the first sentence the word marks a hedg- ing cue, while in the second sentence the word does not suggest speculation. Unlike previous work, we do not attempt to identify negation/speculation cues independently of their scopes. Instead, we concentrate on scope detection, simultaneously detecting corresponding cues. 3 Dataset We used the BioScope corpus (Vincze et al., 2008) to develop our system and evaluate its performance. To our knowledge, the BioScope corpus is the only publicly available dataset annotated with negation/speculation cues and their scopes. It consists of biomedical papers, abstracts, and clinical reports (corpus statistics are shown in Tables 1 and 2). Corpus Type Sentences Documents Mean Document Size Clinical752019543.85 Full Papers Paper Abstracts 3352 14565 9 1273 372.44 11.44 Table 1: Statistics of the BioScope corpus. Document sizes represent number of sentences. Corpus Type Negation Cues Speculation Cues Negation Speculation Clinical87211376.6%13.4% Full Papers Paper Abstracts 378 1757 682 2694 13.76% 13.45% 22.29% 17.69% Table 2: Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and speculative sentences. 70% ofthe corpus documents (randomly selected) were used to develop the ScopeFinder system (i.e. extract lexico-syntactic rules) and the remaining 30% were used to evaluate system performance. While the corpus focuses on the biomedical domain, our rule extraction method is not domain specific and in future work we are planning to apply our method on different types of corpora. 4 Method Intuitively, rules for detecting both speculation and negation scopes could be concisely expressed as a 284 Figure 1: Parse tree of the sentence ‘T cells {lack active NFkappa B } bPuatr express Sp1 as expected’ generated by cthtiev eS NtanF-fkoaprdp parser. Speculation scope ewxporedcste are gsehnoewrant eind ellipsis. tTanhecue word is shown in grey. The nearest common ancestor of all cue and scope leaf nodes is shown in a box. combination of lexical and syntactic patterns. example, BioScope O¨zg u¨r For and Radev (2009) examined sample sentences and developed hedging scope rules such as: The scope of a modal verb cue (e.g. may, might, could) is the verb phrase to which it is attached; The scope of a verb cue (e.g. appears, seems) followed by an infinitival clause extends to the whole sentence. Similar lexico-syntactic rules have been also manually compiled and used in a number of hedge scope detection systems, e.g. (Kilicoglu and Bergler, 2008), (Rei and Briscoe, 2010), (Velldal et al., 2010), (Kilicoglu and Bergler, 2010), (Zhou et al., 2010). However, manually creating a comprehensive set of such lexico-syntactic scope rules is a laborious and time-consuming process. In addition, such an approach relies heavily on the availability of accurately parsed sentences, which could be problematic for domains such as biomedical texts (Clegg and Shepherd, 2007; McClosky and Charniak, 2008). Instead, we attempted to automatically extract lexico-syntactic scope rules from the BioScope corpus, relying only on consistent (but not necessarily accurate) parse tree representations. We first parsed each sentence in the training dataset which contained a negation or speculation cue using the Stanford parser (Klein and Manning, 2003; De Marneffe et al., 2006). Figure 1 shows the parse tree of a sample sentence containing a negation cue and its scope. Next, for each cue-scope instance within the sen- tence, we identified the nearest common ancestor Figure 2: Lexico-syntactic pattern extracted from the sentence from Figure 1. The rule is equivalent to the following string representation: (VP (VBP lack) (NP (JJ *scope*) (NN *scope*) (NN *scope*))). which encompassed the cue word(s) and all words in the scope (shown in a box on Figure 1). The subtree rooted by this ancestor is the basis for the resulting lexico-syntactic rule. The leaf nodes of the resulting subtree were converted to a generalized representation: scope words were converted to *scope*; noncue and non-scope words were converted to *; cue words were converted to lower case. Figure 2 shows the resulting rule. This rule generation approach resulted in a large number of very specific rule patterns - 1,681 nega- tion scope rules and 3,043 speculation scope rules were extracted from the training dataset. To identify a more general set of rules (and increase recall) we next performed a simple transformation of the derived rule set. If all children of a rule tree node are of type *scope* or * (i.e. noncue words), the node label is replaced by *scope* or * respectively, and the node’s children are pruned from the rule tree; neighboring identical siblings of type *scope* or * are replaced by a single node of the corresponding type. Figure 3 shows an example of this transformation. (a)ThechildrenofnodesJ /N /N are(b)Thechildren pruned and their labels are replaced by of node NP are *scope*. pruned and its label is replaced by *scope*. Figure 3: Transformation of the tree shown in Figure 2. The final rule is equivalent to the following string representation: (VP (VBP lack) *scope* ) 285 The rule tree pruning described above reduced the negation scope rule patterns to 439 and the speculation rule patterns to 1,000. In addition to generating a set of scope finding rules, we also implemented a module that parses string representations of the lexico-syntactic rules and performs subtree matching. The ScopeFinder module2 identifies negation and speculation scopes in sentence parse trees using string-encoded lexicosyntactic patterns. Candidate sentence parse subtrees are first identified by matching the path of cue leafnodes to the root ofthe rule subtree pattern. Ifan identical path exists in the sentence, the root of the candidate subtree is thus also identified. The candidate subtree is evaluated for a match by recursively comparing all node children (starting from the root of the subtree) to the rule pattern subtree. Nodes of type *scope* and * match any number of nodes, similar to the semantics of Regex Kleene star (*). 5 Results As an informed baseline, we used a previously de- veloped rule-based system for negation and speculation scope discovery (Apostolova and Tomuro, 2010). The system, inspired by the NegEx algorithm (Chapman et al., 2001), uses a list of phrases split into subsets (preceding vs. following their scope) to identify cues using string matching. The cue scopes extend from the cue to the beginning or end of the sentence, depending on the cue type. Table 3 shows the baseline results. PSFCNalpueingpleciarPutcAlai opbtneisor tacsP6597C348o.r12075e4ctly6859RP203475r. 81e26d037icteF569784C52. 04u913e84s5F2A81905l.2786P14redictCus Table 3: Baseline system performance. P (Precision), R (Recall), and F (F1-score) are computed based on the sentence tokens of correctly predicted cues. The last column shows the F1-score for sentence tokens of all predicted cues (including erroneous ones). We used only the scopes of predicted cues (correctly predicted cues vs. all predicted cues) to mea- 2The rule sets and source code are publicly available at http://scopefinder.sourceforge.net/. sure the baseline system performance. The baseline system heuristics did not contain all phrase cues present in the dataset. The scopes of cues that are missing from the baseline system were not included in the results. As the baseline system was not penalized for missing cue phrases, the results represent the upper bound of the system. Table 4 shows the results from applying the full extracted rule set (1,681 negation scope rules and 3,043 speculation scope rules) on the test data. As expected, this rule set consisting of very specific scope matching rules resulted in very high precision and very low recall. Negation P R F A Clinical99.4734.3051.0117.58 Full Papers Paper Abstracts 95.23 87.33 25.89 05.78 40.72 10.84 28.00 07.85 Speculation Clinical96.5020.1233.3022.90 Full Papers Paper Abstracts 88.72 77.50 15.89 11.89 26.95 20.62 10.13 10.00 Table 4: Results from applying the full extracted rule set on the test data. Precision (P), Recall (R), and F1-score (F) are com- puted based the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). Table 5 shows the results from applying the rule set consisting of pruned pattern trees (439 negation scope rules and 1,000 speculation scope rules) on the test data. As shown, overall results improved significantly, both over the baseline and over the unpruned set of rules. Comparable results are shown in bold in Tables 3, 4, and 5. Negation P R F A Clinical85.5992.1588.7585.56 Full Papers 49.17 94.82 64.76 71.26 Paper Abstracts 61.48 92.64 73.91 80.63 Speculation Clinical67.2586.2475.5771.35 Full Papers 65.96 98.43 78.99 52.63 Paper Abstracts 60.24 95.48 73.87 65.28 Table 5: Results from applying the pruned rule set on the test data. Precision (P), Recall (R), and F1-score (F) are computed based on the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). 6 Related Work Interest in the task of identifying negation and spec- ulation scopes has developed in recent years. Rele286 vant research was facilitated by the appearance of a publicly available annotated corpus. All systems described below were developed and evaluated against the BioScope corpus (Vincze et al., 2008). O¨zg u¨r and Radev (2009) have developed a supervised classifier for identifying speculation cues and a manually compiled list of lexico-syntactic rules for identifying their scopes. For the performance of the rule based system on identifying speculation scopes, they report 61. 13 and 79.89 accuracy for BioScope full papers and abstracts respectively. Similarly, Morante and Daelemans (2009b) developed a machine learning system for identifying hedging cues and their scopes. They modeled the scope finding problem as a classification task that determines if a sentence token is the first token in a scope sequence, the last one, or neither. Results of the scope finding system with predicted hedge signals were reported as F1-scores of 38. 16, 59.66, 78.54 and for clinical texts, full papers, and abstracts respectively3. Accuracy (computed for correctly identified scopes) was reported as 26.21, 35.92, and 65.55 for clinical texts, papers, and abstracts respectively. Morante and Daelemans have also developed a metalearner for identifying the scope of negation (2009a). Results of the negation scope finding system with predicted cues are reported as F1-scores (computed on scope tokens) of 84.20, 70.94, and 82.60 for clinical texts, papers, and abstracts respectively. Accuracy (the percent of correctly identified exact scopes) is reported as 70.75, 41.00, and 66.07 for clinical texts, papers, and abstracts respectively. The top three best performers on the CoNLL2010 shared task on hedge scope detection (Farkas et al., 2010) report an F1-score for correctly identified hedge cues and their scopes ranging from 55.3 to 57.3. The shared task evaluation metrics used stricter matching criteria based on exact match of both cues and their corresponding scopes4. CoNLL-2010 shared task participants applied a variety of rule-based and machine learning methods 3F1-scores are computed based on scope tokens. Unlike our evaluation metric, scope token matches are computed for each cue within a sentence, i.e. a token is evaluated multiple times if it belongs to more than one cue scope. 4Our system does not focus on individual cue-scope pair de- tection (we instead optimized scope detection) and as a result performance metrics are not directly comparable. on the task - Morante et al. (2010) used a memorybased classifier based on the k-nearest neighbor rule to determine if a token is the first token in a scope sequence, the last, or neither; Rei and Briscoe (2010) used a combination of manually compiled rules, a CRF classifier, and a sequence of post-processing steps on the same task; Velldal et al (2010) manually compiled a set of heuristics based on syntactic information taken from dependency structures. 7 Discussion We presented a method for automatic extraction of lexico-syntactic rules for negation/speculation scopes from an annotated corpus. The developed ScopeFinder system, based on the automatically extracted rule sets, was compared to a baseline rule-based system that does not use syntactic information. The ScopeFinder system outperformed the baseline system in all cases and exhibited results comparable to complex feature-based, machine-learning systems. In future work, we will explore the use of statistically based methods for the creation of an optimum set of lexico-syntactic tree patterns and will evaluate the system performance on texts from different domains. References E. Apostolova and N. Tomuro. 2010. Exploring surfacelevel heuristics for negation and speculation discovery in clinical texts. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 81–82. Association for Computational Linguistics. W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, and B.G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5):301–310. A.B. Clegg and A.J. Shepherd. 2007. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC bioinformatics, 8(1):24. M.C. De Marneffe, B. MacCartney, and C.D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC 2006. Citeseer. R. Farkas, V. Vincze, G. M o´ra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference on 287 Computational Natural Language Learning (CoNLL2010): Shared Task, pages 1–12. H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC bioinformatics, 9(Suppl 11):S10. H. Kilicoglu and S. Bergler. 2010. A High-Precision Approach to Detecting Hedges and Their Scopes. CoNLL-2010: Shared Task, page 70. D. Klein and C.D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3–10. D. McClosky and E. Charniak. 2008. Self-training for biomedical parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 101–104. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009a. A metalearning approach to processing the scope of negation. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 21–29. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009b. Learning the scope of hedge cues in biomedical texts. In Proceed- ings of the Workshop on BioNLP, pages 28–36. Association for Computational Linguistics. R. Morante, V. Van Asch, and W. Daelemans. 2010. Memory-based resolution of in-sentence scopes of hedge cues. CoNLL-2010: Shared Task, page 40. A. O¨zg u¨r and D.R. Radev. 2009. Detecting speculations and their scopes in scientific text. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1398–1407. Association for Computational Linguistics. M. Rei and T. Briscoe. 2010. Combining manual rules and supervised learning for hedge cue and scope detection. In Proceedings of the 14th Conference on Natural Language Learning, pages 56–63. E. Velldal, L. Øvrelid, and S. Oepen. 2010. Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules. CoNLL-2010: Shared Task, page 48. V. Vincze, G. Szarvas, R. Farkas, G. M o´ra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(Suppl 11):S9. H. Zhou, X. Li, D. Huang, Z. Li, and Y. Yang. 2010. Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts. CoNLL-2010: Shared Task, page 106.

same-paper 2 0.89034539 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

Author: Andrei Popescu-Belis ; Majid Yazdani ; Alexandre Nanchen ; Philip N. Garner

Abstract: The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition. Results are displayed in real time to meeting participants, or to users watching a recorded lecture or conversation.

3 0.88973755 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars

Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata

Abstract: We propose a model that incorporates an insertion operator in Bayesian tree substitution grammars (BTSG). Tree insertion is helpful for modeling syntax patterns accurately with fewer grammar rules than BTSG. The experimental parsing results show that our model outperforms a standard PCFG and BTSG for a small dataset. For a large dataset, our model obtains comparable results to BTSG, making the number of grammar rules much smaller than with BTSG.

4 0.86311698 30 acl-2011-Adjoining Tree-to-String Translation

Author: Yang Liu ; Qun Liu ; Yajuan Lu

Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.

5 0.85580969 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices

Author: Alexis Nasr ; Frederic Bechet ; Jean-Francois Rey ; Benoit Favre ; Joseph Le Roux

Abstract: MACAON is a tool suite for standard NLP tasks developed for French. MACAON has been designed to process both human-produced text and highly ambiguous word-lattices produced by NLP tools. MACAON is made of several native modules for common tasks such as a tokenization, a part-of-speech tagging or syntactic parsing, all communicating with each other through XML files . In addition, exchange protocols with external tools are easily definable. MACAON is a fast, modular and open tool, distributed under GNU Public License.

6 0.83793581 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

7 0.8147071 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

8 0.81339109 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

9 0.81284422 177 acl-2011-Interactive Group Suggesting for Twitter

10 0.81234968 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output

11 0.81170315 11 acl-2011-A Fast and Accurate Method for Approximate String Search

12 0.81132728 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

13 0.80999148 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation

14 0.80956882 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

15 0.8080979 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

16 0.80756366 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

17 0.8075493 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

18 0.80697501 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model

19 0.80669975 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

20 0.80668294 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining