acl acl2013 acl2013-104 knowledge-graph by maker-knowledge-mining

104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

Source: pdf

Author: Daniel Bar ; Torsten Zesch ; Iryna Gurevych

Abstract: We present DKPro Similarity, an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. DKPro Similarity comprises a wide variety of measures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector comparisons and structural, stylistic, and phonetic measures. In order to promote the reproducibility of experimental results and to provide reliable, permanent experimental conditions for future studies, DKPro Similarity additionally comes with a set of full-featured experimental setups which can be run out-of-the-box and be used for future systems to built upon.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We present DKPro Similarity, an open source framework for text similarity. [sent-4, score-0.101]

2 Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. [sent-5, score-0.732]

3 DKPro Similarity comprises a wide variety of measures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector comparisons and structural, stylistic, and phonetic measures. [sent-6, score-0.318]

4 1 Introduction Computing text similarity is key to several natural language processing applications such as automatic essay grading, paraphrase recognition, or plagiarism detection. [sent-8, score-0.477]

5 However, only a few text similarity measures proposed in the literature are released publicly, and those then typically do not comply with any standardization. [sent-9, score-0.653]

6 We are currently not aware of any designated text similarity framework which goes beyond simple lexical similarity or contains more than a small number ofmeasures, even though related frameworks exist, which we discuss in Section 6. [sent-10, score-0.959]

7 This fact was also realized by the organizers of the pilot Semantic Textual Similarity Task at SemEval-2012 (see Section 5), as they argue for the creation of an open source framework for text similarity (Agirre et al. [sent-11, score-0.455]

8 In order to fill this gap, we present DKPro Similarity, an open source framework for text similarity. [sent-13, score-0.101]

9 DKPro Similarity is designed to comple- ment DKPro Core1 , a collection of software components for natural language processing based on the Apache UIMA framework (Ferrucci and Lally, 2004). [sent-14, score-0.123]

10 Our goal is to provide a comprehensive repository of text similarity measures which are implemented in a common framework using standardized interfaces. [sent-15, score-0.764]

11 2 Architecture DKPro Similarity is designed to operate in either of two modes: The stand-alone mode allows to use text similarity measures as independent components in any experimental setup, but does not offer means for further language processing, e. [sent-18, score-0.927]

12 The UIMA-coupled mode tightly integrates similarity computation with fullfledged Apache UIMA-based language processing pipelines. [sent-21, score-0.437]

13 coreference or named-entitiy resolution, along with the text similarity computation. [sent-24, score-0.423]

14 Stand-alone Mode In this mode, text similarity measures can be used independently of any language processing pipeline just by passing them a pair of texts as (i) two strings, or (ii) two lists of strings (e. [sent-25, score-0.807]

15 We therefore provide an API module, which contains Java interfaces and abstract base classes for the measures. [sent-28, score-0.094]

16 That way, DKPro Similarity allows for a maximum flexibility in experimental design, as the text similarity measures can easily be integrated with any existing experimental setup: 1code . [sent-29, score-0.794]

17 com/p / dkpro-s imi l arity-a 121 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t. [sent-33, score-0.144]

18 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgices 121–126, 1 2 imi larityMeasure m = new GreedySt ringT i ing ( ) ; l double s imi l arity = m . [sent-35, score-0.389]

19 get S imi l arity ( t ext 1, t ext 2 ) ; Text S s l The above code snippet instantiates the Greedy String Tiling measure (Wise, 1996) and then computes the text similarity between the given pair of texts. [sent-36, score-0.742]

20 The resulting similarity score is normalized into [0, 1] where 0 means not similar at all, and 1 corresponds to perfectly similar. [sent-37, score-0.354]

21 3 By using the common Text S imi larityMeasure interface, it is easy to replace Greedy String Tiling with any measure of choice, such as Latent Semantic Analysis (Landauer et al. [sent-38, score-0.144]

22 We give an overview of measures available in DKPro Similarity in Section 3. [sent-40, score-0.23]

23 UIMA-coupled Mode In this mode, DKPro Similarity allows text similarity computation to be directly integrated with any UIMA-based language processing pipeline. [sent-41, score-0.503]

24 That way, it is easy to use text similarity components in addition to other UIMA-based components in the same pipeline. [sent-42, score-0.547]

25 For example, an experimental setup may require to first compute text similarity scores and then to run a classification algorithm on the resulting scores. [sent-43, score-0.517]

26 In Figure 1, we show a graphical overview of the integration of text similarity measures (right) with a UIMA-based pipeline (left). [sent-44, score-0.723]

27 As all text similarity measures in DKPro Similarity conform to standardized interfaces, they can be easily exchanged in the text similarity computation step. [sent-46, score-1.236]

28 With DKPro Similarity, we offer various subclasses of the generic UIMA components which are specifically tailored towards text similarity experiments, e. [sent-47, score-0.516]

29 corpus readers for standard evaluation datasets as well as evaluation components for running typical evaluation metrics. [sent-49, score-0.139]

30 By leveraging UIMA’s architecture, we also define an 3Some string distance measures such as the Levenshtein distance (Levenshtein, 1966) return a raw distance score where less distance corresponds to higher similarity. [sent-50, score-0.416]

31 intagApTholinyegs Figure 1: DKPro Similarity allows to integrate any text similarity measure (right) which conforms to standardized interfaces into a UIMA-based language processing pipeline (left) by means of a dedicated Similarity Scorer component (middle). [sent-55, score-0.731]

32 That way, it is possible to create text similarity measures which can use any piece of information that has been annotated in the processed documents, such as dependency trees or morphological information. [sent-57, score-0.653]

33 We detail the new set of components offered by DKPro Similarity in Section 4. [sent-58, score-0.062]

34 3 Text Similarity Measures In this section, we give an overview of the text similarity measures which are already available in DKPro Similarity. [sent-59, score-0.653]

35 While we provide new implementations for a multitude of measures, we rely on specialized libraries such as the S-Space Package (see Section 6) if available. [sent-60, score-0.127]

36 It also contains Greedy String Tiling (Wise, 1996), a measure which allows to compare strings if parts have been reordered. [sent-65, score-0.106]

37 The framework also offers measures which compute sets of character and word n-grams and compare them using different overlap coefficients, e. [sent-66, score-0.262]

38 It further includes popular string distance metrics such as the Jaro-Winkler (Winkler, 1990), Monge and Elkan (1997) and Levenshtein (1966) distance measures. [sent-69, score-0.183]

39 2 Semantic Similarity Measures DKPro Similarity also contains several measures which go beyond simple character sequences and compute text similarity on a semantic level. [sent-71, score-0.75]

40 Pairwise Word Similarity These measures are based on pairwise word similarity computations which are then aggregated for the complete texts. [sent-72, score-0.612]

41 The measures typically operate on a graph-based representation of words and the semantic relations among them within a lexical-semantic resource. [sent-73, score-0.334]

42 DKPro Similarity therefore contains adapters for WordNet, Wiktionary5, and Wikipedia, while the framework can easily be extended to other data sources that conform to a common interface (Garoufi et al. [sent-74, score-0.192]

43 Pairwise similarity measures in DKPro Similarity include Jiang and Conrath (1997) or Resnik (1995). [sent-76, score-0.584]

44 Vector Space Models These text similarity measures project texts onto high-dimensional vectors which are then compared. [sent-79, score-0.71]

45 Cosine similarity, a basic measure often used in information retrieval, weights words according to their term fre- quencies or tf-idf scores, and computes the cosine between two text vectors. [sent-80, score-0.069]

46 DKPro Similarity goes beyond a single implementation of these measures and comes with highly customizable code which allows to set var5http : / /www . [sent-85, score-0.338]

47 , 2012b) has shown promising results for the inclusion of measures which go beyond textual content and compute similarity along other text characteristics. [sent-92, score-0.731]

48 Thus, DKPro Similarity also includes measures for structural, stylistic, and phonetic similarity. [sent-93, score-0.311]

49 Structural Similarity Structural similarity between texts can be computed, for example, by comparing sets of stopword n-grams (Stamatatos, 2011). [sent-94, score-0.451]

50 The idea here is that similar texts may preserve syntactic similarity while exchanging only content words. [sent-95, score-0.411]

51 Other measures in DKPro Similarity allow to compare texts by part-of-speech ngrams, and order and distance features for pairs of words (Hatzivassiloglou et al. [sent-96, score-0.355]

52 The framework also includes a set ofmeasures which capture statistical properties of texts such as the type-token ratio (TTR) and the sequential TTR (McCarthy and Jarvis, 2010). [sent-99, score-0.162]

53 Phonetic Similarity DKPro Similarity also al- lows to compute text similarity based on pairwise phonetic comparisons of words. [sent-100, score-0.503]

54 It therefore contains implementations of well-known phonetic algorithms such as Double Metaphone (Philips, 2000) and Soundex (Knuth, 1973), which also conform to the common text similarity interface. [sent-101, score-0.612]

55 4 UIMA Components In addition to a rich set of text similarity measures as partly described above, DKPro Similarity includes components which allow to integrate text similarity measures with any UIMA-based pipeline, as outlined in Figure 1. [sent-102, score-1.433]

56 In the following, we introduce these components along with their resources. [sent-103, score-0.062]

57 Readers & Datasets DKPro Similarity includes corpus readers specifically tailored towards combining the input texts in a number of ways, e. [sent-104, score-0.167]

58 all possible combinations, or each text paired with n others by random. [sent-106, score-0.069]

59 Standard datasets for which 123 readers come pre-packaged include, among oth- ers, the SemEval-2012 STS data (Agirre et al. [sent-107, score-0.077]

60 As far as license terms allow redistribution, the datasets themselves are integrated into the framework. [sent-111, score-0.063]

61 Similarity Scorer The Similarity Scorer allows to integrate any text similarity measure (which is decoupled from UIMA by default) into a UIMAbased pipeline. [sent-112, score-0.474]

62 It builds upon the standardized text similarity interfaces and thus allows to easily exchange the text similarity measure as well as to specify the data types the measure should operate on, e. [sent-113, score-1.104]

63 , 2012) has shown that different text similarity measures can be combined using machine learning classifiers. [sent-117, score-0.653]

64 Such a combination shows improvements over single measures due to the fact that different measures capture different text characteristics. [sent-118, score-0.529]

65 DKPro Similarity thus provides adapters for the Weka framework (Hall et al. [sent-119, score-0.076]

66 , 2009) and allows to first pre-compute sets of text similarity scores which can then be used as features for various machine learning classifiers. [sent-120, score-0.474]

67 DKPro Similarity ships with a set of components which for example compute Pearson or Spearman correlation with human judgments, or apply task-specific metrics such as average precision as used in the RTE challenges. [sent-122, score-0.094]

68 That way, we promote the reproducibility of experimental results, and provide reliable, permanent experimental conditions which can benefit future studies and help to stimulate the reuse of particular experimental steps and software modules. [sent-124, score-0.281]

69 The experimental setups are instantiations of the generic UIMA-based language processing pipeline depicted in Figure 1 and are designed to precisely match the particular task at hand. [sent-125, score-0.248]

70 They thus come pre-configured with corpus readers for the relevant input data, with a set of pre- and postprocessing as well as evaluation components, and with a set of text similarity measures which are well-suited for the particular task. [sent-126, score-0.703]

71 The experimental setups are self-contained systems and can be run out-of-the-box without further configuration. [sent-127, score-0.178]

72 Intrinsic Evaluation DKPro Similarity contains the setup (B¨ ar et al. [sent-131, score-0.077]

73 7 The system combines a multitude of text similar- ity measures of varying complexity using a simple log-linear regression model. [sent-134, score-0.343]

74 The provided setup allows to evaluate how well the system output resembles human similarityjudgments on short texts which are taken from five different sources, e. [sent-135, score-0.157]

75 Extrinsic Evaluation Our framework includes two setups for an extrinsic evaluation: detecting text reuse, and recognizing textual entailment. [sent-138, score-0.371]

76 , 2012b) combines a multitude of text similarity measures along different text characteristics. [sent-141, score-0.766]

77 Thereby, it not only combines simple string-based and semantic similarity measures (see Sections 3. [sent-142, score-0.626]

78 2), but makes extensive use of measures along structural and stylistic text characteristics (see Section 3. [sent-144, score-0.387]

79 For recognizing textual entailment, we provide a setup which is similar in configuration to the one described above, but contains corpus readers and evaluation components precisely tailored towards the RTE challenge series (Dagan et al. [sent-147, score-0.271]

80 We believe that our setup can be used for filtering those text pairs which need further analysis by a dedicated textual entailment system. [sent-149, score-0.211]

81 124 6 Related Frameworks To the best of our knowledge, only a few generalized similarity frameworks exist at all. [sent-154, score-0.387]

82 That way, DKPro Similarity brings together the scattered efforts by offering access to all measures through common interfaces. [sent-156, score-0.23]

83 S-Space Package Even though no designated text similarity library, the S-Space Package (Jurgens and Stevens, 2010)8 contains some text similarity measures such as Latent Semantic Analysis (LSA) and Explicit Semantic Analysis (see Section 3. [sent-158, score-1.136]

84 Semantic Vectors The Semantic Vectors package is a package for distributional semantics (Widdows and Cohen, 2010)9 that contains measures such as LSA and allows for comparing documents within a given vector space. [sent-165, score-0.471]

85 WordNet::Similarity The open source package by Pedersen et al. [sent-167, score-0.081]

86 (2004)10 is a popular Perl library for the similarity computation on WordNet. [sent-168, score-0.422]

87 It comprises six word similarity measures that operate on WordNet, e. [sent-169, score-0.682]

88 Unfortunately, no strategies have been added to the package yet which aggregate the word similarity scores for complete texts in a similar manner as described in Section 3. [sent-172, score-0.492]

89 net /pro j ect s /wn-s imi l arity In DKPro Similarity, we offer native Java implementations of all measures contained in WordNet::Similarity, and allow to go beyond WordNet and use the measures with any lexical-semantic resource of choice, e. [sent-179, score-0.781]

90 (2005)11 exclusively comprises text similarity measures which compute lexical similarity on string sequences and compare texts without any semantic processing. [sent-183, score-1.2]

91 It contains measures such as the Levenshtein (1966) orMonge and Elkan (1997) distance metrics. [sent-184, score-0.29]

92 In DKPro Similarity, some string-based measures (see Section 3. [sent-185, score-0.23]

93 It also contains several well-known text similarity measures on string sequences, and includes many of the measures which are also part of the SimMetrics Library. [sent-189, score-0.998]

94 Some string-based measures in DKPro Similarity are based on the SecondString Toolkit. [sent-190, score-0.23]

95 7 Conclusions We presented DKPro Similarity, an open source framework designed to streamline the development of text similarity measures. [sent-191, score-0.455]

96 All measures conform to standardized interfaces and can either be used as stand-alone components in any experimental setup (e. [sent-192, score-0.583]

97 an already existing system which is not based on Apache UIMA), or can be tightly coupled with a full-featured UIMA-based language processing pipeline in order to allow for advanced processing capabilities. [sent-194, score-0.106]

98 We would like to encourage other researchers to participate in our efforts and invite them to explore our existing experimental setups as outlined in Section 5, run modified versions of our setups, and contribute own text similarity measures to the framework. [sent-195, score-0.831]

99 Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. [sent-270, score-0.423]

100 Semantic similarity based on corpus statistics and lexical taxonomy. [sent-276, score-0.354]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dkpro', 0.691), ('similarity', 0.354), ('measures', 0.23), ('uima', 0.154), ('imi', 0.144), ('setups', 0.133), ('package', 0.081), ('standardized', 0.079), ('pipeline', 0.07), ('text', 0.069), ('jcas', 0.066), ('laritymeasure', 0.066), ('simmetrics', 0.066), ('interfaces', 0.066), ('apache', 0.064), ('components', 0.062), ('operate', 0.062), ('ext', 0.059), ('string', 0.058), ('texts', 0.057), ('implementations', 0.057), ('stylistic', 0.057), ('arity', 0.057), ('agirre', 0.057), ('mode', 0.054), ('plagiarism', 0.054), ('sts', 0.054), ('clough', 0.054), ('conform', 0.052), ('levenshtein', 0.052), ('torsten', 0.052), ('phonetic', 0.052), ('java', 0.052), ('textual', 0.051), ('allows', 0.051), ('tiling', 0.051), ('readers', 0.05), ('setup', 0.049), ('experimental', 0.045), ('multitude', 0.044), ('adapters', 0.044), ('garoufi', 0.044), ('jca', 0.044), ('metaphone', 0.044), ('monge', 0.044), ('ofmeasures', 0.044), ('secondstring', 0.044), ('widdows', 0.044), ('double', 0.044), ('iryna', 0.044), ('reuse', 0.044), ('gabrilovich', 0.043), ('scorer', 0.043), ('zesch', 0.043), ('semantic', 0.042), ('ferrucci', 0.042), ('dedicated', 0.042), ('stopword', 0.04), ('meter', 0.039), ('permanent', 0.039), ('library', 0.039), ('rte', 0.038), ('wordnet', 0.036), ('interface', 0.036), ('comprises', 0.036), ('elkan', 0.036), ('allow', 0.036), ('reproducibility', 0.034), ('frameworks', 0.033), ('int', 0.032), ('conrath', 0.032), ('ttr', 0.032), ('authorship', 0.032), ('designated', 0.032), ('distance', 0.032), ('metrics', 0.032), ('landauer', 0.032), ('framework', 0.032), ('lsa', 0.031), ('structural', 0.031), ('dagan', 0.031), ('tailored', 0.031), ('markovitch', 0.031), ('jurgens', 0.031), ('ukp', 0.031), ('goes', 0.03), ('extrinsic', 0.03), ('includes', 0.029), ('software', 0.029), ('computation', 0.029), ('lally', 0.029), ('contains', 0.028), ('pairwise', 0.028), ('detecting', 0.027), ('datasets', 0.027), ('dinu', 0.027), ('beyond', 0.027), ('strings', 0.027), ('libraries', 0.026), ('wise', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

Author: Daniel Bar ; Torsten Zesch ; Iryna Gurevych

2 0.35290334 105 acl-2013-DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation

Author: Tristan Miller ; Nicolai Erbs ; Hans-Peter Zorn ; Torsten Zesch ; Iryna Gurevych

Abstract: Implementations of word sense disambiguation (WSD) algorithms tend to be tied to a particular test corpus format and sense inventory. This makes it difficult to test their performance on new data sets, or to compare them against past algorithms implemented for different data sets. In this paper we present DKPro WSD, a freely licensed, general-purpose framework for WSD which is both modular and extensible. DKPro WSD abstracts the WSD process in such a way that test corpora, sense inventories, and algorithms can be freely swapped. Its UIMA-based architecture makes it easy to add support for new resources and algorithms. Related tasks such as word sense induction and entity linking are also supported.

3 0.19523457 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

Author: Mohammad Taher Pilehvar ; David Jurgens ; Roberto Navigli

Abstract: Semantic similarity is an essential component of many Natural Language Processing applications. However, prior methods for computing semantic similarity often operate at different levels, e.g., single words or entire documents, which requires adapting the method for each data type. We present a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents. Our method leverages a common probabilistic representation over word senses in order to compare different types of linguistic data. This unified representation shows state-ofthe-art performance on three tasks: seman- tic textual similarity, word similarity, and word sense coarsening.

4 0.19096114 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

Author: Vasile Rus ; Mihai Lintean ; Rajendra Banjade ; Nobal Niraula ; Dan Stefanescu

Abstract: We present in this paper SEMILAR, the SEMantic simILARity toolkit. SEMILAR implements a number of algorithms for assessing the semantic similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented semantic similarity methods. Furthermore, it offers facilities for manual se-mantic similarity annotation by experts through its component SEMILAT (a SEMantic simILarity Annotation Tool). 1

5 0.13977794 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures

Author: Sean Szumlanski ; Fernando Gomez ; Valerie K. Sims

Abstract: We have elicited human quantitative judgments of semantic relatedness for 122 pairs of nouns and compiled them into a new set of relatedness norms that we call Rel-122. Judgments from individual subjects in our study exhibit high average correlation to the resulting relatedness means (r = 0.77, σ = 0.09, N = 73), although not as high as Resnik’s (1995) upper bound for expected average human correlation to similarity means (r = 0.90). This suggests that human perceptions of relatedness are less strictly constrained than perceptions of similarity and establishes a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We compare the results of several WordNet-based similarity and relatedness measures to our Rel-122 norms and demonstrate the limitations of WordNet for discovering general indications of semantic relatedness. We also offer a critique of the field’s reliance upon similarity norms to evaluate relatedness measures.

6 0.11672298 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us

7 0.11533061 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

8 0.11182485 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

9 0.098507993 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

10 0.095639326 150 acl-2013-Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications

11 0.094636723 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

12 0.087942749 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

13 0.087408513 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

14 0.087190703 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data

15 0.084807187 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

16 0.08146508 139 acl-2013-Entity Linking for Tweets

17 0.081414409 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit

18 0.077078663 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

19 0.073422909 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

20 0.064753652 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.158), (1, 0.066), (2, 0.042), (3, -0.178), (4, -0.013), (5, -0.127), (6, -0.101), (7, 0.029), (8, 0.01), (9, -0.013), (10, -0.044), (11, 0.124), (12, -0.041), (13, -0.062), (14, 0.133), (15, 0.052), (16, 0.023), (17, 0.061), (18, -0.059), (19, 0.007), (20, -0.016), (21, 0.015), (22, -0.064), (23, 0.008), (24, -0.228), (25, 0.034), (26, 0.121), (27, 0.052), (28, -0.098), (29, -0.188), (30, -0.071), (31, 0.026), (32, -0.089), (33, -0.091), (34, 0.022), (35, 0.129), (36, -0.058), (37, 0.063), (38, -0.018), (39, 0.032), (40, -0.05), (41, 0.041), (42, -0.055), (43, 0.012), (44, -0.057), (45, -0.016), (46, -0.044), (47, -0.072), (48, 0.033), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94804096 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

Author: Daniel Bar ; Torsten Zesch ; Iryna Gurevych

2 0.79920149 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

Author: Vasile Rus ; Mihai Lintean ; Rajendra Banjade ; Nobal Niraula ; Dan Stefanescu

3 0.76663107 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures

Author: Sean Szumlanski ; Fernando Gomez ; Valerie K. Sims

4 0.67431933 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

Author: Mohammad Taher Pilehvar ; David Jurgens ; Roberto Navigli

5 0.65986514 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

Author: Tony Veale ; Guofu Li

Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as

6 0.63674682 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us

7 0.613662 105 acl-2013-DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation

8 0.57682705 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

9 0.55347186 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

10 0.54745388 150 acl-2013-Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications

11 0.53816909 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

12 0.52929485 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models

13 0.49270883 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections

14 0.47271264 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

15 0.47223413 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

16 0.46897596 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures

17 0.45589122 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

18 0.45566577 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics

19 0.45389396 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

20 0.44264361 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.504), (6, 0.016), (11, 0.044), (15, 0.015), (24, 0.041), (26, 0.041), (28, 0.011), (35, 0.052), (42, 0.04), (48, 0.021), (64, 0.012), (70, 0.035), (88, 0.021), (90, 0.017), (95, 0.056)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98450226 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

Author: Daniel Bar ; Torsten Zesch ; Iryna Gurevych

2 0.97156143 269 acl-2013-PLIS: a Probabilistic Lexical Inference System

Author: Eyal Shnarch ; Erel Segal-haLevi ; Jacob Goldberger ; Ido Dagan

Abstract: This paper presents PLIS, an open source Probabilistic Lexical Inference System which combines two functionalities: (i) a tool for integrating lexical inference knowledge from diverse resources, and (ii) a framework for scoring textual inferences based on the integrated knowledge. We provide PLIS with two probabilistic implementation of this framework. PLIS is available for download and developers of text processing applications can use it as an off-the-shelf component for injecting lexical knowledge into their applications. PLIS is easily configurable, components can be extended or replaced with user generated ones to enable system customization and further research. PLIS includes an online interactive viewer, which is a powerful tool for investigating lexical inference processes. 1 Introduction and background Semantic Inference is the process by which machines perform reasoning over natural language texts. A semantic inference system is expected to be able to infer the meaning of one text from the meaning of another, identify parts of texts which convey a target meaning, and manipulate text units in order to deduce new meanings. Semantic inference is needed for many Natural Language Processing (NLP) applications. For instance, a Question Answering (QA) system may encounter the following question and candidate answer (Example 1): Q: which explorer discovered the New World? A: Christopher Columbus revealed America. As there are no overlapping words between the two sentences, to identify that A holds an answer for Q, background world knowledge is needed to link Christopher Columbus with explorer and America with New World. Linguistic knowledge is also needed to identify that reveal and discover refer to the same concept. Knowledge is needed in order to bridge the gap between text fragments, which may be dissimilar on their surface form but share a common meaning. For the purpose of semantic inference, such knowledge can be derived from various resources (e.g. WordNet (Fellbaum, 1998) and others, detailed in Section 2.1) in a form which we denote as inference links (often called inference/entailment rules), each is an ordered pair of elements in which the first implies the meaning of the second. For instance, the link ship→vessel can be derived from tshtaen hypernym rkel sahtiiopn→ ovfe Wsseolr cdNanet b. Other applications can benefit from utilizing inference links to identify similarity between language expressions. In Information Retrieval, the user’s information need may be expressed in relevant documents differently than it is expressed in the query. Summarization systems should identify text snippets which convey the same meaning. Our work addresses a generic, application in- dependent, setting of lexical inference. We therefore adopt the terminology of Textual Entailment (Dagan et al., 2006), a generic paradigm for applied semantic inference which captures inference needs of many NLP applications in a common underlying task: given two textual fragments, termed hypothesis (H) and text (T), the task is to recognize whether T implies the meaning of H, denoted T→H. For instance, in a QA application, H reprTe→seHnts. Fthoer question, a innd a T Q a c aanpdpilidcaattei answer. pInthis setting, T is likely to hold an answer for the question if it entails the question. It is challenging to properly extract the needed inference knowledge from available resources, and to effectively utilize it within the inference process. The integration of resources, each has its own format, is technically complex and the quality 97 ProceedingSsof oiaf, th Beu 5lg1asrtia A,n Anuuaglu Mst 4ee-9tin 2g0 o1f3. th ?ec A20ss1o3ci Aastisoonci faotrio Cno fomrp Cuotamtipountaalti Loinnaglu Lisitnigcsu,is patigcess 97–102, Figure 1: PLIS schema - a text-hypothesis pair is processed by the Lexical Integrator which uses a set of lexical resources to extract inference chains which connect the two. The Lexical Inference component provides probability estimations for the validity of each level of the process. ofthe resulting inference links is often unknown in advance and varies considerably. For coping with this challenge we developed PLIS, a Probabilistic Lexical Inference System1 . PLIS, illustrated in Fig 1, has two main modules: the Lexical Integra- tor (Section 2) accepts a set of lexical resources and a text-hypothesis pair, and finds all the lexical inference relations between any pair of text term ti and hypothesis term hj, based on the available lexical relations found in the resources (and their combination). The Lexical Inference module (Section 3) provides validity scores for these relations. These term-level scores are used to estimate the sentence-level likelihood that the meaning of the hypothesis can be inferred from the text, thus making PLIS a complete lexical inference system. Lexical inference systems do not look into the structure of texts but rather consider them as bag ofterms (words or multi-word expressions). These systems are easy to implement, fast to run, practical across different genres and languages, while maintaining a competitive level of performance. PLIS can be used as a stand-alone efficient inference system or as the lexical component of any NLP application. PLIS is a flexible system, allowing users to choose the set of knowledge resources as well as the model by which inference 1The complete software package is available at http:// www.cs.biu.ac.il/nlp/downloads/PLIS.html and an online interactive viewer is available for examination at http://irsrv2. cs.biu.ac.il/nlp-net/PLIS.html. is done. PLIS can be easily extended with new knowledge resources and new inference models. It comes with a set of ready-to-use plug-ins for many common lexical resources (Section 2.1) as well as two implementation of the scoring framework. These implementations, described in (Shnarch et al., 2011; Shnarch et al., 2012), provide probability estimations for inference. PLIS has an interactive online viewer (Section 4) which provides a visualization of the entire inference process, and is very helpful for analysing lexical inference models and lexical resources usability. 2 Lexical integrator The input for the lexical integrator is a set of lexical resources and a pair of text T and hypothesis H. The lexical integrator extracts lexical inference links from the various lexical resources to connect each text term ti ∈ T with each hypothesis term hj ∈ H2. A lexical i∈nfTer wenicthe elianckh hinydpicoathteess a semantic∈ rHelation between two terms. It could be a directional relation (Columbus→navigator) or a bai ddiirreeccttiioonnaall one (car ←→ automobile). dSirinecceti knowledge resources vary lien) their representation methods, the lexical integrator wraps each lexical resource in a common plug-in interface which encapsulates resource’s inner representation method and exposes its knowledge as a list of inference links. The implemented plug-ins that come with PLIS are described in Section 2.1. Adding a new lexical resource and integrating it with the others only demands the implementation of the plug-in interface. As the knowledge needed to connect a pair of terms, ti and hj, may be scattered across few resources, the lexical integrator combines inference links into lexical inference chains to deduce new pieces of knowledge, such as Columbus −r −e −so −u −rc −e →2 −r −e −so −u −rc −e →1 navigator explorer. Therefore, the only assumption −t −he − l−e −x →ica elx integrator makes, regarding its input lexical resources, is that the inferential lexical relations they provide are transitive. The lexical integrator generates lexical infer- ence chains by expanding the text and hypothesis terms with inference links. These links lead to new terms (e.g. navigator in the above chain example and t0 in Fig 1) which can be further expanded, as all inference links are transitive. A transitivity 2Where iand j run from 1 to the length of the text and hypothesis respectively. 98 limit is set by the user to determine the maximal length for inference chains. The lexical integrator uses a graph-based representation for the inference chains, as illustrates in Fig 1. A node holds the lemma, part-of-speech and sense of a single term. The sense is the ordinal number of WordNet sense. Whenever we do not know the sense of a term we implement the most frequent sense heuristic.3 An edge represents an inference link and is labeled with the semantic relation of this link (e.g. cytokine→protein is larbeellaetdio wni othf tt hheis sW linokrd (Nee.gt .re clayttiookni hypernym). 2.1 Available plug-ins for lexical resources We have implemented plug-ins for the follow- ing resources: the English lexicon WordNet (Fellbaum, 1998)(based on either JWI, JWNL or extJWNL java APIs4), CatVar (Habash and Dorr, 2003), a categorial variations database, Wikipedia-based resource (Shnarch et al., 2009), which applies several extraction methods to derive inference links from the text and structure of Wikipedia, VerbOcean (Chklovski and Pantel, 2004), a knowledge base of fine-grained semantic relations between verbs, Lin’s distributional similarity thesaurus (Lin, 1998), and DIRECT (Kotlerman et al., 2010), a directional distributional similarity thesaurus geared for lexical inference. To summarize, the lexical integrator finds all possible inference chains (of a predefined length), resulting from any combination of inference links extracted from lexical resources, which link any t, h pair of a given text-hypothesis. Developers can use this tool to save the hassle of interfacing with the different lexical knowledge resources, and spare the labor of combining their knowledge via inference chains. The lexical inference model, described next, provides a mean to decide whether a given hypothesis is inferred from a given text, based on weighing the lexical inference chains extracted by the lexical integrator. 3 Lexical inference There are many ways to implement an inference model which identifies inference relations between texts. A simple model may consider the 3This disambiguation policy was better than considering all senses of an ambiguous term in preliminary experiments. However, it is a matter of changing a variable in the configuration of PLIS to switch between these two policies. 4http://wordnet.princeton.edu/wordnet/related-projects/ number of hypothesis terms for which inference chains, originated from text terms, were found. In PLIS, the inference model is a plug-in, similar to the lexical knowledge resources, and can be easily replaced to change the inference logic. We provide PLIS with two implemented baseline lexical inference models which are mathematically based. These are two Probabilistic Lexical Models (PLMs), HN-PLM and M-PLM which are described in (Shnarch et al., 2011; Shnarch et al., 2012) respectively. A PLM provides probability estimations for the three parts of the inference process (as shown in Fig 1): the validity probability of each inference chain (i.e. the probability for a valid inference relation between its endpoint terms) P(ti → hj), the probability of each hypothesis term to →b e i hnferred by the entire text P(T → hj) (term-level probability), eanntdir teh tee probability o hf the entire hypothesis to be inferred by the text P(T → H) (sentencelteov eble probability). HN-PLM describes a generative process by which the hypothesis is generated from the text. Its parameters are the reliability level of each of the resources it utilizes (that is, the prior probability that applying an arbitrary inference link derived from each resource corresponds to a valid inference). For learning these parameters HN-PLM applies a schema of the EM algorithm (Dempster et al., 1977). Its performance on the recognizing textual entailment task, RTE (Bentivogli et al., 2009; Bentivogli et al., 2010), are in line with the state of the art inference systems, including complex systems which perform syntactic analysis. This model is improved by M-PLM, which deduces sentence-level probability from term-level probabilities by a Markovian process. PLIS with this model was used for a passage retrieval for a question answering task (Wang et al., 2007), and outperformed state of the art inference systems. Both PLMs model the following prominent aspects of the lexical inference phenomenon: (i) considering the different reliability levels of the input knowledge resources, (ii) reducing inference chain probability as its length increases, and (iii) increasing term-level probability as we have more inference chains which suggest that the hypothesis term is inferred by the text. Both PLMs only need sentence-level annotations from which they derive term-level inference probabilities. To summarize, the lexical inference module 99 ?(? → ?) Figure 2: PLIS interactive viewer with Example 1 demonstrates knowledge integration of multiple inference chains and resource combination (additional explanations, which are not part of the demo, are provided in orange). provides the setting for interfacing with the lexical integrator. Additionally, the module provides the framework for probabilistic inference models which estimate term-level probabilities and integrate them into a sentence-level inference decision, while implementing prominent aspects of lexical inference. The user can choose to apply another inference logic, not necessarily probabilistic, by plugging a different lexical inference model into the provided inference infrastructure. 4 The PLIS interactive system PLIS comes with an online interactive viewer5 in which the user sets the parameters of PLIS, inserts a text-hypothesis pair and gets a visualization of the entire inference process. This is a powerful tool for investigating knowledge integration and lexical inference models. Fig 2 presents a screenshot of the processing of Example 1. On the right side, the user configures the system by selecting knowledge resources, adjusting their configuration, setting the transitivity limit, and choosing the lexical inference model to be applied by PLIS. After inserting a text and a hypothesis to the appropriate text boxes, the user clicks on the infer button and PLIS generates all lexical inference chains, of length up to the transitivity limit, that connect text terms with hypothesis terms, as available from the combination of the selected input re5http://irsrv2.cs.biu.ac.il/nlp-net/PLIS.html sources. Each inference chain is presented in a line between the text and hypothesis. PLIS also displays the probability estimations for all inference levels; the probability of each chain is presented at the end of its line. For each hypothesis term, term-level probability, which weighs all inference chains found for it, is given below the dashed line. The overall sentence-level probability integrates the probabilities of all hypothesis terms and is displayed in the box at the bottom right corner. Next, we detail the inference process of Example 1, as presented in Fig 2. In this QA example, the probability of the candidate answer (set as the text) to be relevant for the given question (the hypothesis) is estimated. When utilizing only two knowledge resources (WordNet and Wikipedia), PLIS is able to recognize that explorer is inferred by Christopher Columbus and that New World is inferred by America. Each one of these pairs has two independent inference chains, numbered 1–4, as evidence for its inference relation. Both inference chains 1 and 3 include a single inference link, each derived from a different relation of the Wikipedia-based resource. The inference model assigns a higher probability for chain 1since the BeComp relation is much more reliable than the Link relation. This comparison illustrates the ability of the inference model to learn how to differ knowledge resources by their reliability. Comparing the probability assigned by the in100 ference model for inference chain 2 with the probabilities assigned for chains 1 and 3, reveals the sophisticated way by which the inference model integrates lexical knowledge. Inference chain 2 is longer than chain 1, therefore its probability is lower. However, the inference model assigns chain 2 a higher probability than chain 3, even though the latter is shorter, since the model is sensitive enough to consider the difference in reliability levels between the two highly reliable hypernym relations (from WordNet) of chain 2 and the less reliable Link relation (from Wikipedia) of chain 3. Another aspect of knowledge integration is exemplified in Fig 2 by the three circled probabilities. The inference model takes into consideration the multiple pieces of evidence for the inference of New World (inference chains 3 and 4, whose probabilities are circled). This results in a termlevel probability estimation for New World (the third circled probability) which is higher than the probabilities of each chain separately. The third term of the hypothesis, discover, remains uncovered by the text as no inference chain was found for it. Therefore, the sentence-level inference probability is very low, 37%. In order to identify that the hypothesis is indeed inferred from the text, the inference model should be provided with indications for the inference of discover. To that end, the user may increase the transitivity limit in hope that longer inference chains provide the needed information. In addition, the user can examine other knowledge resources in search for the missing inference link. In this example, it is enough to add VerbOcean to the input of PLIS to expose two inference chains which connect reveal with discover by combining an inference link from WordNet and another one from VerbOcean. With this additional information, the sentence-level probability increases to 76%. This is a typical scenario of utilizing PLIS, either via the interactive system or via the software, for analyzing the usability of the different knowledge resources and their combination. A feature of the interactive system, which is useful for lexical resources analysis, is that each term in a chain is clickable and links to another screen which presents all the terms that are inferred from it and those from which it is inferred. Additionally, the interactive system communicates with a server which runs PLIS, in a fullduplex WebSocket connection6. This mode of operation is publicly available and provides a method for utilizing PLIS, without having to install it or the lexical resources it uses. Finally, since PLIS is a lexical system it can easily be adjusted to other languages. One only needs to replace the basic lexical text processing tools and plug in knowledge resources in the target language. If PLIS is provided with bilingual resources,7 it can operate also as a cross-lingual inference system (Negri et al., 2012). For instance, the text in Fig 3 is given in English, while the hypothesis is written in Spanish (given as a list of lemma:part-of-speech). The left side of the figure depicts a cross-lingual inference process in which the only lexical knowledge resource used is a man- ually built English-Spanish dictionary. As can be seen, two Spanish terms, jugador and casa remain uncovered since the dictionary alone cannot connect them to any of the English terms in the text. As illustrated in the right side of Fig 3, PLIS enables the combination of the bilingual dictionary with monolingual resources to produce cross-lingual inference chains, such as footballer−h −y −p −er−n y −m →player− −m −a −nu − →aljugador. Such inferenc−e − c−h −a −in − →s hpalavey trh− e− capability otro. overcome monolingual language variability (the first link in this chain) as well as to provide cross-lingual translation (the second link). 5 Conclusions To utilize PLIS one should gather lexical resources, obtain sentence-level annotations and train the inference model. Annotations are available in common data sets for task such as QA, Information Retrieval (queries are hypotheses and snippets are texts) and Student Response Analysis (reference answers are the hypotheses that should be inferred by the student answers). For developers of NLP applications, PLIS offers a ready-to-use lexical knowledge integrator which can interface with many common lexical knowledge resources and constructs lexical inference chains which combine the knowledge in them. A developer who wants to overcome lexical language variability, or to incorporate background knowledge, can utilize PLIS to inject lex6We used the socket.io implementation. 7A bilingual resource holds inference links which connect terms in different languages (e.g. an English-Spanish dictionary can provide the inference link explorer→explorador). 101 Figure 3 : PLIS as a cross-lingual inference system. Left: the process with a single manual bilingual resource. Right: PLIS composes cross-lingual inference chains to increase hypothesis coverage and increase sentence-level inference probability. ical knowledge into any text understanding application. PLIS can be used as a lightweight inference system or as the lexical component of larger, more complex inference systems. Additionally, PLIS provides scores for infer- ence chains and determines the way to combine them in order to recognize sentence-level inference. PLIS comes with two probabilistic lexical inference models which achieved competitive performance levels in the tasks of recognizing textual entailment and passage retrieval for QA. All aspects of PLIS are configurable. The user can easily switch between the built-in lexical resources, inference models and even languages, or extend the system with additional lexical resources and new inference models. Acknowledgments The authors thank Eden Erez for his help with the interactive viewer and Miquel Espl a` Gomis for the bilingual dictionaries. This work was partially supported by the European Community’s 7th Framework Programme (FP7/2007-2013) under grant agreement no. 287923 (EXCITEMENT) and the Israel Science Foundation grant 880/12. References Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the web for fine-grained semantic verb relations. In Proc. of EMNLP. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, series [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for English. In Proc. of NAACL. Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, 16(4):359–389. Dekang Lin. 1998. Automatic retrieval and clustering of similar words. In Proc. of COLOING-ACL. Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2012. Semeval-2012 task 8: Cross-lingual textual entailment for content synchronization. In Proc. of SemEval. Eyal Shnarch, Libby Barak, and Ido Dagan. 2009. Extracting lexical reference rules from Wikipedia. In Proc. of ACL. Eyal Shnarch, Jacob Goldberger, and Ido Dagan. 2011. Towards a probabilistic model for lexical entailment. In Proc. of the TextInfer Workshop. Eyal Shnarch, Ido Dagan, and Jacob Goldberger. 2012. A probabilistic lexical model for ranking textual inferences. In Proc. of *SEM. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasisynchronous grammar for QA. In Proc. of EMNLP. 102

3 0.96978593 150 acl-2013-Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications

Author: Georgios Kontonatsios ; Paul Thompson ; Riza Theresa Batista-Navarro ; Claudiu Mihaila ; Ioannis Korkontzelos ; Sophia Ananiadou

Abstract: U-Compare is a UIMA-based workflow construction platform for building natural language processing (NLP) applications from heterogeneous language resources (LRs), without the need for programming skills. U-Compare has been adopted within the context of the METANET Network of Excellence, and over 40 LRs that process 15 European languages have been added to the U-Compare component library. In line with METANET’s aims of increasing communication between citizens of different European countries, U-Compare has been extended to facilitate the development of a wider range of applications, including both mul- tilingual and multimodal workflows. The enhancements exploit the UIMA Subject of Analysis (Sofa) mechanism, that allows different facets of the input data to be represented. We demonstrate how our customised extensions to U-Compare allow the construction and testing of NLP applications that transform the input data in different ways, e.g., machine translation, automatic summarisation and text-to-speech.

4 0.96634305 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures

Author: Sean Szumlanski ; Fernando Gomez ; Valerie K. Sims

5 0.95783651 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers

Author: Andre Martins ; Miguel Almeida ; Noah A. Smith

Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).

6 0.95766556 277 acl-2013-Part-of-speech tagging with antagonistic adversaries

7 0.90460449 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

8 0.84954858 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling

9 0.81540984 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

10 0.76115483 105 acl-2013-DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation

11 0.74685031 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP

12 0.68668753 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

13 0.68636996 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

14 0.68305331 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

15 0.670654 297 acl-2013-Recognizing Partial Textual Entailment

16 0.65158027 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

17 0.64588386 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

18 0.6323697 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

19 0.63109654 237 acl-2013-Margin-based Decomposed Amortized Inference

20 0.62814838 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages