emnlp emnlp2013 emnlp2013-12 knowledge-graph by maker-knowledge-mining

12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity


Source: pdf

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. [sent-3, score-0.968]

2 When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods. [sent-5, score-0.706]

3 , 2007), textual similarity is symmetric, and unlike both textual entailment and paraphrasing (Dolan and Brockett, 2005), textual similarity is modeled using a graded score rather than a binary decision. [sent-9, score-0.806]

4 State-of-the-art systems to determine textual similarity (B¨ ar et al. [sent-19, score-0.333]

5 The process is repeated to obtain the similarity score from sent2 to sent1, and both scores are then averaged to determine the overall textual similarity. [sent-28, score-0.374]

6 Consider now the semantic representations for sentences 1(a) and 1(b) in Figure 1. [sent-36, score-0.265]

7 While in both sentences the word ‘man’ encodes the same concept, their semantic functions with respect to other concepts are different. [sent-38, score-0.299]

8 The main novelties of our approach are: it (1) derives semantic features from a logic prover to be used in a machine learning framework; (2) uses three logic form transformations capturing different levels of knowledge; and (3) incorporates semantic representations extracted automatically. [sent-48, score-1.435]

9 1 Matching Semantic Representations and Determining Textual Similarity Throughout this paper, the semantic representation of a sentence comprises the concepts in it, semantic relations linking those concepts and named entities qualifying them. [sent-50, score-0.71]

10 First, we note that existing tools to extract semantic relations and named entities are not perfect, thus any system relying on them will suffer from incomplete and incorrect representations. [sent-51, score-0.358]

11 Second, even if flawless representations were readily available, the problem of determining textual similarity cannot be reduced to matching semantic representations: partial matches may correspond to completely similar sentences. [sent-52, score-0.527]

12 Our approach (Section 3) copes with the inherent errors made by tools used to obtain semantic representations and learns which parts of a representation are important to determine textual similarity. [sent-54, score-0.474]

13 Consider sentences 2(a) The man used a sword to slice a plastic bottle and 2(b) A man sliced a plastic 1236 bottle with a sword. [sent-55, score-0.57]

14 Both sentences have high simi- larity [5 out of 5], and yet their semantic representations only match partially. [sent-56, score-0.265]

15 Sentence 2(c) A woman is applying cosmetics to her face and 2(d) A woman is putting on makeup are highly similar even though the latter specifies neither the LOCATION where the ‘makeup’ is applied nor the fact that a PART of the ‘woman’ is her ‘face’. [sent-61, score-0.585]

16 Similarly, sentences 2(e) A woman is dancing in the rain and 2(f) A woman dances in the rain outside are semantically equivalent since ‘rain’ always has LOCATION ‘outside’ : missing this information does not carry loss of meaning. [sent-62, score-0.889]

17 We believe this is because they use semantic relations to calculate some ad-hoc similarity score. [sent-88, score-0.373]

18 Moreover, we use three logic form transformations capturing different levels of knowledge, from only content words to semantic structure. [sent-90, score-0.566]

19 They use a standard theorem prover and extract 8 features that are later combined using machine learning. [sent-98, score-0.302]

20 (2005) use a logic form transformation derived from dependency parses and named entities. [sent-100, score-0.506]

21 Unlike them, we define three logic from transformations, use a modified resolution step and extract hundreds of features from the proofs. [sent-102, score-0.334]

22 Tatu and Moldovan (2005) use a modified logic prover that drops predicates when a proof cannot be found. [sent-103, score-1.079]

23 Unlike us, they do not drop unbound predicates and use a single logic form transformation. [sent-104, score-0.881]

24 Another key difference is that they assign fixed weights to predicates a priori instead of using machine learning to determine them. [sent-105, score-0.326]

25 3 Approach Our approach to determine textual similarity (Figure 3) is grounded on using semantic features derived from a logic prover that are later combined in a standard supervised machine learning framework. [sent-106, score-1.201]

26 First, sentences are transformed into logic forms (lft1, lft2). [sent-107, score-0.398]

27 Then, a modified logic prover is used to find a proof in both directions (lft1 to lft2 and lft2 to lft1). [sent-108, score-0.835]

28 The prover yields similarity scores based on the number of predicates dropped and features characterizing the proofs. [sent-109, score-0.877]

29 Additional similarity scores are obtained using standard pairwise word similarity measures. [sent-110, score-0.402]

30 Finally, all scores and features are combined using machine learning to yield the final textual similarity score. [sent-111, score-0.335]

31 The rest of this section details each component and exemplifies it with 2(e) A woman is dancing in the rain and 2(f) A woman dances in the rain outside. [sent-113, score-0.874]

32 1 Logic Form Transformation The logic form transformation (LFT) of a sentence is derived from the concepts in it, the semantic relations linking them and named entities. [sent-118, score-0.841]

33 Unlike other LFT proposals (Zettlemoyer and Collins, 2005; Poon and Domingos, 2009), transforming sentences into logic forms is a straightforward step, the quality of the logic forms is determined by the output of standard NLP tools. [sent-119, score-0.764]

34 , A woman SdaRnc feosr: woman N (x1 ) & dance V (x2 ) & A G E N TSR ( x2 , x1) . [sent-135, score-0.496]

35 In order to overcome semantic relation extraction errors, we have experimented with three logic form transformation modes. [sent-136, score-0.558]

36 Each mode captures different levels of knowledge: Basic generates predicates for all nouns, verbs, modifiers and named entities. [sent-137, score-0.394]

37 This logic form is parallel to accounting for content words, their POS tags and named entity types. [sent-138, score-0.382]

38 SemRels generates predicates for all semantic relations, concepts that are arguments of relations and named entities qualifying those concepts. [sent-139, score-0.73]

39 This mode ignores concepts not linked to other concepts through a relation and might miss key 1238 concepts if some relations are missing. [sent-140, score-0.43]

40 If no semantic relations are found, this mode backs off to Basic to avoid empty logic forms. [sent-141, score-0.627]

41 Full generates predicates for all concepts, all semantic relations and all named entities. [sent-142, score-0.569]

42 It is equivalent to SemRels after adding predicates for concepts that are not arguments of a semantic relation. [sent-143, score-0.554]

43 However, this is often not the case and combining the three logic forms yields better performance (Section 4). [sent-146, score-0.366]

44 The logic prover uses a modified resolution procedure to calculate a similarity score and features derived from the proof. [sent-150, score-0.841]

45 The logic prover is a modification of OTTER3 (McCune and Wos, 1997), an automated theorem prover for first-order logic. [sent-152, score-0.938]

46 For the textual similarity task, we load lft1 and ¬lft2 to the set of support and lexical chain axiaonmds to the usable list. [sent-153, score-0.294]

47 Then, the logic prover begins its search for a proof. [sent-154, score-0.636]

48 c6085o270r5e Table 2: Example of predicate dropping step by step. [sent-163, score-0.265]

49 Predicates AGE N TSR (x2, x1) and T HEMESR (x2, x4 ) would not be dropped if unbound predicates were not dropped, yielding a score of 0. [sent-164, score-0.626]

50 In this case, predicates from lft1 are dropped until a proof is found. [sent-168, score-0.551]

51 The worst case occurs when all predicates in lft1 are dropped. [sent-169, score-0.287]

52 The goal of the dropping mechanism is to force the prover to always find a proof, and penalize partial proofs accordingly. [sent-170, score-0.53]

53 Ftaonrc example, axioms derived from woman include woman → female, woman → mistress, woman → w wiodomwa na n→d woman → mmaanda →m. [sent-173, score-1.359]

54 1 Predicate Dropping Criteria When a proof cannot be found, individual predicates from lft1 not present in lft2 are dropped. [sent-178, score-0.443]

55 A greedy algorithm was implemented for this step: out of all predicates from lft1 not present in lft2, drop whichever occurs first. [sent-179, score-0.316]

56 After dropping a predicate, all predicates that become unbound are dropped as well. [sent-181, score-0.812]

57 With our current logic form transformation, dropping a noun, verb or modifier may make a semantic relation (SR) or named entity (NE) predicate unbound. [sent-182, score-0.813]

58 To avoid determining high similarity between sentences with a common semantic structure but unrelated concepts instantiating this structure, predicates encoding semantic relations and named entities are automatically dropped when they become unbound. [sent-183, score-1.144]

59 2 Proof Scoring Criterion The score assigned to the proof from lft1 to lft2 is calculated as the ratio of number of predicates in lft1 not dropped to find the proof over the original number of predicates in lft1. [sent-186, score-0.994]

60 Note that the dropping mechanism, and in particular whether predicates that become unbound are automatically dropped, greatly impact the proof obtained and its score (Table 2). [sent-187, score-0.889]

61 If predi- × cates that become unbound were not automatically dropped in each step, instrument NE (x4 ) and VALUESR (x4, x3 ) would be dropped in steps 5 and 6, AG E N T SR (x2 , x1) and T H E ME SR (x2 , x4 ) would not be dropped, and the final score would be 0. [sent-188, score-0.447]

62 In plain English, dropping unbound predicates avoids matching semantic structures instantiated by unrelated concepts. [sent-191, score-0.87]

63 3 Feature Selection While the proof score can be used directly as an estimator of the similarity between lft1 and lft2, additional features are extracted from the proof itself. [sent-194, score-0.451]

64 Namely, for each predicate type (N, V, M, O, SR, NE), we count the number of predicates present in lft1, the number of predicates dropped to find a proof for lft2 and the ratio of the two counts. [sent-195, score-0.917]

65 These three counts are also calculated for each specific semantic relation predicate (AGEN TSR, LOCAT IONSR, etc. [sent-196, score-0.245]

66 8M3n0(3rx4)v1tv0d v0rm10tm10dm10rn etn0edn0ers2rts0rds0r Table 3: Two logic forms and output of logic prover in both directions. [sent-202, score-1.002]

67 ) features indicate the total number of predicates, the number of predicates dropped until a proof is found and ratio of the two counts (t, d and r respectively). [sent-204, score-0.551]

68 We omit the features for predicate O and individual semantic relations because of space constraints. [sent-205, score-0.313]

69 The process is repeated from sent2 to sent1 to obtain the similarity between sent2 and sent1, and both overall similarities are averaged to determine the final similarity score. [sent-212, score-0.351]

70 Semantic relations are extracted with Polaris (Moldovan and Blanco, 2012), a semantic parser that given text extracts semantic relations. [sent-229, score-0.4]

71 These corpora consist of pairs of sentences labeled with their semantic similarity score, ranging from 0. [sent-241, score-0.337]

72 Table 5 shows results obtained with the test split not dropping and dropping unbound predicates. [sent-254, score-0.632]

73 For comparison purposes, results of the top-3 performers and participants using the semantic structure of sentences are also shown. [sent-255, score-0.287]

74 uk/semeval-20 12 / t ask 6 / 1241 the score (average of both directions) obtained with the corresponding logic form transformation (Basic, SemRels or Full) and are unsupervised: training data with textual similarity scores is not used. [sent-260, score-0.756]

75 LFT-scores + features combines the 9 LFT-scores and 468 features derived from the logic proof. [sent-262, score-0.4]

76 WN-scores uses as features the 7 scores derived using pairwise word similarity measures. [sent-263, score-0.3]

77 We indicate that the performance of one of our systems with respect to LFT score Basic not dropping unbound predicates is significant with ∗ (confidence 99%) and † (confidence 95%). [sent-265, score-0.704]

78 Overall, systems that drop unbound predicates perform better than systems that do not drop them. [sent-266, score-0.576]

79 However, best results for SMTeuroparl are obtained dropping unbound predicates and using All features. [sent-268, score-0.733]

80 Henceforth, we comment on results dropping unbound predicates as they are higher. [sent-269, score-0.704]

81 Regarding logic form transformations, one can see a trend depending on the source of sentences. [sent-270, score-0.334]

82 Polaris, the semantic parser, and the syntactic parser Polaris relies on are mostly trained in the news domain, and thus semantic representations have higher quality in that domain. [sent-271, score-0.399]

83 r421e904di82c56ates), and results obtained by the top-3 performers and teams that included in their models features derived from the semantic structure of sentences. [sent-301, score-0.35]

84 Statistically significant differences in performance between our systems and LFT score Basic not dropping unbound predicates are indicated with ∗ (confidence 99%) and (confidence 95%). [sent-302, score-0.704]

85 This leads to the conclusion that several semantic relations are often missing, and thus considering concepts even if they are not linked to other concepts via a semantic relation (Full) is more sound than ignoring them (SemRels). [sent-310, score-0.602]

86 When training data is available (MSRpar, MSRvid, SMTeuroparl), LFT-scores + features always outperforms the scores obtained with a single logic form transformation in an unsupervised manner. [sent-311, score-0.462]

87 In other words, combining the scores obtained with the three logic form transformations and incorporating the additional features derived from the proofs improves performance. [sent-312, score-0.578]

88 These results demonstrate that while a shallow logic form transformation (Basic) offers a strong baseline, it can be successfully complemented with logic form transformations that consider the semantic structure of sen1242 tences (SemRels, Full) and additional features characterizing the proofs. [sent-313, score-0.989]

89 WN scores, which only uses as features the scores derived from pairwise word similarity measures, performs astonishingly well for some corpora. [sent-321, score-0.3]

90 Finally, dropping unbound predicates and using All features outperforms any other system. [sent-331, score-0.704]

91 (Sˇari´ c 5 Conclusions This paper presents a novel approach to determine textual similarity that employs a logic prover to extract semantic features. [sent-369, score-1.135]

92 A layered methodology to transform text into logic forms using three logic form transformations modes is presented. [sent-370, score-0.855]

93 Each mode captures different levels of knowledge, from only content words to semantic representations automatically extracted. [sent-371, score-0.292]

94 Best results are obtained when features derived from the logic prover are complemented with simpler pairwise word similarity measures. [sent-372, score-0.955]

95 Features that account for the semantic structure of sentences are incorporated when needed, as the results obtained with systems All, LFT scores and WN scores show. [sent-373, score-0.309]

96 State-ofthe-art NLP tools to extract semantic representations from text, which are far from perfect, yield promising results. [sent-375, score-0.28]

97 Semeval-2012 task 6: A pilot on semantic textual similarity. [sent-381, score-0.321]

98 Ukp: Computing semantic textual similarity by combining multiple content similarity measures. [sent-396, score-0.599]

99 Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. [sent-512, score-0.336]

100 Using information content to evaluate semantic similarity in a taxonomy. [sent-562, score-0.305]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('logic', 0.334), ('prover', 0.302), ('predicates', 0.287), ('woman', 0.248), ('unbound', 0.231), ('dropping', 0.186), ('semantic', 0.166), ('lft', 0.16), ('semrels', 0.16), ('smteuroparl', 0.16), ('proof', 0.156), ('textual', 0.155), ('msrvid', 0.142), ('similarity', 0.139), ('msrpar', 0.124), ('rain', 0.113), ('dropped', 0.108), ('concepts', 0.101), ('moldovan', 0.099), ('man', 0.095), ('semeval', 0.092), ('performers', 0.089), ('polaris', 0.089), ('predicate', 0.079), ('plastic', 0.071), ('relations', 0.068), ('representations', 0.067), ('transformations', 0.066), ('derived', 0.066), ('entailment', 0.063), ('bottle', 0.062), ('tsr', 0.062), ('mode', 0.059), ('montr', 0.059), ('transformation', 0.058), ('pairwise', 0.054), ('eal', 0.054), ('sr', 0.054), ('axioms', 0.053), ('dances', 0.053), ('exemplifies', 0.053), ('lymba', 0.053), ('makeup', 0.053), ('mohler', 0.053), ('smtnews', 0.053), ('bos', 0.053), ('modes', 0.053), ('ari', 0.05), ('banea', 0.05), ('named', 0.048), ('tools', 0.047), ('canada', 0.047), ('dancing', 0.046), ('girju', 0.046), ('sword', 0.046), ('sixth', 0.044), ('directions', 0.043), ('proofs', 0.042), ('eduardo', 0.042), ('agirre', 0.042), ('scores', 0.041), ('ne', 0.04), ('determine', 0.039), ('palmer', 0.038), ('wordnet', 0.037), ('pages', 0.036), ('outside', 0.036), ('abductive', 0.036), ('agen', 0.036), ('blanco', 0.036), ('chopping', 0.036), ('cosmetics', 0.036), ('fighting', 0.036), ('giampiccolo', 0.036), ('layered', 0.036), ('leacock', 0.036), ('mccune', 0.036), ('monkey', 0.036), ('raina', 0.036), ('rios', 0.036), ('sliced', 0.036), ('tatu', 0.036), ('confidence', 0.035), ('location', 0.035), ('similarities', 0.034), ('sentences', 0.032), ('basic', 0.032), ('forms', 0.032), ('measures', 0.031), ('wn', 0.031), ('complemented', 0.031), ('lesk', 0.031), ('qualifying', 0.031), ('rna', 0.031), ('obtained', 0.029), ('full', 0.029), ('entities', 0.029), ('drop', 0.029), ('agent', 0.028), ('martha', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.

2 0.16067423 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Mike Lewis ; Mark Steedman

Abstract: Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

3 0.15486422 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

Author: Wiltrud Kessler ; Jonas Kuhn

Abstract: This short paper presents a pilot study investigating the training of a standard Semantic Role Labeling (SRL) system on product reviews for the new task of detecting comparisons. An (opinionated) comparison consists of a comparative “predicate” and up to three “arguments”: the entity evaluated positively, the entity evaluated negatively, and the aspect under which the comparison is made. In user-generated product reviews, the “predicate” and “arguments” are expressed in highly heterogeneous ways; but since the elements are textually annotated in existing datasets, SRL is technically applicable. We address the interesting question how well training an outof-the-box SRL model works for English data. We observe that even without any feature engineering or other major adaptions to our task, the system outperforms a reasonable heuristic baseline in all steps (predicate identification, argument identification and argument classification) and in three different datasets.

4 0.15188947 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

5 0.12453179 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

Author: Oier Lopez de Lacalle ; Mirella Lapata

Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.

6 0.12117261 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

7 0.10220379 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

8 0.076075621 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

9 0.071819589 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

10 0.069875166 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

11 0.064637102 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

12 0.063845545 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations

13 0.062728882 160 emnlp-2013-Relational Inference for Wikification

14 0.059120093 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations

15 0.058552746 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

16 0.05806312 24 emnlp-2013-Application of Localized Similarity for Web Documents

17 0.057099573 123 emnlp-2013-Learning to Rank Lexical Substitutions

18 0.056241296 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

19 0.0542344 33 emnlp-2013-Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese

20 0.054172769 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.206), (1, 0.089), (2, -0.031), (3, 0.037), (4, 0.013), (5, 0.195), (6, -0.094), (7, -0.006), (8, 0.082), (9, 0.001), (10, 0.107), (11, -0.028), (12, 0.116), (13, -0.014), (14, 0.065), (15, 0.104), (16, -0.03), (17, -0.026), (18, -0.016), (19, 0.08), (20, -0.068), (21, 0.107), (22, -0.092), (23, -0.126), (24, -0.033), (25, 0.071), (26, -0.161), (27, 0.136), (28, 0.136), (29, 0.049), (30, -0.029), (31, 0.081), (32, -0.001), (33, -0.093), (34, -0.096), (35, 0.007), (36, 0.099), (37, -0.001), (38, -0.122), (39, -0.008), (40, 0.052), (41, -0.072), (42, 0.014), (43, -0.081), (44, -0.064), (45, -0.102), (46, -0.056), (47, 0.11), (48, 0.041), (49, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93258554 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.

2 0.72891784 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

Author: Wiltrud Kessler ; Jonas Kuhn

Abstract: This short paper presents a pilot study investigating the training of a standard Semantic Role Labeling (SRL) system on product reviews for the new task of detecting comparisons. An (opinionated) comparison consists of a comparative “predicate” and up to three “arguments”: the entity evaluated positively, the entity evaluated negatively, and the aspect under which the comparison is made. In user-generated product reviews, the “predicate” and “arguments” are expressed in highly heterogeneous ways; but since the elements are textually annotated in existing datasets, SRL is technically applicable. We address the interesting question how well training an outof-the-box SRL model works for English data. We observe that even without any feature engineering or other major adaptions to our task, the system outperforms a reasonable heuristic baseline in all steps (predicate identification, argument identification and argument classification) and in three different datasets.

3 0.71187454 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Mike Lewis ; Mark Steedman

Abstract: Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

4 0.6456849 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

Author: Joshua K. Hartshorne ; Claire Bonial ; Martha Palmer

Abstract: This research describes efforts to use crowdsourcing to improve the validity of the semantic predicates in VerbNet, a lexicon of about 6300 English verbs. The current semantic predicates can be thought of semantic primitives, into which the concepts denoted by a verb can be decomposed. For example, the verb spray (of the Spray class), involves the predicates MOTION, NOT, and LOCATION, where the event can be decomposed into an AGENT causing a THEME that was originally not in a particular location to now be in that location. Although VerbNet’s predicates are theoretically well-motivated, systematic empirical data is scarce. This paper describes a recently-launched attempt to address this issue with a series of human judgment tasks, posed to subjects in the form of games.

5 0.50103408 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

Author: Qi Zhang ; Jin Qian ; Huan Chen ; Jihua Kang ; Xuanjing Huang

Abstract: Explanatory sentences are employed to clarify reasons, details, facts, and so on. High quality online product reviews usually include not only positive or negative opinions, but also a variety of explanations of why these opinions were given. These explanations can help readers get easily comprehensible information of the discussed products and aspects. Moreover, explanatory relations can also benefit sentiment analysis applications. In this work, we focus on the task of identifying subjective text segments and extracting their corresponding explanations from product reviews in discourse level. We propose a novel joint extraction method using firstorder logic to model rich linguistic features and long distance constraints. Experimental results demonstrate the effectiveness of the proposed method.

6 0.4735947 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri

7 0.4656947 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

8 0.43897259 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

9 0.42815498 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

10 0.42093313 33 emnlp-2013-Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese

11 0.38270301 152 emnlp-2013-Predicting the Presence of Discourse Connectives

12 0.37071061 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

13 0.36540288 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI

14 0.35502166 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts

15 0.35217461 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

16 0.34966841 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

17 0.34373462 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

18 0.3359651 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

19 0.33122447 24 emnlp-2013-Application of Localized Similarity for Web Documents

20 0.31290495 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.469), (18, 0.024), (22, 0.038), (30, 0.041), (45, 0.01), (50, 0.013), (51, 0.164), (66, 0.034), (71, 0.035), (75, 0.045), (96, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94369453 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication

Author: Byron C. Wallace ; Thomas A Trikalinos ; M. Barton Laws ; Ira B. Wilson ; Eugene Charniak

Abstract: We develop a novel generative model of conversation that jointly captures both the topical content and the speech act type associated with each utterance. Our model expresses both token emission and state transition probabilities as log-linear functions of separate components corresponding to topics and speech acts (and their interactions). We apply this model to a dataset comprising annotated patient-physician visits and show that the proposed joint approach outperforms a baseline univariate model.

2 0.86579925 5 emnlp-2013-A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task

Author: Maria Liakata ; Simon Dobnik ; Shyamasree Saha ; Colin Batchelor ; Dietrich Rebholz-Schuhmann

Abstract: We present a method which exploits automatically generated scientific discourse annotations to create a content model for the summarisation of scientific articles. Full papers are first automatically annotated using the CoreSC scheme, which captures 11 contentbased concepts such as Hypothesis, Result, Conclusion etc at the sentence level. A content model which follows the sequence of CoreSC categories observed in abstracts is used to provide the skeleton of the summary, making a distinction between dependent and independent categories. Summary creation is also guided by the distribution of CoreSC categories found in the full articles, in order to adequately represent the article content. Fi- nally, we demonstrate the usefulness of the summaries by evaluating them in a complex question answering task. Results are very encouraging as summaries of papers from automatically obtained CoreSCs enable experts to answer 66% of complex content-related questions designed on the basis of paper abstracts. The questions were answered with a precision of 75%, where the upper bound for human summaries (abstracts) was 95%.

3 0.84845859 61 emnlp-2013-Detecting Promotional Content in Wikipedia

Author: Shruti Bhosale ; Heath Vinicombe ; Raymond Mooney

Abstract: This paper presents an approach for detecting promotional content in Wikipedia. By incorporating stylometric features, including features based on n-gram and PCFG language models, we demonstrate improved accuracy at identifying promotional articles, compared to using only lexical information and metafeatures.

same-paper 4 0.82447189 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.

5 0.51632261 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

Author: Tengfei Ma ; Hiroshi Nakagawa

Abstract: Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination.

6 0.51131588 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

7 0.50538105 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

8 0.50207263 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions

9 0.49459738 174 emnlp-2013-Single-Document Summarization as a Tree Knapsack Problem

10 0.48881021 72 emnlp-2013-Elephant: Sequence Labeling for Word and Sentence Segmentation

11 0.48809445 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

12 0.47918406 204 emnlp-2013-Word Level Language Identification in Online Multilingual Communication

13 0.47524738 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction

14 0.47109112 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation

15 0.47009021 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

16 0.46646938 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

17 0.46593347 199 emnlp-2013-Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students

18 0.4650977 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

19 0.46482587 65 emnlp-2013-Document Summarization via Guided Sentence Compression

20 0.46472037 152 emnlp-2013-Predicting the Presence of Discourse Connectives