emnlp emnlp2010 emnlp2010-72 knowledge-graph by maker-knowledge-mining

72 emnlp-2010-Learning First-Order Horn Clauses from Web Text


Source: pdf

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. [sent-7, score-0.342]

2 Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0. [sent-11, score-0.808]

3 8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text. [sent-12, score-0.349]

4 be Thus, while HOLMES’s inference run time is highly scalable, it requires substantial labor and expertise to hand-craft the appropriate set of Horn clauses for each new domain. [sent-22, score-0.173]

5 We refer to the set of ground facts derived from Web text as opendomain theories. [sent-24, score-0.434]

6 Second, the ground facts in the theories are noisy, and incomplete. [sent-28, score-0.436]

7 Finally, the names used to denote both entities and relations are rife with both synonyms and polysymes making their referents ambiguous and resulting in a particularly noisy and ambiguous set of ground facts. [sent-30, score-0.209]

8 Table 1 shows some example rules that were learned by SHERLOCK. [sent-32, score-0.289]

9 For Web text, the scoring function yields more accurate rules than several functions from the ILP literature. [sent-39, score-0.302]

10 Inference using SHERLOCK’s learned rules identifies three times as many high quality facts (e. [sent-42, score-0.67]

11 , 2010) performs coupled semi-supervised learning to extract a large knowl- edge base of instances, relations, and inference rules, bootstrapping from a few seed examples of each class and relation of interest and a few constraints among them. [sent-66, score-0.185]

12 Two other notable systems that learn inference rules from text are DIRT (Lin and Pantel, 2001) and RESOLVER (Yates and Etzioni, 2007). [sent-70, score-0.342]

13 However, both DIRT and RESOLVER learn only a limited set of rules capturing synonyms, paraphrases, and simple entailments, not more expressive multipart Horn clauses. [sent-71, score-0.246]

14 Applications of these rules often depend on context (e. [sent-73, score-0.246]

15 The selectional preferences act as type restrictions inference rules offline and provides them to the HOLMES inference engine, which uses the rules to answer queries. [sent-78, score-0.748]

16 While these approaches are useful, they are strictly more limited than the rules learned by SHERLOCK. [sent-80, score-0.289]

17 Approaches to RTE include those of Tatu and Moldovan (2007), which generates inference rules from WordNet lexical chains and a set of axiom templates, and Pennacchiotti and Zanzotto (2007), which learns inference rules based on similarity across entailment pairs. [sent-83, score-0.735]

18 SHERLOCK operates over simpler Web extractions, but is not given guidance about which facts may interact. [sent-85, score-0.349]

19 , HOLMES) use the rules to answer questions, infer additional facts, etc. [sent-89, score-0.32]

20 Learn inference rules using the discovered relations and determine the confidence in each rule 1090 The first two steps help deal with the synonyms, homonyms, and noise present in open-domain theories by identifying a smaller, cleaner, and more cohesive set of facts to learn rules over. [sent-94, score-1.189]

21 SHERLOCK learns inference rules from a collection of open-domain extractions produced by TEXTRUNNER (Banko et al. [sent-95, score-0.418]

22 The rules learned by SHERLOCK are input to an inference engine and used to find answers to a user’s query. [sent-97, score-0.409]

23 In this paper, SHERLOCK utilizes HOLMES as its inference engine when answering queries, and uses extracted facts of the form R(arg1, arg2) provided by the authors of TEXTRUNNER, but the techniques presented are more broadly applicable. [sent-98, score-0.499]

24 However, the goal of this work is to learn rules on top of the discovered typed relations. [sent-137, score-0.296]

25 For every pair of classes (C1, C2), we find a set of typed, candidate relations from the 100 most frequent relations in the corpus where the first argument is an instance of C1 and the second argument is an instance of C2. [sent-139, score-0.174]

26 3 Learning Inference Rules SHERLOCK attempts to learn inference rules for each typed relation in turn. [sent-153, score-0.431]

27 SHERLOCK generates all first-order, definite clauses up to length k, where R appears as the head of the clause. [sent-155, score-0.179]

28 4 Evaluating Rules by Statistical Relevance The problem of evaluating candidate rules has been studied by many researchers, but typically in either a supervised or propositional context whereas we are learning first-order Horn-clauses from a noisy set of positive examples. [sent-163, score-0.332]

29 Moreover, due to the incomplete nature of the input corpus and the imperfect yield of extraction—many true facts are not stated explicitly in the set of ground assertions used by the learner to evaluate rules. [sent-164, score-0.419]

30 Thus to evaluate rules over extractions, we need to consider relative probability estimates. [sent-169, score-0.246]

31 , vm) is a conjunction offunctionfree, non-negated, first-order relations, and vi ∈ V is the set of typed variables used in the rule), we say the body helps explain the head if: 1. [sent-210, score-0.27]

32 )) In practice it is difficult to determine the probabilities exactly, so when checking for statistical relevance we ensure that the probability of the rule is at least a factor t greater than the probability of any subsuming rule, that is, p(Head(. [sent-256, score-0.188]

33 In lieu of positive and negative examples, we use whether or not the inferred head value was observed, and compare against the distribution of a subsuming clause B0(. [sent-360, score-0.229]

34 This method of evaluating rules has two important differences from ILP under a closed world assumption. [sent-364, score-0.246]

35 Second, statistical relevance finds rules with large increases in relative probability, not necessarily a large absolute probability. [sent-366, score-0.334]

36 This is crucial in an open domain setting where most facts are false, which means the trivial rule that everything is false will have high accuracy. [sent-367, score-0.449]

37 , 1992), first finding facts using logical inference, then estimating the confidence of each using a Markov Logic Network (Richardson and Domingos, 2006). [sent-384, score-0.349]

38 Prior to running inference, it is necessary to assign a weight to each rule learned by SHERLOCK. [sent-385, score-0.177]

39 Since 1093 the rules and inferences are based on a set of noisy and incomplete extractions, the algorithms used for both weight learning and inference should capture the following characteristics of our problem: C1. [sent-386, score-0.542]

40 However, facts in E should have a confidence bounded by a threshold pmax < 1. [sent-390, score-0.349]

41 An inference that combines uncertain facts should be less likely than each fact it uses. [sent-393, score-0.445]

42 Next, we describe the needed modifications to the weight learning and inference algorithm to achieve the desired behavior. [sent-394, score-0.162]

43 Consequently, we compute ni (E), the number of true groundings of rule i, as follows: = ni(E) Xmkax Xj Y p(B(. [sent-400, score-0.184]

44 )Y∈Bodyijk (1) where E is the evidence, j ranges over heads of the rule, Bodyijk is the body of the kth grounding for jth head of rule i, and p(B(. [sent-406, score-0.324]

45 In practice, this also helps address C3 by giving longer rules smaller values of ni, which reflects that inferences arrived at through a combination of multiple, noisy facts should have lower confidence. [sent-415, score-0.736]

46 Longer rules are also more likely to have multiple groundings that infer a particular head, so keeping only the most likely grounding prevents a head from receiving undue weight from any single rule. [sent-416, score-0.517]

47 1We note that this approximation is equivalent to an MLN which uses only the two rules defined in 3. [sent-417, score-0.246]

48 Longer rules have a higher prior to capture the notion that they are more likely to make incorrect inferences. [sent-422, score-0.271]

49 Without a high prior, each rule would receive an unduly high weight as we have no negative examples. [sent-423, score-0.169]

50 2 Probabilistic Inference After learning the weights, we add the following two rules to our rule set: 1. [sent-426, score-0.346]

51 ),sentencei) with weight 1 The first rule models C1, by saying that most facts are false. [sent-437, score-0.483]

52 We do not include these rules during weight learning as doing so swamps the effects of the other inference rules (i. [sent-440, score-0.622]

53 4 Experiments One can attempt to evaluate a rule learner by estimating the quality of learned rules, or by measuring their impact on a system that uses the learned rules. [sent-448, score-0.186]

54 Does inference utilizing learned Horn rules improve the precision/recall of question answering and by how much? [sent-451, score-0.385]

55 Score all candidate rules according to the rule scoring metric M, accept all rules with a score at least tM (tuned on a small development set of rules), and learn weights for all accepted rules. [sent-464, score-0.693]

56 Find all facts inferred by the rules and use the rule weights to estimate the fact probabilities. [sent-466, score-0.745]

57 Place all results into bins based on their probabilities, and estimate the precision and the num- ber of correct facts in the bin using a random sample. [sent-477, score-0.416]

58 In these experiments we consider rules with up to k = 2 relations in the body. [sent-478, score-0.308]

59 SHERLOCK found 5 million candidate rules that infer at least two ofthe observed facts. [sent-480, score-0.289]

60 Unless otherwise noted, we use SHERLOCK’s rule scoring function to evaluate the rules (Section 3. [sent-481, score-0.402]

61 We observe between a dozen and 2,375 distinct, ground facts for each relation. [sent-484, score-0.394]

62 2 Learning all rules, rule 2The learned rules are available at: http://www. [sent-486, score-0.389]

63 Horn-clauses with multiple relations in the body infer 30% more correct facts than are identified by simpler entailment rules, inferring many facts not present in the corpus in any form. [sent-491, score-0.969]

64 However, we note that for halfofthe relations SHERLOCK accepts no inference rules, and remind the reader that the performance on any particular relation may be substantially differ- ent, and depends on the facts observed in the corpus and on the rules learned. [sent-493, score-0.792]

65 2 Benefits of Inference We first evaluate the utility of the learned Horn rules by contrasting the precision and number of correct and incorrect facts identified with and without inference over learned rules. [sent-495, score-0.869]

66 The first is a noinference baseline that uses no rules, returning only facts that are explicitly extracted. [sent-497, score-0.349]

67 The second baseline only accepts rules of length k = 1, allowing it to make simple entailments but not more complicated inferences using multiple facts. [sent-498, score-0.372]

68 Figure 2 compares the precision and estimated number of correct facts with and without inference. [sent-499, score-0.442]

69 As is apparent, the learned inference rules substantially increase the number of known facts, quadrupling the number of correct facts beyond what are explicitly extracted. [sent-500, score-0.762]

70 The Horn rules having a body-length of two identify 30% more facts than the simpler length-one rules. [sent-501, score-0.595]

71 Furthermore, we find the Horn rules yield 1095 slightly increased precision at comparable levels of recall, although the increase is not statistically significant. [sent-502, score-0.285]

72 , confusing Vancouver, British Columbia with Vancouver, Washington), one third are due to inferences based on incorrectly-extracted facts (e. [sent-506, score-0.444]

73 , inferences based on the incorrect fact IsLocatedIn(New York, Suffolk County), which was extracted from sentences like ‘Deer Park, New York is located in Suffolk County’), and the rest are due to unsound or incorrect inference rules (e. [sent-508, score-0.584]

74 Wmpitahnoyu,tC negative examples aitl lisO fd(ifCfiictuyl,t to distinguish correct rules from these unsound rules, since the unsound rules are correct more often than expected by chance. [sent-511, score-0.741]

75 Finally, we note that although simple, length-one rules capture many of the results, in some respects they are just rephrasing facts that are extracted in another form. [sent-512, score-0.625]

76 However, the more complex, lengthtwo rules synthesize facts extracted from multiple pages, and infer results that are not stated anywhere in the corpus. [sent-513, score-0.668]

77 3 Effect of Scoring Function We now examine how SHERLOCK’s rule scoring function affects its results, by comparing it with three rule scoring functions used in prior work: LIME. [sent-515, score-0.312]

78 Comparison of Rule Scoring Functions Estimated Number of Correct Facts Figure 3: SHERLOCK identifies rules that lead to more accurate inferences over a large set of open-domain extracted facts, deducing 2x as many facts at precision 0. [sent-520, score-0.791]

79 As proposed in (Huynh and Mooney, 2008), this learns weights for all candidate rules using L1-regularization (encouraging sparsity) instead of L2-regularization, and retains only those with non-zero weight. [sent-523, score-0.273]

80 Figure 3 compares the precision and estimated number of correct facts inferred by the rules of each scoring function. [sent-524, score-0.794]

81 SHERLOCK has consistently higher precision, and finds twice as many correct facts at precision 0. [sent-525, score-0.416]

82 M-Estimate accepted eight times as many rules as SHERLOCK, increasing the number of inferred facts at the cost of precision and longer inference times. [sent-527, score-0.825]

83 4 Scoring Function Design Decisions SHERLOCK requires a rule to have statistical relevance and statistical significance. [sent-530, score-0.188]

84 Figure 4 compares the precision and estimated number of correct facts obtained when requiring rules to be only statistically relevant, only statistically significant, or both. [sent-532, score-0.688]

85 Statistical significance finds twice as many correct facts as SHERLOCK, but the extra facts it discovers have precision < 0. [sent-536, score-0.8]

86 1096 Design Decisions of Sherlock’s Scoring Function Estimated Number of Correct Facts Figure 4: By requiring rules to have both statistical relevance and statistical significance, SHERLOCK rejects many error-prone rules that are accepted by the metrics individually. [sent-538, score-0.625]

87 The better rule set yields more accurate inferences, but identifies fewer correct facts. [sent-539, score-0.16]

88 Comparing the rules accepted in each case, we found that statistical relevance and statistical significance each accepted about 180,000 rules, compared to about 31,000 for SHERLOCK. [sent-540, score-0.424]

89 The smaller set of rules accepted by SHERLOCK not only leads to higher precision inferences, but also speeds up inference time by a factor of seven. [sent-541, score-0.426]

90 The statistical significance metric handles sparse rules better, but is still overconfident in the case of many unsound rules. [sent-543, score-0.313]

91 The learned-rule weights only affect the probabilities of the inferred facts, not the inferred facts themselves, so to measure the influence of the weight learning algorithm we examine the recall at precision 0. [sent-548, score-0.522]

92 We build a test set by holding SHERLOCK’s inference rules constant and randomly sampling 700 inferred facts. [sent-550, score-0.392]

93 Variable Penalty - Do we use the same L2 penalty on the weights for all rules or a stronger L2 penalty for longer rules? [sent-552, score-0.292]

94 Most of the gains come from penalizing longer rules more, but using weighted grounding counts further improves recall by 0. [sent-560, score-0.281]

95 07, which corresponds to almost 100,000 additional facts at precision 0. [sent-561, score-0.388]

96 While this may not seem like much, the scale is such that it leads to almost 100,000 additional correct facts at precision 0. [sent-571, score-0.416]

97 5 Conclusion This paper addressed the problem of learning firstorder Horn clauses from the noisy and heterogeneous corpus of open-domain facts extracted from Web text. [sent-573, score-0.502]

98 Furthermore, the learned rules are valuable, because they infer a substantial number of facts which were not extracted from the corpus. [sent-575, score-0.711]

99 Third, it utilizes a novel rule-scoring function, which is tolerant of the noise, ambiguity, and missing data issues prevalent in facts extracted from Web text. [sent-579, score-0.379]

100 Directions for future work include inducing longer inference rules, investigating better methods for combining the rules, allowing deeper inferences across multiple rules, evaluating our system on other corpora and devising better techniques for handling word sense ambiguity. [sent-581, score-0.191]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sherlock', 0.666), ('facts', 0.349), ('rules', 0.246), ('horn', 0.213), ('ilp', 0.167), ('holmes', 0.16), ('head', 0.102), ('rule', 0.1), ('inference', 0.096), ('inferences', 0.095), ('relevance', 0.088), ('body', 0.087), ('dzeroski', 0.08), ('clauses', 0.077), ('extractions', 0.076), ('logic', 0.068), ('unsound', 0.067), ('relations', 0.062), ('web', 0.059), ('groundings', 0.057), ('scoring', 0.056), ('dirt', 0.053), ('lime', 0.053), ('salmon', 0.053), ('entailment', 0.051), ('typed', 0.05), ('classes', 0.05), ('inferred', 0.05), ('noise', 0.048), ('noisy', 0.046), ('huynh', 0.046), ('textrunner', 0.046), ('schoenmackers', 0.046), ('ground', 0.045), ('accepted', 0.045), ('lp', 0.043), ('learned', 0.043), ('infer', 0.043), ('clause', 0.042), ('theories', 0.042), ('inductive', 0.041), ('auc', 0.04), ('basedin', 0.04), ('bratko', 0.04), ('mccreath', 0.04), ('muggleton', 0.04), ('opendomain', 0.04), ('propositional', 0.04), ('storm', 0.04), ('vitamin', 0.04), ('hearst', 0.04), ('precision', 0.039), ('relation', 0.039), ('grounding', 0.035), ('discovers', 0.035), ('negative', 0.035), ('textual', 0.035), ('weight', 0.034), ('selectional', 0.033), ('identifies', 0.032), ('modifications', 0.032), ('vi', 0.031), ('answer', 0.031), ('entailments', 0.031), ('pantel', 0.03), ('extracted', 0.03), ('washington', 0.029), ('domingos', 0.028), ('correct', 0.028), ('ambiguous', 0.028), ('extraction', 0.028), ('city', 0.027), ('instances', 0.027), ('ni', 0.027), ('acquires', 0.027), ('atmospheric', 0.027), ('bodyijk', 0.027), ('classi', 0.027), ('craven', 0.027), ('domainindependent', 0.027), ('foods', 0.027), ('lavrac', 0.027), ('pressure', 0.027), ('quinlan', 0.027), ('retains', 0.027), ('shinyama', 0.027), ('storms', 0.027), ('suffolk', 0.027), ('wellman', 0.027), ('pmi', 0.027), ('estimated', 0.026), ('class', 0.026), ('incorrect', 0.025), ('richardson', 0.025), ('incomplete', 0.025), ('engine', 0.024), ('rte', 0.024), ('examples', 0.024), ('york', 0.023), ('penalty', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

2 0.12442375 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic outof-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline.

3 0.10914794 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

4 0.09004157 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

Author: Tahira Naseem ; Harr Chen ; Regina Barzilay ; Mark Johnson

Abstract: We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin.1

5 0.074913979 51 emnlp-2010-Function-Based Question Classification for General QA

Author: Fan Bu ; Xingwei Zhu ; Yu Hao ; Xiaoyan Zhu

Abstract: In contrast with the booming increase of internet data, state-of-art QA (question answering) systems, otherwise, concerned data from specific domains or resources such as search engine snippets, online forums and Wikipedia in a somewhat isolated way. Users may welcome a more general QA system for its capability to answer questions of various sources, integrated from existed specialized sub-QA engines. In this framework, question classification is the primary task. However, the current paradigms of question classification were focused on some specified type of questions, i.e. factoid questions, which are inappropriate for the general QA. In this paper, we propose a new question classification paradigm, which includes a question taxonomy suitable to the general QA and a question classifier based on MLN (Markov logic network), where rule-based methods and statistical methods are unified into a single framework in a fuzzy discriminative learning approach. Experiments show that our method outperforms traditional question classification approaches.

6 0.074831799 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

7 0.066617772 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

8 0.061672106 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

9 0.058867339 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

10 0.058132846 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

11 0.057216603 59 emnlp-2010-Identifying Functional Relations in Web Text

12 0.055784091 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

13 0.054422691 12 emnlp-2010-A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web

14 0.053354938 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

15 0.05206608 112 emnlp-2010-Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

16 0.048944596 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

17 0.046780724 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

18 0.046301313 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

19 0.04525036 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation

20 0.044891991 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.169), (1, 0.044), (2, 0.05), (3, 0.111), (4, 0.127), (5, -0.105), (6, -0.038), (7, 0.07), (8, -0.003), (9, -0.047), (10, -0.056), (11, -0.1), (12, -0.024), (13, -0.154), (14, -0.018), (15, 0.082), (16, -0.113), (17, -0.146), (18, -0.054), (19, 0.102), (20, 0.015), (21, -0.221), (22, 0.009), (23, 0.054), (24, -0.001), (25, 0.048), (26, 0.073), (27, 0.031), (28, 0.03), (29, 0.085), (30, 0.199), (31, 0.032), (32, 0.141), (33, -0.064), (34, 0.026), (35, -0.099), (36, -0.058), (37, 0.184), (38, 0.168), (39, 0.023), (40, -0.158), (41, -0.0), (42, 0.095), (43, 0.04), (44, 0.023), (45, -0.32), (46, 0.134), (47, 0.001), (48, -0.142), (49, 0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96006125 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

2 0.53574967 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic outof-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline.

3 0.42529133 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

4 0.40894666 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

Author: Tahira Naseem ; Harr Chen ; Regina Barzilay ; Mark Johnson

Abstract: We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin.1

5 0.33721647 59 emnlp-2010-Identifying Functional Relations in Web Text

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

6 0.31319207 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

7 0.28849846 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

8 0.27313548 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

9 0.26995218 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

10 0.24139018 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

11 0.23952645 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

12 0.23804063 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

13 0.23243497 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

14 0.2064105 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

15 0.2024492 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

16 0.20013967 112 emnlp-2010-Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

17 0.19946656 123 emnlp-2010-Word-Based Dialect Identification with Georeferenced Rules

18 0.18852371 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

19 0.18277436 51 emnlp-2010-Function-Based Question Classification for General QA

20 0.1778121 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.026), (10, 0.045), (12, 0.035), (29, 0.086), (30, 0.037), (52, 0.025), (56, 0.053), (62, 0.368), (66, 0.084), (72, 0.045), (76, 0.021), (77, 0.016), (82, 0.018), (88, 0.014), (89, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91985035 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

2 0.81790513 90 emnlp-2010-Positional Language Models for Clinical Information Retrieval

Author: Florian Boudin ; Jian-Yun Nie ; Martin Dawes

Abstract: The PECO framework is a knowledge representation for formulating clinical questions. Queries are decomposed into four aspects, which are Patient-Problem (P), Exposure (E), Comparison (C) and Outcome (O). However, no test collection is available to evaluate such framework in information retrieval. In this work, we first present the construction of a large test collection extracted from systematic literature reviews. We then describe an analysis of the distribution of PECO elements throughout the relevant documents and propose a language modeling approach that uses these distributions as a weighting strategy. In our experiments carried out on a collection of 1.5 million documents and 423 queries, our method was found to lead to an improvement of 28% in MAP and 50% in P@5, as com- pared to the state-of-the-art method.

same-paper 3 0.79951262 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

4 0.71566558 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

5 0.53406352 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

6 0.46610376 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

7 0.45886442 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

8 0.45716393 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

9 0.45366472 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

10 0.44819772 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

11 0.44518876 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

12 0.44115403 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

13 0.43195406 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

14 0.42077097 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

15 0.4184778 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

16 0.41665268 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

17 0.41398057 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

18 0.41032282 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

19 0.41030231 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

20 0.40945461 51 emnlp-2010-Function-Based Question Classification for General QA