acl acl2012 acl2012-78 knowledge-graph by maker-knowledge-mining

78 acl-2012-Efficient Search for Transformation-based Inference

Source: pdf

Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner

Abstract: This paper addresses the search problem in textual inference, where systems need to infer one piece of text from another. A prominent approach to this task is attempts to transform one text into the other through a sequence of inference-preserving transformations, a.k.a. a proof, while estimating the proof’s validity. This raises a search challenge of finding the best possible proof. We explore this challenge through a comprehensive investigation of prominent search algorithms and propose two novel algorithmic components specifically designed for textual inference: a gradient-style evaluation function, and a locallookahead node expansion method. Evaluations, using the open-source system, BIUTEE, show the contribution of these ideas to search efficiency and proof quality.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 i l Abstract This paper addresses the search problem in textual inference, where systems need to infer one piece of text from another. [sent-9, score-0.371]

2 This raises a search challenge of finding the best possible proof. [sent-14, score-0.338]

3 We explore this challenge through a comprehensive investigation of prominent search algorithms and propose two novel algorithmic components specifically designed for textual inference: a gradient-style evaluation function, and a locallookahead node expansion method. [sent-15, score-0.79]

4 Evaluations, using the open-source system, BIUTEE, show the contribution of these ideas to search efficiency and proof quality. [sent-16, score-0.609]

5 An appealing approach to such textual inferences is to explicitly transform T into H, using a sequence of transformations (Bar-Haim et al. [sent-23, score-0.584]

6 Examples of such possible transformations are lexical substitutions (e. [sent-25, score-0.429]

7 The rationale behind this approach is that each transformation step should preserve inference validity, such that each text generated along this process is indeed inferred from the preceding one. [sent-34, score-0.263]

8 This creates a very large search space, where systems have to find the “best” transformation sequence the one oflowest cost, or ofhighest probability. [sent-38, score-0.448]

9 To the best of our knowledge, this search challenge has not been investigated yet in a substan– Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-39, score-0.295]

10 The knowledge required for such transformations is often obtained from available knowledge resources and NLP tools. [sent-46, score-0.393]

11 tial manner: each ofthe above-cited works described the search method they used, but none of them tried alternative methods while evaluating search performance. [sent-47, score-0.564]

12 Furthermore, while experimenting with our own open-source inference system, BIUTEE1 , we observed that search efficiency is a major issue, often yielding practically unsatisfactory run-times. [sent-48, score-0.478]

13 This paper investigates the search problem in transformation-based textual inference, naturally falling within the framework of heuristic AI (Artificial Intelligence) search. [sent-49, score-0.439]

14 To facilitate such investigation, we formulate a generic search scheme which incorporates many search variants as special cases and enable a meaningful comparison between the algorithms. [sent-50, score-0.559]

15 Under this framework, we identify special characteristics of the textual inference search space, that lead to the development of two novel algorithmic components: a special lookahead method for node expansion, named local lookahead, and a gradient-based evaluation function. [sent-51, score-0.754]

16 Together, they yield a new search algorithm, which achieved substantially superior search performance in our evaluations. [sent-52, score-0.518]

17 Section 2 provides an overview of transformation-based inference systems, AI search algorithms, and search methods realized in prior inference systems. [sent-54, score-0.736]

18 Section 3 formulates the generic search scheme that we have investigated, which covers a broad range of known algorithms, and presents our own algorithmic contributions. [sent-55, score-0.374]

19 In Section 4 we evaluate them empirically, and show that they improve search efficiency as well as solution’s quality. [sent-57, score-0.335]

20 2 Background Applying sequences of transformations to recognize textual inference was suggested by several works. [sent-63, score-0.673]

21 We focus on methods that perform transformations over parse trees, and highlight the search challenge with which they are faced. [sent-67, score-0.688]

22 1 Transformation-based textual inference Several researchers suggested using various types of transformations in order to derive H from T. [sent-69, score-0.673]

23 Since the above mentioned transformations are limited in capturing certain interesting and prevalent semantic phenomena, an extended set of tree edit operations (e. [sent-72, score-0.484]

24 Their transformations are defined by knowledge resources that contain a large amount of entailment rules, or rewrite rules, which are pairs of parse-tree fragments that entail one another. [sent-80, score-0.499]

25 This limitation is dealt by our open-source integrated framework, BIUTEE (Stern and Dagan, 2011), which incorporates knowledge-based transformations (entailment rules) with a set of predefined tree-edits. [sent-85, score-0.428]

26 The semantic validity of transformation-based inference is usually modeled by defining a cost or a probability estimation for each transformation. [sent-87, score-0.315]

27 A global cost (or probability estimation) for a complete sequence of transformations is typically defined as the sum of the costs of the involved transformations. [sent-89, score-0.672]

28 Finding the lowest cost proof, as needed for determining inference validity, is the focus of our research. [sent-90, score-0.266]

29 Nevertheless, for the extended set of transformations it is unlikely that efficient exact algorithms for finding lowest-cost sequences are available (Heilman and Smith, 2010). [sent-92, score-0.516]

30 Each state in the search space is a parse-tree, where the initial state is the text parse-tree, the goal state is the hypothesis parse-tree, and we search for the shortest (in terms of costs) path of transformations from the initial state to the goal state. [sent-94, score-1.661]

31 2 Search Algorithms Search algorithms find a path from an initial state to a goal state by expanding and generating states in a search space. [sent-97, score-0.867]

32 The term generating a state refers to creating a data structure that represents it, while expanding a state means generating all its immediate derivations. [sent-98, score-0.317]

33 A best-first search algorithm iteratively removes the best state (according to f(s)) from OPEN, and inserts new states being generated by expanding this best state. [sent-103, score-0.597]

34 The evaluation function is usually a linear combination of the shortest path found from the start state to state s, denoted by g(s), and a heuristic function, denoted by h(s), which estimates the cost of reaching a goal state from s. [sent-104, score-0.861]

35 Many search algorithms can be viewed as special cases or variations of best-first search. [sent-105, score-0.339]

36 algorithm is a best-first search that uses an evaluation function f(s) = g(s) h(s). [sent-108, score-0.353]

37 Beam search (Furcy and Koenig, 2005; Zhou and Hansen, 2005) limits + the number of states stored in OPEN, while Greedy search limits OPEN to contain only the single best state generated in the current iteration. [sent-112, score-0.778]

38 The search algorithm has crucial impact on the quality of proof found by a textual inference system, as well as on its efficiency. [sent-113, score-0.793]

39 Next, we describe search strategies used in prior works for textual inference. [sent-114, score-0.371]

40 3 Search in prior inference models In spite of being a fundamental problem, prior solutions to the search challenge in textual inference were mostly ad-hoc. [sent-116, score-0.625]

41 Furthermore, there was no investigation of alternative search methods, and no evaluation of search efficiency and quality was reported. [sent-117, score-0.689]

42 For example, in (Harmeling, 2009) the order by which the transformations are performed is predetermined, and in addition many possible derivations are discarded, to prevent exponential explosion. [sent-118, score-0.393]

43 Handling the search problem in (Heilman and Smith, 2010) was by a variant of greedy search, driven by a similarity measure between the current parse-tree and the hypothesis, while ignoring the cost already paid. [sent-119, score-0.517]

44 In the earlier version of BIUTEE (Stern and Dagan, 2011)2, a ver- sion of beam search was incorporated, named hereafter BIUTEE-orig. [sent-121, score-0.306]

45 In this paper we consider several prominent search algorithms and evaluate their quality. [sent-124, score-0.383]

46 In addition to evaluating standard search algorithms we propose two novel components specifically designed for proof-based textualinference and evaluate their contribution. [sent-126, score-0.415]

47 3 Search for Textual Inference In this section we formalize our search problem and specify a unifying search scheme by which we test several search algorithms in a systematic manner. [sent-127, score-0.898]

48 We conclude by presenting our new search algorithm which combines these two ideas. [sent-129, score-0.298]

49 The proof might be valid, if all the transformations involved are valid, or invalid otherwise. [sent-149, score-0.667]

50 DPenoting by tT and tH the text parse tQree and the hypothesis parse tree, a proof system Qhas to find a sequence O with minimal cost such that tT ‘O tH. [sent-157, score-0.527]

51 This forms a search problem of finding the ‘lowest-cost proof among all possible proofs. [sent-158, score-0.576]

52 2 o(m)} s(j), Search Scheme Our empirical investigation compares a range prominent search algorithms, described in Section 2. [sent-177, score-0.352]

53 In classical search algorithms, expand(s) means generating a set of states by applying all the possible state transition operators to s. [sent-184, score-0.569]

54 Table 2 specifies how known search algorithms, described in Section 2, fit into the unified search scheme. [sent-188, score-0.518]

55 To improve the quality of the proof found by greedy search, we introduce new algorithmic components for the expansion and evaluation functions, as described in the next two subsections, while maintaining efficiency by keeping kmaintain=kexpand= 1 3. [sent-190, score-0.621]

56 3 Evaluation function In most domains, the heuristic function h(s) estimates the cost of the minimal-cost path from a current state, s, to a goal state. [sent-191, score-0.425]

57 Having such a function, the value g(s) + h(s) estimates the expected total cost of a search path containing s. [sent-192, score-0.463]

58 However, this does not give an 287 estimation for the expected cost of the path (the sequence of transformations) from s to the goal state. [sent-196, score-0.282]

59 This is because the number of nodes and edges that can be changed by a single transformation can vary from a single node to several nodes (e. [sent-197, score-0.294]

60 Moreover, even if two transformations change the same number of nodes and edges, their costs might be significantly different. [sent-200, score-0.55]

61 Nevertheless, this heuristic function, serving as h(s), is still unrelated to the transformation costs (g(s)). [sent-203, score-0.309]

62 Our function is designed for a greedy search in which OPEN always contains a single state, s. [sent-205, score-0.415]

63 Let sj be a state generated from s, the cost of deriving sj from s is ∆g(sj) ≡ g(sj) − g(s). [sent-206, score-0.498]

64 4 Node expansion method When examining the proofs produced by the above mentioned algorithms, we observed that in many cases a human could construct proofs that exhibit some internal structure, but were not revealed by the algorithms. [sent-216, score-0.264]

65 It can be seen that transformations 2,3 and 4 strongly depend on each other. [sent-218, score-0.393]

66 Applying transformation 3 requires first applying transformation 2, and similarly 4 could not be applied unless 2 and 3 are first applied. [sent-219, score-0.358]

67 Moreover, there is no gain in applying transformations 2 and 3, unless transformation 4 is applied as well. [sent-220, score-0.597]

68 It may be performed at any point along the proof, and moreover, changing all other transformations would not affect it. [sent-222, score-0.393]

69 Often, a sequence of transformations can be decomposed into a set of coherent subsequences of transformations, where in each subsequence the transformations strongly depend on each other, while different subsequences are independent. [sent-224, score-1.325]

70 One technique for finding such subsequences is to perform, for each state being expanded, a bruteforce depth-limited search, also known as lookahead (Russell and Norvig, 2010; Bulitko and Lustrek, 2006; Korf, 1990; Stern et al. [sent-228, score-0.543]

71 Fortunately, in our domain, coherent subsequences have the following characteristic which can be leveraged: typically, a transformation depends on a previous one only if it is performed over some nodes which were affected by the previous transformation. [sent-231, score-0.531]

72 Accordingly, our proposed algorithm searches for coherent subsequences, in which each subsequent transformation must be applied to nodes that were affected by the previous transformation. [sent-232, score-0.369]

73 Next, for a transformation o, applied on a parse tree t, we define σrequired(t, o) as the subset of t’s nodes required for applying o (i. [sent-235, score-0.323]

74 enabled ops(t, σ) is a function that returns the set of the transformations that can be applied on t, which require at least one of the nodes in σ. [sent-239, score-0.571]

75 In our algorithm, σ is the set of nodes that were affected by the preceding transformation of the constructed subsequence. [sent-241, score-0.286]

76 5 Local-lookahead gradient search We are now ready to define our new algorithm LOCAL-LOOKAHEAD GRADIENT SEARCH (LLGS). [sent-251, score-0.298]

77 expand(s) is defined to return all states generated by subsequences found by the local-lookahead procedure, while the evaluation function is defined as f = f∆ (see last row of Table 2). [sent-253, score-0.377]

78 4 Evaluation In this section we first evaluate the search performance in terms of efficiency (run time), the quality of the found proofs (as measured by proof cost), and overall inference performance achieved through various search algorithms. [sent-254, score-1.079]

79 The latter is a strong baseline, which outperforms known search algorithms in generating low cost proofs. [sent-269, score-0.496]

80 For beam search we used k = 150, since higher val3http : / /lucene . [sent-274, score-0.306]

81 edu /ARKre f / See (Haghighi and Klein, 2009) 289 ues of k did not improve the proof cost on the training dataset. [sent-280, score-0.431]

82 d = 4 yielded the same proof costs, but was about 3 times slower. [sent-282, score-0.274]

83 The idea is to guide the search for lower cost solutions. [sent-286, score-0.416]

84 2 Search performance This experiment evaluates the search algorithms in both efficiency (run-time) and proof quality. [sent-289, score-0.689]

85 5 GHz) run-time (in seconds) for finding a complete proof for a text-hypothesis instance, and by the average number of generated states along the search. [sent-291, score-0.438]

86 Thus, in the training phase we used the original search of BIUTEE, and then ran the test phase with each algorithm separately. [sent-294, score-0.298]

87 The results, presented in Table 3, show that our novel algorithm, LLGS, outperforms all other algorithms in finding lower cost proofs. [sent-295, score-0.32]

88 While inherently fast algorithms, particularly greedy and pure heuristic, achieve faster running times, they achieve lower proof quality, as well as lower overall inference performance (see next subsection). [sent-297, score-0.52]

89 3 Overall inference performance In this experiment we test whether, and how much, finding better proofs, by a better search algorithm, improves overall success rate of the RTE system. [sent-299, score-0.411]

90 Another alternative is to use the standard lookahead, in which a brute-force depth-limited 290 search is performed in each iteration, termed here “exhaustive lookahead”. [sent-327, score-0.305]

91 The results, presented in Table 6, show that by avoiding any type of lookahead one can achieve fast runtime, while compromising proof quality. [sent-328, score-0.434]

92 On the other hand, both exhaustive and local lookahead yield better proofs and accuracy, while local lookahead is more than 4 times faster than exhaustive lookahead. [sent-329, score-0.496]

93 5 Conclusion In this paper we investigated the efficiency and proof quality obtained by various search algorithms. [sent-337, score-0.609]

94 Consequently, we observed special phenomena of the search space in textual inference and proposed two novel components yielding a new search algorithm, targeted for our domain. [sent-338, score-0.849]

95 We have shown empirically that (1) this algorithm improves run time by factors of 3-8 relative to BIUTEE-orig, and by similar factors relative to standard AI-search algorithms that achieve similar proof quality; and (2) outperforms all other algorithms in finding low cost proofs. [sent-339, score-0.673]

96 , Monte-Carlo style approaches (Kocsis and Szepesv a´ri, 2006), which do not fall under the AI search scheme covered in this paper. [sent-342, score-0.3]

97 In addition, while our novel components were motivated by the search space of textual inference, we foresee their potential utility in other application areas for search, such as automated planning and scheduling. [sent-343, score-0.484]

98 Addressing discourse and document structure in the rte search task. [sent-446, score-0.323]

99 Heuristic search viewed as path finding in a graph. [sent-450, score-0.349]

100 Simultaneously searching with multiple settings: An alternative to parameter tuning for suboptimal single-agent search algorithms. [sent-471, score-0.305]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('transformations', 0.393), ('proof', 0.274), ('search', 0.259), ('subsequences', 0.201), ('llgs', 0.187), ('biutee', 0.181), ('stern', 0.166), ('lookahead', 0.16), ('cost', 0.157), ('transformation', 0.154), ('state', 0.139), ('dagan', 0.139), ('states', 0.121), ('textual', 0.112), ('inference', 0.109), ('heilman', 0.108), ('entailment', 0.106), ('kexpand', 0.104), ('proofs', 0.102), ('greedy', 0.101), ('sj', 0.101), ('costs', 0.087), ('felner', 0.083), ('kmaintain', 0.083), ('ops', 0.083), ('ido', 0.081), ('algorithms', 0.08), ('efficiency', 0.076), ('algorithmic', 0.074), ('harmeling', 0.073), ('expanded', 0.071), ('nodes', 0.07), ('heuristic', 0.068), ('open', 0.066), ('rte', 0.064), ('hregular', 0.062), ('affected', 0.062), ('hypothesis', 0.061), ('expansion', 0.06), ('suggested', 0.059), ('subsequence', 0.058), ('function', 0.055), ('asher', 0.054), ('enabled', 0.053), ('tt', 0.052), ('expand', 0.051), ('runtime', 0.051), ('ariel', 0.05), ('roni', 0.05), ('applying', 0.05), ('investigation', 0.049), ('validity', 0.049), ('tree', 0.049), ('beam', 0.047), ('path', 0.047), ('dirt', 0.046), ('idan', 0.046), ('alternative', 0.046), ('ai', 0.045), ('transform', 0.044), ('coherent', 0.044), ('branching', 0.044), ('szpektor', 0.044), ('prominent', 0.044), ('goal', 0.043), ('finding', 0.043), ('smith', 0.042), ('edit', 0.042), ('th', 0.042), ('mehdad', 0.042), ('botea', 0.042), ('bulitko', 0.042), ('dovetailing', 0.042), ('furcy', 0.042), ('hregulark', 0.042), ('kocsis', 0.042), ('korf', 0.042), ('kotlerman', 0.042), ('sinit', 0.042), ('szepesv', 0.042), ('valenzano', 0.042), ('scheme', 0.041), ('coreference', 0.04), ('novel', 0.04), ('expanding', 0.039), ('algorithm', 0.039), ('denoted', 0.037), ('planning', 0.037), ('exhaustive', 0.037), ('hart', 0.036), ('mirkin', 0.036), ('challenge', 0.036), ('components', 0.036), ('pure', 0.036), ('substitutions', 0.036), ('predefined', 0.035), ('sequence', 0.035), ('yielding', 0.034), ('braz', 0.033), ('recognising', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 78 acl-2012-Efficient Search for Transformation-based Inference

Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner

2 0.53826064 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

Author: Asher Stern ; Ido Dagan

Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.

3 0.13749561 65 acl-2012-Crowdsourcing Inference-Rule Evaluation

Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan

Abstract: The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. In this paper, we suggest a framework for evaluating inference-rule resources. Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators.

4 0.12382383 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger

Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.

5 0.10854807 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar

Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.

6 0.10147615 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

7 0.099260218 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

8 0.098909408 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

9 0.087825567 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain

10 0.083603427 137 acl-2012-Lemmatisation as a Tagging Task

11 0.082103014 184 acl-2012-String Re-writing Kernel

12 0.0776527 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

13 0.076131187 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

14 0.073438518 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

15 0.071463019 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

16 0.066459753 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

17 0.062833384 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

18 0.061413225 53 acl-2012-Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions

19 0.060955498 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

20 0.058246057 121 acl-2012-Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.205), (1, 0.039), (2, -0.082), (3, 0.033), (4, -0.031), (5, 0.159), (6, -0.002), (7, 0.314), (8, 0.062), (9, 0.037), (10, -0.232), (11, 0.264), (12, 0.005), (13, -0.27), (14, 0.175), (15, -0.097), (16, 0.07), (17, 0.066), (18, 0.066), (19, -0.06), (20, -0.095), (21, -0.064), (22, 0.024), (23, 0.017), (24, 0.019), (25, -0.061), (26, 0.006), (27, 0.082), (28, 0.013), (29, 0.08), (30, 0.07), (31, -0.019), (32, -0.235), (33, -0.093), (34, 0.105), (35, 0.201), (36, 0.057), (37, 0.027), (38, 0.022), (39, -0.051), (40, -0.158), (41, 0.045), (42, -0.164), (43, 0.03), (44, 0.003), (45, 0.103), (46, -0.107), (47, -0.058), (48, -0.067), (49, -0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97120309 78 acl-2012-Efficient Search for Transformation-based Inference

Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner

2 0.94694245 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

Author: Asher Stern ; Ido Dagan

3 0.4413155 65 acl-2012-Crowdsourcing Inference-Rule Evaluation

Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan

4 0.4243491 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar

5 0.41550559 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

Author: Rafal Rak ; BalaKrishna Kolluru ; Sophia Ananiadou

Abstract: Argo is a web-based NLP and text mining workbench with a convenient graphical user interface for designing and executing processing workflows of various complexity. The workbench is intended for specialists and nontechnical audiences alike, and provides the ever expanding library of analytics compliant with the Unstructured Information Management Architecture, a widely adopted interoperability framework. We explore the flexibility of this framework by demonstrating workflows involving three processing components capable of performing self-contained machine learning-based tagging. The three components are responsible for the three distinct tasks of 1) generating observations or features, 2) training a statistical model based on the generated features, and 3) tagging unlabelled data with the model. The learning and tagging components are based on an implementation of conditional random fields (CRF); whereas the feature generation component is an analytic capable of extending basic token information to a comprehensive set of features. Users define the features of their choice directly from Argo’s graphical interface, without resorting to programming (a commonly used approach to feature engineering). The experimental results performed on two tagging tasks, chunking and named entity recognition, showed that a tagger with a generic set of features built in Argo is capable of competing with taskspecific solutions. 121

6 0.35677713 137 acl-2012-Lemmatisation as a Tagging Task

7 0.33861175 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

8 0.32274982 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

9 0.31555891 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain

10 0.29241976 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

11 0.28487384 129 acl-2012-Learning High-Level Planning from Text

12 0.27927515 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

13 0.27745202 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords

14 0.27418584 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

15 0.26412359 184 acl-2012-String Re-writing Kernel

16 0.25882512 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis

17 0.25638878 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

18 0.23971884 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition

19 0.22627603 112 acl-2012-Humor as Circuits in Semantic Networks

20 0.22350924 121 acl-2012-Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.019), (26, 0.026), (28, 0.025), (30, 0.035), (37, 0.02), (39, 0.036), (49, 0.044), (59, 0.017), (74, 0.029), (82, 0.011), (84, 0.022), (85, 0.028), (90, 0.086), (92, 0.488), (94, 0.016), (99, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95214319 78 acl-2012-Efficient Search for Transformation-based Inference

Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner

2 0.92927635 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks

Author: Tsung-Ting Kuo ; San-Chuan Hung ; Wei-Shih Lin ; Nanyun Peng ; Shou-De Lin ; Wei-Fen Lin

Abstract: This paper brings a marriage of two seemly unrelated topics, natural language processing (NLP) and social network analysis (SNA). We propose a new task in SNA which is to predict the diffusion of a new topic, and design a learning-based framework to solve this problem. We exploit the latent semantic information among users, topics, and social connections as features for prediction. Our framework is evaluated on real data collected from public domain. The experiments show 16% AUC improvement over baseline methods. The source code and dataset are available at http://www.csie.ntu.edu.tw/~d97944007/dif fusion/ 1 Background The diffusion of information on social networks has been studied for decades. Generally, the proposed strategies can be categorized into two categories, model-driven and data-driven. The model-driven strategies, such as independent cascade model (Kempe et al., 2003), rely on certain manually crafted, usually intuitive, models to fit the diffusion data without using diffusion history. The data-driven strategies usually utilize learning-based approaches to predict the future propagation given historical records of prediction (Fei et al., 2011; Galuba et al., 2010; Petrovic et al., 2011). Data-driven strategies usually perform better than model-driven approaches because the past diffusion behavior is used during learning (Galuba et al., 2010). Recently, researchers started to exploit content information in data-driven diffusion models (Fei et al., 2011; Petrovic et al., 2011; Zhu et al., 2011). 344 However, most of the data-driven approaches assume that in order to train a model and predict the future diffusion of a topic, it is required to obtain historical records about how this topic has propagated in a social network (Petrovic et al., 2011; Zhu et al., 2011). We argue that such assumption does not always hold in the real-world scenario, and being able to forecast the propagation of novel or unseen topics is more valuable in practice. For example, a company would like to know which users are more likely to be the source of ‘viva voce’ of a newly released product for advertising purpose. A political party might want to estimate the potential degree of responses of a half-baked policy before deciding to bring it up to public. To achieve such goal, it is required to predict the future propagation behavior of a topic even before any actual diffusion happens on this topic (i.e., no historical propagation data of this topic are available). Lin et al. also propose an idea aiming at predicting the inference of implicit diffusions for novel topics (Lin et al., 2011). The main difference between their work and ours is that they focus on implicit diffusions, whose data are usually not available. Consequently, they need to rely on a model-driven approach instead of a datadriven approach. On the other hand, our work focuses on the prediction of explicit diffusion behaviors. Despite the fact that no diffusion data of novel topics is available, we can still design a data- driven approach taking advantage of some explicit diffusion data of known topics. Our experiments show that being able to utilize such information is critical for diffusion prediction. 2 The Novel-Topic Diffusion Model We start by assuming an existing social network G = (V, E), where V is the set of nodes (or user) v, and E is the set of link e. The set of topics is Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi3c 4s4–348, denoted as T. Among them, some are considered as novel topics (denoted as N), while the rest (R) are used as the training records. We are also given a set of diffusion records D = {d | d = (src, dest, t) }, where src is the source node (or diffusion source), dest is the destination node, and t is the topic of the diffusion that belongs to R but not N. We assume that diffusions cannot occur between nodes without direct social connection; any diffusion pair implies the existence of a link e = (src, dest) ∈ E. Finally, we assume there are sets of keywords or tags that relevant to each topic (including existing and novel topics). Note that the set of keywords for novel topics should be seen in that of existing topics. From these sets of keywords, we construct a topicword matrix TW = (P(wordj | topici))i,j of which the elements stand for the conditional probabilities that a word appears in the text of a certain topic. Similarly, we also construct a user-word matrix UW= (P(wordj | useri))i,j from these sets of keywords. Given the above information, the goal is to predict whether a given link is active (i.e., belongs to a diffusion link) for topics in N. 2.1 The Framework The main challenge of this problem lays in that the past diffusion behaviors of new topics are missing. To address this challenge, we propose a supervised diffusion discovery framework that exploits the latent semantic information among users, topics, and their explicit / implicit interactions. Intuitively, four kinds of information are useful for prediction: • Topic information: Intuitively, knowing the signatures of a topic (e.g., is it about politics?) is critical to the success of the prediction. • User information: The information of a user such as the personality (e.g., whether this user is aggressive or passive) is generally useful. • User-topic interaction: Understanding the users' preference on certain topics can improve the quality of prediction. • Global information: We include some global features (e.g., topology info) of social network. Below we will describe how these four kinds of information can be modeled in our framework. 2.2 Topic Information We extract hidden topic category information to model topic signature. In particular, we exploit the 345 Latent Dirichlet Allocation (LDA) method (Blei et al., 2003), which is a widely used topic modeling technique, to decompose the topic-word matrix TW into hidden topic categories: TW = TH * HW , where TH is a topic-hidden matrix, HW is hiddenword matrix, and h is the manually-chosen parameter to determine the size of hidden topic categories. TH indicates the distribution of each topic to hidden topic categories, and HW indicates the distribution of each lexical term to hidden topic categories. Note that TW and TH include both existing and novel topics. We utilize THt,*, the row vector of the topic-hidden matrix TH for a topic t, as a feature set. In brief, we apply LDA to extract the topic-hidden vector THt,* to model topic signature (TG) for both existing and novel topics. Topic information can be further exploited. To predict whether a novel topic will be propagated through a link, we can first enumerate the existing topics that have been propagated through this link. For each such topic, we can calculate its similarity with the new topic based on the hidden vectors generated above (e.g., using cosine similarity between feature vectors). Then, we sum up the similarity values as a new feature: topic similarity (TS). For example, a link has previously propagated two topics for a total of three times {ACL, KDD, ACL}, and we would like to know whether a new topic, EMNLP, will propagate through this link. We can use the topic-hidden vector to generate the similarity values between EMNLP and the other topics (e.g., {0.6, 0.4, 0.6}), and then sum them up (1.6) as the value of TS. 2.3 User Information Similar to topic information, we extract latent personal information to model user signature (the users are anonymized already). We apply LDA on the user-word matrix UW: UW = UM * MW , where UM is the user-hidden matrix, MW is the hidden-word matrix, and m is the manually-chosen size of hidden user categories. UM indicates the distribution of each user to the hidden user categories (e.g., age). We then use UMu,*, the row vector of UM for the user u, as a feature set. In brief, we apply LDA to extract the user-hidden vector UMu,* for both source and destination nodes of a link to model user signature (UG). 2.4 User-Topic Interaction Modeling user-topic interaction turns out to be non-trivial. It is not useful to exploit latent semantic analysis directly on the user-topic matrix UR = UQ * QR , where UR represents how many times each user is diffused for existing topic R (R ∈ T), because UR does not contain information of novel topics, and neither do UQ and QR. Given no propagation record about novel topics, we propose a method that allows us to still extract implicit user-topic information. First, we extract from the matrix TH (described in Section 2.2) a subset RH that contains only information about existing topics. Next we apply left division to derive another userhidden matrix UH: UH = (RH \ URT)T = ((RHT RH )-1 RHT URT)T Using left division, we generate the UH matrix using existing topic information. Finally, we exploit UHu,*, the row vector of the user-hidden matrix UH for the user u, as a feature set. Note that novel topics were included in the process of learning the hidden topic categories on RH; therefore the features learned here do implicitly utilize some latent information of novel topics, which is not the case for UM. Experiments confirm the superiority of our approach. Furthermore, our approach ensures that the hidden categories in topic-hidden and user-hidden matrices are identical. Intuitively, our method directly models the user’s preference to topics’ signature (e.g., how capable is this user to propagate topics in politics category?). In contrast, the UM mentioned in Section 2.3 represents the users’ signature (e.g., aggressiveness) and has nothing to do with their opinions on a topic. In short, we obtain the user-hidden probability vector UHu,* as a feature set, which models user preferences to latent categories (UPLC). 2.5 Global Features Given a candidate link, we can extract global social features such as in-degree (ID) and outdegree (OD). We tried other features such as PageRank values but found them not useful. Moreover, we extract the number of distinct topics (NDT) for a link as a feature. The intuition behind this is that the more distinct topics a user has diffused to another, the more likely the diffusion will happen for novel topics. 346 2.6 Complexity Analysis The complexity to produce each feature is as below: (1) Topic information: O(I * |T| * h * Bt) for LDA using Gibbs sampling, where Iis # of the iterations in sampling, |T| is # of topics, and Bt is the average # of tokens in a topic. (2) User information: O(I * |V| * m * Bu) , where |V| is # of users, and Bu is the average # of tokens for a user. (3) User-topic interaction: the time complexity is O(h3 + h2 * |T| + h * |T| * |V|). (4) Global features: O(|D|), where |D| is # of diffusions. 3 Experiments For evaluation, we try to use the diffusion records of old topics to predict whether a diffusion link exists between two nodes given a new topic. 3.1 Dataset and Evaluation Metric We first identify 100 most popular topic (e.g., earthquake) from the Plurk micro-blog site between 01/201 1 and 05/201 1. Plurk is a popular micro-blog service in Asia with more than 5 million users (Kuo et al., 2011). We manually separate the 100 topics into 7 groups. We use topic-wise 4-fold cross validation to evaluate our method, because there are only 100 available topics. For each group, we select 3/4 of the topics as training and 1/4 as validation. The positive diffusion records are generated based on the post-response behavior. That is, if a person x posts a message containing one of the selected topic t, and later there is a person y responding to this message, we consider a diffusion of t has occurred from x to y (i.e., (x, y, t) is a positive instance). Our dataset contains a total of 1,642,894 positive instances out of 100 distinct topics; the largest and smallest topic contains 303,424 and 2,166 diffusions, respectively. Also, the same amount of negative instances for each topic (totally 1,642,894) is sampled for binary classification (similar to the setup in KDD Cup 2011 Track 2). The negative links of a topic t are sampled randomly based on the absence of responses for that given topic. The underlying social network is created using the post-response behavior as well. We assume there is an acquaintance link between x and y if and only if x has responded to y (or vice versa) on at least one topic. Eventually we generated a social network of 163,034 nodes and 382,878 links. Furthermore, the sets of keywords for each topic are required to create the TW and UW matrices for latent topic analysis; we simply extract the content of posts and responses for each topic to create both matrices. We set the hidden category number h = m = 7, which is equal to the number of topic groups. We use area under ROC curve (AUC) to evaluate our proposed framework (Davis and Goadrich, 2006); we rank the testing instances based on their likelihood of being positive, and compare it with the ground truth to compute AUC. 3.2 Implementation and Baseline After trying many classifiers and obtaining similar results for all of them, we report only results from LIBLINEAR with c=0.0001 (Fan et al., 2008) due to space limitation. We remove stop-words, use SCWS (Hightman, 2012) for tokenization, and MALLET (McCallum, 2002) and GibbsLDA++ (Phan and Nguyen, 2007) for LDA. There are three baseline models we compare the result with. First, we simply use the total number of existing diffusions among all topics between two nodes as the single feature for prediction. Second, we exploit the independent cascading model (Kempe et al., 2003), and utilize the normalized total number of diffusions as the propagation probability of each link. Third, we try the heat diffusion model (Ma et al., 2008), set initial heat proportional to out-degree, and tune the diffusion time parameter until the best results are obtained. Note that we did not compare with any data-driven approaches, as we have not identified one that can predict diffusion of novel topics. 3.3 Results The result of each model is shown in Table 1. All except two features outperform the baseline. The best single feature is TS. Note that UPLC performs better than UG, which verifies our hypothesis that maintaining the same hidden features across different LDA models is better. We further conduct experiments to evaluate different combinations of features (Table 2), and found that the best one (TS + ID + NDT) results in about 16% improvement over the baseline, and outperforms the combination of all features. As stated in (Witten et al., 2011), 347 adding useless features may cause the performance of classifiers to deteriorate. Intuitively, TS captures both latent topic and historical diffusion information, while ID and NDT provide complementary social characteristics of users. 4 Conclusions The main contributions of this paper are as below: 1. We propose a novel task of predicting the diffusion of unseen topics, which has wide applications in real-world. 2. Compared to the traditional model-driven or content-independent data-driven works on diffusion analysis, our solution demonstrates how one can bring together ideas from two different but promising areas, NLP and SNA, to solve a challenging problem. 3. Promising experiment result (74% in AUC) not only demonstrates the usefulness of the proposed models, but also indicates that predicting diffusion of unseen topics without historical diffusion data is feasible. Acknowledgments This work was also supported by National Science Council, National Taiwan University and Intel Corporation under Grants NSC 100-291 1-I-002-001, and 101R7501. References David M. Blei, Andrew Y. Ng & Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res., 3.993-1022. Jesse Davis & Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, XiangRui Wang & Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res., 9.1871-74. Hongliang Fei, Ruoyi Jiang, Yuhao Yang, Bo Luo & Jun Huan. 2011. Content based social behavior prediction: a multi-task learning approach. Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, Scotland, UK. Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic & Wolfgang Kellerer. 2010. Outtweeting the twitterers - predicting information cascades in microblogs. Proceedings of the 3rd conference on Online social networks, Boston, MA. Hightman. 2012. Simple Chinese Words Segmentation (SCWS). David Kempe, Jon Kleinberg & Eva Tardos. 2003. Maximizing the spread of influence through a social network. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C. Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Shou-De Lin, Ting-Chun Peng & Chia-Chun Shih. 2011. Assessing the Quality of Diffusion Models Using Real-World Social Network Data. Conference on Technologies and Applications of Artificial Intelligence, 2011. C.X. Lin, Q.Z. Mei, Y.L. Jiang, J.W. Han & S.X. Qi. 2011. Inferring the Diffusion and Evolution of Topics in Social Communities. Proceedings of the IEEE International Conference on Data Mining, 2011. Hao Ma, Haixuan Yang, Michael R. Lyu & Irwin King. 2008. Mining social networks using heat diffusion processes for marketing candidates selection. Proceeding of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA. Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. Sasa Petrovic, Miles Osborne & Victor Lavrenko. 2011. RT to Win! Predicting Message Propagation in Twitter. International AAAI Conference on Weblogs and Social Media, 2011. 348 Xuan-Hieu Phan & Cam-Tu Nguyen. 2007. GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA). Ian H. Witten, Eibe Frank & Mark A. Hall. 2011. Data Mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann Publishers Inc. Jiang Zhu, Fei Xiong, Dongzhen Piao, Yun Liu & Ying Zhang. 2011. Statistically Modeling the Effectiveness of Disaster Information in Social Media. Proceedings of the 2011 IEEE Global Humanitarian Technology Conference.

3 0.9177509 154 acl-2012-Native Language Detection with Tree Substitution Grammars

Author: Benjamin Swanson ; Eugene Charniak

Abstract: We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Furthermore, we contrast these two induction algorithms and show that the Bayesian approach produces superior classification results with a smaller feature set.

4 0.90441579 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

Author: Rui Yan ; Mirella Lapata ; Xiaoming Li

Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.

5 0.89279979 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

6 0.71018451 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

7 0.67878908 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

8 0.67273736 31 acl-2012-Authorship Attribution with Author-aware Topic Models

9 0.66126573 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

10 0.64987749 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

11 0.6091311 98 acl-2012-Finding Bursty Topics from Microblogs

12 0.60301101 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

13 0.5746417 167 acl-2012-QuickView: NLP-based Tweet Search

14 0.57164985 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

15 0.56685948 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

16 0.56392479 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

17 0.56216961 171 acl-2012-SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations

18 0.55612856 79 acl-2012-Efficient Tree-Based Topic Modeling

19 0.54521888 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

20 0.54414207 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale