acl acl2010 acl2010-71 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
Reference: text
sentIndex sentText sentNum sentScore
1 sg } Abstract This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. [sent-4, score-1.755]
2 As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. [sent-5, score-1.836]
3 This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. [sent-6, score-1.094]
4 The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. [sent-7, score-2.299]
5 Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. [sent-8, score-1.654]
6 1 Introduction Parse tree and packed forest of parse trees are two widely used data structures to represent the syntactic structure information of sentences in natural language processing (NLP). [sent-9, score-1.08]
7 A forest (Tomita, 1987) compactly encodes an exponential number of parse trees. [sent-12, score-0.814]
8 In this paper, we study how to effectively explore structured features embedded in a forest using convolution kernel (Haussler, 1999). [sent-13, score-1.486]
9 For example, it is computationally infeasible to enumerate all subtree features (using subtree a fea- ture) for a parse tree into a linear feature vector. [sent-16, score-0.761]
10 Many learning algorithms, such as SVM (Vapnik, 1998), the Perceptron learning algorithm (Rosenblatt, 1962) and Voted Perceptron (Freund and Schapire, 1999), can work directly with kernels by replacing the dot product with a particular kernel function. [sent-19, score-0.633]
11 This nice property of kernel methods, that implicitly calculates the dot product in a high-dimensional space over the original representations of objects, has made kernel methods an effective solution to modeling structured objects in NLP. [sent-20, score-1.141]
12 In the context of parse tree, convolution tree kernel (Collins and Duffy, 2002) defines a feature space consisting of all subtree types of parse trees and counts the number of common subtrees as the syntactic similarity between two parse trees. [sent-21, score-1.877]
13 The tree kernel has shown much success in many NLP applications like parsing (Collins and Duffy, 2002), semantic role labeling (Moschitti, 2004; Zhang et al. [sent-22, score-0.793]
14 , 2006), question classification (Zhang and Lee, 2003) and machine translation (Zhang and Li, 2009), where the tree kernel is used to compute the similarity between two NLP application instances that are usually represented by parse trees. [sent-25, score-0.84]
15 However, in those studies, the tree kernel only covers the features derived from single 1875 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-26, score-0.699]
16 A parse tree and its 11 subtree features covered by convolution tree kernel best parse tree. [sent-29, score-1.701]
17 This may largely compromise the performance of tree kernel due to parsing errors and data sparseness. [sent-30, score-0.75]
18 To address the above issues, this paper constructs a forest-based convolution kernel to mine structured features directly from packed forest. [sent-31, score-1.009]
19 A packet forest compactly encodes exponential number of n-best parse trees, and thus containing much more rich structured features than a single parse tree. [sent-32, score-1.008]
20 This advantage enables the forest kernel not only to be more robust against parsing errors, but also to be able to learn more reliable feature values and help to solve the data sparseness issue that exists in the traditional tree kernel. [sent-33, score-1.372]
21 Experimental results on the benchmark data show that the forest kernel significantly outperforms the tree kernel. [sent-35, score-1.279]
22 Section 2 reviews the convolution tree kernel while section 3 discusses the proposed forest kernel in details. [sent-37, score-2.107]
23 2 Convolution Kernel over Parse Tree Convolution kernel was proposed as a concept of kernels for discrete structures by Haussler (1999) and related but independently conceived ideas on string kernels first presented in (Watkins, 1999). [sent-40, score-0.699]
24 The framework defines the kernel function between input objects as the convolution of “subkernels”, i. [sent-41, score-0.857]
25 The parse tree kernel (Collins and Duffy, 2002) is an instantiation of convolution kernel over syntactic parse trees. [sent-44, score-1.809]
26 Given a parse tree, its features defined by a tree kernel are all of its subtree types and the value of a given feature is the number of the occurrences of the subtree in the parse tree. [sent-45, score-1.397]
27 1 illustrates a parse tree with all of its 11 subtree features covered by the convolution tree kernel. [sent-47, score-1.043]
28 In the tree kernel, a parse tree T is represented by a vector of integer counts of each subtree type (i. [sent-48, score-0.761]
29 The tree kernel counts the number of common subtrees as the syntactic similarity between two parse trees. [sent-51, score-1.012]
30 To solve this computational issue, Collins and Duffy (2002) proposed the following tree kernel to calculate the dot product between the above high dimensional vectors implicitly. [sent-53, score-0.724]
31 As discussed in previous section, when convo- lution tree kernel is applied to NLP applications, its performance is vulnerable to the errors from the single parse tree and data sparseness. [sent-60, score-1.043]
32 In this paper, we present a convolution kernel over packed forest to address the above issues by exploring structured features embedded in a forest. [sent-61, score-1.614]
33 3 Convolution Kernel over Forest In this packed on the feature section, we first illustrate the concept of forest and then give a detailed discussion covered feature space, fractional count, value and the forest kernel function itself. [sent-62, score-2.073]
34 1 Packed forest of parse trees Informally, a packed parse forest, or (packed) forest in short, is a compact representation of all the derivations (i. [sent-64, score-1.619]
35 In parsing, a sentence corresponds to exponential number of parse trees with different tree probabilities, where a forest can compact all the parse trees by sharing their common subtrees in a bottom-up manner. [sent-70, score-1.323]
36 A non-terminal node in a forest is represented as a “label [start, end]”, where the “label” is its syntax category and “[start, end]” is the span of words it covers. [sent-82, score-0.657]
37 2) can be represented as a single forest by sharing their common subtrees (such as NP[3,4] and PP[5,7]) and merging common non-terminal nodes covering the same span (such as VP[2,7], where there are two hyper-edges attach to it). [sent-86, score-0.792]
38 2 Convolution forest kernel In this subsection, we first define the feature space covered by forest kernel, and then define the forest kernel function. [sent-219, score-2.838]
39 1 Feature space, object space and feature value The forest kernel counts the number of common subtrees as the syntactic similarity between two forests. [sent-222, score-1.299]
40 A forest encodes exponential number of parse trees, and thus containing exponential times more subtrees than a single parse tree. [sent-231, score-1.093]
41 This ensures forest kernel to learn more reliable feature values and is also able to help to address the data sparseness issues in a better way than tree kernel does. [sent-232, score-1.859]
42 Forest kernel is also expected to yield more non-zero feature values than tree kernel. [sent-233, score-0.729]
43 Furthermore, different parse tree in a forest represents different derivation and interpretation for a given sentence. [sent-234, score-0.903]
44 Therefore, forest kernel should be more robust to parsing errors than tree kernel. [sent-235, score-1.33]
45 So one subtree extracted from different parse trees should have different fractional count with regard to the probabilities of different parse trees. [sent-239, score-0.78]
46 Following the previous work (Charniak and Johnson, 2005; Huang, 2008), we define the fractional count of the occurrence of a subtree in a parse tree ? [sent-240, score-0.734]
47 Then we define the fractional count of the occurrence of a subtree in a forest f as ? [sent-317, score-0.991]
48 However, due to the property of forest that compactly represents all the parse trees, the posterior probability of a subtree in a forest, ? [sent-388, score-0.957]
49 In the next subsection, we present the forest kernel that implicitly calculates the dot-product between two ? [sent-522, score-1.097]
50 2 Convolution forest kernel The forest kernel counts of common subtrees as between two forests. [sent-527, score-2.366]
51 2 in the the fractional numbers the syntactic similarity define the forest kernel following way. [sent-532, score-1.271]
52 (3) defined by Inside-Outside probabilities is exactly to compute the sum of those parse tree probabilities that cover the subtree of being considered as defined at eq. [sent-705, score-0.592]
53 Indeed, Algorithm 1 can be viewed as a natural extension of convolution kernel from over tree to over forest. [sent-711, score-1.01]
54 (6) is very similar to the Rule 3 of tree kernel (see section 2) except its inputs are hyper-edges and its further expansion is based on forest nodes. [sent-723, score-1.279]
55 Similar to tree kernel (Collins and Duffy, 2002), eq. [sent-724, score-0.699]
56 2 calculated by Algorithm 1 is a proper convolution kernel since it simply counts the number of common subtrees under the root ? [sent-756, score-1.033]
57 Given a forest and the best parse trees, the number of hyperedges is only several times (normally <=3 after pruning) than that of tree nodes in the parse tree3. [sent-783, score-1.071]
58 2 Tree can be viewed as a special case of forest with only one hyper-edge attached to each tree node. [sent-784, score-0.79]
59 3 Suppose there are K forest nodes in a forest, each node has M associated hyper-edges fan out and each hyper-edge has ? [sent-785, score-0.655]
60 end for (6) (7) Same as tree kernel, forest kernel is running more efficiently in practice since only two nodes with the same label needs to be further processed (line 2 of Algorithm 1). [sent-874, score-1.306]
61 Now let us see how to integrate fractional counts into forest kernel. [sent-875, score-0.792]
62 ≤ 1) is also introduced in order to make the kernel value less variable with respect to the subtree sizes (Collins and Duffy, 2002). [sent-912, score-0.71]
63 are not roots of the subtrees of being explored (only outside probabilities of the root of a subtree should be counted in its fractional count), and Δ′ ? [sent-1011, score-0.623]
64 2 (10) Finally, since the size of input forests is not constant, the forest kernel value is normalized using the following equation. [sent-1084, score-1.133]
65 2 (11) From the above discussion, we can see that the proposed forest kernel is defined together by eqs. [sent-1101, score-1.097]
66 Thanks to the compact representation of trees in forest and the recursive nature of the kernel function, the introduction of fractional counts and normalization do not change the convolution property and the time complexity of the forest kernel. [sent-1103, score-2.249]
67 2 is still a proper convolution kernel with quadratic time complexity. [sent-1108, score-0.828]
68 3 Comparison with previous work To the best of our knowledge, this is the first work to address convolution kernel over packed parse forest. [sent-1110, score-1.097]
69 Convolution tree kernel is a special case of the proposed forest kernel. [sent-1111, score-1.279]
70 So the number of subtree instances extracted from a forest is exponential number of times greater than that from its corresponding parse tree. [sent-1114, score-0.961]
71 To some degree, forest kernel can be viewed as a tree kernel with very powerful back-off mechanism. [sent-1116, score-1.796]
72 In addition, forest kernel is much more robust against parsing errors than tree kernel. [sent-1117, score-1.33]
73 There are a few other previous works done by generalizing convolution tree kernels (Kashima and Koyanagi, 2003; Moschitti, 2006; Zhang et al. [sent-1121, score-0.584]
74 From a broad viewpoint, as suggested by one reviewer of the paper, we can consider the forest kernel as an alternative solution proposed for the general problem of noisy inference pipelines (eg. [sent-1124, score-1.097]
75 In this section, we verify the effectiveness of the forest kernel on two NLP applications, semantic role labeling (SRL) (Gildea, 2002) and relation extraction (RE) (ACE, 2002-2006). [sent-1132, score-1.186]
76 In our implementation, we use the binary SVMLight (Joachims, 1998) and borrow the framework of the Tree Kernel Tools (Moschitti, 2004) to integrate our forest kernel into the SVMLight. [sent-1135, score-1.097]
77 , the Viterbi-style inside-outside probabilities and set the pruning threshold as 8) for forest pruning. [sent-1139, score-0.642]
78 We use the CoNLL-2005 shared … … task on Semantic Role Labeling (Carreras and Ma rquez, 2005) for the evaluation of our forest kernel method. [sent-1144, score-1.097]
79 In our method, we break the limitation of 1-best parse tree and regard each span rooted by a single forest node (i. [sent-1152, score-1.006]
80 76 Table 1: Performance comparison of SRL (%) Table 1 shows that the forest kernel significantly outperforms (? [sent-1248, score-1.097]
81 01) the tree kernel with an absolute improvement of 2. [sent-1250, score-0.699]
82 This convincingly demonstrates the advantage of the forest kernel over the tree kernel. [sent-1257, score-1.279]
83 The performance improvement is mainly due to the fact that forest encodes much more such structured features and the forest kernel is able to more effectively capture such structured features than the tree kernel. [sent-1259, score-1.99]
84 We apply the same pruning strategies to forest plus our heuristic rules to prune out some of the arguments with span overlapped with each other and those arguments with very small inside probabilities, depending on the numbers of candidates in the span. [sent-1268, score-0.733]
85 Kpath and Kcs are two standard convolution tree kernels to describe predicate-argument path substructures and argument syntactic substructures, respectively. [sent-1269, score-0.609]
86 (2006) as our baseline method as it reports the state-of-the-art performance using tree kernel-based composite kernel method for RE. [sent-1275, score-0.699]
87 We replace their tree kernels with our forest kernels and use the same experimental settings as theirs. [sent-1276, score-0.944]
88 By simulating PT, we use the minimal fragment of a forest covering the two entities and their internal words to represent a relation instance by only parsing the span covering the two entities and their internal words. [sent-1283, score-0.706]
89 7 Table 2: Performance Comparison of RE (%) over 23 subtypes on the ACE 2004 data Table 2 compares the performance of the forest kernel and the tree kernel on relation extraction. [sent-1291, score-1.821]
90 We can see that the forest kernel significantly outperforms test with p=0. [sent-1292, score-1.097]
91 This further verifies the effectiveness of the forest kernel method for (? [sent-1295, score-1.097]
92 5 Conclusions and Future Work Many NLP applications have benefited from the success of convolution kernel over parse tree. [sent-1304, score-0.969]
93 Since a packed parse forest contains much richer structured features than a parse tree, we are motivated to develop a technology to measure the syntactic similarity between two forests. [sent-1305, score-1.043]
94 To achieve this goal, in this paper, we design a convolution kernel over packed forest by generalizing the tree kernel. [sent-1306, score-1.718]
95 We analyze the object space of the forest kernel, the fractional count for feature value computing and design a dynamic programming algorithm to realize the forest kernel with quadratic time complexity. [sent-1307, score-1.925]
96 Compared with the tree kernel, the forest kernel is more robust against parsing errors and data sparseness issues. [sent-1308, score-1.363]
97 Experimental results demonstrate the effectiveness of the proposed kernel in structured NLP data modeling and the advantages over tree kernel. [sent-1310, score-0.752]
98 In the future, we would like to verify the forest kernel in more NLP applications. [sent-1311, score-1.097]
99 However, the challenge is that we compute the fractional counts together with the forest kernel recursively by using the Inside-Outside probabilities. [sent-1315, score-1.309]
100 A hybrid convolution tree kernel for semantic role labeling. [sent-1356, score-1.049]
wordName wordTfidf (topN-words)
[('forest', 0.58), ('kernel', 0.517), ('convolution', 0.311), ('subtree', 0.193), ('tree', 0.182), ('fractional', 0.174), ('parse', 0.141), ('packed', 0.128), ('subtrees', 0.112), ('kernels', 0.091), ('zhang', 0.091), ('srl', 0.073), ('subtreetypei', 0.072), ('duffy', 0.069), ('structured', 0.053), ('trees', 0.049), ('node', 0.048), ('exponential', 0.047), ('outside', 0.046), ('che', 0.045), ('collins', 0.045), ('count', 0.044), ('alessandro', 0.042), ('chew', 0.041), ('min', 0.04), ('inside', 0.039), ('leaf', 0.039), ('counts', 0.038), ('probabilities', 0.038), ('forests', 0.036), ('lari', 0.036), ('aiolli', 0.036), ('mct', 0.036), ('subtreetypen', 0.036), ('unmatched', 0.036), ('covered', 0.034), ('cfg', 0.034), ('sparseness', 0.033), ('haizhou', 0.033), ('lim', 0.033), ('root', 0.033), ('moschitti', 0.033), ('vapnik', 0.03), ('charniak', 0.03), ('feature', 0.03), ('parsing', 0.03), ('objects', 0.029), ('span', 0.029), ('nlp', 0.029), ('attached', 0.028), ('ace', 0.028), ('roots', 0.027), ('spaces', 0.027), ('hui', 0.027), ('nodes', 0.027), ('rooted', 0.026), ('viewpoint', 0.026), ('dot', 0.025), ('aw', 0.025), ('argument', 0.025), ('labeling', 0.025), ('encodes', 0.025), ('integer', 0.025), ('relation', 0.025), ('embedded', 0.025), ('aiti', 0.025), ('gates', 0.024), ('isubtreei', 0.024), ('kashima', 0.024), ('martino', 0.024), ('sperduti', 0.024), ('wanxiang', 0.024), ('pruning', 0.024), ('constituents', 0.023), ('tan', 0.023), ('enumerating', 0.023), ('common', 0.022), ('probability', 0.022), ('sheng', 0.022), ('infeasible', 0.022), ('bank', 0.021), ('entities', 0.021), ('compactly', 0.021), ('iif', 0.021), ('overlapped', 0.021), ('tomita', 0.021), ('errors', 0.021), ('role', 0.02), ('arguments', 0.02), ('jth', 0.02), ('re', 0.02), ('pt', 0.02), ('ting', 0.019), ('fabio', 0.019), ('billot', 0.019), ('downstream', 0.019), ('haussler', 0.019), ('rquez', 0.019), ('semantic', 0.019), ('xue', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
2 0.33651033 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
Author: WenTing Wang ; Jian Su ; Chew Lim Tan
Abstract: Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly. These structural syntactic features, together with other normal flat features are incorporated into our composite kernel to capture diverse knowledge for simultaneous discourse identification and classification for both explicit and implicit relations. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further incorporate information from a higher dimension space for possible better discrimination. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well. University of Singapore Singapore 117417 sg tacl @ comp .nus .edu . sg 1
3 0.29767036 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
Author: Jun Sun ; Min Zhang ; Chew Lim Tan
Abstract: We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. We further apply the sub-tree alignment in machine translation with two methods. It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment. 1
4 0.27824113 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
5 0.21988545 243 acl-2010-Tree-Based and Forest-Based Translation
Author: Yang Liu ; Liang Huang
Abstract: unkown-abstract
6 0.18360817 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
7 0.18018249 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
8 0.17233315 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
9 0.13209155 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
10 0.12934195 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
11 0.11439568 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
12 0.10881376 133 acl-2010-Hierarchical Search for Word Alignment
13 0.10124066 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
14 0.093380488 169 acl-2010-Learning to Translate with Source and Target Syntax
15 0.08713717 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
16 0.085960805 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
17 0.083621882 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
18 0.073576108 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
19 0.073077783 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
20 0.072082952 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
topicId topicWeight
[(0, -0.2), (1, -0.129), (2, 0.148), (3, 0.038), (4, -0.143), (5, -0.012), (6, 0.088), (7, 0.037), (8, -0.265), (9, -0.041), (10, -0.049), (11, -0.23), (12, 0.086), (13, -0.005), (14, -0.005), (15, 0.159), (16, 0.056), (17, -0.021), (18, 0.128), (19, -0.068), (20, 0.213), (21, -0.083), (22, -0.194), (23, 0.026), (24, -0.074), (25, 0.196), (26, -0.003), (27, -0.007), (28, 0.012), (29, 0.123), (30, -0.069), (31, 0.011), (32, 0.159), (33, 0.095), (34, -0.031), (35, -0.12), (36, -0.013), (37, -0.07), (38, 0.0), (39, -0.088), (40, 0.174), (41, -0.015), (42, 0.041), (43, 0.06), (44, -0.088), (45, 0.08), (46, 0.029), (47, 0.109), (48, 0.129), (49, -0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.97636044 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
2 0.66294104 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
Author: Jun Sun ; Min Zhang ; Chew Lim Tan
Abstract: We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. We further apply the sub-tree alignment in machine translation with two methods. It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment. 1
3 0.66045374 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
Author: WenTing Wang ; Jian Su ; Chew Lim Tan
Abstract: Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly. These structural syntactic features, together with other normal flat features are incorporated into our composite kernel to capture diverse knowledge for simultaneous discourse identification and classification for both explicit and implicit relations. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further incorporate information from a higher dimension space for possible better discrimination. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well. University of Singapore Singapore 117417 sg tacl @ comp .nus .edu . sg 1
4 0.48435104 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: Tree-to-string translation rules are widely used in linguistically syntax-based statistical machine translation systems. In this paper, we propose to use deep syntactic information for obtaining fine-grained translation rules. A head-driven phrase structure grammar (HPSG) parser is used to obtain the deep syntactic information, which includes a fine-grained description of the syntactic property and a semantic representation of a sentence. We extract fine-grained rules from aligned HPSG tree/forest-string pairs and use them in our tree-to-string and string-to-tree systems. Extensive experiments on largescale bidirectional Japanese-English trans- lations testified the effectiveness of our approach.
5 0.48342669 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
6 0.48322028 243 acl-2010-Tree-Based and Forest-Based Translation
7 0.44649225 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
8 0.44008735 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
9 0.43818438 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
10 0.38612056 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
11 0.37832329 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
12 0.37817001 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
13 0.3629345 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
14 0.31406561 67 acl-2010-Computing Weakest Readings
15 0.29785639 169 acl-2010-Learning to Translate with Source and Target Syntax
16 0.2925719 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
17 0.28948188 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
18 0.27990201 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
19 0.277316 133 acl-2010-Hierarchical Search for Word Alignment
20 0.25868779 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
topicId topicWeight
[(7, 0.025), (12, 0.11), (14, 0.029), (16, 0.027), (25, 0.153), (33, 0.026), (39, 0.02), (42, 0.013), (44, 0.032), (59, 0.076), (73, 0.042), (78, 0.067), (83, 0.112), (84, 0.019), (98, 0.137)]
simIndex simValue paperId paperTitle
same-paper 1 0.91429067 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
2 0.88684821 6 acl-2010-A Game-Theoretic Model of Metaphorical Bargaining
Author: Beata Beigman Klebanov ; Eyal Beigman
Abstract: We present a game-theoretic model of bargaining over a metaphor in the context of political communication, find its equilibrium, and use it to rationalize observed linguistic behavior. We argue that game theory is well suited for modeling discourse as a dynamic resulting from a number of conflicting pressures, and suggest applications of interest to computational linguists.
3 0.88178301 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
Author: Timothy A. D. Fowler ; Gerald Penn
Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.
4 0.86472154 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
5 0.86327475 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder
Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.
6 0.85582137 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
7 0.84544313 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.84384108 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
9 0.83810771 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
10 0.8376109 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
11 0.83664238 28 acl-2010-An Entity-Level Approach to Information Extraction
12 0.83450097 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
13 0.83414114 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
14 0.83407825 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
15 0.83200181 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
16 0.83071566 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
17 0.8259654 248 acl-2010-Unsupervised Ontology Induction from Text
18 0.82516187 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
19 0.82317078 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
20 0.82231903 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification