acl acl2012 acl2012-146 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alessandro Moschitti ; Qi Ju ; Richard Johansson
Abstract: In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role ofcategory relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state-of-the-art.
Reference: text
sentIndex sentText sentNum sentScore
1 it , Abstract In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. [sent-3, score-0.236]
2 We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. [sent-4, score-0.765]
3 The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state-of-the-art. [sent-6, score-0.541]
4 1 Introduction Automated Text Categorization (TC) algorithms for hierarchical taxonomies are typically based on flat schemes, e. [sent-7, score-0.129]
5 We speculate that the failure of using hierarchical approaches is caused by the inherent complexity of modeling all possible topic dependencies rather than the uselessness of such relationships. [sent-12, score-0.221]
6 More precisely, although hierarchical multi-label classifiers can exploit machine learning algorithms for structural output, e. [sent-13, score-0.244]
7 Typically, the probability of a document d to belong to a subcategory Ci of a category C is assumed to depend only on d and C, but not on other subcategories of C, or any other categories in the hierarchy. [sent-20, score-0.149]
8 In this perspective, kernel methods are a viable approach to implicitly and easily explore feature spaces encoding dependencies. [sent-25, score-0.242]
9 In this paper, we propose to use the combination of reranking with kernel methods as a way to handle the computational and feature design issues. [sent-30, score-0.381]
10 We first use a basic hierarchical classifier to generate a hypothesis set of limited size, and then apply reranking models. [sent-31, score-0.373]
11 Since our rerankers are simple binary classifiers of hypothesis pairs, they can encode complex dependencies thanks to kernel methods. [sent-32, score-0.919]
12 In particular, we used tree, sequence and linear kernels applied to structural and feature-vector representations describing hierarchical dependencies. [sent-33, score-0.609]
13 Additionally, to better investigate the role of topical relationships, we consider two interesting cases: (i) traditional categorization schemes in which nodeProce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-34, score-0.128]
14 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi7c 5s9–767, fathers include all the documents of their childcategories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. [sent-36, score-0.149]
15 The intuition under the above setting is that shared documents between categories create semantic links between them. [sent-37, score-0.132]
16 Thus, if we remove common documents between father and children, we reduce the dependencies that can be captured with traditional bag-of-words representation. [sent-38, score-0.156]
17 We carried out experiments on two entire hierarchies TOPICS (103 nodes organized in 5 levels) and INDUSTRIAL (365 nodes organized in 6 levels) of the well-known Reuters Corpus Volume 1 (RCV1). [sent-39, score-0.212]
18 We first evaluate the accuracy as well as the efficiency of several reranking models. [sent-40, score-0.17]
19 The results show that all our rerankers consistently and significantly improve on the traditional approaches to TC up to 10 absolute percent points. [sent-41, score-0.401]
20 Very interestingly, the combination of structural kernels with the linear kernel applied to vectors of category probabilities further improves on reranking: such a vector provides a more effective information than the joint global probability of the reranking hypothesis. [sent-42, score-0.967]
21 In the rest of the paper, Section 2 describes the hypothesis generation algorithm, Section 3 illustrates our reranking approach based on tree kernels, Section 4 reports on our experiments, Section 5 illustrates the related work and finally Section 6 derives the conclusions. [sent-43, score-0.418]
22 2 Hierarchy classification hypotheses from binary decisions The idea of the paper is to build efficient models for hierarchical classification using global dependencies. [sent-44, score-0.328]
23 For this purpose, we use reranking models, which encode global information. [sent-45, score-0.212]
24 In the following sections, we describe a simple framework for hypothesis generation. [sent-49, score-0.127]
25 1 Top k hypothesis generation Given n categories, C1, . [sent-51, score-0.127]
26 MCAT M11 -M12 M13 -M131 M14 -M132 -M141 -M142 M143 Figure 2: A tree representing a category assignment hypothesis for the subhierarchy in Fig. [sent-56, score-0.38]
27 Typically, a large margin corresponds to high probability for d to be in the category whereas small margin indicates low probability1 . [sent-61, score-0.153]
28 If we assume independence between the SVM scores, the most probable hypothesis on d is ˜h = ha∈r{g0m,1a}nxiY=n1pihi(d) = ? [sent-67, score-0.157]
29 Given h˜, the Ysecond best hypothesis can be obtained by changing the label on the least probable classification, i. [sent-70, score-0.157]
30 3 Structural Kernels for Reranking Hierarchical Classification In this section we describe our hypothesis reranker. [sent-76, score-0.127]
31 The main idea is to represent the hypotheses as a tree structure, naturally derived from the hierarchy and then to use tree kernels to encode such a structural description in a learning algorithm. [sent-77, score-0.986]
32 For this purpose, we describe our hypothesis representation, kernel methods and the kernel-based approach to preference reranking. [sent-78, score-0.396]
33 1 Encoding hypotheses in a tree Once hypotheses are generated, we need a representation from which the dependencies between the dif1We used the conversion of margin into probability provided by LIBSVM. [sent-80, score-0.491]
34 MCAT M11 M13 MM114 3 Figure 3: A compact representation of the hypothesis in Fig. [sent-81, score-0.243]
35 Since we do not know in advance which are the important dependencies and not even the scope of the interaction between the different structure subparts, we rely on automatic feature engineering via structural kernels. [sent-84, score-0.225]
36 For this paper, we consider tree-shaped hierarchies so that tree kernels, e. [sent-85, score-0.163]
37 For example, Figure 1shows a subhierarchy of the Markets (MCAT) category and its subcategories: Equity Markets (M11), Bond Markets (M12), Money Markets (M13) and Commodity Markets (M14). [sent-89, score-0.132]
38 As the input of our reranker, we can simply use a tree representing the hierarchy above, marking the negative assignments ofthe current hypothesis in the node labels with “-”, e. [sent-91, score-0.382]
39 For example, Figure 2 shows the representation of a classification hypothesis consisting in assigning the target document to the categories MCAT, M11, M13, M14 and M143. [sent-94, score-0.295]
40 Another more compact representation is the hierarchy tree from which all the nodes associated with a negative classification decision are removed. [sent-95, score-0.453]
41 As only a small subset ofnodes ofthe full hierarchy will be positively classified the tree will be much smaller. [sent-96, score-0.214]
42 Figure 3 shows the compact representation ofthe hypothesis in Fig. [sent-97, score-0.243]
43 In several cases, this can be efficiently and implicitly computed by kernel functions by exploiting the following dual formulation: 761 Pi=1. [sent-102, score-0.242]
44 l yiαiφ(oi)φ(o) + b = 0, where oi and o are Ptwo objects, φ is a mapping from the objects to featPure vectors x~ i and φ(oi)φ(o) = K(oi, o) is a kernel function implicitly defining such a mapping. [sent-104, score-0.324]
45 In case of structural kernels, K determines the shape of the substructures describing the objects above. [sent-105, score-0.138]
46 The most general kind of kernels used in NLP are string kernels, e. [sent-106, score-0.426]
47 In our case, given a hypothesis represented as a tree like in Figure 2, we can visit it and derive a linearization of the tree. [sent-125, score-0.248]
48 SK applied to such a node sequence can derive useful dependencies between category nodes. [sent-126, score-0.212]
49 , , , , , MCAT MCAT M11 -M12 M13 M14 M11 -M12 -M13 1 M13 -M132 -M131 , , , -M141 , , , MCAT M13 M14 M11 -M12 -M142 M13 M14 -M141 -M142 M143 M14 -M1 32 , M143 Figure 4: The tree fragments of the hypothesis in Fig. [sent-128, score-0.309]
50 2 generated by STK MCAT MCAT M11 -M1M2 C MAT1 3 M14 M11 -MMC12AT M 13 -M1 3M1 13-M 132 - MM113 2 -M1M41 4 -M 142 M14 M13 MCAT MCAT -M141 -M142 -M143 -M131 -M132 M11 M13 -M131 Figure 5: Some tree fragments of the hypothesis in Fig. [sent-129, score-0.309]
51 , f|F| } be a tree fragment space and χi(n) f be an indicator} function, equal tto s 1a cief atnhed target fi is rooted at node n and equal to 0 otherwise. [sent-136, score-0.162]
52 The ∆ function determines the richness of the kernel space and thus different tree kernels. [sent-139, score-0.332]
53 Figure 4 shows the five fragments of the hypothesis in Figure 2. [sent-149, score-0.188]
54 5 shows the tree fragments from the hypothesis of Fig. [sent-161, score-0.309]
55 3 Preference reranker When training a reranker model, the task of the machine learning algorithm is to learn to select the best candidate from a given set of hypotheses. [sent-165, score-0.194]
56 In the Preference Kernel approach, the reranking problem learning to pick the correct candidate h1 from a candidate set {h1, . [sent-172, score-0.17]
57 The kernels are then engineered to implicitly represent the differences between the objects in the pairs. [sent-187, score-0.488]
58 If we have a valid kernel K over the candidate space T , we can construct a preference kceanrndeild PK over eth Te space aofn pairs Ttru tT a as efofellroewncse: PK(x, y) = PKK(x(h2x,1y,2x)2 −i,h Ky1(,xy1,2yi)2) = − K K(x(x12, y 1)1+), (1) where x, y ∈ T T . [sent-188, score-0.269]
59 s, makes it possible to use kernel methods to train the reranker. [sent-193, score-0.211]
60 We explore innovative kernels K to be used in Eq. [sent-194, score-0.426]
61 1: KJ = p(x1) p(y1) + S, where p(·) is the global joint probability o +f a target hypothesis aen gdl oSb aisl a structural kernel, i. [sent-195, score-0.234]
62 75690 Table 2: Comparison of rerankers using different kernels, child-full setting (KJ model). [sent-203, score-0.358]
63 64 872 Table 3: Comparison of rerankers using different kernels, child-free setting (KJ model). [sent-209, score-0.358]
64 tree t ∈ T and S is again a structural kernel, i. [sent-210, score-0.228]
65 For comparative purposes, we also use for S a linear kernel over the bag-of-labels (BOL). [sent-213, score-0.211]
66 This is supposed to capture non-structural dependencies between the category labels. [sent-214, score-0.171]
67 4 Experiments The aim of the experiments is to demonstrate that our reranking approach can introduce semantic dependencies in the hierarchical classification model, which can improve accuracy. [sent-215, score-0.43]
68 For this purpose, we show that several reranking models based on tree kernels improve the classification based on the flat one-vs. [sent-216, score-0.836]
69 We used the above datasets with two different settings: the child-free setting, where we removed all the document belonging to the child nodes from the parent nodes, and the normal setting which we refer to as child-full. [sent-232, score-0.128]
70 The binary classifiers are trained on Train1 and tested on Train2 (and vice versa) to generate the hypotheses on Train2 (Train1). [sent-242, score-0.181]
71 tw/ ˜c j l in/ l ibsvm/ 764 2 7 12 17 22 27 Training Data Size (thousands of instances) 32 Figure 6: Learning curves of the reranking models using STK in terms of MicroAverage-F1, according to increasing training set (child-free setting). [sent-252, score-0.198]
72 Training Data Size (thousands of instances) Figure 7: Learning curves of the reranking models using STK in terms of MacroAverage-F1, according to increasing training set (child-free setting). [sent-253, score-0.198]
73 The rerankers are based on SVMs and the Preference Kernel (PK) described in Sec. [sent-259, score-0.316]
74 We trained the rerankers using SVM-light-TK6, which enables the use of structural kernels in SVM-light (Joachims, 1999). [sent-264, score-0.849]
75 This allows for applying kernels to pairs of trees and combining them with vector-based kernels. [sent-265, score-0.456]
76 l htm Table 4: F1 of some binary classifiers along with the Micro and Macro-Average F1 over all 103 categories of RCV1, 8 hypotheses and 32k of training data for rerankers using STK. [sent-271, score-0.549]
77 2 Classification Accuracy In the first experiments, we compared the different kernels using the KJ combination (which exploits the joint hypothesis probability, see Sec. [sent-275, score-0.553]
78 BOL cannot capture the same dependencies as the structural kernels. [sent-280, score-0.225]
79 In contrast, when we remove the dependencies generated by shared documents between a node and its descendants (child-free setting) BOL improves on BL. [sent-281, score-0.197]
80 To study how much data is needed for the reranker, the figures 6 and 7 report the Micro and Macro Average F1 of our rerankers over 103 categories, according to different sets of training data. [sent-283, score-0.316]
81 We note that 765 Training Data Size (thousands of instances) Figure 8: Training and test time of the rerankers trained on data of increasing size. [sent-289, score-0.344]
82 This means that the use of probability vectors and combination with structural kernels is a very promising direction for reranker design. [sent-302, score-0.63]
83 To definitely assess the benefit of our rerankers we tested them on the Lewis’ split of two different datasets of RCV1, i. [sent-303, score-0.316]
84 It can be noted that the models using the compact hypothesis representation are much faster than those M aic Fr o1o- F 1 BL0 (. [sent-317, score-0.243]
85 examples are used for training the rerankers with child-full setting. [sent-328, score-0.316]
86 This is not surprising as, in the latter case, each kernel evaluation requires to perform tree kernel evaluation on trees of 103 nodes. [sent-335, score-0.573]
87 When using the compact representation the number of nodes is upper-bounded by the maximum number of labels per documents, i. [sent-336, score-0.173]
88 5 Related Work Tree and sequence kernels have been successfully used in many NLP applications, e. [sent-344, score-0.426]
89 : parse reranking and adaptation (Collins and Duffy, 2002; Shen et al. [sent-346, score-0.17]
90 To our knowledge, ours is the first work exploring structural kernels for reranking hierarchical text categorization hypotheses. [sent-354, score-0.881]
91 Additionally, there is a substantial lack of work exploring reranking for hierarchical text categorization. [sent-355, score-0.276]
92 , 2005), which has been applied to model dependencies expressed as category label subsets of flat categorization schemes but no solution has been attempted for hierarchical settings. [sent-362, score-0.428]
93 , 2010) can surely be applied to model dependencies in a tree, however, they need that feature templates are specified in advance, thus the meaningful dependencies must be already known. [sent-364, score-0.236]
94 In contrast, kernel methods allow for automatically generating all possible dependencies and reranking can efficiently encode them. [sent-365, score-0.541]
95 6 Conclusions In this paper, we have described several models for reranking the output of an MCC based on SVMs and structural kernels, i. [sent-366, score-0.277]
96 The latter are exploited by SVMs using preference kernels to automatically derive features from the hypotheses. [sent-370, score-0.484]
97 When using tree kernels such features are tree fragments, which can encode complex semantic dependencies between categories. [sent-371, score-0.828]
98 We tested our rerankers on the entire well-known RCV1 . [sent-372, score-0.316]
99 Improving text classification by shrinkage in a hierarchy of classes. [sent-458, score-0.159]
100 Efficient convolution kernels for dependency and constituent syntactic trees. [sent-462, score-0.464]
wordName wordTfidf (topN-words)
[('kernels', 0.426), ('rerankers', 0.316), ('mcat', 0.296), ('stk', 0.283), ('kernel', 0.211), ('ptk', 0.207), ('reranking', 0.17), ('lewis', 0.158), ('kj', 0.138), ('hypothesis', 0.127), ('tree', 0.121), ('dependencies', 0.118), ('sk', 0.113), ('markets', 0.111), ('structural', 0.107), ('reranker', 0.097), ('hierarchy', 0.093), ('moschitti', 0.082), ('subhierarchy', 0.079), ('reuters', 0.079), ('bol', 0.079), ('hypotheses', 0.076), ('hierarchical', 0.076), ('subsequences', 0.073), ('categorization', 0.072), ('ux', 0.069), ('compact', 0.066), ('classification', 0.066), ('kp', 0.063), ('fragments', 0.061), ('classifiers', 0.061), ('industry', 0.059), ('preference', 0.058), ('nodes', 0.057), ('tc', 0.057), ('percent', 0.057), ('schemes', 0.056), ('tsochantaridis', 0.055), ('category', 0.053), ('flat', 0.053), ('svms', 0.053), ('kudo', 0.053), ('categories', 0.052), ('rifkin', 0.052), ('oi', 0.051), ('productions', 0.051), ('representation', 0.05), ('margin', 0.05), ('mcc', 0.047), ('pk', 0.045), ('children', 0.044), ('binary', 0.044), ('tk', 0.044), ('subcategories', 0.044), ('lavergne', 0.044), ('setting', 0.042), ('encode', 0.042), ('hierarchies', 0.042), ('node', 0.041), ('duffy', 0.04), ('ailon', 0.04), ('balcan', 0.04), ('finley', 0.04), ('frr', 0.04), ('gothenburg', 0.04), ('mmaic', 0.04), ('rousu', 0.04), ('subparts', 0.04), ('ii', 0.04), ('convolution', 0.038), ('documents', 0.038), ('thousands', 0.037), ('riezler', 0.036), ('gaps', 0.036), ('metals', 0.034), ('vasserman', 0.034), ('cumby', 0.034), ('klautau', 0.034), ('rr', 0.034), ('additionally', 0.034), ('alessandro', 0.033), ('zelenko', 0.031), ('impressive', 0.031), ('chitt', 0.031), ('implicitly', 0.031), ('objects', 0.031), ('trees', 0.03), ('probable', 0.03), ('exploring', 0.03), ('gliozzo', 0.029), ('cristianini', 0.029), ('belonging', 0.029), ('joachims', 0.028), ('increasing', 0.028), ('absolute', 0.028), ('purpose', 0.028), ('organized', 0.028), ('taku', 0.028), ('cancedda', 0.028), ('complexity', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
Author: Alessandro Moschitti ; Qi Ju ; Richard Johansson
Abstract: In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role ofcategory relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state-of-the-art.
2 0.30065462 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing
Author: Alessandro Moschitti
Abstract: unkown-abstract
Author: Zhaopeng Tu ; Yifan He ; Jennifer Foster ; Josef van Genabith ; Qun Liu ; Shouxun Lin
Abstract: Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 pointabsoluteimprovementinaccuracy overa bag-of-words classifier on a widely used sentiment corpus. 1
4 0.29193372 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer
Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.
5 0.18118362 184 acl-2012-String Re-writing Kernel
Author: Fan Bu ; Hang Li ; Xiaoyan Zhu
Abstract: Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper, we propose a new class of kernel functions, referred to as string re-writing kernel, to address the problem. A string re-writing kernel measures the similarity between two pairs of strings, each pair representing re-writing of a string. It can capture the lexical and structural similarity between two pairs of sentences without the need of constructing syntactic trees. We further propose an instance of string rewriting kernel which can be computed efficiently. Experimental results on benchmark datasets show that our method can achieve better results than state-of-the-art methods on two sentence re-writing learning tasks: paraphrase identification and recognizing textual entailment.
6 0.13650265 154 acl-2012-Native Language Detection with Tree Substitution Grammars
7 0.081831761 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
8 0.077216908 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
9 0.076796107 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
10 0.072543383 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
11 0.066226438 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
12 0.066130057 71 acl-2012-Dependency Hashing for n-best CCG Parsing
13 0.06552732 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
14 0.064863637 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
15 0.062833384 78 acl-2012-Efficient Search for Transformation-based Inference
16 0.061770234 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
17 0.061435387 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
18 0.06136547 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
19 0.05876448 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
20 0.057007581 83 acl-2012-Error Mining on Dependency Trees
topicId topicWeight
[(0, -0.198), (1, 0.061), (2, -0.085), (3, -0.152), (4, -0.093), (5, -0.037), (6, -0.039), (7, 0.275), (8, -0.183), (9, -0.108), (10, -0.07), (11, -0.21), (12, 0.204), (13, 0.127), (14, 0.117), (15, 0.17), (16, -0.206), (17, 0.149), (18, 0.085), (19, -0.07), (20, -0.031), (21, 0.006), (22, 0.174), (23, -0.004), (24, -0.098), (25, -0.069), (26, -0.068), (27, 0.041), (28, -0.071), (29, -0.111), (30, -0.006), (31, 0.081), (32, 0.002), (33, 0.039), (34, 0.03), (35, -0.041), (36, -0.019), (37, -0.016), (38, 0.01), (39, -0.017), (40, 0.011), (41, 0.057), (42, -0.01), (43, -0.001), (44, -0.011), (45, 0.014), (46, -0.049), (47, -0.016), (48, 0.019), (49, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.94876915 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
Author: Alessandro Moschitti ; Qi Ju ; Richard Johansson
Abstract: In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role ofcategory relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state-of-the-art.
2 0.88201958 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing
Author: Alessandro Moschitti
Abstract: unkown-abstract
Author: Zhaopeng Tu ; Yifan He ; Jennifer Foster ; Josef van Genabith ; Qun Liu ; Shouxun Lin
Abstract: Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 pointabsoluteimprovementinaccuracy overa bag-of-words classifier on a widely used sentiment corpus. 1
4 0.71331453 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer
Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.
5 0.65809739 184 acl-2012-String Re-writing Kernel
Author: Fan Bu ; Hang Li ; Xiaoyan Zhu
Abstract: Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper, we propose a new class of kernel functions, referred to as string re-writing kernel, to address the problem. A string re-writing kernel measures the similarity between two pairs of strings, each pair representing re-writing of a string. It can capture the lexical and structural similarity between two pairs of sentences without the need of constructing syntactic trees. We further propose an instance of string rewriting kernel which can be computed efficiently. Experimental results on benchmark datasets show that our method can achieve better results than state-of-the-art methods on two sentence re-writing learning tasks: paraphrase identification and recognizing textual entailment.
6 0.38102761 154 acl-2012-Native Language Detection with Tree Substitution Grammars
7 0.30904162 83 acl-2012-Error Mining on Dependency Trees
8 0.27906996 200 acl-2012-Toward Automatically Assembling Hittite-Language Cuneiform Tablet Fragments into Larger Texts
9 0.25560284 50 acl-2012-Collective Classification for Fine-grained Information Status
10 0.25027511 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
11 0.24132961 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
12 0.23707043 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
13 0.23132104 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars
14 0.22728516 139 acl-2012-MIX Is Not a Tree-Adjoining Language
15 0.22141649 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
16 0.21225049 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
17 0.21109867 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
18 0.21010306 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
19 0.20824455 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
20 0.20763601 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
topicId topicWeight
[(22, 0.229), (25, 0.021), (26, 0.031), (28, 0.042), (30, 0.03), (37, 0.107), (39, 0.044), (48, 0.013), (74, 0.033), (82, 0.036), (84, 0.026), (85, 0.023), (90, 0.105), (92, 0.086), (94, 0.037), (99, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.77087158 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
Author: Alessandro Moschitti ; Qi Ju ; Richard Johansson
Abstract: In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role ofcategory relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state-of-the-art.
2 0.71828616 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
Author: Ziheng Lin ; Chang Liu ; Hwee Tou Ng ; Min-Yen Kan
Abstract: An ideal summarization system should produce summaries that have high content coverage and linguistic quality. Many state-ofthe-art summarization systems focus on content coverage by extracting content-dense sentences from source articles. A current research focus is to process these sentences so that they read fluently as a whole. The current AESOP task encourages research on evaluating summaries on content, readability, and overall responsiveness. In this work, we adapt a machine translation metric to measure content coverage, apply an enhanced discourse coherence model to evaluate summary readability, and combine both in a trained regression model to evaluate overall responsiveness. The results show significantly improved performance over AESOP 2011 submitted metrics.
3 0.62630731 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer
Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.
4 0.62221026 64 acl-2012-Crosslingual Induction of Semantic Roles
Author: Ivan Titov ; Alexandre Klementiev
Abstract: We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations. Specifically, we consider unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and use a stateof-the-art generative Bayesian non-parametric model. At inference time, instead of only seeking the model which explains the monolingual data available for each language, we regularize the objective by introducing a soft constraint penalizing for disagreement in argument labeling on aligned sentences. We propose a simple approximate learning algorithm for our set-up which results in efficient inference. When applied to German-English parallel data, our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on non-parallel sentences.
5 0.62124598 71 acl-2012-Dependency Hashing for n-best CCG Parsing
Author: Dominick Ng ; James R. Curran
Abstract: Optimising for one grammatical representation, but evaluating over a different one is a particular challenge for parsers and n-best CCG parsing. We find that this mismatch causes many n-best CCG parses to be semantically equivalent, and describe a hashing technique that eliminates this problem, improving oracle n-best F-score by 0.7% and reranking accuracy by 0.4%. We also present a comprehensive analysis of errors made by the C&C; CCG parser, providing the first breakdown of the impact of implementation decisions, such as supertagging, on parsing accuracy.
7 0.6002627 114 acl-2012-IRIS: a Chat-oriented Dialogue System based on the Vector Space Model
8 0.59943455 42 acl-2012-Bootstrapping via Graph Propagation
9 0.59939665 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
10 0.59615773 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
11 0.58361536 31 acl-2012-Authorship Attribution with Author-aware Topic Models
12 0.58149236 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
13 0.57393187 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
14 0.57234168 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
15 0.56959063 191 acl-2012-Temporally Anchored Relation Extraction
16 0.56696296 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
17 0.56539738 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
18 0.5643428 167 acl-2012-QuickView: NLP-based Tweet Search
19 0.56430525 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models
20 0.56325144 184 acl-2012-String Re-writing Kernel