acl acl2013 acl2013-144 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matt Post ; Shane Bergsma
Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.
Reference: text
sentIndex sentText sentNum sentScore
1 Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. [sent-2, score-1.108]
2 We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. [sent-3, score-2.103]
3 Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations. [sent-4, score-0.636]
4 1 Introduction Features computed over parse trees are useful for a range of discriminative tasks, including authorship attribution (Baayen et al. [sent-5, score-0.217]
5 , 1996) , parse reranking (Collins and Duffy, 2002) , language modeling (Cherry and Quirk, 2008) , and native-language detection (Wong and Dras, 2011) . [sent-6, score-0.104]
6 A major distinction among these uses of syntax is how the features are represented. [sent-7, score-0.055]
7 The implicit approach uses tree kernels (Collins and Duffy, 2001) , which make predictions with inner products between tree pairs. [sent-8, score-0.944]
8 These products can be computed efficiently with a dynamic program that produces weighted counts of all the shared tree fragments between a pair of trees, essentially incorporating all fragments without representing any of them explicitly. [sent-9, score-0.531]
9 Tree kernel approaches have been applied successfully in many areas of NLP (Collins and Duffy, 2002; Moschitti, 2004; Pighin and Moschitti, 2009) . [sent-10, score-0.152]
10 Tree kernels were inspired in part by ideas from Data-Oriented Parsing (Scha, 1990; Bod, 1993) , which was in turn motivated by uncertainty about which fragments to include in a grammar. [sent-11, score-0.529]
11 However, manual and automatic approaches to inducing tree fragments have recently been found to be useful in an explicit approach to text classification, which employs specific tree fragments as features in standard classifiers (Post, 2011; Wong and Dras, 2011; Swanson and Charniak, 2012) . [sent-12, score-0.978]
12 These feature sets necessarily represent only a small subset of all possible tree patterns, leaving open the question of what further gains might be had from the unusued fragments. [sent-13, score-0.329]
13 Somewhat surprisingly, explicit and implicit syntactic features have been explored largely independently. [sent-14, score-0.287]
14 Here, we compare them on a range of classification tasks: (1,2) grammatical classification (is a sentence written by a human? [sent-15, score-0.204]
15 ) , (3) question classification (what type of answer is sought by this question? [sent-16, score-0.195]
16 ) , and (4,5) native language prediction (what is the native language of a text’s author? [sent-17, score-0.334]
17 Our main contribution is to show that an explicit syntactic feature set performs as well or better than tree kernels on each tested task, and in orders of magnitude less time. [sent-19, score-0.952]
18 Since explicit features are simple to generate (with publicly available tools) and flexible to use, we recommend they be included as baseline comparisons in tree kernel method evaluations. [sent-20, score-0.636]
19 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgicess 866–872, CFG rules Counts of depth-one contextfree grammar (CFG) productions obtained from the Berkeley parser (Petrov et al. [sent-24, score-0.043]
20 C&J; features The parse-tree reranking feature set of Charniak and Johnson (2005) , extracted from the Berkeley parse trees. [sent-26, score-0.159]
21 TSG features We also parsed with a Bayesian tree substitution grammar (Post and Gildea, 2009, TSG)2 and extracted fragment counts from Viterbi derivations. [sent-27, score-0.365]
22 We divided each dataset into training, dev, and test sets. [sent-30, score-0.033]
23 We then trained an L2-regularized L1-loss support vector machine (-s 3) with a bias parameter of 1(-B 1) , optimizing the regularization parameter (-c) on the dev set over the range {0. [sent-31, score-0.063]
24 For tree kernels, we used SVM-light-TK4 (Moschitti, 2004; Moschitti, 2006) with the default settings (-t 5 -D 1 -L 0 . [sent-39, score-0.253]
25 We tuned the regularization parameter (-c) on the dev set in the same manner as described above, providing 4 GB of memory to the kernel cache (-m 4000) . [sent-41, score-0.215]
26 6 We used subset tree kernels, which compute the similarity between two trees by implicitly enumerating all possible fragments of the trees (in contrast with subtree kernels, where all fragments fully extend to the leaves) . [sent-42, score-0.619]
27 1 Coarse grammatical classification Our first comparison is coarse grammatical classification, where the goal is to distinguish between human-written sentences and “pseudo-negative” sentences sampled from a trigram language model constructed from inprovement. [sent-45, score-0.313]
28 htm 5Optimizing SVM-TK’s decay parameter (-L) did not improve test-set accuracy, but did increase training time (squaring the number of hyperparameter combinations to evaluate) , so we stuck with the default. [sent-54, score-0.033]
29 We repeat Post’s experiments on the BLLIP dataset,7 using his exact data splits (Table 2) . [sent-64, score-0.095]
30 To our knowledge, tree kernels have not been applied to this task. [sent-65, score-0.643]
31 2 Fine grammatical classification Real-world grammaticality judgments require much finer-grained distinctions than the coarse ones of the previous section (for example, marking dropped determiners or wrong verb inflections) . [sent-67, score-0.49]
32 LDC2000T43 867 system accuracy CPU time Wong & Dras60. [sent-71, score-0.051]
33 8weeks Table 3: Fine-grained classification accuracy (the Wong and Dras (2010) score is the highest score from the last column of their Table 3) . [sent-78, score-0.134]
34 in into the parse trees from the positive data using GenERRate (Foster and Andersen, 2009) . [sent-82, score-0.101]
35 Wong and Dras (2010) reported good results with parsers trained separately on the positive and negative sides of the training data and classifiers built from comparisons between the CFG productions of those parsers. [sent-84, score-0.113]
36 We obtained their data splits (described as NoisyW WSJ in their paper) and repeat their experiments here (Table 3) . [sent-85, score-0.095]
37 3 Question Classification We look next at question classification (QC) . [sent-87, score-0.159]
38 Li and Roth (2002) introduced the TREC-10 dataset,8 a set of questions paired with labels that categorize the question by the type of answer it seeks. [sent-88, score-0.112]
39 The labels are organized hierarchically into six (coarse) top-level labels and fifty (fine) refinements. [sent-89, score-0.052]
40 An example question from the ENTY/animal category is What was the first domesticated bird? [sent-90, score-0.076]
41 Table 4 contains results predicting just the coarse labels. [sent-92, score-0.154]
42 We compare to Pighin and Moschitti (2009) , and also repeat their experiments, finding a slightly better result for them. [sent-93, score-0.048]
43 We also experimented with the refined version of the task, where we directly predict one of the fifty refined categories, and found nearly identical relative results, with the best explicit feature set (CFG) returning an accuracy of 83. [sent-111, score-0.276]
44 2% accuracy when training on the full training set (5,500 examples) with an SVM and bag-of-words features. [sent-115, score-0.117]
45 4 Native language identification Native language identification (NLI) is the task of determining a text’s author’s native language. [sent-117, score-0.231]
46 This is usually cast as a documentlevel task, since there are often not enough cues to identify native languages at smaller granularities. [sent-118, score-0.167]
47 As such, this task presents a challenge to tree kernels, which are defined at the level of a single parse tree and have no obvious document-level extension. [sent-119, score-0.563]
48 Table 5 therefore presents three evaluations: (a) sentencelevel accuracy, and document-level accuracy from (b) sentence-level voting and (c) direct, whole-document classification. [sent-120, score-0.083]
49 In order to mitigate topic bias10 and other problems that have been reported with 9Pighin and Moschitti (2009) did not report results on this version of the task. [sent-122, score-0.034]
50 , 2012) ,11 we preprocessed each dataset into two signaturestylized versions by replacing all words not in a stopword list. [sent-127, score-0.075]
51 12 The first version replaces nonstopwords with word classes computed from surface-form signatures,13 and the second with POS tags. [sent-128, score-0.034]
52 14 N-gram features are then taken from both stylized versions of the corpus. [sent-129, score-0.055]
53 Restricting the feature representation to be topic-independent is standard-practice in stylometric tasks like authorship attribution, gender identification, and native-language identification (Mosteller and Wallace, 1984; Koppel et al. [sent-130, score-0.196]
54 2 The first dataset is a seven-language subset of the International Corpus of Learner English, Version 2 (ICLE) (Granger et al. [sent-135, score-0.033]
55 7 million words of English documents written by people with sixteen dif- ferent native languages. [sent-137, score-0.167]
56 Table 1 contains scores, including one reported by Wong and Dras (2011) , who used the CFG and C&J; features, and whose data splits we mirror. [sent-138, score-0.047]
57 2 ACL Anthology Network We also experimented with native language classification on scientific documents using a version of the ACL Anthology Network (Radev et al. [sent-141, score-0.284]
58 , 2009, AAN) annotated for experiments in stylemetric tasks, including a native/non-native author judgment (Bergsma et al. [sent-142, score-0.035]
59 For NLI, we further annotated this dataset in a semi-automatic fashion for the five most-common native languages of ACL authors in our training era: English, Japanese, German, Chinese, and French. [sent-144, score-0.233]
60 12The stopword list contains the set of 524 SMARTsystem stopwords used by Tomokiyo and Jones (2001) , plus punctuation and Latin abbreviations. [sent-147, score-0.042]
61 test accuracy for coarse grammaticality, plotting test scores from models trained on 100, 300, 1k, 3k, 10k, 30k, and 100k instances. [sent-158, score-0.205]
62 4 Discussion Syntactic features improve upon the n-gram baseline for all tasks except whole-document classification for ICLE. [sent-159, score-0.173]
63 Tree kernels are often among the best, but always trail (by orders of magnitude) when runtime is considered. [sent-160, score-0.482]
64 Constructing the multi-class SVM-TK models for the NLI tasks in particular was computationally burdensome, requiring cpu-months of time. [sent-161, score-0.035]
65 The C&J; features are similarly often the best, but incur a runtime cost due to the large models. [sent-162, score-0.086]
66 CFG and TSG features balance performance, model size, and runtime. [sent-163, score-0.055]
67 1 Training time versus accuracy Tree kernel training is quadratic in the size of the training data, and its empirical slowness is known. [sent-166, score-0.269]
68 We compared models trained on the first 100, 300, 1k, 3k, 10k, 30k, and 100k data points of the coarse grammaticality dataset, split evenly between positive and negative examples (Figure 1) . [sent-168, score-0.369]
69 SVM-TK improves over the TSG and CFG models in the limit, but at an extraordinary cost in training time: 100k training examples is already pushing the bounds of practicality for tree kernel learning, and generating curve’s next point would require several months of time. [sent-169, score-0.471]
70 Approximate kernel methods designed to scale to large datasets address this (Severyn 869 and Moschitti, 2010) . [sent-172, score-0.152]
71 We investigated the uSVM-TK toolkit,17 which enables tuning the tradeoff between training time and accuracy. [sent-173, score-0.033]
72 While faster than SVM-TK, its performance was never better than explicit methods along both dimensions (time and accuracy) . [sent-174, score-0.139]
73 2 Overfitting Overfitting is also a problem for kernel methods. [sent-176, score-0.152]
74 The best models often had a huge number of support vectors, achieving near-perfect accuracy on the training set but making many errors on the dev. [sent-177, score-0.084]
75 On the ICLE task, close to 75% of all the training examples were used as support vectors. [sent-179, score-0.033]
76 We found only half as many support vectors used for the explicit representations, implying less error (Vapnik, 1998) , and saw much lower variance between training and testing performance. [sent-180, score-0.172]
77 Our findings support the observations of Cumby and Roth (2003) , who point out that kernels introduce a large number of irrelevant features that may be especially harmful in small-data settings, and that, when possible, it is often better to have a set of explicit, relevant features. [sent-183, score-0.445]
78 In other words, it is better to have the right features than all of them. [sent-184, score-0.055]
79 Tree kernels provide a robust, efficiently-computable measure of comparison, but they also skirt the difficult question, Which fragments? [sent-185, score-0.39]
80 Table 6) presents an intuitive list from the coarse grammaticality task: phenomena such as balanced parenthetical phrases and quotations are associated with grammaticality, while small, flat, abstract rules indicate samples from the ngram model. [sent-188, score-0.369]
81 The immediate interpretability of the explicit formalisms is another advantage, although recent work has shown that weights on the implicit features can also be obtained after a kind of linearization of the tree kernel (Pighin and Moschitti, 2009) . [sent-190, score-0.647]
82 Ultimately, which features matter is taskdependent, and skirting the question is advantageous in many settings. [sent-191, score-0.175]
83 But it is also encouraging that methods for selecting fragments and other tree features work so well, 17disi . [sent-192, score-0.447]
84 )) (NP DT JJ NNS) Table 6: The highest- and lowest-weighted TSG features (coarse grammaticality) . [sent-199, score-0.055]
85 yielding quick, light-weight models that contrast with the heavy machinery of tree kernels. [sent-200, score-0.253]
86 5 Conclusion Tree kernels provide a robust measure of comparison between trees, effectively making use of all fragments. [sent-201, score-0.39]
87 In addition to their flexibility and interpetability, explicit syntactic features of- ten outperformed tree kernels in accuracy, and even where they did not, the cost was multiple orders of magnitude increase in both training and testing time. [sent-203, score-1.04]
88 These results were consistent across a range of task types, dataset sizes, and classification arities (binary and multiclass) . [sent-204, score-0.116]
89 We explored a range of data settings, but there are many others where tree kernels have been proven useful, such as parse tree reranking (Collins and Duffy, 2002; Shen and Joshi, 2003) , sentence subjectivity (Suzuki et al. [sent-206, score-1.0]
90 There are also tree kernel variations such as dependency tree kernels (Culotta and Sorensen, 2004) and shallow semantic tree kernels (Moschitti et al. [sent-209, score-1.725]
91 Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. [sent-214, score-0.119]
92 New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. [sent-244, score-0.39]
93 Exploit- ing syntactic and shallow semantic kernels for question answer classification. [sent-286, score-0.581]
94 A study on convolution kernels for shallow semantic parsing. [sent-291, score-0.502]
95 An SVMbased voting algorithm with application to parse reranking. [sent-358, score-0.089]
96 Convolution kernels with feature selection for natural language processing tasks. [sent-363, score-0.39]
97 Native tongues, lost and found: Resources and empirical evaluations in native language identification. [sent-373, score-0.167]
wordName wordTfidf (topN-words)
[('kernels', 0.39), ('tree', 0.253), ('grammaticality', 0.215), ('moschitti', 0.204), ('native', 0.167), ('cfg', 0.159), ('tsg', 0.157), ('coarse', 0.154), ('kernel', 0.152), ('pighin', 0.143), ('fragments', 0.139), ('dras', 0.139), ('icle', 0.139), ('explicit', 0.139), ('wong', 0.131), ('duffy', 0.119), ('cl', 0.117), ('np', 0.092), ('tomokiyo', 0.09), ('nli', 0.09), ('anthology', 0.083), ('crftagger', 0.083), ('classification', 0.083), ('convolution', 0.078), ('collins', 0.077), ('post', 0.076), ('question', 0.076), ('authorship', 0.074), ('alessandro', 0.071), ('generrate', 0.068), ('jojo', 0.068), ('mosteller', 0.068), ('okanohara', 0.068), ('systemaccuracycpu', 0.068), ('culotta', 0.066), ('magnitude', 0.064), ('dev', 0.063), ('matt', 0.062), ('orders', 0.061), ('swanson', 0.06), ('prn', 0.06), ('severyn', 0.06), ('parse', 0.057), ('substitution', 0.057), ('cherry', 0.056), ('stylometric', 0.055), ('features', 0.055), ('bllip', 0.052), ('fifty', 0.052), ('granger', 0.052), ('koppel', 0.052), ('accuracy', 0.051), ('sorensen', 0.05), ('baayen', 0.05), ('pp', 0.049), ('fine', 0.048), ('repeat', 0.048), ('implicit', 0.048), ('vp', 0.048), ('tetreault', 0.048), ('reranking', 0.047), ('splits', 0.047), ('nigel', 0.046), ('syntactic', 0.045), ('charniak', 0.045), ('advantageous', 0.044), ('trees', 0.044), ('hour', 0.043), ('productions', 0.043), ('japanese', 0.042), ('attribution', 0.042), ('stopword', 0.042), ('dt', 0.041), ('shane', 0.041), ('literary', 0.041), ('bergsma', 0.039), ('minutes', 0.039), ('usa', 0.038), ('grammatical', 0.038), ('comparisons', 0.037), ('acl', 0.036), ('answer', 0.036), ('tasks', 0.035), ('wsj', 0.035), ('suzuki', 0.035), ('author', 0.035), ('shallow', 0.034), ('version', 0.034), ('training', 0.033), ('radev', 0.033), ('dataset', 0.033), ('identification', 0.032), ('fan', 0.032), ('voting', 0.032), ('overfitting', 0.032), ('runtime', 0.031), ('foster', 0.031), ('pronoun', 0.031), ('jones', 0.031), ('network', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
Author: Matt Post ; Shane Bergsma
Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.
2 0.29511607 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
Author: Barbara Plank ; Alessandro Moschitti
Abstract: Relation Extraction (RE) is the task of extracting semantic relationships between entities in text. Recent studies on relation extraction are mostly supervised. The clear drawback of supervised methods is the need of training data: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to. This is the problem of domain adaptation. In this paper, we propose to combine (i) term generalization approaches such as word clustering and latent semantic analysis (LSA) and (ii) structured kernels to improve the adaptability of relation extractors to new text genres/domains. The empirical evaluation on ACE 2005 domains shows that a suitable combination of syntax and lexical generalization is very promising for domain adaptation.
3 0.24716631 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations
Author: Aliaksei Severyn ; Massimo Nicosia ; Alessandro Moschitti
Abstract: Measuring semantic textual similarity (STS) is at the cornerstone of many NLP applications. Different from the majority of approaches, where a large number of pairwise similarity features are used to represent a text pair, our model features the following: (i) it directly encodes input texts into relational syntactic structures; (ii) relies on tree kernels to handle feature engineering automatically; (iii) combines both structural and feature vector representations in a single scoring model, i.e., in Support Vector Regression (SVR); and (iv) delivers significant improvement over the best STS systems.
4 0.16750666 357 acl-2013-Transfer Learning for Constituency-Based Grammars
Author: Yuan Zhang ; Regina Barzilay ; Amir Globerson
Abstract: In this paper, we consider the problem of cross-formalism transfer in parsing. We are interested in parsing constituencybased grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features. To handle this apparent discrepancy, we design a probabilistic model that jointly generates CFG and target formalism parses. The model includes features of both parses, allowing trans- fer between the formalisms, while preserving parsing efficiency. We evaluate our approach on three constituency-based grammars CCG, HPSG, and LFG, augmented with the Penn Treebank-1. Our experiments show that across all three formalisms, the target parsers significantly benefit from the coarse annotations.1 —
5 0.15072316 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.
6 0.14210929 4 acl-2013-A Context Free TAG Variant
8 0.12293123 292 acl-2013-Question Classification Transfer
9 0.12201045 296 acl-2013-Recognizing Identical Events with Graph Kernels
10 0.11817048 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts
11 0.11410574 57 acl-2013-Arguments and Modifiers from the Learner's Perspective
12 0.099913739 310 acl-2013-Semantic Frames to Predict Stock Price Movement
13 0.096935377 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
14 0.087122872 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
15 0.08585307 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
16 0.085342489 275 acl-2013-Parsing with Compositional Vector Grammars
17 0.084688872 314 acl-2013-Semantic Roles for String to Tree Machine Translation
18 0.078721114 80 acl-2013-Chinese Parsing Exploiting Characters
19 0.078666002 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
20 0.077758186 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
topicId topicWeight
[(0, 0.231), (1, -0.024), (2, -0.086), (3, -0.039), (4, -0.096), (5, 0.053), (6, 0.071), (7, -0.068), (8, 0.05), (9, 0.013), (10, 0.074), (11, 0.046), (12, 0.025), (13, -0.011), (14, -0.108), (15, -0.026), (16, 0.033), (17, 0.151), (18, -0.114), (19, 0.082), (20, 0.215), (21, 0.068), (22, 0.16), (23, -0.03), (24, -0.104), (25, -0.036), (26, 0.016), (27, -0.121), (28, -0.008), (29, 0.01), (30, -0.167), (31, 0.158), (32, 0.039), (33, -0.008), (34, -0.165), (35, 0.078), (36, 0.035), (37, -0.016), (38, 0.097), (39, 0.088), (40, -0.081), (41, 0.169), (42, 0.014), (43, -0.008), (44, 0.032), (45, -0.079), (46, 0.02), (47, -0.074), (48, 0.007), (49, 0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.95022428 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
Author: Matt Post ; Shane Bergsma
Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.
2 0.77499598 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations
Author: Aliaksei Severyn ; Massimo Nicosia ; Alessandro Moschitti
Abstract: Measuring semantic textual similarity (STS) is at the cornerstone of many NLP applications. Different from the majority of approaches, where a large number of pairwise similarity features are used to represent a text pair, our model features the following: (i) it directly encodes input texts into relational syntactic structures; (ii) relies on tree kernels to handle feature engineering automatically; (iii) combines both structural and feature vector representations in a single scoring model, i.e., in Support Vector Regression (SVR); and (iv) delivers significant improvement over the best STS systems.
3 0.74708074 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
Author: Barbara Plank ; Alessandro Moschitti
Abstract: Relation Extraction (RE) is the task of extracting semantic relationships between entities in text. Recent studies on relation extraction are mostly supervised. The clear drawback of supervised methods is the need of training data: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to. This is the problem of domain adaptation. In this paper, we propose to combine (i) term generalization approaches such as word clustering and latent semantic analysis (LSA) and (ii) structured kernels to improve the adaptability of relation extractors to new text genres/domains. The empirical evaluation on ACE 2005 domains shows that a suitable combination of syntax and lexical generalization is very promising for domain adaptation.
4 0.55689287 4 acl-2013-A Context Free TAG Variant
Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber
Abstract: We propose a new variant of TreeAdjoining Grammar that allows adjunction of full wrapping trees but still bears only context-free expressivity. We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters. This collapsed context-free form is used to implement efficient gram- mar estimation and parsing algorithms. We perform parsing experiments the Penn Treebank and draw comparisons to TreeSubstitution Grammars and between different variations in probabilistic model design. Examination of the most probable derivations reveals examples of the linguistically relevant structure that our variant makes possible.
5 0.54453665 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.
6 0.53568453 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts
7 0.51325911 57 acl-2013-Arguments and Modifiers from the Learner's Perspective
8 0.50034106 357 acl-2013-Transfer Learning for Constituency-Based Grammars
9 0.49853516 310 acl-2013-Semantic Frames to Predict Stock Price Movement
10 0.49525693 165 acl-2013-General binarization for parsing and translation
12 0.47881675 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
13 0.45463043 163 acl-2013-From Natural Language Specifications to Program Input Parsers
14 0.4536294 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
15 0.44294161 14 acl-2013-A Novel Classifier Based on Quantum Computation
16 0.44100678 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints
17 0.4397392 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
18 0.41216755 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
19 0.40322584 275 acl-2013-Parsing with Compositional Vector Grammars
20 0.40254551 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
topicId topicWeight
[(0, 0.047), (2, 0.012), (6, 0.05), (11, 0.053), (14, 0.033), (15, 0.017), (24, 0.065), (25, 0.184), (26, 0.078), (28, 0.017), (35, 0.077), (42, 0.054), (48, 0.045), (70, 0.071), (88, 0.053), (90, 0.019), (95, 0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.84030282 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
Author: Matt Post ; Shane Bergsma
Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.
2 0.81334656 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
Author: Adrien Barbaresi
Abstract: We present a way to extract links from messages published on microblogging platforms and we classify them according to the language and possible relevance of their target in order to build a text corpus. Three platforms are taken into consideration: FriendFeed, identi.ca and Reddit, as they account for a relative diversity of user profiles and more importantly user languages. In order to explore them, we introduce a traversal algorithm based on user pages. As we target lesser-known languages, we try to focus on non-English posts by filtering out English text. Using mature open-source software from the NLP research field, a spell checker (as- pell) and a language identification system (langid .py), our case study and our benchmarks give an insight into the linguistic structure of the considered services.
3 0.765553 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations
Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Motoki Sano ; Stijn De Saeger ; Kiyonori Ohtake
Abstract: In this paper, we explore the utility of intra- and inter-sentential causal relations between terms or clauses as evidence for answering why-questions. To the best of our knowledge, this is the first work that uses both intra- and inter-sentential causal relations for why-QA. We also propose a method for assessing the appropriateness of causal relations as answers to a given question using the semantic orientation of excitation proposed by Hashimoto et al. (2012). By applying these ideas to Japanese why-QA, we improved precision by 4.4% against all the questions in our test set over the current state-of-theart system for Japanese why-QA. In addi- tion, unlike the state-of-the-art system, our system could achieve very high precision (83.2%) for 25% of all the questions in the test set by restricting its output to the confident answers only.
4 0.73223907 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
Author: Nadir Durrani ; Alexander Fraser ; Helmut Schmid ; Hieu Hoang ; Philipp Koehn
Abstract: The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. A large scale evaluation over 8 language pairs shows that performance does significantly improve.
5 0.70041859 318 acl-2013-Sentiment Relevance
Author: Christian Scheible ; Hinrich Schutze
Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.
7 0.6839928 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
8 0.68191159 275 acl-2013-Parsing with Compositional Vector Grammars
9 0.67869002 80 acl-2013-Chinese Parsing Exploiting Characters
10 0.67781901 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
11 0.67737389 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
12 0.67722684 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
13 0.67661631 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
14 0.67476732 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
15 0.67381632 225 acl-2013-Learning to Order Natural Language Texts
16 0.67357594 224 acl-2013-Learning to Extract International Relations from Political Context
17 0.67277545 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
18 0.67262566 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
19 0.66982442 333 acl-2013-Summarization Through Submodularity and Dispersion
20 0.66973335 4 acl-2013-A Context Free TAG Variant