emnlp emnlp2013 emnlp2013-168 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
Reference: text
sentIndex sentText sentNum sentScore
1 sg Abstract In current dependency parsing models, conventional features (i. [sent-6, score-0.272]
2 base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. [sent-8, score-0.318]
3 In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i. [sent-9, score-0.276]
4 meta features) with the help of a large amount of automatically parsed data. [sent-11, score-0.815]
5 The meta features are used together with base features in our final parser. [sent-12, score-1.057]
6 Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data. [sent-14, score-0.273]
7 The supervised models take annotated data as training data, utilize features defined over surface words, part-of-speech tags, and dependency trees, and learn the preference of features via adjusting feature weights. [sent-17, score-0.34]
8 If input sentences contain unknown features that are not included in training data, the parsers can usually give lower accuracy. [sent-26, score-0.249]
9 In this paper, we propose an alternative approach to semi-supervised dependency parsing via feature transformation (Ando and Zhang, 2005). [sent-31, score-0.307]
10 The base features defined over surface words, part-of-speech tags, and dependency trees are high dimensional and have been explored in the above previous studies. [sent-35, score-0.424]
11 The higher-level features, which we call meta features, are low dimensional, and newly defined in this paper. [sent-36, score-0.781]
12 The key idea behind is that we build connections between known and unknown base features via the meta features. [sent-37, score-1.057]
13 From another viewpoint, we can also interpret the meta features as a way of doing feature smoothing. [sent-38, score-0.883]
14 In our approach, the base features are grouped and each group relates to a meta feature. [sent-40, score-1.0]
15 In the first step, we use a baseline parser to parse a large amount of unannotated sentences. [sent-41, score-0.246]
16 Based on the transformed values, we define a set of meta features. [sent-44, score-0.805]
17 Finally, the meta features are incorporated directly into parsing models. [sent-45, score-0.911]
18 The meta features build connections between known and unknown base features, and relieve the data sparseness problem. [sent-53, score-1.093]
19 • Compared to the base features, the number of meta pfearaetudre tos tish remarkably sremsa,l thl. [sent-54, score-0.943]
20 • We build semi-supervised dependency parsers tWhaet baucihlidev see mthi-es ubpeestr accuracy on ethncey yC phairnseesrse data and comparable accuracy with the best known systems on the English data. [sent-55, score-0.277]
21 1304 Section 3 describes the meta features and meta parser. [sent-58, score-1.619]
22 1 Graph-based parsing model Given an input sentence, dependency parsing is to build a dependency tree. [sent-65, score-0.43]
23 In the graph-based model, we define ordered pair (wi, wj) ∈ y as a dependency relation in tree y from word wi to yw aosrd a wj (wi nisc yth ree ahetiaodn iannd tr wj i sf othme dependent), Gx as a graph that consists of a set of nodes Vx = {w0, w1, . [sent-80, score-0.264]
24 We define the score of a dependency tree y ∈ Y (Gx) teof i bnee th thee sum roef othfe subgraph scores, = score(x,y) = ∑score(x,g) (1) ∑g∈y where g is a spanning subgraph of y, which can be a single arc or adjacent arcs. [sent-92, score-0.244]
25 Then scoring function score(x, g) is, score(x, g) = f(x, g) · w (2) where f(x, g) is a high-dimensional feature vector based on features defined over g and x and w refers to the weights for the features. [sent-95, score-0.243]
26 2 Base features Previous studies have defined different sets of features for the graph-based parsing models, such as the first-order features defined in McDonald et al. [sent-100, score-0.244]
27 We further extend the features defined by Bohnet (2010) by introducing more lexical features as the base features. [sent-103, score-0.276]
28 3 Baseline parser We train a parser with the base features as the Baseline parser. [sent-107, score-0.495]
29 We define fb(x, g) as the base features and wb as the corresponding weights. [sent-108, score-0.271]
30 The features in FM are referred to as meta features. [sent-111, score-0.838]
31 Based on the mapped values, we define feature templates for generating the meta features. [sent-113, score-0.959]
32 Finally, we build a new parser with the base and meta features. [sent-114, score-1.081]
33 1 Template-based mapping function We define a template-based function for mapping the base features to predefined discrete values. [sent-116, score-0.267]
34 For each template Ti ∈ TB, we can generate a set of base features Fi fro∈m dependency trees in the parsed data, which is automatically parsed by the Baseline parser. [sent-119, score-0.487]
35 In (b) First-order Linear (d) Second-order Linear total, we have 4 × Table 1: Base feature templates N(TB) possible values for all the ×× tboatsael features, w4×heNre N(TB) refers to the number of the base feature templates, which is usually small. [sent-125, score-0.499]
36 2 Meta feature templates Based on the mapped values, we define meta feature templates in FM for dependency parsing. [sent-128, score-1.252]
37 The meta feature templates are listed in Table 2, where fb is a base feature of FB, hp refers to the partof-speech tag of the head and hw refers to the surface word of the head. [sent-129, score-1.67]
38 The number of the meta features is relatively small. [sent-131, score-0.838]
39 2 shows that the 1306 size of meta features is only 1. [sent-138, score-0.838]
40 3 Generating meta features We use an example to demonstrate how to generate the meta features based on the meta feature templates in practice. [sent-141, score-2.608]
41 ” and want to generate the meta features for the relation among “ate”, “meat”, and “with”, where “ate” is the head, “meat” is the dependent, and “with” is the closest left sibling of “meat”. [sent-143, score-0.869]
42 We can have a base feature “ate, meat, with, RIGHTSIB”, where “RIGHTSIB” refers to the parent-siblings structure with the right direction. [sent-146, score-0.348]
43 wwiit h aa ffoorrkk Tk: hw, dw, cw, d(h,d,c) Fb: ate, meat, with, RIGHTSIB " (fb)=Mk [Mk]; [Mk], VV; [Mk], ate Figure 1: An example of generating meta features ing to the mapping function, we obtain the mapped value Mk. [sent-154, score-0.955]
44 Finally, we have the three meta features “[Mk]”, “[Mk], V V ”, and “[Mk] , ate”, where V V is the part-of-speech tag of word “ate”. [sent-155, score-0.838]
45 In this way, we can generate all the meta features for the graphbased model. [sent-156, score-0.861]
46 4 Meta parser We combine the base features with the meta features by a new scoring function, score(x, g) = fb(x, g) · wb + fm(x, g) · wm (5) where fb(x, g) refers to the base features, fm(x, g) refers to the meta features, and wb and wm are their corresponding weights respectively. [sent-158, score-2.628]
47 The new parser is referred to as the meta parser. [sent-163, score-0.919]
48 4 Experiments We evaluated the effect of the meta features for the graph-based parsers on English and Chinese data. [sent-164, score-0.973]
49 , the percentage of to2We ensured that the text used for building the meta features did not include the sentences of the Penn Treebank. [sent-195, score-0.838]
50 2 Feature selection on development sets We evaluated the parsers with different settings on the development sets to select the meta features. [sent-200, score-0.916]
51 1 Different models vs meta features In this section, we investigated the effect of different types of meta features for the models trained on different sizes of training data on English. [sent-203, score-1.676]
52 There are too many base feature templates to test one by one. [sent-204, score-0.313]
53 Based on different categories of base templates, we have different sets of meta features. [sent-209, score-0.943]
54 Then, we generated the meta features based on the newly auto-parsed data. [sent-212, score-0.838]
55 1308 meta parsers were trained on the different subsets of the training data with different sets of meta features. [sent-216, score-1.697]
56 Finally, we have three meta parsers: MP1, MP10, MPFULL, which were trained on 1%, 10% and 100% of the training data. [sent-217, score-0.781]
57 From the table, we found that the meta features that are only related to part-of-speech tags did not always help, while the ones related to the surface words were very helpful. [sent-219, score-0.913]
58 These suggested that the more sparse the base features were, the more effective the corresponding meta features were. [sent-221, score-1.057]
59 Thus, we built the final parsers by adding the meta features of N1WM, N2WM, N3WM, and N4WM. [sent-222, score-0.973]
60 The results showed that OURS achieved better performance than the systems with individual sets of meta features. [sent-223, score-0.781]
61 2 Different meta feature types In Table 2, there are three types of meta feature templates. [sent-226, score-1.652]
62 Here, the results of the parsers with different settings are shown in Table 6, where CORE refers to the first type, WithPOS refers to the second one, and WithWORD refers to the third one. [sent-227, score-0.558]
63 We also counted the numbers of the meta features. [sent-229, score-0.818]
64 Thus, we used all the three types of meta features in our final meta parsers. [sent-232, score-1.619]
65 3 Main results on test sets We then evaluated the meta parsers on the English and Chinese test sets. [sent-236, score-0.916]
66 1 English The results are shown in Table 7, where MetaParser refers to the meta parser. [sent-239, score-0.922]
67 We found that the meta parser outperformed the baseline with an absolute improvement of 1. [sent-240, score-0.942]
68 As in the experiment on English, the meta parser outperformed the baseline. [sent-252, score-0.919]
69 4 Different sizes of unannotated data Here, we considered the improvement relative to the sizes of the unannotated data used to generate the meta features. [sent-258, score-0.951]
70 We also tried generating the meta features from the training data only, shown as TrainData in Table 9. [sent-265, score-0.838]
71 (2008), Suzuki09 refers to the parser of Suzuki et al. [sent-273, score-0.279]
72 (2009), Chen09 refers to the parser of Chen et al. [sent-274, score-0.279]
73 (2009), Zhou1 1 refers to the parser of Zhou et al. [sent-275, score-0.279]
74 (201 1), and Chen12 refers to the parser of Chen et al. [sent-277, score-0.279]
75 The results showed that our meta parser outperformed most of the previous systems and obtained the comparable accuracy with the best result of Suzuki1 1 (Suzuki et al. [sent-279, score-0.919]
76 However, our approach is much simpler than theirs and we believe that our meta parser can be further improved by combining their methods. [sent-282, score-0.919]
77 (201 1), Hatori1 1 refers to the parser of Hatori et al. [sent-287, score-0.279]
78 (201 1), and Li12 refers to the unlabeled parser of Li et al. [sent-288, score-0.279]
79 We found that the score of our meta parser for this data was the best reported so far and significantly higher than the previous scores. [sent-291, score-0.919]
80 6 Analysis Here, we analyzed the effect of the meta features on the data sparseness problem. [sent-296, score-0.874]
81 BIN Figure 2: Accuracies relative to numbers ofunknown features (average per word) by Baseline parsers Then, we investigated the effect of the meta features. [sent-304, score-1.01]
82 We calculated the average number of active meta features per word that were transformed from the unknown features for each sentence. [sent-305, score-1.008]
83 We sorted the sentences in increasing order of the average numbers of active meta features and divided them into five bins. [sent-306, score-0.935]
84 Figures 3 and 4 show the results, where “Better” is for the sentences where the meta parsers provided better results than the baselines and “Worse” is for those where the meta parsers provided worse results. [sent-308, score-1.832]
85 We found that the gap between “Better” and “Worse” became larger while the sentences contain more active meta features for the unknown features. [sent-309, score-0.927]
86 This indicates that the meta features are very effective in processing the unknown features. [sent-311, score-0.895]
87 5 Related work Our approach is to use unannotated data to generate the meta features to improve dependency parsing. [sent-312, score-1.065]
88 BIN Figure 3: Improvement relative to numbers of active meta features on English (average per word) BIN Figure 4: Improvement relative to numbers of active meta features on Chinese (average per word) Several previous studies relevant to our approach have been conducted. [sent-313, score-1.814]
89 (2008) used a word clusters trained on a large amount of unannotated data and designed a set of new features based on the clusters for dependency parsing models. [sent-315, score-0.357]
90 (2009) extended a Semi-supervised Structured Conditional Model (SSSCM) of Suzuki and Isozaki (2008) to the dependency parsing problem and combined their method with the word clustering feature representation of Koo et al. [sent-319, score-0.26]
91 (2012) proposed an approach to representing high-order features for graphbased dependency parsing models using a dependency language model and beam search. [sent-322, score-0.437]
92 In comparison with their approach, our method is simpler in the sense that we do not request any intermediate step of splitting the prediction problem, and obtain meta fea- tures directly from self-annotated data. [sent-333, score-0.781]
93 The training of our meta feature values is highly efficient, requiring the collection of simple statistics over base features from huge amount of data. [sent-334, score-1.045]
94 6 Conclusion In this paper, we have presented a simple but effective semi-supervised approach to learning the meta features from the auto-parsed data for dependency parsing. [sent-336, score-0.98]
95 We build a meta parser by combining the meta features with the base features in a graph-based model. [sent-337, score-1.976]
96 Our meta parser achieves comparable accuracy with the best known parsers on the English data (Penn English Treebank) and the best accuracy on the Chinese data (Chinese Treebank Version 5. [sent-339, score-1.054]
97 Further analysis indicate that the meta features are very effective in processing the unknown features. [sent-341, score-0.895]
98 Utilizing dependency language models for graph-based dependency parsing models. [sent-381, score-0.357]
99 Incremental joint pos tagging and dependency parsing in chinese. [sent-402, score-0.246]
100 Joint models for chinese pos tagging and dependency parsing. [sent-429, score-0.274]
wordName wordTfidf (topN-words)
[('meta', 0.781), ('base', 0.162), ('fb', 0.149), ('dependency', 0.142), ('refers', 0.141), ('parser', 0.138), ('parsers', 0.135), ('templates', 0.106), ('chinese', 0.101), ('meat', 0.099), ('mcdonald', 0.091), ('koo', 0.088), ('unannotated', 0.085), ('mk', 0.085), ('wenliang', 0.082), ('suzuki', 0.08), ('parsing', 0.073), ('ate', 0.066), ('gx', 0.065), ('bin', 0.06), ('tb', 0.058), ('nivre', 0.057), ('features', 0.057), ('unknown', 0.057), ('jun', 0.056), ('fm', 0.056), ('wb', 0.052), ('wm', 0.052), ('zhang', 0.051), ('ayr', 0.049), ('gaxx', 0.049), ('gym', 0.049), ('hatori', 0.049), ('rightsib', 0.049), ('ando', 0.049), ('carreras', 0.048), ('transformation', 0.047), ('feature', 0.045), ('chen', 0.044), ('secondorder', 0.043), ('mcclosky', 0.043), ('wj', 0.043), ('isozaki', 0.039), ('penn', 0.039), ('surface', 0.039), ('numbers', 0.037), ('singapore', 0.037), ('vx', 0.036), ('tags', 0.036), ('wi', 0.036), ('sparseness', 0.036), ('english', 0.036), ('subgraph', 0.034), ('treebank', 0.034), ('spanning', 0.034), ('crammer', 0.034), ('parsed', 0.034), ('template', 0.034), ('auxiliary', 0.033), ('mxpost', 0.033), ('bohnet', 0.033), ('hideki', 0.033), ('active', 0.032), ('pos', 0.031), ('sibling', 0.031), ('hw', 0.031), ('sagae', 0.031), ('uas', 0.031), ('head', 0.03), ('yue', 0.029), ('conll', 0.029), ('condensed', 0.029), ('kiyotaka', 0.029), ('kruengkrai', 0.029), ('thirdorder', 0.029), ('sorted', 0.028), ('min', 0.027), ('mapped', 0.027), ('segmentation', 0.026), ('yamada', 0.026), ('bllip', 0.026), ('uchimoto', 0.026), ('zhenghua', 0.026), ('pages', 0.026), ('session', 0.025), ('ichi', 0.025), ('wanxiang', 0.024), ('transformed', 0.024), ('xue', 0.024), ('pereira', 0.024), ('mapping', 0.024), ('trees', 0.024), ('tagger', 0.024), ('charniak', 0.024), ('baseline', 0.023), ('kazama', 0.023), ('adn', 0.023), ('graphbased', 0.023), ('buchholz', 0.023), ('projective', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
2 0.13331309 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
Author: He He ; Hal Daume III ; Jason Eisner
Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
3 0.090283483 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster
Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.
4 0.089492038 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
5 0.085399419 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu
Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.
6 0.080463313 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
7 0.073939495 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
8 0.070563838 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
9 0.069200411 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions
10 0.067219734 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
11 0.066558309 61 emnlp-2013-Detecting Promotional Content in Wikipedia
12 0.065407977 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning
13 0.0649243 27 emnlp-2013-Authorship Attribution of Micro-Messages
14 0.064886324 141 emnlp-2013-Online Learning for Inexact Hypergraph Search
15 0.063430637 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
16 0.060399368 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
17 0.05599764 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
18 0.05462208 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
19 0.05337375 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
20 0.053280406 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
topicId topicWeight
[(0, -0.167), (1, -0.023), (2, 0.014), (3, -0.028), (4, -0.093), (5, 0.011), (6, 0.047), (7, 0.055), (8, -0.018), (9, 0.148), (10, 0.009), (11, -0.039), (12, -0.059), (13, -0.004), (14, -0.028), (15, 0.001), (16, -0.216), (17, 0.077), (18, -0.066), (19, -0.039), (20, -0.037), (21, 0.017), (22, 0.118), (23, 0.139), (24, 0.045), (25, 0.053), (26, 0.004), (27, 0.156), (28, 0.037), (29, -0.012), (30, 0.053), (31, -0.066), (32, 0.181), (33, -0.007), (34, 0.072), (35, -0.024), (36, -0.112), (37, -0.03), (38, 0.06), (39, 0.09), (40, 0.075), (41, 0.033), (42, 0.064), (43, 0.023), (44, -0.037), (45, -0.127), (46, -0.07), (47, -0.206), (48, 0.049), (49, 0.102)]
simIndex simValue paperId paperTitle
same-paper 1 0.94780242 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
2 0.64501119 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster
Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.
3 0.64460874 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
4 0.64256561 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
Author: He He ; Hal Daume III ; Jason Eisner
Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
5 0.56762832 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
6 0.54467481 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions
7 0.51452106 61 emnlp-2013-Detecting Promotional Content in Wikipedia
8 0.44780809 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
9 0.43096614 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
11 0.38913226 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment
12 0.38155487 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
13 0.35705173 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
14 0.34763759 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
15 0.33576331 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
16 0.33450815 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
17 0.33359805 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
18 0.33099127 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
19 0.32626691 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
20 0.32188177 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
topicId topicWeight
[(3, 0.026), (9, 0.012), (18, 0.039), (22, 0.075), (26, 0.018), (30, 0.08), (43, 0.017), (45, 0.057), (50, 0.013), (51, 0.173), (66, 0.054), (67, 0.014), (71, 0.024), (74, 0.202), (75, 0.015), (77, 0.034), (90, 0.018), (95, 0.019), (96, 0.015)]
simIndex simValue paperId paperTitle
1 0.84430569 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi
Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.
same-paper 2 0.81226778 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
3 0.78346395 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh
Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.
4 0.7123118 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
5 0.71158451 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
Author: Alla Rozovskaya ; Dan Roth
Abstract: State-of-the-art systems for grammatical error correction are based on a collection of independently-trained models for specific errors. Such models ignore linguistic interactions at the sentence level and thus do poorly on mistakes that involve grammatical dependencies among several words. In this paper, we identify linguistic structures with interacting grammatical properties and propose to address such dependencies via joint inference and joint learning. We show that it is possible to identify interactions well enough to facilitate a joint approach and, consequently, that joint methods correct incoherent predictions that independentlytrained classifiers tend to produce. Furthermore, because the joint learning model considers interacting phenomena during training, it is able to identify mistakes that require mak- ing multiple changes simultaneously and that standard approaches miss. Overall, our model significantly outperforms the Illinois system that placed first in the CoNLL-2013 shared task on grammatical error correction.
6 0.71101427 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
7 0.70352453 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
8 0.70302296 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
9 0.70200974 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
10 0.70024651 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
11 0.69884586 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
12 0.69857723 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
13 0.69843286 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
14 0.69834888 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
15 0.69674063 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
16 0.69559407 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
17 0.69509745 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
18 0.69464463 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
19 0.69394279 143 emnlp-2013-Open Domain Targeted Sentiment
20 0.69369847 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing