emnlp emnlp2013 emnlp2013-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
Reference: text
sentIndex sentText sentNum sentScore
1 Dependency language models for sentence completion Joseph Gubbins Computer Laboratory University of Cambridge j sg5 2 @ cam ac uk . [sent-1, score-0.541]
2 Abstract Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. [sent-3, score-0.402]
3 Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. [sent-4, score-0.048]
4 In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. [sent-5, score-0.656]
5 We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8. [sent-6, score-0.048]
6 7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train. [sent-7, score-0.152]
7 Systematic approaches for solving such problems require models that can judge the global coherence of sentences. [sent-10, score-0.084]
8 Such measures of global coherence may prove to be useful in various applications, including machine translation and natural language generation (Zweig and Burges, 2012). [sent-11, score-0.036]
9 1405 Andreas Vlachos Computer Laboratory University of Cambridge av3 0 8 @ cam . [sent-12, score-0.066]
10 uk Most approaches to sentence completion employ language models which use a window of immediate context around the missing word and choose the word that results in the completed sentence with the highest probability (Zweig and Burges, 2012; Mnih and Teh, 2012). [sent-14, score-0.643]
11 However, such language models may fail to identify sentences that are locally coherent but are improbable due to long-range syntactic/semantic dependencies. [sent-15, score-0.048]
12 Consider, for example, completing the sentence I saw a tiger which was really very . [sent-16, score-0.107]
13 A language model relying on up to five words of immediate context would ignore the crucial dependency between the missing word and the noun tiger. [sent-20, score-0.395]
14 In this paper we tackle sentence completion using language models based on dependency grammar. [sent-21, score-0.777]
15 These models are similar to standard n-gram language models, but instead of using the linear ordering of the words in the sentence, they generate words along paths in the dependency tree of the sentence. [sent-22, score-0.505]
16 Unlike other approaches incorporating syntax into language models (e. [sent-23, score-0.086]
17 , 1997), our models are relatively easy to train and estimate, and can exploit standard smoothing methods. [sent-26, score-0.123]
18 7 points in accuracy over ngram models, giving the best results to date for any method apart from the more computationally demanding neural language models. [sent-28, score-0.141]
19 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is4t0ic5s–1410, Figure 1: Dependency tree example 2 Unlabelled Dependency Language Models In dependency grammar, each word in a sentence is associated with a node in a dependency tree (Figure 1). [sent-31, score-0.956]
20 We define a dependency tree as a rooted, connected, acyclic directed graph together with a mapping from the nodes of the tree to a set of grammatical relation labels R. [sent-32, score-0.684]
21 We define a lexicalised dependency tiorene as a dependency tfrineee along cwali thse a mapping from the vertices of the tree to a vocabulary V. [sent-33, score-0.81]
22 We seek to model the probability distribution of the lexicalisation of a given dependency tree. [sent-34, score-0.481]
23 We will use this as a language model; we neglect the fact that a given lexicalised dependency tree can correspond to more than one sentence due to variations in word order. [sent-35, score-0.617]
24 Let ST be a lexicalised dependency tree, where T is the unlexicalised tree and let w1w2 . [sent-36, score-0.508]
25 wm be an ordering of the words corresponding to a breadth-first enumeration of the tree. [sent-39, score-0.113]
26 In order for this representation to be unique, when we parse a sentence, we will use the unique breadthfirst ordering where the children of any node appear in the same order as they did in the sentence. [sent-40, score-0.207]
27 We define w0 to be a special symbol denoting the root of the tree. [sent-41, score-0.04]
28 We denote the grammatical relation between wk and its parent by gk ∈ R. [sent-42, score-0.333]
29 We make the following two assumptions: • • that each word wi is conditionally independent othfa tth eea cwho wrdosr do wutside of its ancestor sequence (wk)ik−=10 ∩ A(wi)c, given the ancestor sequence A(wi) ; that the words are independent of the labels (gk)km=1. [sent-46, score-0.922]
30 To deal with this data sparsity issue, we take inspiration from n-gram models and assume a Markov property of order (N 1): − P[w|A(w)] = P[w|A(N−1)(w)] (4) A(N−1)(w) where denotes the sequence of up to (N − 1) closest ancestors of w. [sent-51, score-0.144]
31 imator for this probability is: ˆP[wi|A(N−1)(wi)] =PC((AC((N(A−1(N)(−w1i))(,wwi)i),)w)) We have arrived at a mPodel which is quite similar to n-gram language models. [sent-53, score-0.054]
32 The main difference is that each word in the tree can have several children, while in the n-gram models it can only be followed by one word. [sent-54, score-0.162]
33 Thus the sum in the denominator above does not simplify to the count of the ancestor sequence in the way that it does for n-gram language models. [sent-55, score-0.241]
34 However, we can calculate and store the denominators easily during training, so that we do not need to sum over the vocabulary each time we evaluate the estimator. [sent-56, score-0.036]
35 We refer to this model as the order N unlabelled dependency language model. [sent-57, score-0.515]
36 As is the case for n-gram language models, even for low values of N, we will often encounter se- (A(N−1)(w), quences w) which were not observed in training. [sent-58, score-0.036]
37 In order to avoid assigning zero probability to the entire sentence, we need to use a smoothing method. [sent-59, score-0.168]
38 We can use any of the smoothing methods used for n-gram language models. [sent-60, score-0.075]
39 For simplicity, we use stupid backoff smoothing (Brants et al. [sent-61, score-0.378]
40 3 Labelled Dependency Language Models We assumed above that the words are generated independently from the grammatical relations. [sent-63, score-0.112]
41 However, we are likely to ignore valuable information in doing so. [sent-64, score-0.052]
42 To illustrate this point, consider the following pair of sentences: nsubj You dobjdet ate det An an nsubj apple apple dobj ate you The dependency trees of the two sentences are very similar, with only the grammatical relations between ate and its arguments differing. [sent-65, score-0.873]
43 The unla- belled dependency language model will assign the same probability to both of the sentences as it ignores the labels of grammatical relations. [sent-66, score-0.51]
44 In order to be able to distinguish between them, the nature of the grammatical relations between the words in the dependency tree needs to be incorporated in the language model. [sent-67, score-0.605]
45 We relax the assumption that the words are independent of the labels of the parse tree, assuming instead the each word is conditionally independent of the words and labels outside its ancestor path given the words and labels in its ancestor 1407 path. [sent-68, score-0.659]
46 We define G(wi) to be the sequence of grammatical relations between the successive elements of (A(wi) , wi). [sent-69, score-0.256]
47 G(wi) is the sequence of grammatical relations found on the path from the root node to wi. [sent-70, score-0.34]
48 be the sequence of grammat- G(N−1)(w) ical relations between successive elements of w). [sent-73, score-0.144]
49 a Wvei:t (A(N−1)(w), Ym P[ST|T] = YP[wi|A(N−1)(wi), G(N−1)(wi)] iY= Y1 The maximum likelihood estimator for the probability is once again given by the ratio of the counts of labelled paths. [sent-75, score-0.279]
50 We refer to this model as the order N labelled dependency language model. [sent-76, score-0.513]
51 This consists of a set of 1,040 sentence completion problems taken from five of the Sherlock Holmes novels by Arthur Conan Doyle. [sent-78, score-0.493]
52 Each problem consists of a sentence in which one word has been removed and replaced with a blank and a set of 5 candidate words to complete the sentence. [sent-79, score-0.187]
53 The task is to choose the candidate word which, when inserted into the blank, gives the most probable complete sentence. [sent-80, score-0.048]
54 The set of candidates consists of the original word and 4 imposter words with similar distributional statistics. [sent-81, score-0.083]
55 Human judges were tasked with choosing imposter words which would lead to grammatically correct sentences and such that, with some thought, the correct answer should be unambiguous. [sent-82, score-0.119]
56 The training data set consists of 522 19th century novels from Project Gutenberg. [sent-83, score-0.066]
57 We parsed the training data using the Nivre arc-eager deterministic dependency parsing algorithm (Nivre and Scholz, 2004) as implemented in MaltParser (Nivre et al. [sent-84, score-0.445]
58 We trained order N labelled and unabelled dependency Figure 2: Procedure for evaluating sentence completion problems NT5243ableU1n4: l738Sa. [sent-86, score-0.94]
59 In order to have a baseline to compare against, we also trained n-gram language models with Kneser-Ney smoothing and stupid backoff using the Berkeley Language Modeling Toolkit (Pauls and Klein, 2011). [sent-93, score-0.465]
60 To test a given language model, we calculated the scores it assigned to each candidate sentence and chose the completion with the highest score. [sent-94, score-0.475]
61 For the dependency language models we parsed the sentence with each of the 5 possible completions and calculated the probability in each case. [sent-95, score-0.528]
62 Figure 2 illustrates an example of this process for the order 3 unlabelled model. [sent-96, score-0.213]
63 Unlab-SB is the order N unlabelled dependency language model with Stupid Backoff, Lab-SB is the order N labelled dependency language model with Stupid Backoff, Ngm-SB is the n-gram language model with Stupid Backoff and Ngm-KN is the interpolated KneserNey smoothed n-gram language model. [sent-98, score-1.028]
64 Both of the dependency language models outperfomed the n-gram language models by a substantial 1408 UnlaSbkenlip-egd ra DmaMme spt (ehV(noMad rieoknuocsloy)vM)odel3A94%c48c . [sent-99, score-0.434]
65 The best result was achieved by the order 4 labelled dependency model which is 8. [sent-102, score-0.513]
66 7 points in accuracy better than the best ngram model. [sent-103, score-0.037]
67 Furthermore, the labelled dependency models outperformed their unlabelled counterparts for every order except 2. [sent-104, score-0.735]
68 The performance of the labelled dependency language model is superior to the results reported for any single model method, apart from those relying on neural language models (Mnih and Teh, 2012; Mikolov et al. [sent-107, score-0.626]
69 However the superior performance of neural networks comes at the cost of long training times. [sent-109, score-0.052]
70 6 Related Work and Discussion The best-known language model based on dependency parsing is that of Chelba et al. [sent-114, score-0.343]
71 This model writes the probability in the familiar left-toright chain rule decomposition in the linear order of the sentence, conditioning the probability of the next word on the linear trigram context, as well as some part of the dependency graph information relating to the words on its left. [sent-116, score-0.521]
72 The language models we propose are far simpler to train and compute. [sent-117, score-0.048]
73 A somewhat similar model to our unlabelled dependency language model was proposed in Graham and van Genabith (2010). [sent-118, score-0.476]
74 However they seem to have used different probability estimators which ignore the fact that each node in the dependency tree can have multiple children. [sent-119, score-0.609]
75 Other research on syntactic language modelling has focused on using phrase structure grammars (Pauls and Klein, 2012; Charniak, 2001 ; Roark, 2001 ; Hall and Johnson, 2003). [sent-120, score-0.038]
76 The linear complexity of deterministic dependency parsing makes dependency language models such as ours more scalable than these approaches. [sent-121, score-0.744]
77 The most similar task to sentence completion is lexical substitution (McCarthy and Navigli, 2007). [sent-122, score-0.427]
78 The main difference between them is that in the latter the word to be substituted provides a very important clue in choosing the right candidate, while in sentence completion this is not available. [sent-123, score-0.427]
79 Another related task is selectional preference modeling (S ´eaghdha, 2010; Ritter et al. [sent-124, score-0.073]
80 The dependency language models described in this paper assign probabilities to full sentences. [sent-126, score-0.35]
81 Language models which require full sentences can be used in automatic speech recognition (ASR) and machine translation (MT). [sent-127, score-0.048]
82 The approach is to use a conventional ASR or MT decoder to produce an N-best list of the most likely candidate sentences and then re-score these with the language model. [sent-128, score-0.048]
83 (1997) for ASR using a dependency language model and by Pauls and Klein (201 1) for MT using a PSG-based syntactic language model. [sent-130, score-0.302]
84 1409 7 Conclusion We have proposed a pair of language models which are probabilistic models for the lexicalisation of a given dependency tree. [sent-131, score-0.523]
85 These models are simple to train and evaluate and are scalable to large data sets. [sent-132, score-0.048]
86 They performed substantially better than n-gram language models, achieving the best result reported for any single method except for the more expensive and complex to train neural language models. [sent-134, score-0.052]
87 In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 858–867. [sent-144, score-0.038]
88 In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 124–131. [sent-149, score-0.038]
89 In Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation, pages 118– 126. [sent-158, score-0.038]
90 In IEEE Workshop on Automatic Speech Recognition and Understanding, pages 507–512. [sent-163, score-0.038]
91 A fast and simple algorithm for training neural probabilistic language models. [sent-172, score-0.052]
92 In Proceedings of the 29th International Conference on Machine Learning, pages 1751–1758. [sent-173, score-0.038]
93 In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 258–267. [sent-186, score-0.038]
94 In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 959–968. [sent-191, score-0.038]
95 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 424–434. [sent-196, score-0.038]
96 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 435–444. [sent-205, score-0.038]
97 On the Future of Language Modeling for HLT, pages 29–36. [sent-211, score-0.038]
98 In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 601–610. [sent-216, score-0.038]
wordName wordTfidf (topN-words)
[('completion', 0.354), ('dependency', 0.302), ('wi', 0.275), ('zweig', 0.232), ('ancestor', 0.184), ('unlabelled', 0.174), ('labelled', 0.172), ('stupid', 0.165), ('chelba', 0.145), ('backoff', 0.138), ('lexicalisation', 0.125), ('ym', 0.123), ('burges', 0.122), ('pauls', 0.122), ('tree', 0.114), ('grammatical', 0.112), ('iy', 0.11), ('mnih', 0.11), ('wk', 0.105), ('yp', 0.105), ('lexicalised', 0.092), ('nivre', 0.091), ('imposter', 0.083), ('asr', 0.083), ('ik', 0.076), ('smoothing', 0.075), ('sentence', 0.073), ('selectional', 0.073), ('gk', 0.072), ('vlachos', 0.072), ('enumeration', 0.072), ('mikolov', 0.071), ('blank', 0.066), ('cam', 0.066), ('novels', 0.066), ('ate', 0.063), ('nsubj', 0.061), ('microsoft', 0.061), ('ritter', 0.058), ('jc', 0.058), ('maltparser', 0.058), ('sequence', 0.057), ('apple', 0.055), ('andreas', 0.054), ('probability', 0.054), ('arxiv', 0.053), ('estimator', 0.053), ('apart', 0.052), ('ignore', 0.052), ('neural', 0.052), ('node', 0.051), ('deterministic', 0.051), ('parsed', 0.051), ('graham', 0.051), ('conditionally', 0.049), ('successive', 0.049), ('models', 0.048), ('candidate', 0.048), ('parent', 0.044), ('teh', 0.043), ('hall', 0.043), ('labels', 0.042), ('path', 0.042), ('st', 0.042), ('ordering', 0.041), ('parsing', 0.041), ('pc', 0.041), ('immediate', 0.041), ('brants', 0.041), ('root', 0.04), ('joakim', 0.04), ('challenge', 0.039), ('order', 0.039), ('geoffrey', 0.039), ('association', 0.038), ('pages', 0.038), ('relations', 0.038), ('modelling', 0.038), ('syntax', 0.038), ('children', 0.037), ('ngram', 0.037), ('independent', 0.037), ('mt', 0.036), ('coherence', 0.036), ('writes', 0.036), ('corrado', 0.036), ('ainur', 0.036), ('ciprian', 0.036), ('denominators', 0.036), ('estimators', 0.036), ('ionn', 0.036), ('mario', 0.036), ('neglect', 0.036), ('quences', 0.036), ('spt', 0.036), ('tasked', 0.036), ('chain', 0.036), ('klein', 0.035), ('assumptions', 0.034), ('really', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
2 0.098086223 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu
Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.
3 0.093975969 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang
Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.
4 0.089492038 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
5 0.085880682 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation
Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang
Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .
6 0.075302593 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
8 0.072107643 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
9 0.069811083 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
10 0.068765759 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks
11 0.067684948 20 emnlp-2013-An Efficient Language Model Using Double-Array Structures
12 0.064268775 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
13 0.063669935 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
14 0.06200783 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
15 0.061971001 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
16 0.058753233 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction
17 0.058064725 98 emnlp-2013-Image Description using Visual Dependency Representations
18 0.058058802 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
19 0.057503194 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
20 0.057254378 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
topicId topicWeight
[(0, -0.2), (1, -0.05), (2, -0.009), (3, 0.019), (4, -0.071), (5, 0.044), (6, 0.016), (7, -0.029), (8, -0.056), (9, 0.058), (10, 0.009), (11, 0.009), (12, -0.119), (13, 0.038), (14, -0.054), (15, 0.054), (16, -0.059), (17, 0.098), (18, 0.063), (19, 0.021), (20, 0.053), (21, 0.013), (22, 0.058), (23, 0.131), (24, -0.129), (25, 0.035), (26, 0.04), (27, 0.121), (28, 0.052), (29, -0.057), (30, 0.06), (31, -0.203), (32, 0.181), (33, -0.128), (34, -0.134), (35, -0.04), (36, -0.039), (37, -0.103), (38, 0.009), (39, 0.054), (40, -0.055), (41, -0.071), (42, 0.054), (43, -0.035), (44, -0.059), (45, -0.041), (46, -0.097), (47, 0.005), (48, 0.032), (49, -0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.95477045 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
2 0.59333324 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
3 0.59300393 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
4 0.55627215 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster
Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.
5 0.55360293 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
Author: He He ; Hal Daume III ; Jason Eisner
Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
6 0.50208271 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation
8 0.47502464 176 emnlp-2013-Structured Penalties for Log-Linear Language Models
9 0.46460736 20 emnlp-2013-An Efficient Language Model Using Double-Array Structures
10 0.45406914 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
11 0.446998 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
12 0.43873888 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
13 0.43396902 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
14 0.41140863 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
15 0.40256029 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction
16 0.39644131 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers
17 0.38772765 182 emnlp-2013-The Topology of Semantic Knowledge
18 0.36881635 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks
19 0.36072278 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues
20 0.3593148 156 emnlp-2013-Recurrent Continuous Translation Models
topicId topicWeight
[(3, 0.022), (9, 0.012), (18, 0.015), (22, 0.038), (30, 0.09), (45, 0.431), (50, 0.013), (51, 0.148), (66, 0.054), (71, 0.029), (75, 0.026), (77, 0.04), (96, 0.016)]
simIndex simValue paperId paperTitle
1 0.81784576 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions
Author: Johannes Daxenberger ; Iryna Gurevych
Abstract: In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.
2 0.77132547 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata
Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.
same-paper 3 0.76183569 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
4 0.46544254 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
5 0.46187171 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
6 0.46105993 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
8 0.44354251 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
9 0.4382135 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
10 0.43266717 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
11 0.43193069 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
12 0.43070012 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
13 0.43069649 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
14 0.42925477 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
15 0.42455888 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
16 0.42377672 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
17 0.42190921 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
18 0.42185742 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
19 0.42176938 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing