emnlp emnlp2013 emnlp2013-116 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
Reference: text
sentIndex sentText sentNum sentScore
1 Joint Parsing and Disfluency Detection in Linear Time Mohammad Sadegh Rasooli∗ Department of Computer Science Columbia University, New York, NY ra s oo l @ c s i . [sent-1, score-0.04]
2 edu Abstract We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. [sent-3, score-0.221]
3 Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. [sent-4, score-0.101]
4 We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. [sent-5, score-0.561]
5 1 Introduction Detecting disfluencies in spontaneous speech has been widely studied by researchers in different communities including natural language processing (e. [sent-7, score-0.194]
6 While the percentage of spoken words which are disfluent is typically not more than ten percent (Bortfeld et al. [sent-15, score-0.27]
7 “you know”, “I mean”) or edited words which are repeated or corrected by the speaker. [sent-21, score-0.217]
8 1 Interregnum z}|{ I want a flight t|o B{ozsto}nz|{uzh}}I| m {ezan }{ t|o D{eznve}r Rep{arzandum {FPz {DMz Re{pzair Filled pauses and discourse markers are to some extent a fixed and closed set. [sent-27, score-0.323]
9 The main challenge in finding disfluencies is the case where the edited phrase is neither a rough copy of its repair or has any repair phrase (i. [sent-28, score-0.709]
10 Hence, in previous work, researchers report their method performance on detecting edited phrases (reparandum) (Johnson and Charniak, 2004). [sent-31, score-0.264]
11 In contrast to most previous work which focuses solely on either detection or on parsing, we introduce a novel framework forjointly parsing sentences with disfluencies. [sent-32, score-0.171]
12 To our knowledge, our work is the first model that is based on joint dependency and disfluency detection. [sent-33, score-0.431]
13 We show that our model is robust enough to detect disfluencies with high accuracy, while still maintaining a high level of dependency parsing accuracy that approaches the upper bound. [sent-34, score-0.302]
14 Additionally, our model outperforms prior work on joint parsing and disfluency detection on the disfluency detection task, and improves upon this prior work by running in linear time complexity. [sent-35, score-1.021]
15 In §2, we ohvee rrveimeawi some fth teh previous sw aosrk fo on disfluency detection. [sent-37, score-0.39]
16 1In the literature, edited words are also known as “reparandum”, and the fillers are known as “interregnum”. [sent-41, score-0.288]
17 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 1t2ic4s–129, 2 Related Work Disfluency detection approaches can be divided into two different groups: text-first and speech first (Nakatani and Hirschberg, 1993). [sent-45, score-0.104]
18 In the first approach, all prosodic and acoustic cues are ignored while in the second approach both grammatical and acoustic features are considered. [sent-46, score-0.113]
19 Among text-first approaches, the work is split between developing systems which focus specifically on disfluency detection and those which couple disfluency detection with parsing. [sent-48, score-0.92]
20 For the former, Charniak and Johnson (2001) employ a linear classifier to predict the edited phrases in Switchboard corpus (Godfrey et al. [sent-49, score-0.272]
21 Johnson and Charniak (2004) use a TAG-based noisy channel model to detect disfluencies while parsing with getting nbest parses from each sentence and re-ranking with a language model. [sent-51, score-0.261]
22 The original TAG parser is not used for parsing itself and it is used just to find rough copies in the sentence. [sent-52, score-0.272]
23 Their method achieves promising results on detecting edited words but at the expense of speed (the parser has a complexity of O(N5). [sent-53, score-0.432]
24 (2005) use the same TAG model and add semi-automatically extracted prosodic features. [sent-55, score-0.039]
25 They use three steps for detecting disfluencies using weighted Max-Margin Markov (M3) network: detecting fillers, detecting edited words, and refining errors in previous steps. [sent-61, score-0.518]
26 Some text-first approaches treat parsing and disfluency detection jointly, though the models differ in the type of parse formalism employed. [sent-62, score-0.621]
27 Lease and Johnson (2006) use a PCFG-based parser to parse 125 sentences along with finding edited phrases. [sent-63, score-0.396]
28 To date, none of the prior joint approaches have used a dependency formalism. [sent-65, score-0.041]
29 3 Joint Parsing Model We model the problem using a deterministic transition-based parser (Nivre, 2008). [sent-66, score-0.168]
30 These parsers have the advantage of being very accurate while being able to parse a sentence in linear time. [sent-67, score-0.036]
31 Arc-Eager Algorithm We use the arc-eager algorithm (Nivre, 2004) which is a bottom-up parsing strategy that is used in greedy and k-beam transitionbased parsers. [sent-69, score-0.101]
32 This is particularly beneficial for our task, since we know that reparanda are similar to their repairs. [sent-71, score-0.041]
33 Hence, a reparandum may get its head but whenever the parser faces a repair, it removes the reparandum from the sentence and continues its actions. [sent-72, score-0.708]
34 The actions in an arc-eager parsing algorithm are: • Left-arc (LA): The first word in the buffer becomes tch (eL hAe)a:d T ohfe t fhiers top rwdo inrd hine tbhuef estra cbke. [sent-73, score-0.588]
35 • Right-arc (RA): The top word in the stack becomes tahrec h( ReaAd) o: fT thhee tfoiprst w woordrd i nin t hthee s btaucfkfe br. [sent-75, score-0.043]
36 e • • Reduce (R): The top word in the stack is popped. [sent-76, score-0.043]
37 Shift (SH): The first word in the buffer goes to tShhei top oHf )th: eT hseta cfikrs. [sent-77, score-0.373]
38 Joint Parsing and Disfluency Detection We first extend the arc-eager algorithm by augmenting the action space with three new actions: • Reparandum (Rp[i:j]): treat a phrase (words i to j) aoruatsniddeu mthe ( Rlopo[ki:-ja]h)e:a trde batu fafe phr as a reparandum. [sent-78, score-0.11]
39 • Discourse Marker (Prn[i]): treat a phrase in tDheis cloouork-seah Meadar kbeurffe (rP (first i: words) as a dei isncourse marker and remove them from the sentence. [sent-80, score-0.123]
40 flightto Boston uh I mean flight to Boston uh Imean to . [sent-82, score-0.468]
41 flight to Boston uh I mean to Denver flight to Boston I mean to Denver flight to Boston to Denver flight to Denver flight to Denver flight to Denver . [sent-85, score-1.346]
42 RA RA Intj[1] Prn[1] RP[2:3] RA RA R flight to flight R R Figure 1: A sample transition sequence for the sentence “flight to Boston uh Imean to Denver”. [sent-88, score-0.51]
43 In the third column, only the underlined parse actions are learned by the parser (second classifier). [sent-89, score-0.32]
44 The first classifier uses all instances for training (learns fluent words with “regular” label). [sent-90, score-0.081]
45 Interjection (Intj[i]): treat a phrase in the lIonotker-jaehcetaiod n bu (fIfnetrj (first i words) as a filled pause and remove them from the Our model has two classifiers. [sent-91, score-0.173]
46 The first classifier decides between four possible actions and possible candidates in the current configuration of the sentence. [sent-92, score-0.282]
47 These actions are the three new ones from above and a new action Regular (Reg): which means do one of the original arc-eager parser actions. [sent-93, score-0.313]
48 At each configuration, there might be several candidates for being a prn, intj or reparandum, and one regular candidate. [sent-94, score-0.291]
49 The candidates for being a reparandum are a set of words outside the lookahead buffer and the candidates for being an intj or prn are a set of words beginning from the head of • sentence. [sent-95, score-1.078]
50 If the parser decides regular as the correct action, the second classifier predicts the best parsing transition, based on arc-eager parsing (Nivre, 2004). [sent-97, score-0.508]
51 2In the bracketed version of Switchboard corpus, reparandum is tagged with EDITED and discourse markers and paused fillers are tagged as PRN and INTJ respectively. [sent-99, score-0.434]
52 126 Training A transition-based parser action (our second-level classifier) is sensitive to the words in the buffer and stack. [sent-100, score-0.572]
53 The problem is that we do not have gold dependencies for edited words in our data. [sent-101, score-0.241]
54 Therefore, we need a parser to remove reparandum words from the buffer and push them into the stack. [sent-102, score-0.811]
55 Since our parser cannot be trained on disfluent sentences from scratch, the first step is to train it on clean treebank data. [sent-103, score-0.388]
56 In the second step, we adapt parser weights by training it on disfluent sentences. [sent-104, score-0.388]
57 Our assumption is that we do not know the correct dependencies between disfluent words and other words in the sentence. [sent-105, score-0.269]
58 At each configuration, the parser updates itself with new instances by traversing all configurations in the sentences. [sent-106, score-0.166]
59 In this case, if at the head of the buffer there is an intj or prn tag, the parser allows them to be removed from the buffer. [sent-107, score-0.87]
60 If a reparandum word is not completely outside the buffer (the first two states in Figure 1), the parser decides between the four regular arc-eager actions (i. [sent-108, score-1.032]
61 If the last word pushed into the stack is a reparandum and the first word in the buffer is a regular word, the parser removes all reparanda at the same level (in the case of nested edited words), removes their dependencies to other words and push their dependents into the stack. [sent-111, score-1.247]
62 Otherwise, the parser performs the oracle action and adds that action as its new instance. [sent-112, score-0.255]
63 3 With an adapted parser which is our second-level classifier, we can train our first-level classifier. [sent-113, score-0.143]
64 The same procedure repeats, except that instances for disfluency detection are used for updating param- eter weights for the first classifier for deciding the actions. [sent-114, score-0.515]
65 In Figure 1, only the oracle actions (underlined) are added to the instances for updating parser weights but all first-level actions are learned by the first level classifier. [sent-115, score-0.371]
66 As in that 3The reason that we use a parser instead of expanding all possible transitions for an edited word is that, the number of regular actions will increase and the other actions become sparser than natural. [sent-118, score-0.719]
67 The main difference with previous work is that we use Switchboard mrg files for training and testing our model (since they contain parse trees) instead of the more commonly used Swithboard dps text files. [sent-120, score-0.335]
68 Mrg files are a subset of dps files with about more than half of their size. [sent-121, score-0.23]
69 Unfortunately, the disfluencies marked in the dps files are not exactly the same as those marked in the corresponding mrg files. [sent-122, score-0.459]
70 We use Tsurgeon (Levy and Andrew, 2006) for extracting sentences from mrg files and use the Penn2Malt tool5 to convert them to dependencies. [sent-125, score-0.217]
71 Afterwards, we provide dependency trees with disfluent words being the dependent of nothing. [sent-126, score-0.286]
72 Since the first classifier data is heavily biased towards the “regular label”, we modify the weight updates in the original algorithm to 2 (original is 1) for the cases where a “reparandum” is wrongly recognized as another label. [sent-128, score-0.078]
73 6 For the second classifier (parser), we use the original averaged structured Perceptron algorithm. [sent-131, score-0.055]
74 Features Since for every state in the parser configuration, there are many candidates for being disfluent; we use local features as well as global features for the first classifier. [sent-133, score-0.186]
75 Global features are mostly useful for discriminating between the four actions and local features are mostly useful for choosing a phrase as a candidate for being a disfluent phrase. [sent-134, score-0.389]
76 127 Global Features First n words inside/outside buffer (n=1:4) First n POS i/o buffer (n=1:6) Are n words i/o buffer equal? [sent-144, score-1.119]
77 (n=1:4) Number of common words i/o buffer words (n=1:6) Local Features First n words of the candidate phrase (n=1:4) First n POS of the candidate phrase (n=1:6) Distance between the candidate and first word in the buffer Figure 2: Features used for learning the first classifier. [sent-148, score-0.806]
78 Fine-grained (FG) transitions are enriched with parse actions (e. [sent-150, score-0.21]
79 parser in a similar manner as the MaltParser (Nivre et al. [sent-153, score-0.143]
80 Parser Evaluation We evaluate our parser with both unlabeled attachment accuracy ofcorrect words and precision and recall of finding the dependencies of correct The second classifier is trained with 3 iterations in the first step and 3 iterations in the second step. [sent-155, score-0.343]
81 We use the attachment accuracy of the parse tree of the correct sentences (without disfluencies) as the upper-bound attachment score and parsed tree of the disfluent sentences (without disfluency detection) as our lower-bound attachment words. [sent-156, score-0.878]
82 As we can see in Table 1, WAP does a slightly better job parsing sentences. [sent-158, score-0.101]
83 The upper-bound parsing accuracy shows that we do not lose too much information while jointly detecting disfluencies. [sent-159, score-0.148]
84 Our parser is not comparable to (Johnson and Charniak, 2004) and (Miller and Schuler, 2008), since we use dependency relations for evaluation instead of constituencies. [sent-160, score-0.184]
85 Disfluency Detection Evaluation We evaluate our model on detecting edited words in the sentences 7The parser is actually trained to do labeled attachment and labeled accuracy is about 1-1. [sent-161, score-0.476]
86 UB = upperbound (parsing clean sentences), LB = lowerbound (parsing disfluent sentences without disfluency correction). [sent-176, score-0.635]
87 For the sake of comparing to the state of the art, the best result for this task (Qian and Liu, 2013) is replicated from their available software8 on the portion of dps files that have corresponding mrg files. [sent-187, score-0.299]
88 For a fairer comparison, we also optimized the number of training iterations of (Qian and Liu, 2013) for the mrg set based on dev data (10 iterations instead of 30 iterations). [sent-188, score-0.195]
89 As shown in the results, our model accuracy is slightly less than the state-of-the-art (which focuses solely on the disfluency detection task and does no parsing), but we believe that the performance can be improved through better features and by changing the model. [sent-189, score-0.46]
90 use 128 5 Conclusion In this paper, we have developed a fast, yet accurate, joint dependency parsing and disfluency detection model. [sent-197, score-0.602]
91 Such a parser is useful for spoken dialogue systems which typically encounter disfluent speech and require accurate syntactic structures. [sent-198, score-0.447]
92 The model is completely flexible with adding other features (ei- ther text or speech features). [sent-199, score-0.063]
93 There are still many ways of improving this framework such as using k-beam training and decoding, using prosodic and acoustic features, using out of domain data for improving the language and parsing models, and merging the two classifiers into one through better feature engineering. [sent-200, score-0.177]
94 Ballesteros and Nivre (2013) show that parser accuracy can improve by changing that position for English. [sent-202, score-0.143]
95 One of the main challenges in this problem is that most of the training instances are not disfluent and thus the sample space is very sparse. [sent-203, score-0.245]
96 Another challenge is related to the parser speed, since the number of candidates and features are much greater than the number used in classical dependency parsers. [sent-206, score-0.227]
97 Edit detec- tion and parsing for transcribed speech. [sent-226, score-0.101]
98 Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. [sent-230, score-0.039]
99 A unified syntactic model for parsing fluent and disfluent speech. [sent-266, score-0.372]
100 The impact of language models and loss functions on repair disfluency detection. [sent-298, score-0.512]
wordName wordTfidf (topN-words)
[('disfluency', 0.39), ('buffer', 0.373), ('reparandum', 0.265), ('disfluent', 0.245), ('edited', 0.217), ('flight', 0.194), ('intj', 0.177), ('prn', 0.177), ('disfluencies', 0.16), ('mrg', 0.143), ('parser', 0.143), ('denver', 0.136), ('kahn', 0.122), ('repair', 0.122), ('uh', 0.122), ('johnson', 0.12), ('actions', 0.114), ('boston', 0.112), ('qian', 0.104), ('parsing', 0.101), ('switchboard', 0.097), ('lease', 0.097), ('dps', 0.082), ('wap', 0.082), ('charniak', 0.078), ('schuler', 0.075), ('rp', 0.075), ('files', 0.074), ('nivre', 0.071), ('regular', 0.071), ('fillers', 0.071), ('detection', 0.07), ('miller', 0.069), ('attachment', 0.069), ('nuance', 0.065), ('godfrey', 0.065), ('pauses', 0.061), ('sunnyvale', 0.061), ('transitions', 0.06), ('action', 0.056), ('classifier', 0.055), ('joakim', 0.049), ('filled', 0.048), ('detecting', 0.047), ('stack', 0.043), ('candidates', 0.043), ('dependency', 0.041), ('bortfeld', 0.041), ('imean', 0.041), ('interregnum', 0.041), ('leftarc', 0.041), ('nakatani', 0.041), ('pause', 0.041), ('reparanda', 0.041), ('tsurgeon', 0.041), ('zwarts', 0.041), ('fg', 0.041), ('ra', 0.04), ('prosodic', 0.039), ('marker', 0.039), ('perceptron', 0.039), ('decides', 0.037), ('acoustic', 0.037), ('parse', 0.036), ('georgila', 0.035), ('kallirroi', 0.035), ('ballesteros', 0.035), ('finlayson', 0.035), ('markers', 0.035), ('removes', 0.035), ('liu', 0.034), ('speech', 0.034), ('configuration', 0.033), ('discourse', 0.033), ('xian', 0.032), ('pos', 0.031), ('bracketed', 0.03), ('phrase', 0.03), ('remove', 0.03), ('mean', 0.03), ('completely', 0.029), ('eugene', 0.029), ('rough', 0.028), ('maltparser', 0.028), ('tag', 0.028), ('jeremy', 0.027), ('naaclhlt', 0.027), ('underlined', 0.027), ('mark', 0.027), ('fluent', 0.026), ('wen', 0.026), ('iterations', 0.026), ('deterministic', 0.025), ('expense', 0.025), ('ap', 0.025), ('spoken', 0.025), ('treat', 0.024), ('dependencies', 0.024), ('updates', 0.023), ('refined', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
2 0.087228581 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata
Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.
3 0.066550657 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
Author: Kai Zhao ; James Cross ; Liang Huang
Abstract: We present the first provably optimal polynomial time dynamic programming (DP) algorithm for best-first shift-reduce parsing, which applies the DP idea of Huang and Sagae (2010) to the best-first parser of Sagae and Lavie (2006) in a non-trivial way, reducing the complexity of the latter from exponential to polynomial. We prove the correctness of our algorithm rigorously. Experiments confirm that DP leads to a significant speedup on a probablistic best-first shift-reduce parser, and makes exact search under such a model tractable for the first time.
4 0.061242446 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif
Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102
5 0.059727866 78 emnlp-2013-Exploiting Language Models for Visual Recognition
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.
6 0.057707258 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
7 0.05599764 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
8 0.051326547 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
9 0.050886612 58 emnlp-2013-Dependency Language Models for Sentence Completion
10 0.045534097 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
11 0.04535323 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
12 0.044198081 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging
13 0.042346932 141 emnlp-2013-Online Learning for Inexact Hypergraph Search
14 0.04181112 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
15 0.036556855 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
16 0.031021442 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
17 0.030937204 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
18 0.030778691 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
19 0.030345533 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication
20 0.03018087 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
topicId topicWeight
[(0, -0.117), (1, -0.017), (2, 0.009), (3, 0.009), (4, -0.077), (5, 0.023), (6, 0.015), (7, -0.006), (8, 0.001), (9, 0.081), (10, -0.072), (11, 0.035), (12, -0.054), (13, 0.056), (14, 0.007), (15, 0.029), (16, -0.099), (17, 0.005), (18, -0.036), (19, 0.009), (20, 0.06), (21, -0.0), (22, 0.099), (23, 0.079), (24, 0.074), (25, 0.039), (26, 0.01), (27, 0.119), (28, 0.032), (29, 0.013), (30, 0.024), (31, 0.022), (32, 0.099), (33, 0.029), (34, -0.045), (35, -0.007), (36, -0.074), (37, -0.076), (38, 0.004), (39, 0.018), (40, -0.092), (41, 0.016), (42, 0.016), (43, 0.046), (44, -0.162), (45, -0.089), (46, 0.049), (47, 0.01), (48, 0.061), (49, -0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.92514265 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
2 0.6125142 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
Author: Wenliang Chen ; Min Zhang ; Yue Zhang
Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
3 0.59501559 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster
Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.
4 0.55928516 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata
Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.
5 0.53584141 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
6 0.52760625 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
7 0.51229143 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
8 0.50874335 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
9 0.50796783 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
10 0.45571068 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers
11 0.42353275 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
12 0.41696316 141 emnlp-2013-Online Learning for Inexact Hypergraph Search
13 0.40212119 33 emnlp-2013-Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese
14 0.37310126 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
15 0.34723008 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
16 0.33687383 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
17 0.32546937 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging
18 0.32370952 45 emnlp-2013-Chinese Zero Pronoun Resolution: Some Recent Advances
19 0.31578711 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
20 0.30128577 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
topicId topicWeight
[(3, 0.035), (18, 0.038), (22, 0.035), (26, 0.011), (30, 0.048), (39, 0.015), (44, 0.396), (45, 0.043), (50, 0.034), (51, 0.126), (66, 0.03), (71, 0.02), (75, 0.019), (77, 0.025), (95, 0.011), (96, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.69561535 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
Author: Mohammad Sadegh Rasooli ; Joel Tetreault
Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.
2 0.50372386 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
Author: Peng Li ; Yang Liu ; Maosong Sun
Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.
3 0.36294386 58 emnlp-2013-Dependency Language Models for Sentence Completion
Author: Joseph Gubbins ; Andreas Vlachos
Abstract: Sentence completion is a challenging semantic modeling task in which models must choose the most appropriate word from a given set to complete a sentence. Although a variety of language models have been applied to this task in previous work, none of the existing approaches incorporate syntactic information. In this paper we propose to tackle this task using a pair of simple language models in which the probability of a sentence is estimated as the probability of the lexicalisation of a given syntactic dependency tree. We apply our approach to the Microsoft Research Sentence Completion Challenge and show that it improves on n-gram language models by 8.7 percentage points, achieving the highest accuracy reported to date apart from neural language models that are more complex and ex- pensive to train.
4 0.36206847 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
Author: Alla Rozovskaya ; Dan Roth
Abstract: State-of-the-art systems for grammatical error correction are based on a collection of independently-trained models for specific errors. Such models ignore linguistic interactions at the sentence level and thus do poorly on mistakes that involve grammatical dependencies among several words. In this paper, we identify linguistic structures with interacting grammatical properties and propose to address such dependencies via joint inference and joint learning. We show that it is possible to identify interactions well enough to facilitate a joint approach and, consequently, that joint methods correct incoherent predictions that independentlytrained classifiers tend to produce. Furthermore, because the joint learning model considers interacting phenomena during training, it is able to identify mistakes that require mak- ing multiple changes simultaneously and that standard approaches miss. Overall, our model significantly outperforms the Illinois system that placed first in the CoNLL-2013 shared task on grammatical error correction.
5 0.36177376 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
Author: Kuzman Ganchev ; Dipanjan Das
Abstract: We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation.
6 0.36150467 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
7 0.35969362 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
9 0.35829934 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
10 0.3578651 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions
11 0.35760957 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
12 0.35583988 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
13 0.35534453 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
14 0.35509098 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
15 0.35500696 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
16 0.35446849 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
17 0.35394979 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
18 0.3538368 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
19 0.35370144 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching
20 0.35302907 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks