emnlp emnlp2011 emnlp2011-8 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. [sent-6, score-0.242]
2 Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. [sent-7, score-0.108]
3 We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information. [sent-9, score-0.686]
4 1 Introduction Recent research in psycholinguistics has seen a growing interest in the role of prediction in sentence processing. [sent-10, score-0.1]
5 Prediction refers to the fact that the hu- man sentence processor is able to anticipate upcoming material, and that processing is facilitated when predictions turn out to be correct (evidenced, e. [sent-11, score-0.157]
6 , by shorter reading times on the predicted word or phrase). [sent-13, score-0.123]
7 This allows the processor to save time and makes it easier to cope with the constant stream of new input. [sent-18, score-0.054]
8 Evidence for prediction has been found in a range of psycholinguistic processing domains. [sent-19, score-0.17]
9 Semantic 304 prediction has been demonstrated by studies that show anticipation based on selectional restrictions: listeners are able to launch eye-movements to the predicted argument of a verb before having encountered it, e. [sent-20, score-0.069]
10 Semantic prediction has also been shown in the context of semantic priming: a word that is pre- ceded by a semantically related prime or by a semantically congruous sentence fragment is processed faster (Stanovich and West, 1981 ; Clifton et al. [sent-23, score-0.069]
11 An example for syntactic prediction can be found in coordinate structures: readers predict that the second conjunct in a coordination will have the same syntactic structure as the first conjunct (Frazier et al. [sent-25, score-0.283]
12 In a similar vein, having encountered the word either, readers predict that or and a conjunct will follow it (Staub and Clifton, 2006). [sent-27, score-0.079]
13 Again, priming studies corroborate this: Comprehenders are faster at naming words that are syntactically compatible with prior context, even when they bear no semantic relationship to it (Wright and Garrett, 1984). [sent-28, score-0.06]
14 Of course, it was situated behind a bigcom but unobtrusive bookcasecom. [sent-39, score-0.074]
15 Previous results on discourse effects in sentence processing can also be interpreted in terms of prediction. [sent-41, score-0.248]
16 This result (and a large body of related findings) is compatible with an interpretation in which the processor predicts upcoming syntactic attachment based on the presence of referents in the preceding discourse. [sent-43, score-0.219]
17 Most attempts to model prediction in human language processing have focused on syntactic prediction. [sent-44, score-0.145]
18 Examples include Hale’s (2001) surprisal model, which relates processing effort to the conditional probability of the current word given the previous words in the sentence. [sent-45, score-0.44]
19 Recent work has attempted to integrate semantic and discourse prediction with models of syntactic processing. [sent-47, score-0.316]
20 ’s (2010) approach, which combines an incremental parser with a vector-space model of semantics. [sent-49, score-0.08]
21 At the discourse level, Dubey (2010) has proposed a model that combines an incremental parser with a probabilistic logic-based model of co-reference resolution. [sent-51, score-0.315]
22 However, this model does not explicitly model discourse effects in terms 305 of prediction, and again only proposes a loose integration of co-reference and syntax. [sent-52, score-0.343]
23 Furthermore, Dubey’s (2010) model has only been tested on two experimental data sets (pertaining to the interaction of ambiguity resolution with context), no broad coverage evaluation is available. [sent-53, score-0.091]
24 We propose a computational model that captures discourse effects on syntax in terms of prediction. [sent-55, score-0.28]
25 The model comprises a co-reference component which explicitly stores discourse mentions of NPs, and a syntactic component which adjust the probabilities of NPs in the syntactic structure based on the mentions tracked by the discourse component. [sent-56, score-0.632]
26 Our model is HMM-based, which makes it possible to efficiently process large amounts of data, allowing an evaluation on eye-tracking corpora, which has recently become the gold-standard in computational psycholinguistics (e. [sent-57, score-0.063]
27 2 Model This model utilises an NP chunker based upon a hidden Markov model (HMM) as an approximation to syntax. [sent-65, score-0.205]
28 Using a simple model such as an HMM facilitates the integration of a co-reference component, and the fact that the model is generative is a prerequisite to using surprisal as our metric of interest (as surprisal require the computation of prefix probabilities). [sent-66, score-0.975]
29 The key insight in our model is that human sentence processing is, on average, facilitated when a previously-mentioned discourse entity is repeated. [sent-67, score-0.269]
30 This facilitation depends upon keeping track of a list of previously-mentioned entities, which requires (at the least) shallow syntactic information, yet the facilitation itself is modeled primarily as a lexical phenomenon. [sent-68, score-0.196]
31 This allows a straightforward separation of concerns: shallow syntax is captured using the HMM’s hidden states, whereas the co-reference facilitation is modeled using the HMM’s emissions. [sent-69, score-0.06]
32 1 Syntactic Model A key feature of the co-reference component of our model (described below) is that syntactic analysis and co-reference resolution happen simultaneously. [sent-73, score-0.135]
33 This could potentially slow down the syntactic analysis, which tends to already be quite slow for exhaustive surprisal-based incremental parsers. [sent-74, score-0.092]
34 Therefore, rather than using full parsing, we use an HMMbased NP chunker which allows for a fast analysis. [sent-75, score-0.109]
35 NP chunking is sufficient to extract NP discourse mentions and, as we show below, surprisal values computed using HMM chunks provide a useful fit on the Dundee eye-movement data. [sent-76, score-0.779]
36 Here, a small degree of recursion allows for the NP ((new york city’s) general obligation fund) to be encoded, with the outer NP’s left bracket being ‘announced’ at the token ’s, which is the rightmost lexical token of the inner NP. [sent-81, score-0.072]
37 The resulting distribution has the form P(tag|word) and is sthuelrtienfgor dei sutrnisbuuittaiobnle h faosr t computing surprisal )v aalnudes is. [sent-87, score-0.44]
38 Training The chunker is trained on sections 2– 22 of the Wall Street Journal section of the Penn Treebank. [sent-94, score-0.109]
39 Our chunker is not comparable to the systems in the shared task for several reasons: we use more training data, we tag simultaneously (the CoNLL systems used gold standard tags) and our notion of a chunk is somewhat more complex than that used in CoNLL. [sent-96, score-0.152]
40 The best performing chunker from CoNLL 2000 achieved an F-score of 93. [sent-97, score-0.109]
41 2 Co-Reference Model In a standard HMM, the emission probabilities are computed as P(wi|si) where wi is the ith word and si is the ith state. [sent-103, score-0.096]
42 However, the contents of the cache are not individual words but (NP JJ strong NP) NN demand IN for (NP NNP new NP NNP york Figure NP NNP city 1: The (NP NP) POS ’s NP NP JJ general NP NP) NN obligation NNS bonds VBN propped RP up chunk notation of a tree from the training data. [sent-108, score-0.063]
43 At the end of each sentence, the NPs of the Viterbi parse are added to the mention trie after having their leading articles stripped. [sent-116, score-0.069]
44 A consequence of Equation 2 is that co-reference resolution is handled at the same time as HMM decoding. [sent-119, score-0.059]
45 The estimate is computed by counting how often a repeated NP actually is discourse new. [sent-131, score-0.203]
46 , 2003), which contains the eye-movement record of 10 participants each reading 2,368 sentences of newspaper text. [sent-136, score-0.201]
47 2 Evaluation Eye tracking data is noisy for a number of reasons, including the fact that experimental participants can look at any word which is currently displayed. [sent-139, score-0.078]
48 Deviations from a strict left-to-right progression of fixations motivate the need for several different measures of eye movement. [sent-143, score-0.169]
49 The model presented here predicts the Total Time that participants spent looking at a region, which includes any re-fixations after looking away. [sent-144, score-0.172]
50 We found that the model performed similarly across all these reading time metrics, we therefore only report results for Total Time. [sent-148, score-0.155]
51 As mentioned above, reading measures are hypothesised to correlate with Surprisal, which is defined as: – S(wt) = −log(P(wt |w1 . [sent-149, score-0.123]
52 wt1 ) (3) We compute the surprisal scores for the syntax-only HMM, which does not have access to co-reference information (henceforth referred to as ‘HMM’) and the full model, which combines the syntax- only HMM with the co-reference model (henceforth ‘HMM+Ref’). [sent-152, score-0.472]
53 To determine if our Dundee corpus simulations provide a reasonable model of human sentence processing, we perform a regression analysis with the Dundee corpus reading time measure as the dependent variable and the surprisal scores as the independent variable. [sent-153, score-0.646]
54 To account for noise in the corpus, we also use a number of additional explanatory variables which are known to strongly influence reading times. [sent-154, score-0.191]
55 Two additional explanatory variables were available in the Dundee corpus, which we also included in the regression model. [sent-156, score-0.085]
56 As participants could only view one line at a time (i. [sent-158, score-0.078]
57 , one line per screen), these covariates are known as line position and screen position, respectively. [sent-160, score-0.083]
58 All the covariates, including the surprisal estimates, were centered before including them in the regression model. [sent-161, score-0.491]
59 Because the HMM and HMM+Ref surprisal values are highly collinear, the HMM+Ref surprisal values were added as residuals of the HMM surprisal values. [sent-162, score-1.32]
60 However, in the present analysis we utilise a mixed effects model, which allows both items and participants to be treated as random factors. [sent-164, score-0.166]
61 1 The are a number of criteria which can be used to test the efficacy of one regression model over another. [sent-165, score-0.083]
62 These include the Aikake Information Criterion (AIC), the Bayesian Information Criterion (BIC), which trade off model fit and number of model parameters (lower scores are better). [sent-166, score-0.123]
63 If, due to data sparsity, the surprisal of a word goes to infinity for one of the models, we entirely remove that word from the analysis. [sent-171, score-0.44]
64 However, we did not trim any items due to abnor1We assume that each participant and item bias the reading time of the experiment. [sent-175, score-0.238]
65 Such an analysis is known as having random intercepts of participant and item. [sent-176, score-0.072]
66 It is also possible to assume a more involved analysis, known as random slopes, where the participants and items bias the slope of the predictor. [sent-177, score-0.121]
67 The model did not converge when using random intercept and slopes on both participant and item. [sent-178, score-0.213]
68 If random slopes on items were left out, the HMM regression model did converge, but not the HMM+Ref model. [sent-179, score-0.206]
69 As the HMM+Ref is the model of interest random slopes were left out entirely to allow a like-with-like comparison between the HMM and HMM+Ref regression models. [sent-180, score-0.163]
70 3 Results The result of the model comparison on Total Time reading data is summarised in Table 1. [sent-183, score-0.155]
71 We found that both the HMM and HMM+Ref provide a significantly better fit with the reading time data than the Baseline model; all three criteria agree: AIC and BIC lower than for the baseline, and log-likelihood is higher. [sent-185, score-0.182]
72 Moreover, the HMM+Ref model provides a significantly better fit than the HMM model, which demonstrates the benefit of co-reference information for modeling reading times. [sent-186, score-0.214]
73 It list the mixed-model coefficients for the HMM+Ref model and shows that all factors are significant predictors, including both HMM surprisal and residualized HMM+Ref surprisal. [sent-189, score-0.507]
74 4 Related Work There have been few computational models of human sentence processing that have incorporated a referential or discourse-level component. [sent-190, score-0.078]
75 Niv (1994) proposed a parsing model based on Combinatory Categorial Grammar (Steedman, 2001), in which referential information was used to resolve syntactic ambiguities. [sent-191, score-0.154]
76 The model was able to capture effects of referential information on syntactic garden paths (Altmann and Steedman, 1988). [sent-192, score-0.199]
77 This model differs from that proposed in the present paper, as it is intended to capture psycholinguistic preferences in a qualitative manner, whereas the aim of the present model is to provide a quantitative fit to measures of processing difficulty. [sent-193, score-0.224]
78 Spivey and Tanenhaus (1998) proposed a sentence processing model that examined the effects of referential information, as well as other constraints, on the resolution of ambiguous sentences. [sent-195, score-0.214]
79 Spivey and Tanenhaus’s (1998) model was specifically designed to provide a quantitative fit to reading times. [sent-225, score-0.214]
80 In contrast to both of these earlier models, the model proposed here aims to be general enough to provide estimated reading times for unrestricted text. [sent-227, score-0.155]
81 5 Discussion The primary finding of this work is that incorporating discourse information such as co-reference into an incremental probabilistic model of sentence processing has a beneficial effect on the ability of the model to predict broad-coverage human parsing behaviour. [sent-229, score-0.315]
82 In particular, the model of Dubey (2010), which also simulates the effect of discourse on syntax, is aimed at examining interactivity in the human sentence processor. [sent-231, score-0.304]
83 Under the weakly interactive hypothesis, discourse factors may prune or re-weight parses, but only when assuming the strongly interactive hypothesis would we argue that the sentence processor predicts upcoming material due to discourse factors. [sent-233, score-0.76]
84 Dubey found that a weakly interactive model sim- ulated a pattern of results in an experiment (Grodner et al. [sent-234, score-0.132]
85 , 2005) which was previously believed to provide evidence for the strongly interactive hypothesis. [sent-235, score-0.139]
86 The model presented here, on the other hand, is not only broad-coverage but could also be described as a strongly interactive model. [sent-237, score-0.128]
87 The strong interactivity arises because co-reference resolution is strongly tied to lexical generation probabilities, which are part of the syntactic portion of our model. [sent-238, score-0.206]
88 This cannot be achieve in a weakly interactive model, which is limited to pruning or re-weighting of parses based on discourse information. [sent-239, score-0.303]
89 As our analysis on the Dundee corpus showed, the lexical probabilities (in the form of HMM+Ref surprisal) are key to improving the fit on eye-tracking data. [sent-240, score-0.087]
90 We therefore argue that our results provide evidence against a weakly interactive approach, which may be sufficient to model individual phenomena (as shown by Dubey 2010), but is unlikely to be able to match the broad-coverage result we have presented here. [sent-241, score-0.175]
91 We also note that psycholinguistic evidence for discourse prediction (such as the context based lexical prediction shown by van Berkum et al. [sent-242, score-0.485]
92 2005, see Section 1) is also evidence for strong interactivity; prediction goes beyond mere pruning or reweighting and requires strong interactivity. [sent-243, score-0.112]
93 Data from eyetracking corpora as evidence for theories of syntactic processing complexity. [sent-261, score-0.087]
94 A computational model of prediction in human parsing: Unifying locality and surprisal effects. [sent-264, score-0.541]
95 The influence of discourse on syntax: A psycholinguistic model of sentence processing. [sent-267, score-0.336]
96 In Proceedings of the 12th European conference on eye movement, 2003. [sent-290, score-0.109]
97 A ma- chine learning approach to coreference resolution of noun phrases. [sent-306, score-0.088]
98 biguity resolution in discourse: Syntactic am- Modeling the ef- fects of referential context and lexical frequency. [sent-313, score-0.137]
99 Syntactic prediction in language comprehension: Evidence from either . [sent-321, score-0.069]
100 Anticipating upcoming words in discourse: Evidence from erps and reading times. [sent-338, score-0.192]
wordName wordTfidf (topN-words)
[('surprisal', 0.44), ('hmm', 0.413), ('ref', 0.312), ('discourse', 0.203), ('np', 0.202), ('dundee', 0.18), ('dubey', 0.16), ('nps', 0.128), ('reading', 0.123), ('bic', 0.12), ('demberg', 0.12), ('chunker', 0.109), ('eye', 0.109), ('psycholinguistic', 0.101), ('altmann', 0.1), ('aic', 0.087), ('cognition', 0.083), ('slopes', 0.08), ('referential', 0.078), ('participants', 0.078), ('participant', 0.072), ('interactivity', 0.069), ('trie', 0.069), ('upcoming', 0.069), ('prediction', 0.069), ('interactive', 0.062), ('berkum', 0.06), ('clifton', 0.06), ('facilitation', 0.06), ('fixations', 0.06), ('fwt', 0.06), ('priming', 0.06), ('spivey', 0.06), ('staub', 0.06), ('resolution', 0.059), ('fit', 0.059), ('frank', 0.056), ('processor', 0.054), ('referents', 0.052), ('vera', 0.052), ('keller', 0.051), ('regression', 0.051), ('incremental', 0.048), ('conjunct', 0.047), ('psychology', 0.045), ('effects', 0.045), ('syntactic', 0.044), ('evidence', 0.043), ('screen', 0.043), ('items', 0.043), ('tag', 0.043), ('adrian', 0.04), ('covariates', 0.04), ('fixation', 0.04), ('frazier', 0.04), ('freqof', 0.04), ('gerry', 0.04), ('grodner', 0.04), ('loglik', 0.04), ('niv', 0.04), ('pseen', 0.04), ('stanovich', 0.04), ('tanenhaus', 0.04), ('unobtrusive', 0.04), ('mentions', 0.039), ('amit', 0.038), ('chunks', 0.038), ('nnp', 0.038), ('rightmost', 0.038), ('weakly', 0.038), ('emission', 0.037), ('factors', 0.035), ('erp', 0.034), ('explanatory', 0.034), ('facilitated', 0.034), ('hale', 0.034), ('obligation', 0.034), ('situated', 0.034), ('strongly', 0.034), ('readers', 0.032), ('upon', 0.032), ('model', 0.032), ('steedman', 0.031), ('integration', 0.031), ('brain', 0.031), ('kennedy', 0.031), ('psycholinguistics', 0.031), ('wright', 0.031), ('looking', 0.031), ('ft', 0.031), ('si', 0.031), ('soon', 0.03), ('coreference', 0.029), ('freq', 0.029), ('cache', 0.029), ('hill', 0.029), ('intercept', 0.029), ('movement', 0.029), ('ongoing', 0.029), ('probabilities', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
2 0.12486804 142 emnlp-2011-Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities
Author: Lanjun Zhou ; Binyang Li ; Wei Gao ; Zhongyu Wei ; Kam-Fai Wong
Abstract: Polarity classification of opinionated sentences with both positive and negative sentiments1 is a key challenge in sentiment analysis. This paper presents a novel unsupervised method for discovering intra-sentence level discourse relations for eliminating polarity ambiguities. Firstly, a discourse scheme with discourse constraints on polarity was defined empirically based on Rhetorical Structure Theory (RST). Then, a small set of cuephrase-based patterns were utilized to collect a large number of discourse instances which were later converted to semantic sequential representations (SSRs). Finally, an unsupervised method was adopted to generate, weigh and filter new SSRs without cue phrases for recognizing discourse relations. Experimental results showed that the proposed methods not only effectively recognized the defined discourse relations but also achieved significant improvement by integrating discourse information in sentence-level polarity classification.
3 0.10173098 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
Author: Altaf Rahman ; Vincent Ng
Abstract: An entity in a dialogue may be old, new, or mediated/inferrable with respect to the hearer’s beliefs. Knowing the information status of the entities participating in a dialogue can therefore facilitate its interpretation. We address the under-investigated problem of automatically determining the information status of discourse entities. Specifically, we extend Nissim’s (2006) machine learning approach to information-status determination with lexical and structured features, and exploit learned knowledge of the information status of each discourse entity for coreference resolution. Experimental results on a set of Switchboard dialogues reveal that (1) incorporating our proposed features into Nissim’s feature set enables our system to achieve stateof-the-art performance on information-status classification, and (2) the resulting information can be used to improve the performance of learning-based coreference resolvers.
4 0.095207058 94 emnlp-2011-Modelling Discourse Relations for Arabic
Author: Amal Al-Saif ; Katja Markert
Abstract: We present the first algorithms to automatically identify explicit discourse connectives and the relations they signal for Arabic text. First we show that, for Arabic news, most adjacent sentences are connected via explicit connectives in contrast to English, making the treatment of explicit discourse connectives for Arabic highly important. We also show that explicit Arabic discourse connectives are far more ambiguous than English ones, making their treatment challenging. In the second part of the paper, we present supervised algorithms to address automatic discourse connective identification and discourse relation recognition. Our connective identifier based on gold standard syntactic features achieves almost human performance. In addition, an identifier based solely on simple lexical and automatically derived morphological and POS features performs with high reliability, essential for languages that do not have high-quality parsers yet. Our algorithm for recognizing discourse relations performs significantly better than a baseline based on the connective surface string alone and therefore reduces the ambiguity in explicit connective interpretation.
5 0.08407709 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
6 0.077527359 92 emnlp-2011-Minimally Supervised Event Causality Identification
7 0.07098037 34 emnlp-2011-Corpus-Guided Sentence Generation of Natural Images
8 0.066678889 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
9 0.055966031 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
10 0.054001242 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
11 0.052742403 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
12 0.052119806 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction
13 0.051258799 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
14 0.050841171 62 emnlp-2011-Generating Subsequent Reference in Shared Visual Scenes: Computation vs Re-Use
15 0.05034766 125 emnlp-2011-Statistical Machine Translation with Local Language Models
16 0.050298858 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
17 0.050011389 3 emnlp-2011-A Correction Model for Word Alignments
18 0.04941706 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
19 0.047102679 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
20 0.046634119 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
topicId topicWeight
[(0, 0.163), (1, -0.031), (2, -0.048), (3, 0.016), (4, 0.009), (5, -0.063), (6, -0.039), (7, 0.075), (8, -0.043), (9, -0.273), (10, -0.142), (11, -0.048), (12, -0.027), (13, 0.077), (14, -0.033), (15, -0.134), (16, 0.113), (17, 0.077), (18, -0.011), (19, -0.033), (20, -0.088), (21, 0.04), (22, 0.154), (23, -0.118), (24, -0.058), (25, 0.135), (26, -0.009), (27, 0.047), (28, -0.103), (29, 0.148), (30, -0.038), (31, 0.134), (32, -0.067), (33, -0.077), (34, 0.038), (35, 0.06), (36, -0.133), (37, 0.029), (38, -0.008), (39, 0.031), (40, 0.045), (41, -0.002), (42, -0.042), (43, 0.165), (44, 0.057), (45, 0.006), (46, 0.05), (47, 0.045), (48, 0.06), (49, -0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.93721992 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
2 0.6141836 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
Author: Altaf Rahman ; Vincent Ng
Abstract: An entity in a dialogue may be old, new, or mediated/inferrable with respect to the hearer’s beliefs. Knowing the information status of the entities participating in a dialogue can therefore facilitate its interpretation. We address the under-investigated problem of automatically determining the information status of discourse entities. Specifically, we extend Nissim’s (2006) machine learning approach to information-status determination with lexical and structured features, and exploit learned knowledge of the information status of each discourse entity for coreference resolution. Experimental results on a set of Switchboard dialogues reveal that (1) incorporating our proposed features into Nissim’s feature set enables our system to achieve stateof-the-art performance on information-status classification, and (2) the resulting information can be used to improve the performance of learning-based coreference resolvers.
3 0.55300969 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
4 0.49077547 142 emnlp-2011-Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities
Author: Lanjun Zhou ; Binyang Li ; Wei Gao ; Zhongyu Wei ; Kam-Fai Wong
Abstract: Polarity classification of opinionated sentences with both positive and negative sentiments1 is a key challenge in sentiment analysis. This paper presents a novel unsupervised method for discovering intra-sentence level discourse relations for eliminating polarity ambiguities. Firstly, a discourse scheme with discourse constraints on polarity was defined empirically based on Rhetorical Structure Theory (RST). Then, a small set of cuephrase-based patterns were utilized to collect a large number of discourse instances which were later converted to semantic sequential representations (SSRs). Finally, an unsupervised method was adopted to generate, weigh and filter new SSRs without cue phrases for recognizing discourse relations. Experimental results showed that the proposed methods not only effectively recognized the defined discourse relations but also achieved significant improvement by integrating discourse information in sentence-level polarity classification.
5 0.45318294 94 emnlp-2011-Modelling Discourse Relations for Arabic
Author: Amal Al-Saif ; Katja Markert
Abstract: We present the first algorithms to automatically identify explicit discourse connectives and the relations they signal for Arabic text. First we show that, for Arabic news, most adjacent sentences are connected via explicit connectives in contrast to English, making the treatment of explicit discourse connectives for Arabic highly important. We also show that explicit Arabic discourse connectives are far more ambiguous than English ones, making their treatment challenging. In the second part of the paper, we present supervised algorithms to address automatic discourse connective identification and discourse relation recognition. Our connective identifier based on gold standard syntactic features achieves almost human performance. In addition, an identifier based solely on simple lexical and automatically derived morphological and POS features performs with high reliability, essential for languages that do not have high-quality parsers yet. Our algorithm for recognizing discourse relations performs significantly better than a baseline based on the connective surface string alone and therefore reduces the ambiguity in explicit connective interpretation.
6 0.41948852 34 emnlp-2011-Corpus-Guided Sentence Generation of Natural Images
7 0.36096978 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
8 0.34626323 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
9 0.33779898 92 emnlp-2011-Minimally Supervised Event Causality Identification
10 0.32890102 62 emnlp-2011-Generating Subsequent Reference in Shared Visual Scenes: Computation vs Re-Use
11 0.29035634 96 emnlp-2011-Multilayer Sequence Labeling
12 0.26296365 38 emnlp-2011-Data-Driven Response Generation in Social Media
13 0.25340417 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction
14 0.24952692 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
15 0.24815361 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
16 0.24505505 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
17 0.24261445 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
18 0.23626001 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
19 0.22744498 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
20 0.21311519 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
topicId topicWeight
[(15, 0.156), (23, 0.069), (36, 0.037), (37, 0.023), (45, 0.07), (48, 0.179), (53, 0.031), (54, 0.019), (57, 0.023), (62, 0.017), (64, 0.032), (66, 0.056), (69, 0.017), (75, 0.021), (79, 0.073), (82, 0.023), (90, 0.012), (96, 0.049), (98, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.76331741 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
2 0.73318982 142 emnlp-2011-Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities
Author: Lanjun Zhou ; Binyang Li ; Wei Gao ; Zhongyu Wei ; Kam-Fai Wong
Abstract: Polarity classification of opinionated sentences with both positive and negative sentiments1 is a key challenge in sentiment analysis. This paper presents a novel unsupervised method for discovering intra-sentence level discourse relations for eliminating polarity ambiguities. Firstly, a discourse scheme with discourse constraints on polarity was defined empirically based on Rhetorical Structure Theory (RST). Then, a small set of cuephrase-based patterns were utilized to collect a large number of discourse instances which were later converted to semantic sequential representations (SSRs). Finally, an unsupervised method was adopted to generate, weigh and filter new SSRs without cue phrases for recognizing discourse relations. Experimental results showed that the proposed methods not only effectively recognized the defined discourse relations but also achieved significant improvement by integrating discourse information in sentence-level polarity classification.
3 0.67728233 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
Author: Yugo Murawaki ; Sadao Kurohashi
Abstract: A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological ana- lyzer.
4 0.67715991 120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
Author: Richard Socher ; Jeffrey Pennington ; Eric H. Huang ; Andrew Y. Ng ; Christopher D. Manning
Abstract: We introduce a novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions. Our method learns vector space representations for multi-word phrases. In sentiment prediction tasks these representations outperform other state-of-the-art approaches on commonly used datasets, such as movie reviews, without using any pre-defined sentiment lexica or polarity shifting rules. We also evaluate the model’s ability to predict sentiment distributions on a new dataset based on confessions from the experience project. The dataset consists of personal user stories annotated with multiple labels which, when aggregated, form a multinomial distribution that captures emotional reactions. Our algorithm can more accurately predict distributions over such labels compared to several competitive baselines.
5 0.63818711 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification
Author: Balamurali AR ; Aditya Joshi ; Pushpak Bhattacharyya
Abstract: Traditional approaches to sentiment classification rely on lexical features, syntax-based features or a combination of the two. We propose semantic features using word senses for a supervised document-level sentiment classifier. To highlight the benefit of sense-based features, we compare word-based representation of documents with a sense-based representation where WordNet senses of the words are used as features. In addition, we highlight the benefit of senses by presenting a part-ofspeech-wise effect on sentiment classification. Finally, we show that even if a WSD engine disambiguates between a limited set of words in a document, a sentiment classifier still performs better than what it does in absence of sense annotation. Since word senses used as features show promise, we also examine the possibility of using similarity metrics defined on WordNet to address the problem of not finding a sense in the training corpus. We per- form experiments using three popular similarity metrics to mitigate the effect of unknown synsets in a test corpus by replacing them with similar synsets from the training corpus. The results show promising improvement with respect to the baseline.
6 0.51643693 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
7 0.51095217 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
9 0.49599683 107 emnlp-2011-Probabilistic models of similarity in syntactic context
10 0.49592447 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
11 0.4900994 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
12 0.48549905 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
13 0.4842743 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
14 0.48317453 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
15 0.48304725 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
16 0.48258701 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
17 0.48109528 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
18 0.48094901 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
19 0.48078158 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
20 0.47989011 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs