emnlp emnlp2013 emnlp2013-203 knowledge-graph by maker-knowledge-mining

203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

Source: pdf

Author: Franz Matthies ; Anders Sgaard

Abstract: Nilsson and Nivre (2009) introduced a treebased model of persons’ eye movements in reading. The individual variation between readers reportedly made application across readers impossible. While a tree-based model seems plausible for eye movements, we show that competitive results can be obtained with a linear CRF model. Increasing the inductive bias also makes learning across readers possible. In fact we observe next-to-no performance drop when evaluating models trained on gaze records of multiple readers on new readers.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 With blinkers on: robust prediction of eye movements across readers Franz Matties and Anders Søgaard University of Copenhagen Njalsgade 142 DK-2300 Copenhagen S Email: soegaard@hum. [sent-1, score-1.337]

2 dk Abstract Nilsson and Nivre (2009) introduced a treebased model of persons’ eye movements in reading. [sent-3, score-1.075]

3 The individual variation between readers reportedly made application across readers impossible. [sent-4, score-0.51]

4 While a tree-based model seems plausible for eye movements, we show that competitive results can be obtained with a linear CRF model. [sent-5, score-0.673]

5 Increasing the inductive bias also makes learning across readers possible. [sent-6, score-0.307]

6 In fact we observe next-to-no performance drop when evaluating models trained on gaze records of multiple readers on new readers. [sent-7, score-0.542]

7 1 Introduction When we read a text, our gaze does not move smoothly and continuously along its lines. [sent-8, score-0.251]

8 Rather, our eyes fixate at a word, then skip a few words, to jump to a new fixation point. [sent-9, score-0.386]

9 Gaze can be recorded using eye tracking devices (Starr and Rayner, 2001). [sent-13, score-0.745]

10 Since eye movements in reading give us important information about what readers find complicated in a text, and what readers find completely predictable, predicting eye movements on new texts has many practical applications in text-to-text generation and human computer interaction, for example. [sent-14, score-2.694]

11 The problem of predicting eye movements in reading is, for a reader ri and a given sequence of word tokens w1 . [sent-15, score-1.274]

12 wn, to predict a set of fixation points F ⊆ {w1, . [sent-18, score-0.306]

13 th, teh ree faidxaerti ri may skip 803 wj or fixate at wj. [sent-25, score-0.133]

14 Models are evaluated on recordings of human reading obtained using eye tracking devices. [sent-26, score-0.794]

15 The supervised prediction problem that we consider in this paper, also uses eye tracking data for learning models of eye movement. [sent-27, score-1.379]

16 Nilsson and Nivre (2009) first introduced this supervised learning task and used the Dundee corpus to train and evaluate a tree-based model, essentially treating the problem of predicting eye movements in reading as transition-based dependency parsing. [sent-28, score-1.173]

17 (2012) in modeling only forward saccades and not regressions and refixations. [sent-30, score-0.125]

18 While Nilsson and Nivre (2009) try to model a subset of regressions and refixations, they do not evaluate this part of their model focusing only on fixation accuracy and distribution accuracy, i. [sent-31, score-0.304]

19 , they evaluate how well they predict a set of fixation points rather than a sequence of points in order. [sent-33, score-0.326]

20 This enables us to model eye movements in reading as a sequential problem of determining the length of forward saccades, increasing the inductive bias of our learning algorithm in a motivated way. [sent-34, score-1.23]

21 Note that because we work with visual input, we do not tokenize our input in our experiments, i. [sent-35, score-0.018]

22 Example Figure 1 presents an example sentence and gaze records from the Dundee corpus. [sent-38, score-0.31]

23 The Dundee corpus contains gaze records of 10 readers in total. [sent-39, score-0.542]

24 Note that there is little consensus on what words are skipped. [sent-40, score-0.015]

25 Generally, closed class items (prepositions, copulae, quantifiers) seem to be skipped more open, but we do see a lot of individual variation. [sent-42, score-0.033]

26 While others for this reason have refrained from evaluation across readers (Nilsson and Nivre, 2009; Hara et al. [sent-43, score-0.26]

27 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 8t0ic3s–807, Figure 1: The gaze records of the three first readers for the first sentence in the Dundee corpus. [sent-46, score-0.542]

28 we show that our model predicts gaze better across readers than a previously proposed model (Nilsson and Nivre, 2009) does training and evaluating on the same readers. [sent-47, score-0.531]

29 A final observation is that fixations are very frequent at the word level in fact, even skilled readers make 94 fixations per 100 words (Starr and Rayner, 2001) which motivates using F1-score of skips as metric. [sent-48, score-0.615]

30 Related work Below we use a sequential model rather than a tree-based model to bias our model toward predicting forward saccades. [sent-50, score-0.13]

31 Nilsson and Nivre (2009), in contrast, present a more expressive tree-based model for modeling eye movements, with some constraints on the search space. [sent-51, score-0.673]

32 The transitionbased model uses consecutive classification rather than structured prediction. [sent-52, score-0.029]

33 In particular, they use use word lengths and frequencies, like us, as well as distances between tokens (important in a transitionbased model), and, finally, the history of previous decisions. [sent-54, score-0.048]

34 (2012) use a linear CRF model for the same problem, like us, but they consider a slightly different problem, namely that of predicting eye movement when reading text on a specific screen. [sent-56, score-0.814]

35 In addition, they use word forms, POS, various measures of surprise of word length, as well as per804 plexity of bi- and trigrams. [sent-58, score-0.017]

36 The features relating to screen position were the most predictive ones. [sent-59, score-0.054]

37 2 Our approach We use linear CRFs to model eye movements in reading. [sent-60, score-1.057]

38 , we use only word length and the log probability of words both known to correlate well with likelihood of fixation, as well as fixation times (McDonald and Shillcock, 2012; Kliegl et al. [sent-64, score-0.276]

39 The model thus reflects a hypothesis that eye movements are largely unaffected by semantic content, that eye movements depend on the physical properties and frequency of words, and that there is a sequential dependence between fixation times. [sent-67, score-2.411]

40 We also evaluated using word forms and POS on held-out data, but this did not lead to improvements. [sent-69, score-0.014]

41 There is evidence for the impact of morphology on eye movements (Liversedge and Blythe, 2007; Bertram, 2011), but we did not incorporate this into our model. [sent-70, score-1.07]

42 Finally, we did not incorporate – predictability of tokens, although this is also known to correlate with fixation times (Kliegl et al. [sent-71, score-0.322]

43 com/p/crfpp/ 3 Predicting a reader’s eye movements In this experiment we consider exactly the same setup as Nilsson and Nivre (2009) considered. [sent-78, score-1.057]

44 In the Dundee corpus, we have gaze data for 10 persons. [sent-79, score-0.236]

45 The corpus consists of 2,379 sentences, 56,212 tokens and 9,776 types. [sent-80, score-0.019]

46 Bouis Oculometer Eyetracker, sampling the position of the right eye every millisecond. [sent-82, score-0.688]

47 Our error re- duction over their model in terms of F1 over skips is 9. [sent-85, score-0.091]

48 (2012) consider the problem of learning from the concatenation of the gaze data from the 10 persons in the Dundee corpus, but they also evaluate on data from these persons. [sent-91, score-0.264]

49 In our second experiment, we consider the more difficult problem of learning from one person’s gaze data, but evaluating on gaze data from another test person. [sent-92, score-0.472]

50 This is a more realistic scenario if we want to use our model to predict eye movements in reading on anyone but our test persons. [sent-93, score-1.182]

51 In fact, only in 4/10 cases do we get the best results training on gaze data from the reader we evaluate on. [sent-98, score-0.294]

52 Note also that the readers seem to form two groups (a, b, h, i,j) and (c, d, e, f, g) that provide good training – – material for each other. [sent-99, score-0.247]

53 Training on concatenated data from all members in each group may be beneficial. [sent-100, score-0.014]

54 5 Learning from multiple readers In our final experiment, we learn from the gaze records of nine readers and evaluate on the tenth. [sent-101, score-0.795]

55 This is a realistic evaluation of our ability to predict 805 fixations for new, previously unobserved readers. [sent-102, score-0.226]

56 Interestingly we can predict the fixations of new readers better than Nilsson and Nivre (2009) predict fixations when the training and test data are produced by the same reader. [sent-103, score-0.586]

57 In fact our skip F1 score is actually better than in our first experiments. [sent-105, score-0.045]

58 As already mentioned, this result can probably be improved by using a subset of readers or by weighting training examples, e. [sent-106, score-0.255]

59 6 Discussion Our contributions in this paper are: (i) a model for predicting a reader’s eye movements that is competitive to state-of-the-art, but simpler, with a smaller search space than Nilsson and Nivre (2009) and a smaller feature model than Hara et al. [sent-110, score-1.1]

60 (2012), (ii) showing that the simpler model is robust enough to model eye movements across readers, and finally, (iii) showing that even better models can be obtained training on records from multiple readers. [sent-111, score-1.179]

61 It is interesting that a model without lexical information is more robust across readers. [sent-112, score-0.048]

62 This suggests that deep processing has little impact on eye movements. [sent-113, score-0.688]

63 The features used in this study are well-motivated and account as well for the phenomena as previously proposed models. [sent-115, score-0.013]

64 It would be interesting to incorporate morphological features and perplexity-based features, but we leave this for future work. [sent-116, score-0.029]

65 7 Conclusion This study is, to the best of our knowledge, the first to consider the problem of learning to predict eye movements in reading across readers. [sent-117, score-1.189]

66 We present a very simple model of eye movements in read- ing that performs a little better than Nilsson and Nivre (2009) in terms of fixation accuracy, evaluated on one reader at a time, but predicts skips significantly better. [sent-118, score-1.498]

67 The true merit of the approach, however, is its ability to predict eye movements across readers. [sent-119, score-1.133]

68 In fact, it predicts the eye movements of new readers better than Nilsson and Nivre (2009) do when the training and test data are produced by the same reader. [sent-120, score-1.311]

69 Predicting word fixation in text with a CRF model for capturing general reading strategies among readers. [sent-127, score-0.328]

70 Length, frequency, and predictability effects of words on eye movements in reading. [sent-131, score-1.09]

71 Lexical and sublexical influences on eye movements during reading. [sent-135, score-1.057]

72 Eye movements reveal the on-line computation of lexical probabilities during reading. [sent-139, score-0.384]

73 Learning where to look: Modeling eye movements in reading. [sent-143, score-1.057]

74 Toward a model of eye movement control in reading. [sent-147, score-0.713]

75 Improving predictive inference under covariate shift by weighting the loglikelihood function. [sent-155, score-0.057]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('eye', 0.673), ('movements', 0.384), ('fixation', 0.255), ('gaze', 0.236), ('readers', 0.232), ('nilsson', 0.204), ('hara', 0.149), ('dundee', 0.146), ('fixations', 0.146), ('nivre', 0.137), ('skips', 0.091), ('starr', 0.084), ('records', 0.074), ('reading', 0.073), ('kliegl', 0.063), ('rayner', 0.063), ('reichle', 0.063), ('saccades', 0.063), ('reader', 0.058), ('skip', 0.045), ('predicting', 0.043), ('fixate', 0.042), ('liversedge', 0.042), ('reingold', 0.042), ('tracking', 0.033), ('regressions', 0.033), ('predictability', 0.033), ('predict', 0.031), ('forward', 0.029), ('jump', 0.029), ('copenhagen', 0.029), ('transitionbased', 0.029), ('across', 0.028), ('persons', 0.028), ('erik', 0.028), ('inductive', 0.027), ('movement', 0.025), ('recorded', 0.025), ('sequential', 0.024), ('ri', 0.024), ('weighting', 0.023), ('crf', 0.023), ('keith', 0.022), ('screen', 0.022), ('predicts', 0.022), ('psychological', 0.022), ('wj', 0.022), ('cognitive', 0.021), ('correlate', 0.021), ('nine', 0.021), ('realistic', 0.021), ('psychology', 0.02), ('bias', 0.02), ('points', 0.02), ('robust', 0.02), ('tokens', 0.019), ('wn', 0.018), ('kano', 0.018), ('reportedly', 0.018), ('tokenize', 0.018), ('reinhold', 0.018), ('yoshinobu', 0.018), ('unaffected', 0.018), ('heather', 0.018), ('copulae', 0.018), ('skipped', 0.018), ('treebased', 0.018), ('surprise', 0.017), ('merit', 0.017), ('ralf', 0.017), ('fisher', 0.017), ('gaard', 0.017), ('featuredescription', 0.017), ('covariate', 0.017), ('predictive', 0.017), ('mcdonald', 0.017), ('accuracy', 0.016), ('interestingly', 0.016), ('morphological', 0.016), ('daichi', 0.015), ('mochihashi', 0.015), ('eyes', 0.015), ('smoothly', 0.015), ('unobserved', 0.015), ('matthias', 0.015), ('position', 0.015), ('little', 0.015), ('quantifiers', 0.015), ('recordings', 0.015), ('compass', 0.015), ('predictable', 0.015), ('control', 0.015), ('seem', 0.015), ('toward', 0.014), ('forms', 0.014), ('concatenated', 0.014), ('devices', 0.014), ('rounded', 0.014), ('previously', 0.013), ('incorporate', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

Author: Franz Matthies ; Anders Sgaard

2 0.097790107 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

Author: Mikhail Ageev ; Dmitry Lagun ; Eugene Agichtein

Abstract: Passage retrieval is a crucial first step of automatic Question Answering (QA). While existing passage retrieval algorithms are effective at selecting document passages most similar to the question, or those that contain the expected answer types, they do not take into account which parts of the document the searchers actually found useful. We propose, to the best of our knowledge, the first successful attempt to incorporate searcher examination data into passage retrieval for question answering. Specifically, we exploit detailed examination data, such as mouse cursor movements and scrolling, to infer the parts of the document the searcher found interesting, and then incorporate this signal into passage retrieval for QA. Our extensive experiments and analysis demonstrate that our method significantly improves passage retrieval, compared to using textual features alone. As an additional contribution, we make available to the research community the code and the search behavior data used in this study, with the hope of encouraging further research in this area.

3 0.032700952 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation

Author: Ioannis Konstas ; Mirella Lapata

Abstract: In a language generation system, a content planner selects which elements must be included in the output text and the ordering between them. Recent empirical approaches perform content selection without any ordering and have thus no means to ensure that the output is coherent. In this paper we focus on the problem of generating text from a database and present a trainable end-to-end generation system that includes both content selection and ordering. Content plans are represented intuitively by a set of grammar rules that operate on the document level and are acquired automatically from training data. We develop two approaches: the first one is inspired from Rhetorical Structure Theory and represents the document as a tree of discourse relations between database records; the second one requires little linguistic sophistication and uses tree structures to represent global patterns of database record sequences within a document. Experimental evaluation on two domains yields considerable improvements over the state of the art for both approaches.

4 0.02574764 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller

Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.

5 0.02271127 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

Author: Wolfgang Seeker ; Jonas Kuhn

Abstract: Morphology and syntax interact considerably in many languages and language processing should pay attention to these interdependencies. We analyze the effect of syntactic features when used in automatic morphology prediction on four typologically different languages. We show that predicting morphology for languages with highly ambiguous word forms profits from taking the syntactic context of words into account and results in state-ofthe-art models.

6 0.022118911 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

7 0.021090375 58 emnlp-2013-Dependency Language Models for Sentence Completion

8 0.019659434 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

9 0.019368079 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

10 0.01931693 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging

11 0.018497111 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

12 0.018228337 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time

13 0.016542302 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

14 0.016149553 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

15 0.016089818 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

16 0.015711928 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases

17 0.015328784 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

18 0.014705887 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

19 0.014595548 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

20 0.014527184 30 emnlp-2013-Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.052), (1, 0.001), (2, -0.011), (3, -0.003), (4, -0.031), (5, 0.018), (6, 0.001), (7, -0.0), (8, 0.02), (9, -0.022), (10, -0.033), (11, 0.029), (12, -0.025), (13, -0.01), (14, 0.018), (15, 0.024), (16, -0.025), (17, 0.038), (18, -0.002), (19, 0.011), (20, 0.023), (21, 0.037), (22, 0.052), (23, 0.029), (24, 0.048), (25, 0.008), (26, -0.014), (27, 0.001), (28, -0.032), (29, 0.072), (30, -0.048), (31, -0.047), (32, 0.054), (33, 0.079), (34, -0.127), (35, -0.017), (36, -0.045), (37, -0.011), (38, 0.05), (39, -0.019), (40, -0.001), (41, 0.012), (42, 0.017), (43, 0.021), (44, 0.018), (45, -0.032), (46, -0.024), (47, 0.167), (48, 0.018), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90951985 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

Author: Franz Matthies ; Anders Sgaard

2 0.46511802 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

Author: Mikhail Ageev ; Dmitry Lagun ; Eugene Agichtein

3 0.36360869 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller

4 0.35723764 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time

Author: Mohammad Sadegh Rasooli ; Joel Tetreault

Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.

5 0.33636296 155 emnlp-2013-Question Difficulty Estimation in Community Question Answering Services

Author: Jing Liu ; Quan Wang ; Chin-Yew Lin ; Hsiao-Wuen Hon

Abstract: In this paper, we address the problem of estimating question difficulty in community question answering services. We propose a competition-based model for estimating question difficulty by leveraging pairwise comparisons between questions and users. Our experimental results show that our model significantly outperforms a PageRank-based approach. Most importantly, our analysis shows that the text of question descriptions reflects the question difficulty. This implies the possibility of predicting question difficulty from the text of question descriptions.

6 0.32554352 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

7 0.30933222 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

8 0.3018609 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

9 0.2921662 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

10 0.27687132 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

11 0.26329267 58 emnlp-2013-Dependency Language Models for Sentence Completion

12 0.25841251 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

13 0.25706765 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging

14 0.25060233 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization

15 0.24756809 152 emnlp-2013-Predicting the Presence of Discourse Connectives

16 0.2424984 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

17 0.23838638 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues

18 0.23223625 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction

19 0.23001252 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

20 0.22778338 54 emnlp-2013-Decipherment with a Million Random Restarts

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.028), (9, 0.014), (18, 0.025), (22, 0.038), (30, 0.047), (50, 0.026), (51, 0.127), (59, 0.503), (66, 0.024), (71, 0.013), (75, 0.011), (97, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70479184 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

Author: Franz Matthies ; Anders Sgaard

2 0.54118389 30 emnlp-2013-Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

Author: Ramy Eskander ; Nizar Habash ; Owen Rambow

Abstract: We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction of 55.6% over a simple baseline; our best method for German achieves a 66.7% error reduction.

3 0.51868379 95 emnlp-2013-Identifying Multiple Userids of the Same Author

Author: Tieyun Qian ; Bing Liu

Abstract: This paper studies the problem of identifying users who use multiple userids to post in social media. Since multiple userids may belong to the same author, it is hard to directly apply supervised learning to solve the problem. This paper proposes a new method, which still uses supervised learning but does not require training documents from the involved userids. Instead, it uses documents from other userids for classifier building. The classifier can be applied to documents of the involved userids. This is possible because we transform the document space to a similarity space and learning is performed in this new space. Our evaluation is done in the online review domain. The experimental results using a large number of userids and their reviews show that the proposed method is highly effective. 1

4 0.43150926 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

5 0.279434 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

Author: Longkai Zhang ; Houfeng Wang ; Xu Sun ; Mairgup Mansur

Abstract: Nowadays supervised sequence labeling models can reach competitive performance on the task of Chinese word segmentation. However, the ability of these models is restricted by the availability of annotated data and the design of features. We propose a scalable semi-supervised feature engineering approach. In contrast to previous works using pre-defined taskspecific features with fixed values, we dynamically extract representations of label distributions from both an in-domain corpus and an out-of-domain corpus. We update the representation values with a semi-supervised approach. Experiments on the benchmark datasets show that our approach achieve good results and reach an f-score of 0.961. The feature engineering approach proposed here is a general iterative semi-supervised method and not limited to the word segmentation task.

6 0.27804637 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

7 0.27792174 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

8 0.2775391 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

9 0.27690843 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

10 0.27672708 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

11 0.27642518 152 emnlp-2013-Predicting the Presence of Discourse Connectives

12 0.27628767 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

13 0.27568477 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

14 0.27555528 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

15 0.27534756 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

16 0.27459925 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

17 0.27432236 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

18 0.27427682 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

19 0.27393389 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

20 0.27379608 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity