acl acl2013 acl2013-45 knowledge-graph by maker-knowledge-mining

45 acl-2013-An Empirical Study on Uncertainty Identification in Social Media Context

Source: pdf

Author: Zhongyu Wei ; Junwen Chen ; Wei Gao ; Binyang Li ; Lanjun Zhou ; Yulan He ; Kam-Fai Wong

Abstract: Uncertainty text detection is important to many social-media-based applications since more and more users utilize social media platforms (e.g., Twitter, Facebook, etc.) as information source to produce or derive interpretations based on them. However, existing uncertainty cues are ineffective in social media context because of its specific characteristics. In this paper, we propose a variant of annotation scheme for uncertainty identification and construct the first uncertainty corpus based on tweets. We then conduct experiments on the generated tweets corpus to study the effectiveness of different types of features for uncertainty text identification.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 qa , Abstract Uncertainty text detection is important to many social-media-based applications since more and more users utilize social media platforms (e. [sent-8, score-0.395]

2 ) as information source to produce or derive interpretations based on them. [sent-11, score-0.025]

3 However, existing uncertainty cues are ineffective in social media context because of its specific characteristics. [sent-12, score-0.953]

4 In this paper, we propose a variant of annotation scheme for uncertainty identification and construct the first uncertainty corpus based on tweets. [sent-13, score-1.33]

5 We then conduct experiments on the generated tweets corpus to study the effectiveness of different types of features for uncertainty text identification. [sent-14, score-0.943]

6 1 Introduction Social media is not only a social network tool for people to communicate but also plays an important role as information source with more and more users searching and browsing news on it. [sent-15, score-0.405]

7 People also utilize information from social media for developing various applications, such as earthquake warning systems (Sakaki et al. [sent-16, score-0.364]

8 However, due to its casual and word-of-mouth peculiarities, the quality of information in social media in terms of factuality becomes a premier concern. [sent-19, score-0.373]

9 Chances are there for uncertain information or even rumors flooding in such a context of free form. [sent-20, score-0.225]

10 We analyzed a tweet dataset which includes 326,747 posts (Details are given in Section 3) collected during 2011 London Riots, and result reveals that at least 18. [sent-21, score-0.181]

11 Therefore, distinguishing uncertain statements from factual ones is crucial for users to synthesize social media information to produce or derive reliable interpretations, 1The preliminary study was done based on a manually defined uncertainty cue-phrase list. [sent-23, score-1.23]

12 Tweets containing at least one hedge cue were treated as uncertain. [sent-24, score-0.175]

13 nceuth and this is expected helpful for applications like credibility analysis (Castillo et al. [sent-27, score-0.041]

14 Although uncertainty has been studied theoretically for a long time as a grammatical phenomena (Seifert and Welte, 1987), the computational treatment of uncertainty is a newly emerging area of research. [sent-30, score-1.18]

15 In recent years, the identification of uncertainty in formal text, e. [sent-35, score-0.668]

16 , biomedical text, reviews or newswire, has attracted lots of attention (Kilicoglu and Bergler, 2008; Medlock and Briscoe, 2007; Szarvas, 2008; Light et al. [sent-37, score-0.079]

17 However, uncertainty identification in social media context is rarely explored. [sent-39, score-1.0]

18 Previous research shows that uncertainty identi- fication is domain dependent as the usage of hedge cues varies widely in different domains (Morante and Sporleder, 2012). [sent-40, score-0.756]

19 Therefore, the employment of existing out-of-domain corpus to social media context is ineffective. [sent-41, score-0.332]

20 Furthermore, compared to the existing uncertainty corpus, the expression of uncertainty in social media is fairly different from that in formal text in a sense that people usually raise questions or refer to external information when making uncertain statements. [sent-42, score-1.846]

21 But, neither of the uncertainty expressions can be represented based on the existing types of uncertainty defined in the literature. [sent-43, score-1.205]

22 Therefore, a different uncertainty classification scheme is needed in social media context. [sent-44, score-1.008]

23 In this paper, we propose a novel uncertainty classification scheme and construct the first uncertainty corpus based on social media data tweets in specific here. [sent-45, score-1.864]

24 And then we conduct experiments for uncertainty post identification and study the effectiveness of different categories of features – 5b8ased on the generated corpus. [sent-46, score-0.755]

25 c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 58–62, 2 Related work We introduce some popular uncertainty corpora and methods for uncertainty identification. [sent-49, score-1.203]

26 1 Uncertainty corpus Several text corpora from various domains have been annotated over the past few years at different levels (e. [sent-51, score-0.063]

27 Sauri and Pustejovsky (2009) presented a corpus annotated with information about the factuality of events, namely Factbank, which is constructed based on TimeBank2 containing 3,123 annotated sentences from 208 news documents with 8 different levels of uncertainty defined. [sent-54, score-0.711]

28 (2008) constructed the BioSocpe corpus, which consists of medical and biological texts annotated for negation, uncertainty and their linguistic scope. [sent-56, score-0.676]

29 (2009) generated Wikipedia Weasels Corpus, where Weasel tags in Wikipedia articles is adopted readily as labels for uncertainty annotation. [sent-59, score-0.59]

30 It contains 168,923 unique sentences with 437 weasel tags in total. [sent-60, score-0.044]

31 Although several uncertainty corpora exist, there is not a uniform set of standard for uncertainty annotation. [sent-61, score-1.203]

32 (2012) normalized the annotation of the three corpora aforementioned. [sent-63, score-0.052]

33 However, the context of these corpora is different from that of social media. [sent-64, score-0.181]

34 Typically, these documents annotated are grammatically correct, carefully punctuated, formally structured and logically expressed. [sent-65, score-0.04]

35 2 Uncertainty identification Previous work on uncertainty identification focused on classifying sentences into uncertain or definite categories. [sent-67, score-0.971]

36 , 2004; Medlock and Briscoe, 2007; Medlock, 2008; Szarvas, 2008) using the annotated corpus with different types of features including Part-OfSpeech (POS) tags, stems, n-grams, etc. [sent-69, score-0.065]

37 Classification of uncertain sentences was consolidated as a task in the 2010 edition of CoNLL shared task on learning to detect hedge cues and their scope in natural language text (Farkas et al. [sent-71, score-0.442]

38 The best system for Wikipedia data (Georgescul, 2010) employed Support Vector Machine (SVM), and the best system for biological data (Tang et al. [sent-73, score-0.046]

39 In our work, we conduct an empirical study of uncertainty identification on tweets dataset and explore the effectiveness of different types of features (i. [sent-78, score-1.021]

40 1 Types of uncertainty in microblogs Traditionally, uncertainty can be divided into two categories, namely Epistemic and Hypothetical (Kiefer, 2005). [sent-82, score-1.211]

41 The detail of the classification is described as below (Kiefer, 2005): Epistemic: On the basis of our world knowledge we cannot decide at the moment whether the statement is true or false. [sent-85, score-0.067]

42 Hypothetical: This type of uncertainty includes four sub-classes: • • • • Doxastic: Expresses the speaker’s belDieofxs aasntdic :hy Epoxpthreessesse. [sent-86, score-0.59]

43 Compared to the existing uncertainty corpora, social media authors enjoy free form of writing. [sent-90, score-0.922]

44 In order to study the difference, we annotated a small set of 827 randomly sampled tweets according to the scheme of uncertainty types above, in which we found 65 uncertain tweets. [sent-91, score-1.189]

45 And then, we manually identified all the possible uncertain tweets, and found 246 really uncertain ones out of these 827 tweets, which means that 181 uncertain tweets are missing based on this scheme. [sent-92, score-0.941]

46 We have the following three salient observations: Firstly, there is no tweet found with the type of Investigation. [sent-93, score-0.181]

47 We find people seldom use words like “examine” or “test” (indicative words of Investigation category) when posting tweets. [sent-94, score-0.037]

48 Secondly, people frequently raise questions about some specific topics for confirmation which – – 59expresses uncertainty. [sent-97, score-0.066]

49 For example, @ITVCentral Can you confirm that Birmingham children ’s hospital has/hasn ’t been attacked by rioters? [sent-98, score-0.092]

50 Thirdly, people tend to post message with external information (e. [sent-99, score-0.08]

51 For example, Friend who works at the children ’s hospital in Birmingham says the riot police are protecting it. [sent-102, score-0.142]

52 Based on these observations, we propose a variant of uncertainty types in social media context by eliminating the category of Investigation and adding the category of Question and External under Hypothetical, as shown in Table 3. [sent-103, score-1.013]

53 Note – that our proposed scheme is based on Kiefer’s work (2005) which was previously extended to normalize uncertainty corpora in different genres by Szarvas et al. [sent-105, score-0.656]

54 But we did not try these extended schema for specific genres since even the most general one (Kiefer, 2005) was proved unsuitable for social media context. [sent-107, score-0.332]

55 2 Annotation result The dataset we annotated was collected from Twitter using Streaming API during summer riots in London during August 6-13 2011, including 326,747 tweets in total. [sent-109, score-0.381]

56 We further extracted the tweets relating to seven significant events during the riot identified by UK newspaper The Guardian from this set of tweets. [sent-111, score-0.316]

57 We annotated all the 4,743 extracted tweets for the seven events3. [sent-112, score-0.306]

58 tn}, the annotation task is ttwo elaetbsel T ea =ch { tweet ti as }ei,t htheer aunnncoetrattaioinn o tar cke ir-s tain. [sent-117, score-0.21]

59 Uncertainty assertions are to be identified in terms of the judgements about the author’s in- tended meaning rather than the presence of uncertain cue-phrase. [sent-118, score-0.225]

60 For those tweets annotated as uncertain, sub-class labels are also required according to the classification indicated in Table 3. [sent-119, score-0.349]

61 2, where 926 out of 4,743 tweets are labeled as uncertain accounting for 19. [sent-128, score-0.491]

62 Question is the uncertainty category with most tweets, followed by External. [sent-130, score-0.623]

63 During the preliminary annotation, we found that uncertainty cue-phrase is a good indicator for uncertainty tweets since tweets labeled as uncertain always contain at least one cue-phrase. [sent-136, score-1.961]

64 Therefore, annotators are also required identify cue-phrases which trigger the sense of uncertainty in the tweet. [sent-137, score-0.59]

65 All cue-phrases appearing more than twice are collected to form a uncertainty cue-phrase list. [sent-138, score-0.59]

66 4 Experiment and evaluation We aim to identify those uncertainty tweets from tweet collection automatically based on machine learning approaches. [sent-139, score-1.037]

67 In addition to n-gram features, we also explore the effectiveness of three categories of social media specific features including content-based, user-based and Twitter-specific ones. [sent-140, score-0.394]

68 The description of the three categories of features is shown in Table 4. [sent-141, score-0.025]

69 Since the length of tweet is relatively short, we therefore did not carry out stopwords removal or stemming. [sent-142, score-0.181]

70 Our preliminary experiments showed that combining unigrams with bigrams and trigrams gave better performance than using any one or two of these three features. [sent-143, score-0.024]

71 Precision, recall and F-1 score of uncertainty category are used as the metrics. [sent-146, score-0.623]

72 We used uncertainty cuephrase matching approach as baseline, denoted by CP. [sent-150, score-0.59]

73 For CP, we labeled tweets containing at least one entry in uncertainty cue-phrase list (described in Section 3) as uncertain. [sent-151, score-0.856]

74 someone fHakee b epli cetvueres t ohfa t h teh leo Endarotnh e isy fela ot. [sent-166, score-0.027]

75 7802 n−gram+OOV Table 4: Result of uncertainty tweets identification Twitter-specific feature set and ALL is the combination of C, U and T. [sent-200, score-0.934]

76 We then study the effectiveness of the three content-based features, and result shows that the presence of uncertain cue-phrase is most indicative for uncertainty tweet identification. [sent-207, score-1.033]

77 Our method performs worst on the type of Possible and on the combination of Dy- namic and Doxastic because these two types have the least number of samples in the corpus and the classifier tends to be undertrained without enough samples. [sent-225, score-0.025]

78 5 Conclusion and future work In this paper, we propose a variant of classification scheme for uncertainty identification in social media and construct the first uncertainty corpus based on tweets. [sent-226, score-1.676]

79 We perform uncertainty identification experiments on the generated dataset to explore the effectiveness of different types of features. [sent-227, score-0.73]

80 Result shows that the three categories of social media specific features can improve uncertainty identification. [sent-228, score-0.947]

81 Furthermore, content-based features bring the highest improvement among the three and the presence of uncertain cue-phrase contributes most for content-based features. [sent-229, score-0.225]

82 In future, we will explore to use uncertainty identification for social media applications. [sent-230, score-1.0]

83 6 Acknowledgement This work is partially supported by General Reterms of different types of uncertainty is shown 61search Fund of Hong Kong (No. [sent-231, score-0.615]

84 Time is of the essence: improving recency ranking using twitter data. [sent-243, score-0.044]

85 Rich a´rd Farkas, Veronika Vincze, Gy¨ orgy M o´ra, J a´nos Csirik, and Gy¨ orgy Szarvas. [sent-246, score-0.192]

86 The conll2010 shared task: learning to detect hedges and their scope in natural language text. [sent-248, score-0.106]

87 Finding hedges by chasing weasels: Hedge detection using wikipedia tags and shallow linguistic features. [sent-253, score-0.108]

88 A hedgehop over a maxmargin framework using hedge cues. [sent-258, score-0.135]

89 Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. [sent-269, score-0.079]

90 The language of bioscience: Facts, speculations, and statements in between. [sent-273, score-0.023]

91 In Proceedings of BioLink 2004 workshop on linking biological literature, ontologies and databases: tools for users, pages 17–24. [sent-274, score-0.046]

92 Weakly supervised learning for hedge classification in scientific literature. [sent-279, score-0.178]

93 Earthquake shakes twitter users: real-time event detection by social sensors. [sent-295, score-0.229]

94 A basic bibliography on negation in natural language, volume 313. [sent-306, score-0.048]

95 Gy¨ orgy Szarvas, Veronika Vincze, Rich a´rd Farkas, Gy¨ orgy M o´ra, and Iryna Gurevych. [sent-308, score-0.192]

96 Crossgenre and cross-domain detection of semantic uncer- tainty. [sent-310, score-0.027]

97 Hedge classification in biomedical texts with a weakly supervised selection of keywords. [sent-314, score-0.122]

98 A cascade method for detecting hedges and their scope in natural language text. [sent-318, score-0.082]

99 The bioscope corpus: biomedical texts annotated for uncertainty, negation and their scopes. [sent-328, score-0.167]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('uncertainty', 0.59), ('svmn', 0.276), ('tweets', 0.266), ('uncertain', 0.225), ('gram', 0.21), ('tweet', 0.181), ('media', 0.174), ('social', 0.158), ('szarvas', 0.143), ('hedge', 0.135), ('hypothetical', 0.105), ('kiefer', 0.1), ('medlock', 0.1), ('orgy', 0.096), ('gy', 0.084), ('vincze', 0.082), ('biomedical', 0.079), ('identification', 0.078), ('farkas', 0.077), ('doxastic', 0.075), ('epistemic', 0.075), ('riots', 0.075), ('birmingham', 0.073), ('hospital', 0.066), ('hedges', 0.055), ('buletic', 0.05), ('factbank', 0.05), ('ganter', 0.05), ('kilicoglu', 0.05), ('riot', 0.05), ('seifert', 0.05), ('weasels', 0.05), ('negation', 0.048), ('biological', 0.046), ('rumor', 0.044), ('bmc', 0.044), ('hashtag', 0.044), ('imebank', 0.044), ('tws', 0.044), ('weasel', 0.044), ('twitter', 0.044), ('scheme', 0.043), ('classification', 0.043), ('external', 0.043), ('credibility', 0.041), ('factuality', 0.041), ('cue', 0.04), ('annotated', 0.04), ('morante', 0.038), ('owns', 0.038), ('cp', 0.038), ('effectiveness', 0.037), ('people', 0.037), ('sakaki', 0.037), ('follower', 0.037), ('investigation', 0.036), ('users', 0.036), ('london', 0.035), ('suppl', 0.035), ('castillo', 0.035), ('qazvinian', 0.034), ('seriously', 0.034), ('category', 0.033), ('kong', 0.033), ('earthquake', 0.032), ('qatar', 0.032), ('bioinformatics', 0.032), ('reply', 0.032), ('hong', 0.032), ('briscoe', 0.031), ('microblogs', 0.031), ('cues', 0.031), ('veronika', 0.03), ('ra', 0.029), ('tang', 0.029), ('light', 0.029), ('annotation', 0.029), ('raise', 0.029), ('scope', 0.027), ('someone', 0.027), ('detection', 0.027), ('children', 0.026), ('condition', 0.026), ('verified', 0.026), ('count', 0.026), ('url', 0.026), ('wikipedia', 0.026), ('proposition', 0.025), ('categories', 0.025), ('conduct', 0.025), ('types', 0.025), ('interpretations', 0.025), ('cc', 0.025), ('dong', 0.025), ('preliminary', 0.024), ('shared', 0.024), ('world', 0.024), ('statements', 0.023), ('corpora', 0.023), ('uk', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 45 acl-2013-An Empirical Study on Uncertainty Identification in Social Media Context

Author: Zhongyu Wei ; Junwen Chen ; Wei Gao ; Binyang Li ; Lanjun Zhou ; Yulan He ; Kam-Fai Wong

2 0.24595147 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

Author: Alexandra Balahur ; Hristo Tanev

Abstract: Nowadays, the importance of Social Media is constantly growing, as people often use such platforms to share mainstream media news and comment on the events that they relate to. As such, people no loger remain mere spectators to the events that happen in the world, but become part of them, commenting on their developments and the entities involved, sharing their opinions and distributing related content. This paper describes a system that links the main events detected from clusters of newspaper articles to tweets related to them, detects complementary information sources from the links they contain and subsequently applies sentiment analysis to classify them into positive, negative and neutral. In this manner, readers can follow the main events happening in the world, both from the perspective of mainstream as well as social media and the public’s perception on them. This system will be part of the EMM media monitoring framework working live and it will be demonstrated using Google Earth.

3 0.20387915 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

Author: Weiwei Guo ; Hao Li ; Heng Ji ; Mona Diab

Abstract: Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-toNews task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previ- ous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.

4 0.15820131 146 acl-2013-Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications

Author: Simone Paolo Ponzetto ; Andrea Zielinski

Abstract: unkown-abstract

5 0.14544217 240 acl-2013-Microblogs as Parallel Corpora

Author: Wang Ling ; Guang Xiang ; Chris Dyer ; Alan Black ; Isabel Trancoso

Abstract: In the ever-expanding sea of microblog data, there is a surprising amount of naturally occurring parallel text: some users create post multilingual messages targeting international audiences while others “retweet” translations. We present an efficient method for detecting these messages and extracting parallel segments from them. We have been able to extract over 1M Chinese-English parallel segments from Sina Weibo (the Chinese counterpart of Twitter) using only their public APIs. As a supplement to existing parallel training data, our automatically extracted parallel data yields substantial translation quality improvements in translating microblog text and modest improvements in translating edited news commentary. The resources in described in this paper are available at http://www.cs.cmu.edu/∼lingwang/utopia.

6 0.14432321 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

7 0.12922311 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

8 0.12705255 20 acl-2013-A Stacking-based Approach to Twitter User Geolocation Prediction

9 0.11355986 114 acl-2013-Detecting Chronic Critics Based on Sentiment Polarity and Userâ•Žs Behavior in Social Media

10 0.11227005 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

11 0.111687 139 acl-2013-Entity Linking for Tweets

12 0.098329432 301 acl-2013-Resolving Entity Morphs in Censored Data

13 0.088822037 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

14 0.088301621 33 acl-2013-A user-centric model of voting intention from Social Media

15 0.078242719 42 acl-2013-Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster

16 0.069090694 256 acl-2013-Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

17 0.068232998 326 acl-2013-Social Text Normalization using Contextual Graph Random Walks

18 0.067365423 62 acl-2013-Automatic Term Ambiguity Detection

19 0.065387867 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

20 0.06087739 52 acl-2013-Annotating named entities in clinical text by combining pre-annotation and active learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.133), (1, 0.146), (2, 0.006), (3, 0.052), (4, 0.14), (5, 0.072), (6, 0.142), (7, 0.114), (8, 0.199), (9, -0.157), (10, -0.126), (11, 0.032), (12, 0.045), (13, -0.048), (14, 0.009), (15, 0.002), (16, 0.07), (17, -0.019), (18, 0.004), (19, -0.068), (20, 0.054), (21, 0.047), (22, -0.025), (23, -0.068), (24, -0.025), (25, 0.01), (26, 0.007), (27, -0.005), (28, 0.007), (29, 0.005), (30, -0.028), (31, 0.012), (32, 0.027), (33, 0.027), (34, -0.031), (35, 0.027), (36, 0.033), (37, 0.002), (38, 0.021), (39, -0.037), (40, -0.001), (41, -0.012), (42, -0.012), (43, 0.036), (44, 0.058), (45, 0.009), (46, 0.014), (47, 0.03), (48, -0.022), (49, -0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94710523 45 acl-2013-An Empirical Study on Uncertainty Identification in Social Media Context

Author: Zhongyu Wei ; Junwen Chen ; Wei Gao ; Binyang Li ; Lanjun Zhou ; Yulan He ; Kam-Fai Wong

2 0.87727469 146 acl-2013-Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications

Author: Simone Paolo Ponzetto ; Andrea Zielinski

Abstract: unkown-abstract

3 0.86556512 20 acl-2013-A Stacking-based Approach to Twitter User Geolocation Prediction

Author: Bo Han ; Paul Cook ; Timothy Baldwin

Abstract: We implement a city-level geolocation prediction system for Twitter users. The system infers a user’s location based on both tweet text and user-declared metadata using a stacking approach. We demonstrate that the stacking method substantially outperforms benchmark methods, achieving 49% accuracy on a benchmark dataset. We further evaluate our method on a recent crawl of Twitter data to investigate the impact of temporal factors on model generalisation. Our results suggest that user-declared location metadata is more sensitive to temporal change than the text of Twitter messages. We also describe two ways of accessing/demoing our system.

4 0.80117673 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

Author: Alexandra Balahur ; Hristo Tanev

5 0.75635338 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

Author: Weiwei Guo ; Hao Li ; Heng Ji ; Mona Diab

6 0.73871285 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

7 0.69441408 114 acl-2013-Detecting Chronic Critics Based on Sentiment Polarity and Userâ•Žs Behavior in Social Media

8 0.6913293 33 acl-2013-A user-centric model of voting intention from Social Media

9 0.67902219 42 acl-2013-Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster

10 0.63916951 301 acl-2013-Resolving Entity Morphs in Censored Data

11 0.5824061 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

12 0.57401013 240 acl-2013-Microblogs as Parallel Corpora

13 0.56585485 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

14 0.48972964 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

15 0.47163919 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

16 0.42408532 30 acl-2013-A computational approach to politeness with application to social factors

17 0.42079106 139 acl-2013-Entity Linking for Tweets

18 0.39475024 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision

19 0.36936948 62 acl-2013-Automatic Term Ambiguity Detection

20 0.36570016 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.023), (6, 0.049), (11, 0.045), (13, 0.017), (15, 0.024), (24, 0.058), (26, 0.101), (33, 0.312), (35, 0.058), (38, 0.017), (42, 0.047), (48, 0.026), (70, 0.042), (88, 0.032), (90, 0.015), (95, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80603814 45 acl-2013-An Empirical Study on Uncertainty Identification in Social Media Context

Author: Zhongyu Wei ; Junwen Chen ; Wei Gao ; Binyang Li ; Lanjun Zhou ; Yulan He ; Kam-Fai Wong

2 0.62053472 24 acl-2013-A Tale about PRO and Monsters

Author: Preslav Nakov ; Francisco Guzman ; Stephan Vogel

Abstract: While experimenting with tuning on long sentences, we made an unexpected discovery: that PRO falls victim to monsters overly long negative examples with very low BLEU+1 scores, which are unsuitable for learning and can cause testing BLEU to drop by several points absolute. We propose several effective ways to address the problem, using length- and BLEU+1based cut-offs, outlier filters, stochastic sampling, and random acceptance. The best of these fixes not only slay and protect against monsters, but also yield higher stability for PRO as well as improved testtime BLEU scores. Thus, we recommend them to anybody using PRO, monsterbeliever or not. – 1 Once Upon a Time... For years, the standard way to do statistical machine translation parameter tuning has been to use minimum error-rate training, or MERT (Och, 2003). However, as researchers started using models with thousands of parameters, new scalable optimization algorithms such as MIRA (Watanabe et al., 2007; Chiang et al., 2008) and PRO (Hopkins and May, 2011) have emerged. As these algorithms are relatively new, they are still not quite well understood, and studying their properties is an active area of research. For example, Nakov et al. (2012) have pointed out that PRO tends to generate translations that are consistently shorter than desired. They have blamed this on inadequate smoothing in PRO’s optimization objective, namely sentencelevel BLEU+1, and they have addressed the problem using more sensible smoothing. We wondered whether the issue could be partially relieved simply by tuning on longer sentences, for which the effect of smoothing would naturally be smaller. To our surprise, tuning on the longer 50% of the tuning sentences had a disastrous effect on PRO, causing an absolute drop of three BLEU points on testing; at the same time, MERT and MIRA did not have such a problem. While investigating the reasons, we discovered hundreds of monsters creeping under PRO’s surface... Our tale continues as follows. We first explain what monsters are in Section 2, then we present a theory about how they can be slayed in Section 3, we put this theory to test in practice in Section 4, and we discuss some related efforts in Section 5. Finally, we present the moral of our tale, and we hint at some planned future battles in Section 6. 2 Monsters, Inc. PRO uses pairwise ranking optimization, where the learning task is to classify pairs of hypotheses into correctly or incorrectly ordered (Hopkins and May, 2011). It searches for a vector of weights w such that higher evaluation metric scores correspond to higher model scores and vice versa. More formally, PRO looks for weights w such that g(i, j) > g(i, j0) ⇔ hw (i, j) > hw (i, j0), where g is a local scoring fu hnction (typically, sentencelevel BLEU+1) and hw are the model scores for a given input sentence i and two candidate hypotheses j and j0 that were obtained using w. If g(i, j) > g(i, j0), we will refer to j and j0 as the positive and the negative example in the pair. Learning good parameter values requires negative examples that are comparable to the positive ones. Instead, tuning on long sentences quickly introduces monsters, i.e., corrupted negative examples that are unsuitable for learning: they are (i) much longer than the respective positive examples and the references, and (ii) have very low BLEU+1 scores compared to the positive examples and in absolute terms. The low BLEU+1 means that PRO effectively has to learn from positive examples only. 12 Proce dinSgosfi oa,f tB huel 5g1arsita, An Anu gauls Mt 4e-e9ti n2g01 o3f. th ?c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 12–17, Avg. Lengths Avg. BLEU+1 iter. pos neg ref. pos neg 1 45.2 44.6 46.5 52.5 37.6 2 3 4 5 ... 25 46.4 46.4 46.4 46.3 ... 47.9 70.5 261.0 250.0 248.0 ... 229.0 53.2 53.4 53.0 53.0 ... 52.5 52.8 52.4 52.0 52.1 ... 52.2 14.5 2.19 2.30 2.34 ... 2.81 Table 1: PRO iterations, tuning on long sentences. Table 1shows an optimization run of PRO when tuning on long sentences. We can see monsters after iterations in which positive examples are on average longer than negative ones (e.g., iter. 1). As a result, PRO learns to generate longer sentences, but it overshoots too much (iter. 2), which gives rise to monsters. Ideally, the learning algorithm should be able to recover from overshooting. However, once monsters are encountered, they quickly start dominating, with no chance for PRO to recover since it accumulates n-best lists, and thus also monsters, over iterations. As a result, PRO keeps jumping up and down and converges to random values, as Figure 1 shows. By default, PRO’s parameters are averaged over iterations, and thus the final result is quite mediocre, but selecting the highest tuning score does not solve the problem either: for example, on Figure 1, PRO never achieves a BLEU better than that for the default initialization parameters. iteration Figure 1: PRO tuning results on long sentences across iterations. The dark-gray line shows the tuning BLEU (left axis), the light-gray one is the hypothesis/reference length ratio (right axis). Figure 2 shows the translations after iterations 1, 3 and 4; the last two are monsters. The monster at iteration 3 is potentially useful, but that at iteration 4 is clearly unsuitable as a negative example. Optimizer Objective BLEU PROsent-BLEU+144.57 MERT corpus-BLEU 47.53 MIRA pseudo-doc-BLEU 47.80 PRO (6= objective)pseudo-doc-BLEU21.35 PMRIORA (6= =(6= o bojbejcetcivteiv)e) sent-BLEU+1 47.59 PMRIRO,A PC (6=-sm obojoectthiv,e g)roundfixed sent-BLEU+145.71 Table 2: PRO vs. MERT vs. MIRA. We also checked whether other popular optimizers yield very low BLEU scores at test time when tuned on long sentences. Lines 2-3 in Table 2 show that this is not the case for MERT and MIRA. Since they optimize objectives that are different from PRO’s,1 we further experimented with plugging MIRA’s objective into PRO and PRO’s objective into MIRA. The resulting MIRA scores were not much different from before, while PRO’s score dropped even further; we also found mon- sters. Next, we applied the length fix for PRO proposed in (Nakov et al., 2012); this helped a bit, but still left PRO two BLEU points behind and MIRA, and the monsters did not go away. We can conclude that the monster problem is PRO-specific, cannot be blamed on the objective function, and is different from the length bias. Note also that monsters are not specific to a dataset or language pair. We found them when tuning on the top-50% of WMT10 and testing on WMT1 1 for Spanish-English; this yielded a drop in BLEU from 29.63 (1M2/emEs/RprTes)la to 27.12 n(inPg/RtmOp.)1.1 MERT2 **REF** : but we have to close ranks with each other and realize that in unity there is strength while in division there is weakness . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - **IT1** : but we are that we add our ranks to some of us and that we know that in the strength and weakness in **IT3** : , we are the but of the that that the , and , of ranks the the on the the our the our the some of we can include , and , of to the of we know the the our in of the of some people , force of the that that the in of the that that the the weakness Union the the , and **IT4** : namely Dr Heba Handossah and Dr Mona been pushed aside because a larger story EU Ambassador to Egypt Ian Burg highlighted 've dragged us backwards and dragged our speaking , never balme your defaulting a December 7th 1941 in Pearl Harbor ) we can include ranks will be joined by all 've dragged us backwards and dragged our $ 3 .8 billion in tourism income proceeds Chamber are divided among themselves : some ' ve dragged us backwards and dragged our were exaggerated . Al @-@ Hakim namely Dr Heba Handossah and Dr Mona December 7th 1941 in Pearl Harbor ) cases might be known to us December 7th 1941 in Pearl Harbor ) platform depends on combating all liberal policies Track and Field Federation shortened strength as well face several challenges , namely Dr Heba Handossah and Dr Mona platform depends on combating all liberal policies the report forecast that the weak structure Ftroai ngtkhsu et rahefef 2he : Ea t h xte ha,e motfo pstohlmeee r leafst eorfe wne c, et etr laonngs olfa t hei opnar a ofn sdo hhee oy fpwhaoitst hh a]r ee usisn i ostu tofra tnhes ilna tbiakoern s, haef ctoeokr it hee roant ainod nthse 1 t,h 3we aknonw d, 4. T@-@h eAl l tahes ft trwce o, tho ypotheses are monsters. 1See (Cherry and Foster, 2012) for details on objectives. 2Also, using PRO to initialize MERT, as implemented in Moses, yields 46.52 BLEU and monsters, but using MERT to initialize PRO yields 47.55 and no monsters. 13 3 Slaying Monsters: Theory Below we explain what monsters are and where they come from. Then, we propose various monster slaying techniques to be applied during PRO’s selection and acceptance steps. 3.1 What is PRO? PRO is a batch optimizer that iterates between (i) translation: using the current parameter values, generate k-best translations, and (ii) optimization: using the translations from all previous iterations, find new parameter values. The optimization step has four substeps: 1. Sampling: For each sentence, sample uniformly at random Γ = 5000 pairs from the set of all candidate translations for that sentence from all previous iterations. 2. Selection: From these sampled pairs, select those for which the absolute difference between their BLEU+1 scores is higher than α = 0.05 (note: this is 5 BLEU+1 points). 3. Acceptance: For each sentence, accept the Ξ = 50 selected pairs with the highest absolute difference in their BLEU+1 scores. 4. Learning: Assemble the accepted pairs for all sentences into a single set and use it to train a ranker to prefer the higher-scoring sentence in each pair. We believe that monsters are nurtured by PRO’s selection and acceptance policies. PRO’s selection step filters pairs involving hypotheses that differ by less than five BLEU+1 points, but it does not cut-off ones that differ too much based on BLEU+1 or length. PRO’s acceptance step selects Ξ = 50 pairs with the highest BLEU+1 differentials, which creates breeding ground for monsters since these pairs are very likely to include one monster and one good hypothesis. Below we discuss monster slaying geared towards the selection and acceptance steps of PRO. 3.2 Slaying at Selection In the selection step, PRO filters pairs for which the difference in BLEU+1 is less than five points, but it has no cut-off on the maximum BLEU+1 differentials nor cut-offs based on absolute length or difference in length. Here, we propose several selection filters, both deterministic and probabilistic. Cut-offs. A cut-off is a deterministic rule that filters out pairs that do not comply with some criteria. We experiment with a maximal cut-off on (a) the difference in BLEU+1 scores and (b) the difference in lengths. These are relative cut-offs because they refer to the pair, but absolute cut-offs that apply to each of the elements in the pair are also possible (not explored here). Cut-offs (a) and (b) slay monsters by not allowing the negative examples to get much worse in BLEU+1 or in length than the positive example in the pair. Filtering outliers. Outliers are rare or extreme observations in a sample. We assume normal distribution of the BLEU+1 scores (or of the lengths) of the translation hypotheses for the same source sentence, and we define as outliers hypotheses whose BLEU+1 (or length) is more than λ standard deviations away from the sample average. We apply the outlier filter to both the positive and the negative example in a pair, but it is more important for the latter. We experiment with values of λ like 2 and 3. This filtering slays monsters because they are likely outliers. However, it will not work if the population gets riddled with monsters, in which case they would become the norm. Stochastic sampling. Instead of filtering extreme examples, we can randomly sample pairs according to their probability of being typical. Let us assume that the values of the local scoring functions, i.e., the BLEU+1 scores, are distributed nor- mally: g(i, j) ∼ N(µ, σ2). Given a sample of hypothesis (tira,nj)sl ∼atio Nn(sµ {j} of the same source sentpeontchee i, we can ensstim {ja}te o σ empirically. Then, the difference ∆ = g(i, j) − g(i, j0) would be tdhisetr diibfufteerde normally w gi(thi, mean zero and variance 2σ2. Now, given a pair of examples, we can calculate their ∆, and we can choose to select the pair with some probability, according to N(0, 2σ2). 3.3 Slaying at Acceptance Another problem is caused by the acceptance mechanism of PRO: among all selected pairs, it accepts the top-Ξ with the highest BLEU+1 differentials. It is easy to see that these differentials are highest for nonmonster–monster pairs if such pairs exist. One way to avoid focusing primarily on such pairs is to accept a random set of pairs, among the ones that survived the selection step. One possible caveat is that we can lose some of the discriminative power of PRO by focusing on examples that are not different enough. Ξ 14 TESTING TUNING (run 1, it. 25, avg.) TEST(tune:full) PRO fix Avg. for 3 reruns BLEU StdDev Pos Lengths Neg Ref BLEU+1 Avg. for 3 reruns Pos Neg BLEU StdDev PRO (baseline)44.700.26647.9229.052.552.22.847.800.052 Max diff. cut-offBLEU+1 max=10†47.940.16547.949.649.449.439.947.770.035 BLEU+1 max=20 † 47.73 0.136 47.7 55.5 51.1 49.8 32.7 47.85 0.049 LEN max=5 † 48.09 0.021 46.8 47.0 47.9 52.9 37.8 47.73 0.051 LEN max=10 † 47.99 0.025 47.3 48.5 48.7 52.5 35.6 47.80 0.056 OutliersBLEU+1 λ=2.0†48.050.11946.847.247.752.239.547.470.090 BLEU+1 λ=3.0 LEN λ=2.0 LEN λ=3.0 47.12 46.68 47.02 1.348 2.005 0.727 47.6 49.3 48.2 168.0 82.7 163.0 53.0 53.1 51.4 51.7 52.3 51.4 3.9 5.3 4.2 47.53 47.49 47.65 0.038 0.085 0.096 Stoch. sampl.∆ BLEU+146.331.00046.8216.053.353.12.447.740.035 ∆ LEN 46.36 1.281 47.4 201.0 52.9 53.4 2.9 47.78 0.081 Table 3: Some fixes to PRO (select pairs with highest BLEU+1 differential, also require at least 5 BLEU+1 points difference). A dagger (†) indicates selection fixes that successfully get rid of monsters. 4 Attacking Monsters: Practice Below, we first present our general experimental setup. Then, we present the results for the various selection alternatives, both with the original acceptance strategy and with random acceptance. 4.1 Experimental Setup We used a phrase-based SMT model (Koehn et al., 2003) as implemented in the Moses toolkit (Koehn et al., 2007). We trained on all Arabic-English data for NIST 2012 except for UN, we tuned on (the longest-50% of) the MT06 sentences, and we tested on MT09. We used the MADA ATB segmentation for Arabic (Roth et al., 2008) and truecasing for English, phrases of maximal length 7, Kneser-Ney smoothing, and lexicalized reorder- ing (Koehn et al., 2005), and a 5-gram language model, trained on GigaWord v.5 using KenLM (Heafield, 2011). We dropped unknown words both at tuning and testing, and we used minimum Bayes risk decoding at testing (Kumar and Byrne, 2004). We evaluated the output with NIST’s scoring tool v.13a, cased. We used the Moses implementations of MERT, PRO and batch MIRA, with the –return-best-dev parameter for the latter. We ran these optimizers for up to 25 iterations and we used 1000-best lists. For stability (Foster and Kuhn, 2009), we performed three reruns of each experiment (tuning + evaluation), and we report averaged scores. 4.2 Selection Alternatives Table 3 presents the results for different selection alternatives. The first two columns show the testing results: average BLEU and standard deviation over three reruns. The following five columns show statistics about the last iteration (it. 25) of PRO’s tuning for the worst rerun: average lengths of the positive and the negative examples and average effective reference length, followed by average BLEU+1 scores for the positive and the negative examples in the pairs. The last two columns present the results when tuning on the full tuning set. These are included to verify the behavior of PRO in a nonmonster prone environment. We can see in Table 3 that all selection mechanisms considerably improve BLEU compared to the baseline PRO, by 2-3 BLEU points. However, not every selection alternative gets rid of monsters, which can be seen by the large lengths and low BLEU+1 for the negative examples (in bold). The max cut-offs for BLEU+1 and for lengths both slay the monsters, but the latter yields much lower standard deviation (thirteen times lower than for the baseline PRO!), thus considerably increasing PRO’s stability. On the full dataset, BLEU scores are about the same as for the original PRO (with small improvement for BLEU+1 max=20), but the standard deviations are slightly better. Rejecting outliers using BLEU+1 and λ = 3 is not strong enough to filter out monsters, but making this criterion more strict by setting λ = 2, yields competitive BLEU and kills the monsters. Rejecting outliers based on length does not work as effectively though. We can think of two possible reasons: (i) lengths are not normally distributed, they are more Poisson-like, and (ii) the acceptance criterion is based on the top-Ξ differentials based on BLEU+1, not based on length. On the full dataset, rejecting outliers, BLEU+1 and length, yields lower BLEU and less stability. 15 TESTING TUNING (run 1, it. 25, avg.) TEST(tune:full) Avg. for 3 reruns Lengths BLEU+1 Avg. for 3 reruns PRO fix BLEU StdDev Pos Neg Ref Pos Neg BLEU StdDev PRO (baseline)44.700.26647.9229.052.552.22.847.800.052 Rand. acceptPRO, rand††47.870.14747.748.548.7047.742.947.590.114 OutliersBLEU+1 λ=2.0, rand∗47.850.07848.248.448.947.543.647.620.091 BLEU+1 λ=3.0, rand 47.97 0.168 47.6 47.6 48.4 47.8 43.6 47.44 0.070 LEN λ=2.0, rand∗ 47.69 0.114 47.8 47.8 48.6 47.9 43.6 47.48 0.046 LEN λ=3.0, rand 47.89 0.235 47.8 48.0 48.7 47.7 43. 1 47.64 0.090 Stoch. sampl.∆ BLEU+1, rand∗47.990.08747.948.048.747.843.547.670.096 ∆ LEN, rand∗ 47.94 0.060 47.8 47.9 48.6 47.8 43.6 47.65 0.097 Table 4: More fixes to PRO (with random acceptance, no minimum BLEU+1). The (††) indicates that random acceptance kills monsters. The asterisk (∗) indicates improved stability over random acceptance. Reasons (i) and (ii) arguably also apply to stochastic sampling of differentials (for BLEU+1 or for length), which fails to kill the monsters, maybe because it gives them some probability of being selected by design. To alleviate this, we test the above settings with random acceptance. 4.3 Random Acceptance Table 4 shows the results for accepting training pairs for PRO uniformly at random. To eliminate possible biases, we also removed the min=0.05 BLEU+1 selection criterion. Surprisingly, this setup effectively eliminated the monster problem. Further coupling this with the distributional criteria can also yield increased stability, and even small further increase in test BLEU. For instance, rejecting BLEU outliers with λ = 2 yields comparable average test BLEU, but with only half the standard deviation. On the other hand, using the stochastic sampling of differentials based on either BLEU+1 or lengths improves the test BLEU score while increasing the stability across runs. The random acceptance has a caveat though: it generally decreases the discriminative power of PRO, yielding worse results when tuning on the full, nonmonster prone tuning dataset. Stochastic selection does help to alleviate this problem. Yet, the results are not as good as when using a max cut-off for the length. Therefore, we recommend using the latter as a default setting. 5 Related Work We are not aware of previous work that discusses the issue of monsters, but there has been work on a different, length problem with PRO (Nakov et al., 2012). We have seen that its solution, fix the smoothing in BLEU+1, did not work for us. The stability of MERT has been improved using regularization (Cer et al., 2008), random restarts (Moore and Quirk, 2008), multiple replications (Clark et al., 2011), and parameter aggregation (Cettolo et al., 2011). With the emergence of new optimization techniques, there have been studies that compare stability between MIRA–MERT (Chiang et al., 2008; Chiang et al., 2009; Cherry and Foster, 2012), PRO–MERT (Hopkins and May, 2011), MIRA– PRO–MERT (Cherry and Foster, 2012; Gimpel and Smith, 2012; Nakov et al., 2012). Pathological verbosity can be an issue when tuning MERT on recall-oriented metrics such as METEOR (Lavie and Denkowski, 2009; Denkowski and Lavie, 2011). Large variance between the results obtained with MIRA has also been reported (Simianer et al., 2012). However, none of this work has focused on monsters. 6 Tale’s Moral and Future Battles We have studied a problem with PRO, namely that it can fall victim to monsters, overly long negative examples with very low BLEU+1 scores, which are unsuitable for learning. We have proposed several effective ways to address this problem, based on length- and BLEU+1-based cut-offs, outlier filters and stochastic sampling. The best of these fixes have not only slayed the monsters, but have also brought much higher stability to PRO as well as improved test-time BLEU scores. These benefits are less visible on the full dataset, but we still recommend them to everybody who uses PRO as protection against monsters. Monsters are inherent in PRO; they just do not always take over. In future work, we plan a deeper look at the mechanism of monster creation in PRO and its possible connection to PRO’s length bias. 16 References Daniel Cer, Daniel Jurafsky, and Christopher Manning. 2008. Regularization and search for minimum error rate training. In Proc. of Workshop on Statistical Machine Translation, WMT ’08, pages 26–34. Mauro Cettolo, Nicola Bertoldi, and Marcello Federico. 2011. Methods for smoothing the optimizer instability in SMT. MT Summit XIII: the Machine Translation Summit, pages 32–39. Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT ’ 12, pages 427–436. David Chiang, Yuval Marton, and Philip Resnik. 2008. Online large-margin training of syntactic and structural translation features. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 224–233. David Chiang, Kevin Knight, and Wei Wang. 2009. 11,001 new features for statistical machine transla- tion. In Proc. of the Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT ’09, pages 218–226. Jonathan Clark, Chris Dyer, Alon Lavie, and Noah Smith. 2011. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the Meeting of the Association for Computational Linguistics, ACL ’ 11, pages 176–181 . Michael Denkowski and Alon Lavie. 2011. Meteortuned phrase-based SMT: CMU French-English and Haitian-English systems for WMT 2011. Technical report, CMU-LTI-1 1-01 1, Language Technologies Institute, Carnegie Mellon University. George Foster and Roland Kuhn. 2009. Stabilizing minimum error rate training. In Proceedings of the Workshop on Statistical Machine Translation, StatMT ’09, pages 242–249. Kevin Gimpel and Noah Smith. 2012. Structured ramp loss minimization for machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT ’ 12, pages 221–231. Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Workshop on Statistical Machine Translation, WMT ’ 11, pages 187–197. Mark Hopkins and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 11, pages 1352–1362. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, HLTNAACL ’03, pages 48–54. Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of the International Workshop on Spoken Language Translation, IWSLT ’05. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. of the Meeting of the Association for Computational Linguistics, ACL ’07, pages 177–180. Shankar Kumar and William Byrne. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, HLT-NAACL ’04, pages 169–176. Alon Lavie and Michael Denkowski. 2009. The METEOR metric for automatic evaluation of machine translation. Machine Translation, 23: 105–1 15. Robert Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the International Conference on Computational Linguistics, COLING ’08, pages 585–592. Preslav Nakov, Francisco Guzm a´n, and Stephan Vogel. 2012. Optimizing for sentence-level BLEU+1 yields short translations. In Proceedings ofthe International Conference on Computational Linguistics, COLING ’ 12, pages 1979–1994. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Meeting of the Association for Computational Linguistics, ACL ’03, pages 160–167. Ryan Roth, Owen Rambow, Nizar Habash, Mona Diab, and Cynthia Rudin. 2008. Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In Proceedings of the Meeting of the Association for Computational Linguistics, ACL ’08, pages 117–120. Patrick Simianer, Stefan Riezler, and Chris Dyer. 2012. Joint feature selection in distributed stochastic learning for large-scale discriminative training in smt. In Proceedings of the Meeting of the Association for Computational Linguistics, ACL ’ 12, pages 11–21. Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’07, pages 764–773. 17

3 0.45313588 318 acl-2013-Sentiment Relevance

Author: Christian Scheible ; Hinrich Schutze

Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.

4 0.45030791 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

Author: Matt Post ; Shane Bergsma

Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.

5 0.44686595 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

Author: Angeliki Lazaridou ; Ivan Titov ; Caroline Sporleder

Abstract: We propose a joint model for unsupervised induction of sentiment, aspect and discourse information and show that by incorporating a notion of latent discourse relations in the model, we improve the prediction accuracy for aspect and sentiment polarity on the sub-sentential level. We deviate from the traditional view of discourse, as we induce types of discourse relations and associated discourse cues relevant to the considered opinion analysis task; consequently, the induced discourse relations play the role of opinion and aspect shifters. The quantitative analysis that we conducted indicated that the integration of a discourse model increased the prediction accuracy results with respect to the discourse-agnostic approach and the qualitative analysis suggests that the induced representations encode a meaningful discourse structure.

6 0.44001058 295 acl-2013-Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages

7 0.43976814 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

8 0.43895742 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

9 0.43863425 257 acl-2013-Natural Language Models for Predicting Programming Comments

10 0.43863356 236 acl-2013-Mapping Source to Target Strings without Alignment by Analogical Learning: A Case Study with Transliteration

11 0.43716696 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

12 0.43619427 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

13 0.43546033 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

14 0.43291602 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

15 0.43153208 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

16 0.43030098 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

17 0.42980546 310 acl-2013-Semantic Frames to Predict Stock Price Movement

18 0.42962751 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

19 0.42940205 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

20 0.42908284 80 acl-2013-Chinese Parsing Exploiting Characters