acl acl2011 acl2011-288 knowledge-graph by maker-knowledge-mining

288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications


Source: pdf

Author: Cecilia Ovesdotter Alm

Abstract: This opinion paper discusses subjective natural language problems in terms of their motivations, applications, characterizations, and implications. It argues that such problems deserve increased attention because of their potential to challenge the status of theoretical understanding, problem-solving methods, and evaluation techniques in computational linguistics. The author supports a more holistic approach to such problems; a view that extends beyond opinion mining or sentiment analysis.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This opinion paper discusses subjective natural language problems in terms of their motivations, applications, characterizations, and implications. [sent-2, score-0.588]

2 It argues that such problems deserve increased attention because of their potential to challenge the status of theoretical understanding, problem-solving methods, and evaluation techniques in computational linguistics. [sent-3, score-0.266]

3 The author supports a more holistic approach to such problems; a view that extends beyond opinion mining or sentiment analysis. [sent-4, score-0.162]

4 1 Introduction Interest in subjective meaning and individual, interpersonal or social, poetic/creative, and affective dimensions of language is not new to linguistics or computational approaches to language. [sent-5, score-0.799]

5 Language analysts, including computational linguists, have long acknowledged the importance of such topics (B¨ uhler, 1934; Lyons, 1977; Jakobson, 1996; Halliday, 1996; Wiebe et al, 2004; Wilson et al, 2005). [sent-6, score-0.093]

6 The terms subjectivity or subjectivity analysis are also established in the NLP literature to cover these topics of growing inquiry. [sent-8, score-0.207]

7 The purpose of this opinion paper is not to provide a survey of subjective natural language prob107 lems. [sent-9, score-0.444]

8 Rather, it intends to launch discussions about how subjective natural language problems have a vital role to play in computational linguistics and in shaping fundamental questions in the field for the future. [sent-10, score-0.574]

9 An additional point of departure is that a continuing focus on primarily the fundamental distinction of facts vs. [sent-11, score-0.102]

10 An expanded scope of problem types will benefit our understanding of subjective language and approaches to tackling this family of problems. [sent-14, score-0.517]

11 It is definitely reasonable to assume that problems involving subjective perception, meaning, and language behaviors will diversify and earn increased attention from computational approaches to language. [sent-15, score-0.581]

12 Banea et al already noted: “We have seen a surge in interest towards the application of automatic tools and techniques for the extraction of opinions, emotions, and sentiments in text (subjectivity)” (p. [sent-16, score-0.128]

13 Therefore, it is timely and useful to examine subjective natural language problems from different angles. [sent-18, score-0.524]

14 The first angle that the paper comments upon is what motivates investigatory efforts into such problems. [sent-20, score-0.123]

15 Next, the paper clarifies what subjective natural language processing problems are by providing a few illustrative examples of some relevant problem-solving and application areas. [sent-21, score-0.524]

16 This is followed by discussing yet another angle of this family of problems, namely what some of their characteristics are. [sent-22, score-0.09]

17 Finally, potential implications for the field of computational linguistics at large are addressed, with the hope that this short piece will spawn continued discussion. [sent-23, score-0.075]

18 Subjective natural language processing problems represent exciting frontier areas that directly relate to advances in artificial natural language behavior, improved intelligent access to information, and more agreeable and comfortable language-based human-computer interaction. [sent-27, score-0.144]

19 As just one example, interactional systems continue to suffer from a bias toward ‘neutral’, unexpressive (and thus communicatively cumbersome) language. [sent-28, score-0.096]

20 From a conceptual and methodological perspective, automatic subjective text analysis approaches have potential to challenge the state of theoretical understanding, problem-solving methods, and evaluation techniques. [sent-30, score-0.38]

21 3 Applications Subjective natural language problems extend well beyond sentiment and opinion analysis. [sent-32, score-0.266]

22 They involve a myriad of topics–from linguistic creativity via inference-based forecasting to generation of social and affective language use. [sent-33, score-0.41]

23 1 Case 1: Modeling affect in language A range of affective computing applications apply to language (Picard, 1997). [sent-36, score-0.391]

24 One such area is automatically inferring affect in text. [sent-37, score-0.113]

25 Work on automatic affect inference from language data has generally involved recognition or generation models that contrast a range of affective states either along affect categories (e. [sent-38, score-0.504]

26 As one example, Alm developed an affect dataset and explored automatic prediction of affect 108 in text at the sentence level that accounted for different levels of affective granularity (Alm, 2008; Alm, 2009; Alm, 2010). [sent-44, score-0.504]

27 There are other examples of the strong interest in affective NLP or affective interfacing (Liu et al, 2003; Holzman and Pottenger, 2003; Francisco and Gerv a´s, 2006; Kalra and Karahalios, 2005; G ´en ´ereux and Evans, 2006; Mihalcea and Liu, 2006). [sent-45, score-0.556]

28 Affective semantics is difficult for many automatic techniques to capture because rather than simple text-derived ‘surface’ features, it requires sophisticated, ‘deep’ natural language understanding that draws on subjective human knowledge, interpretation, and experience. [sent-46, score-0.423]

29 2 Case 2: Image sense discrimination Image sense discrimination refers to the problem of determining which images belong together (or not) (Loeff et al, 2006; Forsyth et al, 2009). [sent-49, score-0.168]

30 What counts as the sense of an image adds subjective complexity. [sent-50, score-0.477]

31 For instance, images capture “both word and iconographic sense distinctions . [sent-51, score-0.235]

32 a MACHINE or a BIRD; iconographic distinctions could additionally include birds standing, vs. [sent-56, score-0.109]

33 sense distinctions encoded by further descriptive modication in text. [sent-59, score-0.092]

34 In other words, images can evoke a range of subtle, subjective meaning phenomena. [sent-62, score-0.519]

35 Challenges for annotating images according to lexical meaning (and the use of verification as one way to assess annotation quality) have been discussed in depth, cf. [sent-63, score-0.139]

36 3 Case 3: Multilingual communication The world is multilingual and so are many human language technology users. [sent-66, score-0.047]

37 Arguably, future generations of users will increasingly demand tools capable of effective multilingual tasking, communication and inference-making (besides expecting adjustments to non-native and cross-linguistic behaviors). [sent-68, score-0.092]

38 The challenges of code-mixing include dynamically adapting sociolinguistic forms and func- tions, and they involve both flexible, subjective sense-making and perspective-taking. [sent-69, score-0.448]

39 State-of-the-art intelligent computer-assisted language learning (iCALL) approaches generally bundle language learners into a homogeneous group. [sent-72, score-0.04]

40 However, learners are individuals exhibiting a vast range of various kinds of differences. [sent-73, score-0.04]

41 The subjective aspects here are at another level than meaning. [sent-74, score-0.38]

42 Language learners apply personalized strategies to acquisition, and they have a myriad of individual communicative needs, motivations, backgrounds, and learning goals. [sent-75, score-0.129]

43 A framework that recognizes subjectivity in iCALL might exploit such differences to create tailored acquisition flows that address learning curves and proficiency enhancement in an individualized manner. [sent-76, score-0.14]

44 4 Characterizations It must be acknowledged that a problem such as inferring affective meaning from text is a substan- tially different kind of ‘beast’ compared to predicting, for example, part-of-speech tags. [sent-78, score-0.381]

45 1 Identifying such problems and tackling their solutions is also becoming increasingly desirable with the boom of personalized, user-generated contents. [sent-79, score-0.144]

46 It is a useful intellectual exercise to consider what the general characteristics of this family of problems are. [sent-80, score-0.306]

47 This initial discussion is likely not complete; that is also not the scope of this piece. [sent-81, score-0.044]

48 The following list is rather intended as a set of departure points to spark discussion. [sent-82, score-0.052]

49 • • Non-traditional intersubjectivity Subjective nNaotnur-atlr language processing problems are generally problems of meaning or communication where so-called intersubjective agreement does not apply in the same way as in traditional tasks. [sent-83, score-0.402]

50 Theory gaps A particular challenge is that subjective language phenomena are ogfet eins lheasts s understood by current theory. [sent-84, score-0.38]

51 As an example, in the affective sciences there is a vibrant debate– indeed a controversy–on how to model or even define a concept such as emotion. [sent-85, score-0.33]

52 dataset annotators) may be sensitive to sensory demands, cognitive fatigue, and external factors that affect judgements made at a particular place and point in time. [sent-91, score-0.113]

53 Arguably, this behavioral variation is part of the given subjective language problem. [sent-92, score-0.38]

54 For such problems, acceptability may buet a more useful concept than ‘right’ and ’wrong’ . [sent-94, score-0.072]

55 (Recognizing this does not exclude that acceptability may in clear, prototypical cases converge on just one solution, but this scenario may not apply to a majority of instances. [sent-96, score-0.072]

56 If we recognize that (ground) truth is, under some circumstances, a less useful concept–a problem reduction and simplification that is undesirable because it does not reflect the behavior of language users–how should evaluation then be approached with rigor? [sent-98, score-0.067]

57 • Social/interpersonal focus Many problems in tShoisc family concern lin fofecruesnc Me (or generation) of complex, subtle dimensions of meaning and information, informed by experience or socioculturally influenced language use in realsituation contexts (including human-computer interaction). [sent-99, score-0.341]

58 They tend to tie into sociolinguistic and interactional insights on language (Mesthrie et al, 2009). [sent-100, score-0.229]

59 • Multimodality and interdisciplinarity Many oMf uthlteimse problems dha ivnet an iisncteiprlaicntiavrei ayn Md ahnuymanistic basis. [sent-101, score-0.203]

60 For example, written web texts are accompanied by visual matter (‘texts’), such as images, videos, and text aesthetics (font choices, etc. [sent-103, score-0.059]

61 5 Implications The cases discussed above in section 3 are just selections from the broad range of topics involving aspects of subjectivity, but at least they provide glimpses at what can be done in this area. [sent-107, score-0.045]

62 The list could be expanded to problems intersecting with the digital humanities, healthcare, economics or finance, and political science, but such discussions go beyond the scope of this paper. [sent-108, score-0.188]

63 Instead the last item on this agenda concerns the broader, disciplinary implications that subjective natural language problems raise. [sent-109, score-0.658]

64 • Evaluation If the concept of “ground truth” nEeveadlsu attoi bne Irfea tsheses csoendc efport subjective dna trtuutrha”l language processing tasks, different and alternative evaluation techniques deserve careful thought. [sent-110, score-0.459]

65 For example, evaluating user interaction and satisfaction, as Liu et al (2003) did for an affective email client, may be relevant. [sent-113, score-0.489]

66 Combining standard metrics of system performance with alternative assessment methods may provide especially valuable holistic evaluation information. [sent-125, score-0.04]

67 • Dataset annotation Studies of human annotatDioantsa generally report on i enste orfan hnuomtaatnor a agreement, and many annotation schemes and efforts seek to reduce variability. [sent-126, score-0.064]

68 That may not be appropriate (Zaenen, 2006), considering these kinds of problems (Alm, 2010). [sent-127, score-0.144]

69 Rather, it makes sense to take advantage of corpus annotation as a resource, beyond computational work, for investigation into actual language behaviors associated with the set of problems dealt with in this paper (e. [sent-128, score-0.243]

70 For example, label-internal divergence and intraannotator variation may provide useful understanding of the language phenomenon at stake; surveys, video recordings, think-alouds, or interviews may give additional insights on human (annotator) behavior. [sent-133, score-0.108]

71 The genetic computation community has theorized concepts such as user fatigue and devised robust algorithms that integrate interactional, human input in effective ways (Llor `a et al, 2005; Llor a` et al, 2005). [sent-134, score-0.12]

72 Reporting on sociolinguistic information in datasets can be useful properties for many problems, assuming that it is feasible and ethical for a given context. [sent-136, score-0.157]

73 • Analysis of ethical risks and gains Overall, hAonwa language ahnicda technology c goaailnessc eO vine society is rarely covered; but see Sproat (2010) for an important exception. [sent-137, score-0.089]

74 More specifically, whereas ethics has been discussed within the field of affective computing (Picard, 1997), how ethics applies to language technologies remains an unexplored area. [sent-138, score-0.396]

75 Potential problematic implications of language technologies– or how disciplinary contributions affect the linguistic world–have rarely been a point of discussion. [sent-140, score-0.247]

76 For example, there are convincing arguments for gains that will result from an increased engagement with topics related to endangered languages and language documentation in computational linguistics (Bird, 2009), see also Abney and Bird (2010). [sent-142, score-0.045]

77 By implication, such efforts may contribute to linguistic and cultural sustainability. [sent-143, score-0.064]

78 • Interdisciplinary mixing Given that many subjective pnalitnuararly language problem hhaavte a ahnuymanistic and interpersonal basis, it seems particularly pivotal with investigatory ‘mixing’ ef- forts that reach outside the computational linguistics community in multidisciplinary networks. [sent-144, score-0.543]

79 As an example, to improve assessment of subjective natural language processing tasks, lessons can be learned from the human-computer interaction and social computing communities, as well as from the digital humanities. [sent-145, score-0.463]

80 In addition, attention to multimodality will benefit increased interaction as it demands vision or tactile specialists, etc. [sent-146, score-0.13]

81 wrong answers, or even tractable solutions, present opportunities for intellectual growth. [sent-148, score-0.112]

82 These problems can constitute an opportunity for training new generations to face challenges. [sent-149, score-0.189]

83 Conclusion To conclude: paper argues, computational It is important ily of relevant there is a strong potential–or, as this a necessity–to expand the scope of linguistic research into subjectivity. [sent-150, score-0.044]

84 to recognize that there is a broad famsubjective natural language problems with theoretical and practical, real-world anchoring. [sent-151, score-0.144]

85 The paper has also pointed out that there are certain aspects that deserve special attention. [sent-152, score-0.079]

86 For instance, there are evaluation concepts in computational linguistics that, at least to some degree, detract atten2When thinking along multimodal lines, we might stand a chance at getting better at creating core models that apply successfully also to signed languages. [sent-153, score-0.062]

87 111 tion away from how subjective perception and production phenomena actually manifest themselves in natural language. [sent-154, score-0.38]

88 In encouraging a focus on efforts to achieve ’high-performing’ systems (as measured along traditional lines), there is risk involved–the sacrificing of opportunities for fundamental insights that may lead to a more thorough understanding of language uses and users. [sent-155, score-0.262]

89 Such insights may in fact decisively advance language science and artificial natural language intelligence. [sent-156, score-0.065]

90 Exploring the compositionality of emotions in text: Word emotions, sentence emotions and automated tagging. [sent-204, score-0.16]

91 Classification of emotions in Internet chat: An application of machine learning using speech phonemes. [sent-220, score-0.08]

92 A model of textual affect sensing using real-world knowledge International Conference on Intelligent User Interfaces, 125-132. [sent-242, score-0.113]

93 Combating user fatigue in iGAs: Partial ordering, Support Vector Machines, and synthetic fitness Proceedings of the Genetic and Evolutionary Computation Conference. [sent-246, score-0.12]

94 A literature survey of methods for analysis of subjective language. [sent-276, score-0.38]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subjective', 0.38), ('alm', 0.288), ('affective', 0.278), ('cecilia', 0.209), ('llor', 0.149), ('loeff', 0.149), ('problems', 0.144), ('ovesdotter', 0.131), ('al', 0.128), ('jakobson', 0.119), ('affect', 0.113), ('interactional', 0.096), ('ethical', 0.089), ('picard', 0.089), ('images', 0.084), ('subjectivity', 0.081), ('emotions', 0.08), ('fatigue', 0.079), ('deserve', 0.079), ('icall', 0.079), ('implications', 0.075), ('motivations', 0.074), ('acceptability', 0.072), ('intellectual', 0.072), ('bird', 0.068), ('sociolinguistic', 0.068), ('characterizations', 0.068), ('banea', 0.068), ('truth', 0.067), ('insights', 0.065), ('efforts', 0.064), ('opinion', 0.064), ('multimodal', 0.062), ('nicolas', 0.062), ('aesthetics', 0.059), ('ahnuymanistic', 0.059), ('disciplinary', 0.059), ('ereux', 0.059), ('ethics', 0.059), ('forsyth', 0.059), ('holzman', 0.059), ('iconographic', 0.059), ('igas', 0.059), ('individualized', 0.059), ('intersubjectivity', 0.059), ('investigatory', 0.059), ('kumara', 0.059), ('mesthrie', 0.059), ('sastry', 0.059), ('stylistics', 0.059), ('uhler', 0.059), ('weber', 0.059), ('ground', 0.058), ('sentiment', 0.058), ('behaviors', 0.057), ('meaning', 0.055), ('image', 0.055), ('gerv', 0.052), ('vibrant', 0.052), ('departure', 0.052), ('subtle', 0.051), ('distinctions', 0.05), ('family', 0.05), ('fundamental', 0.05), ('wiebe', 0.05), ('xavier', 0.049), ('lyons', 0.048), ('acknowledged', 0.048), ('multimodality', 0.048), ('jacques', 0.048), ('myriad', 0.048), ('ackstr', 0.048), ('zaenen', 0.048), ('multilingual', 0.047), ('arguably', 0.047), ('hugo', 0.045), ('generations', 0.045), ('interpersonal', 0.045), ('topics', 0.045), ('scope', 0.044), ('janyce', 0.043), ('argues', 0.043), ('creativity', 0.043), ('halliday', 0.043), ('arnold', 0.043), ('understanding', 0.043), ('interaction', 0.042), ('sense', 0.042), ('user', 0.041), ('social', 0.041), ('mihalcea', 0.041), ('personalized', 0.041), ('dimensions', 0.041), ('liu', 0.04), ('characteristics', 0.04), ('learners', 0.04), ('wilson', 0.04), ('demands', 0.04), ('holistic', 0.04), ('opportunities', 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

Author: Cecilia Ovesdotter Alm

Abstract: This opinion paper discusses subjective natural language problems in terms of their motivations, applications, characterizations, and implications. It argues that such problems deserve increased attention because of their potential to challenge the status of theoretical understanding, problem-solving methods, and evaluation techniques in computational linguistics. The author supports a more holistic approach to such problems; a view that extends beyond opinion mining or sentiment analysis.

2 0.11040072 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

Author: Cheng-Te Li ; Chien-Yuan Wang ; Chien-Lin Tseng ; Shou-De Lin

Abstract: Micro-blogging services provide platforms for users to share their feelings and ideas on the move. In this paper, we present a search-based demonstration system, called MemeTube, to summarize the sentiments of microblog messages in an audiovisual manner. MemeTube provides three main functions: (1) recognizing the sentiments of messages (2) generating music melody automatically based on detected sentiments, and (3) produce an animation of real-time piano playing for audiovisual display. Our MemeTube system can be accessed via: http://mslab.csie.ntu.edu.tw/memetube/ .

3 0.10476293 159 acl-2011-Identifying Noun Product Features that Imply Opinions

Author: Lei Zhang ; Bing Liu

Abstract: Identifying domain-dependent opinion words is a key problem in opinion mining and has been studied by several researchers. However, existing work has been focused on adjectives and to some extent verbs. Limited work has been done on nouns and noun phrases. In our work, we used the feature-based opinion mining model, and we found that in some domains nouns and noun phrases that indicate product features may also imply opinions. In many such cases, these nouns are not subjective but objective. Their involved sentences are also objective sentences and imply positive or negative opinions. Identifying such nouns and noun phrases and their polarities is very challenging but critical for effective opinion mining in these domains. To the best of our knowledge, this problem has not been studied in the literature. This paper proposes a method to deal with the problem. Experimental results based on real-life datasets show promising results. 1

4 0.090247244 204 acl-2011-Learning Word Vectors for Sentiment Analysis

Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts

Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.

5 0.087825641 194 acl-2011-Language Use: What can it tell us?

Author: Marjorie Freedman ; Alex Baron ; Vasin Punyakanok ; Ralph Weischedel

Abstract: For 20 years, information extraction has focused on facts expressed in text. In contrast, this paper is a snapshot of research in progress on inferring properties and relationships among participants in dialogs, even though these properties/relationships need not be expressed as facts. For instance, can a machine detect that someone is attempting to persuade another to action or to change beliefs or is asserting their credibility? We report results on both English and Arabic discussion forums. 1

6 0.085619792 253 acl-2011-PsychoSentiWordNet

7 0.083424345 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

8 0.082091667 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

9 0.081627756 133 acl-2011-Extracting Social Power Relationships from Natural Language

10 0.079788968 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

11 0.076980226 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

12 0.073711835 292 acl-2011-Target-dependent Twitter Sentiment Classification

13 0.071655579 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

14 0.066440895 105 acl-2011-Dr Sentiment Knows Everything!

15 0.065665014 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

16 0.065331087 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

17 0.065082259 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

18 0.063122466 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue

19 0.062885679 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models

20 0.062443957 252 acl-2011-Prototyping virtual instructors from human-human corpora


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.153), (1, 0.118), (2, 0.052), (3, -0.012), (4, -0.041), (5, 0.056), (6, 0.027), (7, -0.0), (8, -0.008), (9, -0.056), (10, -0.037), (11, -0.072), (12, -0.032), (13, 0.041), (14, -0.032), (15, -0.029), (16, 0.035), (17, 0.027), (18, -0.006), (19, -0.007), (20, 0.054), (21, 0.054), (22, -0.058), (23, 0.055), (24, -0.01), (25, 0.003), (26, 0.088), (27, -0.001), (28, -0.016), (29, -0.032), (30, 0.032), (31, 0.041), (32, -0.07), (33, 0.078), (34, 0.002), (35, 0.025), (36, -0.049), (37, 0.004), (38, 0.07), (39, 0.037), (40, 0.034), (41, -0.0), (42, 0.042), (43, -0.062), (44, -0.048), (45, -0.025), (46, -0.039), (47, -0.017), (48, -0.003), (49, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94533575 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

Author: Cecilia Ovesdotter Alm

Abstract: This opinion paper discusses subjective natural language problems in terms of their motivations, applications, characterizations, and implications. It argues that such problems deserve increased attention because of their potential to challenge the status of theoretical understanding, problem-solving methods, and evaluation techniques in computational linguistics. The author supports a more holistic approach to such problems; a view that extends beyond opinion mining or sentiment analysis.

2 0.78860283 194 acl-2011-Language Use: What can it tell us?

Author: Marjorie Freedman ; Alex Baron ; Vasin Punyakanok ; Ralph Weischedel

Abstract: For 20 years, information extraction has focused on facts expressed in text. In contrast, this paper is a snapshot of research in progress on inferring properties and relationships among participants in dialogs, even though these properties/relationships need not be expressed as facts. For instance, can a machine detect that someone is attempting to persuade another to action or to change beliefs or is asserting their credibility? We report results on both English and Arabic discussion forums. 1

3 0.68962371 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Author: Myle Ott ; Yejin Choi ; Claire Cardie ; Jeffrey T. Hancock

Abstract: Consumers increasingly rate, review and research products online (Jansen, 2010; Litvin et al., 2008). Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances of opinion spam, in this work we study deceptive opinion spam—fictitious opinions that have been deliberately written to sound authentic. Integrating work from psychology and computational linguistics, we develop and compare three approaches to detecting deceptive opinion spam, and ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset. Based on feature analysis of our learned models, we additionally make several theoretical contributions, including revealing a relationship between deceptive opinions and imaginative writing.

4 0.68751276 133 acl-2011-Extracting Social Power Relationships from Natural Language

Author: Philip Bramsen ; Martha Escobar-Molano ; Ami Patel ; Rafael Alonso

Abstract: Sociolinguists have long argued that social context influences language use in all manner of ways, resulting in lects 1. This paper explores a text classification problem we will call lect modeling, an example of what has been termed computational sociolinguistics. In particular, we use machine learning techniques to identify social power relationships between members of a social network, based purely on the content of their interpersonal communication. We rely on statistical methods, as opposed to language-specific engineering, to extract features which represent vocabulary and grammar usage indicative of social power lect. We then apply support vector machines to model the social power lects representing superior-subordinate communication in the Enron email corpus. Our results validate the treatment of lect modeling as a text classification problem – albeit a hard one – and constitute a case for future research in computational sociolinguistics. 1

5 0.6503818 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

Author: Amjad Abu-Jbara ; Barbara Rosario ; Kent Lyons

Abstract: In this paper, we address the problem of optimizing the style of textual content to make it more suitable to being listened to by a user as opposed to being read. We study the differences between the written style and the audio style by consulting the linguistics andjour- nalism literatures. Guided by this study, we suggest a number of linguistic features to distinguish between the two styles. We show the correctness of our features and the impact of style transformation on the user experience through statistical analysis, a style classification task, and a user study.

6 0.64371008 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations

7 0.64322913 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

8 0.64200276 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

9 0.60946614 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

10 0.6029954 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

11 0.56593513 159 acl-2011-Identifying Noun Product Features that Imply Opinions

12 0.5582642 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models

13 0.55190581 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

14 0.5444082 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech

15 0.54099423 55 acl-2011-Automatically Predicting Peer-Review Helpfulness

16 0.53602219 35 acl-2011-An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling

17 0.53272212 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

18 0.52660507 223 acl-2011-Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts

19 0.52296251 252 acl-2011-Prototyping virtual instructors from human-human corpora

20 0.51698607 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.514), (17, 0.025), (26, 0.024), (31, 0.015), (37, 0.052), (39, 0.024), (41, 0.05), (55, 0.014), (59, 0.034), (72, 0.031), (91, 0.031), (96, 0.099), (97, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90378737 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

Author: Cecilia Ovesdotter Alm

Abstract: This opinion paper discusses subjective natural language problems in terms of their motivations, applications, characterizations, and implications. It argues that such problems deserve increased attention because of their potential to challenge the status of theoretical understanding, problem-solving methods, and evaluation techniques in computational linguistics. The author supports a more holistic approach to such problems; a view that extends beyond opinion mining or sentiment analysis.

2 0.85328513 298 acl-2011-The ACL Anthology Searchbench

Author: Ulrich Schafer ; Bernd Kiefer ; Christian Spurk ; Jorg Steffen ; Rui Wang

Abstract: We describe a novel application for structured search in scientific digital libraries. The ACL Anthology Searchbench is meant to become a publicly available research tool to query the content of the ACL Anthology. The application provides search in both its bibliographic metadata and semantically analyzed full textual content. By combining these two features, very efficient and focused queries are possible. At the same time, the application serves as a showcase for the recent progress in natural language processing (NLP) research and language technology. The system currently indexes the textual content of 7,500 anthology papers from 2002–2009 with predicateargument-like semantic structures. It also provides useful search filters based on bibliographic metadata. It will be extended to provide the full anthology content and en- . hanced functionality based on further NLP techniques. 1 Introduction and Motivation Scientists in all disciplines nowadays are faced with a flood of new publications every day. In addition, more and more publications from the past become digitally available and thus even increase the amount. Finding relevant information and avoiding duplication of work have become urgent issues to be addressed by the scientific community. The organization and preservation of scientific knowledge in scientific publications, vulgo text documents, thwarts these efforts. From a viewpoint of 7 dfki .de / lt a computer scientist, scientific papers are just ‘unstructured information’ . At least in our own scientific community, Computational Linguistics, it is generally assumed that NLP could help to support search in such document collections. The ACL Anthology1 is a comprehensive elec- tronic collection of scientific papers in our own field (Bird et al., 2008). It is updated regularly with new publications, but also older papers have been scanned and are made available electronically. We have implemented the ACL Anthology Searchbench2 for two reasons: Our first aim is to provide a more targeted search facility in this collection than standard web search on the anthology website. In this sense, the Searchbench is meant to become a service to our own community. Our second motivation is to use the developed system as a showcase for the progress that has been made over the last years in precision-oriented deep linguistic parsing in terms of both efficiency and coverage, specifically in the context of the DELPHIN community3. Our system also uses further NLP techniques such as unsupervised term extraction, named entity recognition and part-of-speech (PoS) tagging. By automatically precomputing normalized semantic representations (predicate-argument structure) of each sentence in the anthology, the search space is structured and allows to find equivalent or related predicates even if they are expressed differ- 1http : / /www . aclweb .org/ anthology 2http : //aclasb . dfki . de 3http : / /www . de lph-in . net – DELPH-IN stands for DEep Linguistic Processing with HPSG INitiative. Portland,P Orroecge ondi,n UgSsA o,f 2 th1e J AunCeL 2-H0L1 T. 2 ?0c 1210 1S1ys Atesmso Dcieamtio n s ftorart Cio nms,p puatgaetiso 7n–al1 L3i,nguistics ently, e.g. in passive constructions, using synonyms, etc. By storing the semantic sentence structure along with the original text in a structured full-text search engine, it can be guaranteed that recall cannot fall behind the baseline of a fulltext search. In addition, the Searchbench also provides detailed bibliographic metadata for filtering as well as autosuggest texts for input fields computed from the corpus two further key features one can expect from such systems today, nevertheless very important for efficient search in digital libraries. We describe the offline preprocessing and deep parsing approach in Section 2. Section 3 concentrates on the generation of the semantic search index. In Section 4, we describe the search interface. We conclude in Section 5 and present an outlook to future extensions. – 2 Parsing the ACL Anthology The basis of the search index for the ACL Anthology are its original PDF documents, currently 8,200 from the years 2002 through 2009. To overcome quality problems in text extraction from PDF, we use a commercial PDF extractor based on OCR techniques. This approach guarantees uniform and highquality textual representations even from older papers in the anthology (before 2000) which mostly were scanned from printed paper versions. The general idea of the semantics-oriented access to scholarly paper content is to parse each sentence they contain with the open-source HPSG (Pollard and Sag, 1994) grammar for English (ERG; Flickinger (2002)) and then distill and index semantically structured representations for search. To make the deep parser robust, it is embedded in a NLP workflow. The coverage (percentage of full deeply parsed sentences) on the anthology corpus could be increased from 65 % to now more than 85 % through careful combination of several robustness techniques; for example: (1) chart pruning, directed search during parsing to increase per- formance, and also coverage for longer sentences (Cramer and Zhang, 2010); (2) chart mapping, a novel method for integrating preprocessing information in exactly the way the deep grammar expects it (Adolphs et al., 2008); (3) new version of the ERG with better handling of open word classes; (4) 8 more fine-grained named entity recognition, including recognition of citation patterns; (5) new, better suited parse ranking model (WeScience; Flickinger et al. (2010)). Because of limited space, we will focus on (1) and (2) below. A more detailed description and further results are available in (Sch a¨fer and Kiefer, 2011). Except for a small part of the named entity recognition components (citations, some terminology) and the parse ranking model, there are no further adaptations to genre or domain of the text corpus. This implies that the NLP workflow could be easily and modularly adapted to other (scientific or nonscientific) domains—mainly thanks to the generic and comprehensive language modelling in the ERG. The NLP preprocessing component workflow is implemented using the Heart of Gold NLP middleware architecture (Sch a¨fer, 2006). It starts with sentence boundary detection (SBR) and regular expression-based tokenization using its built-in component JTok, followed by the trigram-based PoS tagger TnT (Brants, 2000) trained on the Penn Treebank (Marcus et al., 1993) and the named entity recognizer SProUT (Dro z˙d z˙y n´ski et al., 2004). 2.1 Precise Preprocessing Integration with Chart Mapping Tagger output is combined with information from the named entity recognizer, e.g. delivering hypothetical information on citation expressions. The combined result is delivered as input to the deep parser PET (Callmeier, 2000) running the ERG. Here, citations, for example, can be treated as either persons, locations or appositions. Concerning punctuation, the ERG can make use of information on opening and closing quotation marks. Such information is often not explicit in the input text, e.g. when, as in our setup, gained through OCR which does not distinguish between ‘ and ’ or “ and However, a tokenizer can often guess (recon- ”. struct) leftness and rightness correctly. This information, passed to the deep parser via chart mapping, helps it to disambiguate. 2.2 Increased Processing Speed and Coverage through Chart Pruning In addition to a well-established discriminative maximum entropy model for post-analysis parse selection, we use an additional generative model as described in Cramer and Zhang (2010) to restrict the search space during parsing. This restriction increases efficiency, but also coverage, because the parse time was restricted to at most 60 CPU seconds on a standard PC, and more sentences could now be parsed within these bounds. A 4 GB limit for main memory consumption was far beyond what was ever needed. We saw a small but negligible decrease in parsing accuracy, 5.4 % best parses were not found due to the pruning of important chart edges. Ninomiya et al. (2006) did a very thorough comparison ofdifferent performance optimization strategies, and among those also a local pruning strategy similar to the one used here. There is an important difference between the systems, in that theirs works on a reduced context-free backbone first and reconstructs the results with the full grammar, while PET uses the HPSG grammar directly, with subsumption packing and partial unpacking to achieve a similar effect as the packed chart of a context-free parser. sentence length −→ Figure 1: Distribution of sentence length and mean parse times for mild pruning In total, we parsed 1,537,801 sentences, of which 57,832 (3.8 %) could not be parsed because of lexicon errors. Most of them were caused by OCR ar- tifacts resulting in unexpected punctuation character combinations. These can be identified and will be deleted in the future. Figure 1 displays the average parse time of processing with a mild chart pruning setting, together with the mean quadratic error. In addition, it contains the distribution of input sentences over sentence length. Obviously, the vast majority of sen9 tences has a length of at most 60 words4. The parse times only grow mildly due to the many optimization techniques in the original system, and also the new chart pruning method. The sentence length distribution has been integrated into Figure 1 to show that the predominant part of our real-world corpus can be processed using this information-rich method with very low parse times (overall average parse time < 2 s per sentence). The large amount of short inputs is at first surprising, even more so that most of these inputs can not be parsed. Most of these inputs are non-sentences such as headings, enumerations, footnotes, table cell content. There are several alternatives to deal with such input, one to identify and handle them in a preprocessing step, another to use a special root condition in the deep analysis component that is able to combine phrases with well-defined properties for inputs where no spanning result could be found. We employed the second method, which has the advantage that it handles a larger range of phenomena in a homogeneous way. Figure 2 shows the change in percentage of unparsed and timed out inputs for the mild pruning method with and without the root condition combining fragments. sentence length −→ Figure 2: Unparsed and timed out sentences with and without fragment combination Figure 2 shows that this changes the curve for unparsed sentences towards more expected characteristics and removes the uncommonly high percentage of short sentences for which no parse can be computed. Together with the parses for fragmented 4It has to be pointed out that extremely long sentences also may be non-sentences resulting from PDF extraction errors, missing punctuation etc. No manual correction took place. Figure 3: Multiple semantic tuples may be generated for a sentence input, we get a recall (sentences with at least one parse) over the whole corpus of 85.9 % (1,321,336 sentences), without a significant change for any of the other measures, and with potential for further improvement. 3 Semantic Tuple Extraction with DMRS In contrast to shallow parsers, the ERG not only handles detailed syntactic analyses of phrases, com- pounds, coordination, negation and other linguistic phenomena that are important for extracting semantic relations, but also generates a formal semantic representation of the meaning of the input sentence in the Minimal Recursion Semantics (MRS) representation format (Copestake et al., 2005). It consists of elementary predications for each word and larger constituents, connected via argument positions and variables, from which predicate-argument structure can be extracted. MRS representations resulting from deep parsing are still relatively close to linguistic structures and contain more detailed information than a user would like to query and search for. Therefore, an additional extraction and abstraction step is performed before storing semantic structures in the search index. Firstly, MRS is converted to DMRS (Copestake, 2009), a dependency-style version of MRS that eases extraction of predicate-argument structure using the implementation in LKB (Copestake, 2002). The representation format we devised for the search index we call semantic tuples, in fact quintuples

3 0.79474252 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model

Author: Elijah Mayfield ; Carolyn Penstein Rose

Abstract: We present a novel computational formulation of speaker authority in discourse. This notion, which focuses on how speakers position themselves relative to each other in discourse, is first developed into a reliable coding scheme (0.71 agreement between human annotators). We also provide a computational model for automatically annotating text using this coding scheme, using supervised learning enhanced by constraints implemented with Integer Linear Programming. We show that this constrained model’s analyses of speaker authority correlates very strongly with expert human judgments (r2 coefficient of 0.947).

4 0.78613311 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

Author: Manoj Harpalani ; Michael Hart ; Sandesh Signh ; Rob Johnson ; Yejin Choi

Abstract: Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism detection. In particular, we hypothesize that textual vandalism constitutes a unique genre where a group of people share a similar linguistic behavior. Experimental results suggest that (1) statistical models give evidence to unique language styles in vandalism, and that (2) deep syntactic patterns based on probabilistic context free grammars (PCFG) discriminate vandalism more effectively than shallow lexicosyntactic patterns based on n-grams. ,

5 0.66663241 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

Author: Jonathan H. Clark ; Chris Dyer ; Alon Lavie ; Noah A. Smith

Abstract: In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability—an extraneous variable that is seldom controlled for—on experimental outcomes, and make recommendations for reporting results more accurately.

6 0.40736687 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue

7 0.39469966 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics

8 0.38651842 8 acl-2011-A Corpus of Scope-disambiguated English Text

9 0.38612309 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

10 0.38381422 133 acl-2011-Extracting Social Power Relationships from Natural Language

11 0.38349527 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

12 0.38305151 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

13 0.37880754 176 acl-2011-Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results

14 0.3752206 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

15 0.37517577 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

16 0.37410757 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

17 0.37295195 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

18 0.3702991 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices

19 0.36896133 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus

20 0.36779702 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia