acl acl2011 acl2011-286 knowledge-graph by maker-knowledge-mining

286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal


Source: pdf

Author: Apoorv Agarwal

Abstract: In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. Iplan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, Ipropose to apply my system to a wide range of genres. By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. Lastly, I will attempt to make convolution kernels more scalable and interpretable.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. [sent-3, score-0.747]

2 To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. [sent-4, score-1.614]

3 By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. [sent-7, score-1.48]

4 Lastly, I will attempt to make convolution kernels more scalable and interpretable. [sent-8, score-0.508]

5 1 Introduction Language is the primary tool that people use for establishing, maintaining and expressing social rela- tions. [sent-9, score-0.711]

6 This makes language the real carrier of social networks. [sent-10, score-0.604]

7 The overall goal of my thesis is to build a system that automatically extracts a social network from raw texts such as literary texts, emails, blog comments and news articles. [sent-11, score-1.092]

8 Itake a “social network” to be a network consisting of individual human beings and groups of human beings who are connected to each other through various relationships by the virtue of participating in social events. [sent-12, score-0.899]

9 Idefine social events to be events that occur between people where at least one person is aware of the other and of the event taking place. [sent-13, score-1.184]

10 For example, in the sentence John talks to Mary, entities John and Mary are aware of each other and of the 111 talking event. [sent-14, score-0.209]

11 In the sentence John thinks Mary is great, only John is aware of Mary and the event is the thinking event. [sent-15, score-0.155]

12 My thesis will introduce a novel way of constructing networks by analyzing text to capture such interactions or events. [sent-16, score-0.292]

13 Motivation: Typically researchers construct a social network from various forms of electronic interaction records like self-declared friendship links, sender-receiver email links and phone logs etc. [sent-17, score-1.014]

14 They ignore a vastly rich network present in the content of such sources. [sent-18, score-0.235]

15 Secondly, many rich sources of social networks remain untouched simply because there is no meta-data associated with them (literary texts, new stories, historical texts). [sent-19, score-0.779]

16 By providing a methodology for analyzing language to extract interaction links between people, my work will overcome both these limitations. [sent-20, score-0.128]

17 Moreover, by empirically analyzing large corpora of text from different genres, my work will aid in formulating a comprehensive linguistic theory about the types oflinguistic constructs people often use to interact and express their social interactions with others. [sent-21, score-1.021]

18 Impact on current SNA applications: Some of the current social network analysis (SNA) applications that utilize interaction meta-data to construct the underlying social network are discussed by Domingos and Richardson (2003), Kempe et al. [sent-23, score-1.712]

19 But meta-data captures only part of all the interactions in which people participate. [sent-28, score-0.25]

20 There is a vastly rich network present in text such as the content of emails, comment threads on online social networks, transcribed phone calls. [sent-29, score-0.839]

21 c 22001111 S Atus doecnitat Sieosnsi foonr, C paomgepsu 1t1a1ti–o1n1a6l, Linguistics social network that SNA community currently uses by complementing it with the finer interaction linkages present in text. [sent-32, score-1.096]

22 (2007) use the sender-receiver email links to connect people in the Enron email corpus. [sent-34, score-0.295]

23 Their social network analysis for calculating centrality measure of people does not take into account interactions that people talk about in the content of emails. [sent-36, score-1.2]

24 Such linkages are relevant to the task for two reasons. [sent-37, score-0.157]

25 First, people talk about their interactions with other people in the content of emails. [sent-38, score-0.391]

26 By ignoring these interaction linkages, the underlying communication network used by Rowe et al. [sent-39, score-0.299]

27 Second, sender-receiver email links only represent “who talks to whom”. [sent-41, score-0.192]

28 ” This later information seems to be crucial to the task presumably because people at the lower organizational hierarchy are more likely to talk about people higher in the hierarchy. [sent-43, score-0.316]

29 My work will enable extraction of these missing linkages and hence offers the potential to improve the performance of currently used SNA algorithms. [sent-44, score-0.193]

30 By capturing alternate forms of communications, my system will also overcome a known limitation of the Enron email corpus that a significant number of emails were lost at the time of data creation (Carenini et al. [sent-45, score-0.149]

31 Impact on study of literary and journalistic texts: Sources of social networks that are primarily textual in nature such as literary texts, historical texts, or news articles are currently under-utilized for social network analysis. [sent-47, score-1.928]

32 In fact, to the best of my knowledge, there is no formal comprehensive categorization of social interactions. [sent-48, score-0.65]

33 An early effort to illustrate the importance of such linkages is by Moretti (2005). [sent-49, score-0.157]

34 They only extract mutual interactions that are signaled by quoted speech. [sent-55, score-0.249]

35 My thesis will 112 go beyond quoted speech and will extract interactions signaled by any linguistic means, in particular verbs of social interaction. [sent-56, score-0.899]

36 Moreover, my research will not only enable extraction of mutual linkages (“who talks to whom” ) but also of one-directional linkages (“who talks about whom”). [sent-57, score-0.518]

37 This will give rise to new applications such as characterization of literary texts based on the type of social network that underlies the narrative. [sent-58, score-1.01]

38 Moreover, analyses of large amounts of related text such as decades of news articles or historical texts will become possible. [sent-59, score-0.178]

39 By looking at the overall social structure the analyst or scientist will get a summary of the key players and their interactions with each other and the rest of network. [sent-60, score-0.747]

40 Impact on Linguistics: To the best of my knowledge, there is no cognitive or linguistic theory that explains how people use language to express social interactions. [sent-61, score-0.759]

41 A system that detects lexical items and syntactic constructions that realize interactions and then classifies them into one of the categories, Idefine in Section 2, has the potential to provide lin- guists with empirical data to formulate such a theory. [sent-62, score-0.208]

42 For example, the notion of social interactions could be added to the FrameNet resource (Baker and Fillmore, 1998) which is based on frame semantics. [sent-63, score-0.814]

43 Frames describe lexical meaning by specifying a set of frame elements, which are participants in a typical event or state of affairs expressed by the frame. [sent-65, score-0.144]

44 It provides lexicographic example annotations that illustrate how frames and frame elements can be realized by syntactic constructions. [sent-66, score-0.113]

45 My categorization of social events can be incorporated into FrameNet by adding new frames for social events to the frame hierarchy. [sent-67, score-1.611]

46 Linguists can use this data to make generalizations about linguistic constructions that realize social interactions frames. [sent-69, score-0.784]

47 For example, a possible generalization could be that transitive verbs in which both subject and object are people, frequently express a social event. [sent-70, score-0.7]

48 In addition, it would be interesting to see what kind social interactions occur in different text genres and if they are realized differently. [sent-71, score-0.78]

49 For example, in a news corpus we hardly found expressions of non-verbal mutual interactions (like eye-contact) while these are frequent in fiction texts like Alice in Wonderland. [sent-72, score-0.297]

50 2 Work to date So far, Ihave defined a comprehensive set of social events and have acquired reliable annotations on a well-known news corpus. [sent-73, score-0.851]

51 I have built a preliminary system that extracts social events from news articles. [sent-74, score-0.805]

52 Meaning of social events: A text can describe a social network in two ways: explicitly, by stating the type of relationship between two individuals (e. [sent-76, score-1.413]

53 Mary is John ’s wife), or implicitly, by describing an event which initiates or perpetuates a social relationship (e. [sent-78, score-0.709]

54 Icall the later types of events “social events” (Agarwal et al. [sent-81, score-0.159]

55 Idefined two broad types of social events: interaction, in which both parties are aware of each other and of the social event, e. [sent-83, score-1.258]

56 For example, sentence 1, contains two distinct social events: interaction: Toujan was informed by the committee, and observation: Toujan is talking about the committee. [sent-88, score-0.644]

57 As a pilot test to see if creating a social network based on social events can give insight into the social structures of a story, Imanually annotated a short version of Alice in Wonderland. [sent-92, score-2.216]

58 On the manually extracted network, Iran social network analysis algorithms to answer questions like: who are the most influential characters in the story, which characters have the same social roles and positions. [sent-93, score-1.489]

59 Another finding was that characters appearing in the same scene like Dodo, Lory, Eaglet, Mouse and Duck were assigned the same social roles and positions. [sent-95, score-0.642]

60 Motivated by this pilot test Idecided to annotate social events on the Automatic Content Extraction (ACE) dataset (Doddington et al. [sent-97, score-0.763]

61 My annotations extend previous annotations for entities, relations and events that are present in the 2005 version of the corpus. [sent-99, score-0.159]

62 My annotations revealed that about 80% of the times, entities mentioned together in the same sentence were not linked with any social event. [sent-100, score-0.642]

63 Extraction of social events: To perform such an analysis, Ibuilt models for two tasks: social event detection and social event classification (Agarwal and Rambow, 2010). [sent-103, score-2.022]

64 Both were formulated as binary tasks: the first one being about detecting existence of a social event between a pair of entities in a sentence and the second one being about differentiating between the interaction and observation type events (given there is an event between the entities). [sent-104, score-1.153]

65 I used tree kernels on structures derived from phrase structure trees and dependency trees in conjunction with Support Vector Machines (SVMs) to solve the tasks. [sent-105, score-0.362]

66 I tried all the kernels and their combinations proposed by Nguyen et al. [sent-108, score-0.236]

67 I used syntactic and semantic insights to devise a new structure derived from dependency trees and showed that this plays a role in achieving the best performance for both social event detection and classification tasks. [sent-110, score-0.78]

68 To my surprise, the system performed extremely well on a seemingly hard task of differentiating between interaction and observation type social events. [sent-118, score-0.746]

69 This result showed that there are significant clues in the lexical and syntactic structures that help in differentiating mutual and onedirectional interactions. [sent-119, score-0.158]

70 Iwill work on making convolution kernels scalable and interpretable. [sent-121, score-0.508]

71 These two steps will meet my goal of building a system that will extract social networks from news articles. [sent-122, score-0.755]

72 My next step will be to survey and incorporate domain adaptation techniques that will allow me port my system to other genres like literary and historical texts, blog comments, emails etc. [sent-123, score-0.379]

73 These steps will allow me to extract social networks from a wide range of textual data. [sent-124, score-0.713]

74 At the same time Iwill be able to empirically analyze the types of linguistic patterns, both lexical and syntactic, that perpetuate social interactions. [sent-125, score-0.633]

75 Iam interested in modeling classes of events which are characterized by the cognitive states of participants– who is aware of whom. [sent-130, score-0.209]

76 The predicate-argument structure of verbs can encode much of this information very efficiently, and classes of verbs express their predicate-argument structure in similar ways. [sent-131, score-0.144]

77 Levin’s verb classes, and Palmer’s VerbNet (Levin, 1993; Schuler, 2005), are based on syntactic similarity between verbs: two verbs are in the same class 114 if and only if they can realize their arguments in the same syntactic patterns. [sent-132, score-0.141]

78 But from a social event perspective, I not interested in exact synonymy, and am in fact it is quite possible that what Iam interested in (awareness of the interaction by the event participants) is the same among verbs of the same VerbNet class. [sent-136, score-0.956]

79 Scaling convolution kernels: Convolution kernels, first proposed by Haussler (1999), are a convenient way of “naturally” combining a variety of features without having to do fine-grained feature engineering. [sent-139, score-0.235]

80 Convolution kernels calculate the similarity between two objects, like trees or strings, by a recursive calculation over the “parts” (substrings, subtrees) of objects. [sent-146, score-0.309]

81 One direction Iwill explore to make convolution kernels more scalable is the following: The decision function for the classifier (SVM in dual form) is given in equation 1 (Burges, 1998, Eq 61). [sent-151, score-0.539]

82 The kernel definition proposed by Collins and Duffy (2002) is given in equation 2, where hs (T) is the number of times the sth subtree appears in tree T. [sent-153, score-0.145]

83 The kernel function K(T1, T2) therefore calculates the similarity between trees T1 and T2 by counting the common subtrees in them. [sent-154, score-0.149]

84 However, to the best of my knowledge, there is no work that addresses approximation of kernel evaluation for convolution kernels. [sent-162, score-0.298]

85 Interpretability of convolution kernels: As mentioned in the previous paragraph, another disadvantage of using convolution kernels is that interpretability of a model is difficult. [sent-163, score-0.773]

86 Recently, Pighin and Moschitti (2009) proposed an algorithm to linearize convolution kernels. [sent-164, score-0.235]

87 By doing so I will be able to empirically see what types of linguistic constructs are used by people to express different types of social interactions thus aiding in formulating a theory of how people express social interactions. [sent-168, score-1.734]

88 Domain adaptation: To be able to extract social networks from literary and historical texts, I will explore domain adaptation techniques. [sent-169, score-0.987]

89 Domain adaptation will conclude my overall goal of creating a system that can extract social networks from a wide variety of texts. [sent-173, score-0.762]

90 Iwill then attempt to extract social networks from the increasing amount of text that is becoming machine readable. [sent-174, score-0.713]

91 Sentiment Analysis:1 A natural step to try once I have linkages associated with snippets of text is sentiment analysis. [sent-175, score-0.193]

92 , 2009) on contextual phrase-level sentiment analysis to analyze snippets of text and add polarity to social event linkages. [sent-177, score-0.745]

93 Sentiment analy- sis will make the social network representation even richer by indicating if people are connected with positive, negative or neutral sentiments. [sent-178, score-0.916]

94 The actortopic model for extracting social networks in literary narrative. [sent-218, score-0.844]

95 Inferring privacy information from social net116 Intelligence and Security Informatics, pages works. [sent-268, score-0.635]

96 Maximizing the spread of influence through a social network. [sent-279, score-0.604]

97 Exploiting syntactic and shallow semantic kernels for question answer classification. [sent-302, score-0.264]

98 Convolution kernels on constituent, dependency and sequential structures for relation extraction. [sent-314, score-0.276]

99 Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 109–1 17. [sent-330, score-0.839]

100 To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. [sent-347, score-0.807]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('social', 0.604), ('kernels', 0.236), ('convolution', 0.235), ('network', 0.205), ('events', 0.159), ('linkages', 0.157), ('interactions', 0.143), ('agarwal', 0.136), ('iwill', 0.136), ('literary', 0.131), ('apoorv', 0.111), ('networks', 0.109), ('verbnet', 0.108), ('people', 0.107), ('event', 0.105), ('interaction', 0.094), ('moschitti', 0.089), ('talks', 0.081), ('rowe', 0.079), ('sna', 0.079), ('levin', 0.077), ('email', 0.077), ('emails', 0.072), ('texts', 0.07), ('interpretability', 0.067), ('pighin', 0.067), ('toujan', 0.067), ('historical', 0.066), ('iam', 0.065), ('framenet', 0.064), ('kernel', 0.063), ('hs', 0.051), ('enron', 0.051), ('mary', 0.051), ('aware', 0.05), ('adaptation', 0.049), ('differentiating', 0.048), ('ihave', 0.048), ('express', 0.048), ('verbs', 0.048), ('nguyen', 0.047), ('frames', 0.046), ('comprehensive', 0.046), ('beings', 0.045), ('idefine', 0.045), ('lindamood', 0.045), ('moretti', 0.045), ('pelossof', 0.045), ('xns', 0.045), ('zheleva', 0.045), ('constructs', 0.044), ('si', 0.044), ('subtrees', 0.043), ('trees', 0.043), ('mutual', 0.042), ('news', 0.042), ('owen', 0.04), ('rambow', 0.04), ('talking', 0.04), ('thesis', 0.04), ('structures', 0.04), ('frame', 0.039), ('clarkson', 0.039), ('duffy', 0.039), ('organizational', 0.039), ('elson', 0.039), ('ipropose', 0.039), ('entities', 0.038), ('characters', 0.038), ('realize', 0.037), ('scalable', 0.037), ('sentiment', 0.036), ('xs', 0.036), ('currently', 0.036), ('talk', 0.034), ('links', 0.034), ('kempe', 0.034), ('alice', 0.034), ('genres', 0.033), ('carenini', 0.032), ('private', 0.032), ('schuler', 0.032), ('signaled', 0.032), ('quoted', 0.032), ('zelenko', 0.032), ('privacy', 0.031), ('join', 0.031), ('equation', 0.031), ('calculation', 0.03), ('mining', 0.03), ('celikyilmaz', 0.03), ('vastly', 0.03), ('culotta', 0.03), ('alternations', 0.03), ('empirically', 0.029), ('hierarchy', 0.029), ('domain', 0.028), ('notion', 0.028), ('syntactic', 0.028), ('doddington', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

Author: Apoorv Agarwal

Abstract: In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. Iplan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, Ipropose to apply my system to a wide range of genres. By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. Lastly, I will attempt to make convolution kernels more scalable and interpretable.

2 0.24896775 133 acl-2011-Extracting Social Power Relationships from Natural Language

Author: Philip Bramsen ; Martha Escobar-Molano ; Ami Patel ; Rafael Alonso

Abstract: Sociolinguists have long argued that social context influences language use in all manner of ways, resulting in lects 1. This paper explores a text classification problem we will call lect modeling, an example of what has been termed computational sociolinguistics. In particular, we use machine learning techniques to identify social power relationships between members of a social network, based purely on the content of their interpersonal communication. We rely on statistical methods, as opposed to language-specific engineering, to extract features which represent vocabulary and grammar usage indicative of social power lect. We then apply support vector machines to model the social power lects representing superior-subordinate communication in the Enron email corpus. Our results validate the treatment of lect modeling as a text classification problem – albeit a hard one – and constitute a case for future research in computational sociolinguistics. 1

3 0.15558675 174 acl-2011-Insights from Network Structure for Text Mining

Author: Zornitsa Kozareva ; Eduard Hovy

Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.

4 0.13672803 122 acl-2011-Event Extraction as Dependency Parsing

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

Abstract: Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e.g., a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. However, most current approaches address event extraction with highly local models that extract each event and argument independently. We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. This provides a simple framework that captures global properties of both nested and flat event structures. We explore a rich feature space that models both the events to be parsed and context from the original supporting text. Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53.5% in development and 48.6% in testing.

5 0.13567518 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

Author: Yu Hong ; Jianfeng Zhang ; Bin Ma ; Jianmin Yao ; Guodong Zhou ; Qiaoming Zhu

Abstract: Event extraction is the task of detecting certain specified types of events that are mentioned in the source language data. The state-of-the-art research on the task is transductive inference (e.g. cross-event inference). In this paper, we propose a new method of event extraction by well using cross-entity inference. In contrast to previous inference methods, we regard entitytype consistency as key feature to predict event mentions. We adopt this inference method to improve the traditional sentence-level event extraction system. Experiments show that we can get 8.6% gain in trigger (event) identification, and more than 11.8% gain for argument (role) classification in ACE event extraction. 1

6 0.13469885 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

7 0.12884571 194 acl-2011-Language Use: What can it tell us?

8 0.12274256 177 acl-2011-Interactive Group Suggesting for Twitter

9 0.11729358 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

10 0.11281525 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature

11 0.10909867 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

12 0.10648029 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

13 0.076410793 293 acl-2011-Template-Based Information Extraction without the Templates

14 0.075217731 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

15 0.073647007 121 acl-2011-Event Discovery in Social Media Feeds

16 0.073287249 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

17 0.072477095 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models

18 0.072043851 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

19 0.071292035 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

20 0.069789536 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.173), (1, 0.108), (2, -0.065), (3, -0.017), (4, 0.092), (5, 0.104), (6, -0.015), (7, -0.065), (8, 0.07), (9, 0.003), (10, -0.069), (11, -0.002), (12, 0.012), (13, 0.042), (14, -0.054), (15, -0.079), (16, -0.006), (17, -0.067), (18, -0.009), (19, -0.116), (20, 0.093), (21, 0.018), (22, -0.038), (23, 0.066), (24, 0.017), (25, -0.076), (26, 0.023), (27, 0.035), (28, -0.004), (29, -0.075), (30, 0.015), (31, 0.009), (32, -0.063), (33, 0.033), (34, 0.13), (35, -0.093), (36, -0.176), (37, -0.018), (38, -0.139), (39, 0.113), (40, 0.05), (41, 0.051), (42, 0.042), (43, -0.079), (44, -0.143), (45, 0.087), (46, -0.105), (47, -0.042), (48, -0.071), (49, -0.099)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97166139 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

Author: Apoorv Agarwal

Abstract: In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. Iplan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, Ipropose to apply my system to a wide range of genres. By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. Lastly, I will attempt to make convolution kernels more scalable and interpretable.

2 0.7182219 133 acl-2011-Extracting Social Power Relationships from Natural Language

Author: Philip Bramsen ; Martha Escobar-Molano ; Ami Patel ; Rafael Alonso

Abstract: Sociolinguists have long argued that social context influences language use in all manner of ways, resulting in lects 1. This paper explores a text classification problem we will call lect modeling, an example of what has been termed computational sociolinguistics. In particular, we use machine learning techniques to identify social power relationships between members of a social network, based purely on the content of their interpersonal communication. We rely on statistical methods, as opposed to language-specific engineering, to extract features which represent vocabulary and grammar usage indicative of social power lect. We then apply support vector machines to model the social power lects representing superior-subordinate communication in the Enron email corpus. Our results validate the treatment of lect modeling as a text classification problem – albeit a hard one – and constitute a case for future research in computational sociolinguistics. 1

3 0.70611125 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

Author: Sara Rosenthal ; Kathleen McKeown

Abstract: We investigate whether wording, stylistic choices, and online behavior can be used to predict the age category of blog authors. Our hypothesis is that significant changes in writing style distinguish pre-social media bloggers from post-social media bloggers. Through experimentation with a range of years, we found that the birth dates of students in college at the time when social media such as AIM, SMS text messaging, MySpace and Facebook first became popular, enable accurate age prediction. We also show that internet writing characteristics are important features for age prediction, but that lexical content is also needed to produce significantly more accurate results. Our best results allow for 81.57% accuracy.

4 0.65709901 194 acl-2011-Language Use: What can it tell us?

Author: Marjorie Freedman ; Alex Baron ; Vasin Punyakanok ; Ralph Weischedel

Abstract: For 20 years, information extraction has focused on facts expressed in text. In contrast, this paper is a snapshot of research in progress on inferring properties and relationships among participants in dialogs, even though these properties/relationships need not be expressed as facts. For instance, can a machine detect that someone is attempting to persuade another to action or to change beliefs or is asserting their credibility? We report results on both English and Arabic discussion forums. 1

5 0.61246586 214 acl-2011-Lost in Translation: Authorship Attribution using Frame Semantics

Author: Steffen Hedegaard ; Jakob Grue Simonsen

Abstract: We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve clas- sifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.

6 0.49532002 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution

7 0.49009225 174 acl-2011-Insights from Network Structure for Text Mining

8 0.48599812 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

9 0.48458537 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

10 0.47960377 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

11 0.4715336 121 acl-2011-Event Discovery in Social Media Feeds

12 0.45059851 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

13 0.43059313 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

14 0.42889097 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

15 0.42671731 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

16 0.42024013 122 acl-2011-Event Extraction as Dependency Parsing

17 0.3992148 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature

18 0.38912705 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus

19 0.38807696 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

20 0.38094717 35 acl-2011-An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.037), (17, 0.041), (20, 0.214), (26, 0.026), (31, 0.025), (37, 0.086), (39, 0.034), (41, 0.079), (55, 0.033), (59, 0.066), (72, 0.031), (91, 0.047), (96, 0.182)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84884238 153 acl-2011-How do you pronounce your name? Improving G2P with transliterations

Author: Aditya Bhargava ; Grzegorz Kondrak

Abstract: Grapheme-to-phoneme conversion (G2P) of names is an important and challenging problem. The correct pronunciation of a name is often reflected in its transliterations, which are expressed within a different phonological inventory. We investigate the problem of using transliterations to correct errors produced by state-of-the-art G2P systems. We present a novel re-ranking approach that incorporates a variety of score and n-gram features, in order to leverage transliterations from multiple languages. Our experiments demonstrate significant accuracy improvements when re-ranking is applied to n-best lists generated by three different G2P programs.

same-paper 2 0.84247315 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

Author: Apoorv Agarwal

Abstract: In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. Iplan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, Ipropose to apply my system to a wide range of genres. By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. Lastly, I will attempt to make convolution kernels more scalable and interpretable.

3 0.80172443 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction

Author: Bo Li ; Eric Gaussier ; Akiko Aizawa

Abstract: We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.

4 0.74087048 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

Author: Joel Lang ; Mirella Lapata

Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.

5 0.739434 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

Author: Ivan Titov ; Alexandre Klementiev

Abstract: We propose a non-parametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of semantically equivalent syntactic structures, and (3) predict predicate-argument relations between the fragments. We use hierarchical PitmanYor processes to model statistical dependencies between meaning representations of predicates and those of their arguments, as well as the clusters of their syntactic realizations. We develop a modification of the MetropolisHastings split-merge sampler, resulting in an efficient inference algorithm for the model. The method is experimentally evaluated by us- ing the induced semantic representation for the question answering task in the biomedical domain.

6 0.73831695 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

7 0.73499918 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

8 0.73393679 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

9 0.73392642 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

10 0.72929645 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

11 0.72901142 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

12 0.72832894 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

13 0.72788125 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

14 0.72652692 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

15 0.72643685 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

16 0.72600806 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

17 0.72504973 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents

18 0.7249161 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

19 0.72468066 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations

20 0.7246747 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment