acl acl2010 acl2010-112 knowledge-graph by maker-knowledge-mining

112 acl-2010-Extracting Social Networks from Literary Fiction

Source: pdf

Author: David Elson ; Nicholas Dames ; Kathleen McKeown

Abstract: We present a method for extracting social networks from literature, namely, nineteenth-century British novels and serials. We derive the networks from dialogue interactions, and thus our method depends on the ability to determine when two characters are in conversation. Our approach involves character name chunking, quoted speech attribution and conversation detection given the set of quotes. We extract features from the social networks and examine their correlation with one another, as well as with metadata such as the novel’s setting. Our results provide evidence that the majority of novels in this time period do not fit two characterizations provided by literacy scholars. Instead, our results suggest an alternative explanation for differences in social networks.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a method for extracting social networks from literature, namely, nineteenth-century British novels and serials. [sent-10, score-0.862]

2 We derive the networks from dialogue interactions, and thus our method depends on the ability to determine when two characters are in conversation. [sent-11, score-0.625]

3 Our approach involves character name chunking, quoted speech attribution and conversation detection given the set of quotes. [sent-12, score-0.604]

4 We extract features from the social networks and examine their correlation with one another, as well as with metadata such as the novel’s setting. [sent-13, score-0.618]

5 Our results provide evidence that the majority of novels in this time period do not fit two characterizations provided by literacy scholars. [sent-14, score-0.347]

6 Instead, our results suggest an alternative explanation for differences in social networks. [sent-15, score-0.344]

7 Some theorists have suggested a relationship between the size of a community and the amount of dialogue that occurs, positing that “face to face time” diminishes as the number of characters in the novel grows. [sent-17, score-0.599]

8 Others suggest that as the social setting becomes more urbanized, the quality of dialogue also changes, with more interactions occurring in rural communities than urban communities. [sent-18, score-0.993]

9 Such claims have typically been made, however, on the basis of a few novels that are studied in depth. [sent-19, score-0.353]

10 In this paper, we aim to determine whether an automated study of a much larger sample of nineteenth century novels supports these claims. [sent-20, score-0.315]

11 The research presented here is concerned with the extraction of social networks from literature. [sent-21, score-0.522]

12 We present a method to automatically construct a network based on dialogue interactions between characters in a novel. [sent-22, score-0.689]

13 Our approach includes components for finding instances of quoted speech, attributing each quote to a character, and identifying when certain characters are in conversation. [sent-23, score-0.634]

14 We then construct a network where characters are vertices and edges signify an amount of bilateral conversation between those characters, with edge weights corresponding to the frequency and length of their exchanges. [sent-24, score-0.806]

15 In contrast to previous approaches to social network construction, ours relies on a novel combination of patternbased detection, statistical methods, and adaptation of standard natural language tools for the literary genre. [sent-25, score-0.851]

16 We carried out this work on a corpus of 60 nineteenth-century novels and serials, including 3 1authors such as Dickens, Austen and Conan Doyle. [sent-26, score-0.315]

17 In order to evaluate the literary claims in question, we compute various characteristics of the dialogue-based social network and stratify these results by categories such as the novel’s setting. [sent-27, score-0.804]

18 For example, the density of the network provides evidence about the cohesion of a large or small community, and cliques may indicate a social fragmentation. [sent-28, score-0.595]

19 Our results surprisingly provide evidence that the majority of novels in this time period do not fit the suggestions provided by literary scholars, and we suggest an alternative explanation for our observations of differences across novels. [sent-29, score-0.627]

20 In the following sections, we survey related work on social networks as well as computational studies of literature. [sent-30, score-0.522]

21 For example, Moretti (2005) has graphically mapped out texts according to ge- ography, social connections and other variables. [sent-43, score-0.437]

22 While researchers have not attempted the automatic construction of social networks representing connections between characters in a corpus of novels, the ACE program has involved entity and relation extraction in unstructured text (Doddington et al. [sent-44, score-0.937]

23 Other recent work in social network construction has explored the use of structured data such as email headers (McCallum et al. [sent-46, score-0.511]

24 In this paper, we also explore how to build a network based on conversational interaction, but we analyze the reported dialogue found in novels to determine the links. [sent-51, score-0.737]

25 3 Hypotheses It is commonly held that the novel is a literary form which tries to produce an accurate representation of the social world. [sent-55, score-0.684]

26 Theories about the relation between novelistic form (the workings of plot, characters, and dialogue, to take the most basic categories) and changes to real-world social milieux abound. [sent-57, score-0.407]

27 Many of these theories center on nineteenth-century European fiction; innovations in novelistic form during this period, as well as the rapid social changes brought about by revolution, industrialization, and transport development, have traditionally been linked. [sent-58, score-0.455]

28 These theories, however, have used only a select few representative novels as proof. [sent-59, score-0.315]

29 We believe these methods are essential to testing the validity of some core theories about social interaction and their representation in literary genres like the novel. [sent-61, score-0.684]

30 Major versions of the theories about the social worlds of nineteenth-century fiction tend to center on characters, in two specific ways: how many characters novels tend to have, and how those characters interact with one another. [sent-62, score-1.499]

31 From the influential work of the Russian critic Mikhail Bakhtin to the present, a consensus emerged that as novels are increasingly set in urban areas, the number of characters and the quality of their interaction change to suit the setting. [sent-64, score-0.963]

32 In Bakhtin’s analysis, different spaces have different social and emotional potentialities, which in turn affect the most basic aspects of a novel’s aesthetic technique. [sent-66, score-0.344]

33 After Bakhtin’s invention of the chronotope, much literary criticism and theory devoted itself to filling in, or describing, the qualities of specific chronotopes, particularly those of the village or rural environment and the city or urban environment. [sent-67, score-0.788]

34 Raymond Williams used the term “knowable communities” to describe this world, in which face-to-face relations of a restricted set of characters are the primary mode of social interaction (Williams, 1975, 166). [sent-69, score-0.73]

35 To describe the social-psychological impact of the city, Franco Moretti argues, protagonists of urban novels “change overnight from ‘sons’ into ‘young men’ : their affective ties are no longer vertical ones (between successive generations), but horizontal, within the same generation. [sent-71, score-0.577]

36 For him, the difference in number of characters is “not just a matter of quantity. [sent-75, score-0.349]

37 As the number of characters increases, Moretti argues (following Bakhtin in his logic), social interactions of different kinds and durations multiply, displacing the family-centered and conversational logic of village or rural fictions. [sent-79, score-1.25]

38 This argument about how novelistic setting produces different forms of social interaction is precisely what our method seeks to evaluate. [sent-81, score-0.444]

39 Here, social relations are largely financial or commercial in character. [sent-85, score-0.344]

40 We conversely define rural to describe texts that are set in a country or village zone, where agriculture is the primary activity, and where land-owning, non-productive, rentcollecting gentry are socially predominant. [sent-86, score-0.353]

41 That there is an inverse correlation between the amount of dialogue in a novel and the number of characters in that novel. [sent-92, score-0.583]

42 One basic, shared assumption of these theorists is that as the network of characters expands– as, in Moretti’s words, a quantitative change becomes qualitative– the importance, and in fact amount, of dialogue decreases. [sent-93, score-0.652]

43 This hypothesis is based on the contrast between Williams’s rural “knowable communities” and the sprawling, populous, less conversational urban fictions or Moretti’s and Eagleton’s analyses. [sent-97, score-0.671]

44 If true, it would suggest that the inverse relationship of hypothesis #1 (more characters means less conversation) can be correlated to, and perhaps even caused by, the geography of a novel’s setting. [sent-98, score-0.418]

45 The claims about novelistic geography and social interaction have usually been based on comparisons of a selected few novelists (Jane Austen and Charles Dickens preeminently). [sent-99, score-0.507]

46 4 Extracting Conversational Networks from Literature In order to test these hypotheses, we developed a novel approach to extracting social networks from literary texts themselves, building on existing analysis tools. [sent-101, score-0.945]

47 In a conversational network, vertices represent characters (assumed to be named entities) and edges indicate at least one instance of dialogue interaction between two characters over the course of the novel. [sent-103, score-1.106]

48 We define a conversation as a continuous span of narrative time featuring a set of characters in which the following conditions are met: 1. [sent-105, score-0.511]

49 The characters are in the same place at the same time; 2. [sent-106, score-0.349]

50 The characters are mutually aware of each other and each character’s speech is mutually intended for the other to hear. [sent-108, score-0.45]

51 We also pre-processed the texts to normalize formatting, detect headings and chapter breaks, remove metadata, and identify likely instances of quoted speech (that is, mark up spans of text that fall between quotation marks, assumed to be a superset of the quoted speech present in the text). [sent-128, score-0.58]

52 Moreover, this design decision empha- sizes the precision ofthe social networks over their recall. [sent-145, score-0.522]

53 This tilts “in favor” of hypothesis #1 (that there are fewer social interactions in larger communities); however, we shall see that despite the emphasis of precision over recall, we identify a sufficient mass of interactions in the texts to constitute evidence against this hypothesis. [sent-146, score-0.596]

54 3 Constructing social networks We then applied the results from our character identification and quoted speech attribution methods toward the construction of conversational networks from literature. [sent-148, score-1.345]

55 We found that a network that included incidental or single-mention named entities became too noisy to function effectively, so we filtered out the entities that are mentioned fewer than three 142 times in the novel or are responsible for less than 1% of the named entity mentions in the novel. [sent-153, score-0.41]

56 We assigned undirected edges between vertices that represent adjacency in quoted speech fragments. [sent-154, score-0.374]

57 Specifically, we set the weight of each undirected edge between two character vertices to the total length, in words, of all quotes that either character speaks from among all pairs of adjacent quotes in which they both speak– implying face to face conversation. [sent-155, score-0.788]

58 When such an adjacency is found, the length of the quote is added to the edge weight, under the hypothesis that the significance ofthe relationship between two individuals is proportional to the length of the dialogue that they exchange. [sent-157, score-0.329]

59 The “correlation” method divides the text into 10-paragraph segments and counts the number of mentions of each character in each segment (excluding mentions inside quoted speech). [sent-163, score-0.494]

60 The “spoken mention” method counts occurrences when one character refers to another in his or her quoted speech. [sent-168, score-0.4]

61 The intuition is that characters who refer to one another are likely to be in conversation. [sent-170, score-0.349]

62 4 Evaluation To check the accuracy ofour method for extracting conversational networks, we conducted an evaluation involving four of the novels (The Sign of the Four, Emma, David Copperfield and The Portrait of a Lady). [sent-173, score-0.497]

63 We processed the annotation results by breaking down each multi-way conversation into all of its unique two-character interactions (for example, a conversation between four people indicates six bilateral interactions). [sent-178, score-0.354]

64 To calculate inter-annotator agreement, we first compiled a list of all possible interactions between all characters in each text. [sent-179, score-0.424]

65 F64t3h71re methods for detecting bilateral conversations in literary texts. [sent-184, score-0.436]

66 95; this indicates that we can be confident in the specificity of the conversational networks that we automatically construct. [sent-190, score-0.335]

67 There were several reasons that we did not detect the missing links, including indirect speech, quotes attributed to anaphoras or coreferents, and “diffuse” conversations in which the characters do not speak in turn with one another. [sent-193, score-0.573]

68 To calculate precision and recall for the two baseline social networks, we set a threshold t to derive a binary prediction from the continuous edge weights. [sent-194, score-0.395]

69 Both baselines performed significantly worse in precision and F-measure than our quoted speech adjacency method for detecting conversations. [sent-196, score-0.336]

70 1 Feature extraction We extracted features from the conversational networks that emphasize the complexity of the social interactions found in each novel: 1. [sent-198, score-0.754]

71 The number of characters and the number of speaking characters 2. [sent-199, score-0.725]

72 The variance of the distribution of quoted speech (specifically, the proportion of quotes spoken by the n most frequent speakers, for 1 ≤ n ≤ 5) 3. [sent-200, score-0.389]

73 The number of quotes, and proportion of words in the novel that are quoted speech 4. [sent-201, score-0.346]

74 The number of 3-cliques and 4-cliques in the social network 5. [sent-202, score-0.511]

75 Irtne xo vth,er a words, st thhies dnuetmerbmerin oefs tehreaverage number of characters connected to each character in the conversational network (“with how many people on average does a character converse? [sent-204, score-1.074]

76 Hypothesis #1, which we described in Section 3, claims that there is an inverse correlation between the amount of dialogue in a nineteenthcentury novel and the number of characters in that novel. [sent-217, score-0.621]

77 16) between the number of quotes in a novel and the number of characters (normalizing the quote count for text length). [sent-220, score-0.633]

78 50) between the number of unique speakers (those characters who speak at least once) and the normalized number of quotes, suggesting that larger networks have more conversations than smaller ones. [sent-222, score-0.663]

79 Another way to interpret hypothesis #1 is that social networks with more characters tend to break apart and be less connected. [sent-224, score-0.915]

80 The correlation between the number of characters in each graph and the average degree (number of conversation partners) for each character was a positive, moderately strong r=. [sent-226, score-0.75]

81 This is not a given; a network can easily, for example, break into minimally connected or mutually exclusive subnetworks when more characters are involved. [sent-228, score-0.572]

82 Instead, we found that networks tend to stay close-knit regardless of their size: even the density of the graph (the percentage of the community that each character talks to) grows with the total population size at r=. [sent-229, score-0.446]

83 A higher number of characters (speaking or non-speaking) is also correlated with a higher rate of 3-cliques per character (r=. [sent-233, score-0.535]

84 Hypothesis #2, meanwhile, posited that a novel’s setting (urban or rural) would have an effect on the structure of its social network. [sent-237, score-0.344]

85 Surprisingly, the numbers of characters and speakers found in the urban novel were not significantly greater than those found in the rural novel. [sent-239, score-0.897]

86 The increase in degree seen in urban texts is not significant. [sent-242, score-0.341]

87 Figure 3: Conversational networks for first-person novels like Collins’s The Woman in White are less connected due to the structure imposed by the perspective. [sent-245, score-0.522]

88 Stories told in the third person had much more connected networks than stories told in the first person: not only did the average degree increase with statistical significance (by the homoscedastic t-test to p < . [sent-247, score-0.366]

89 Figure 3 shows the conversational network extracted for Collins’s The Woman in White, which is told in the first person. [sent-252, score-0.363]

90 Private conversations between auxiliary characters would not include the narrator, and thus do not appear in a 145 first-hand account. [sent-256, score-0.445]

91 An “omniscient” third person narrator, by contrast, can eavesdrop on any pair of characters conversing. [sent-257, score-0.349]

92 One of the basic assumptions behind hypothesis #2– that urban novels contain more characters, mirroring the masses of nineteenth-century cities– is not borne out by our data. [sent-261, score-0.594]

93 Our results do, however, strongly correlate a point of view (thirdperson narration) with more frequently connected characters, implying tighter and more talkative social networks. [sent-262, score-0.373]

94 We would propose that this suggests that the form of a given novel– the standpoint of the narrative voice, whether the voice is “omniscient” or not– is far more determinative of the kind of social network described in the novel than where it is set or even the number of characters involved. [sent-263, score-1.047]

95 We are suggesting that the important element of social networks in nineteenth-century fiction is not where the networks are set, but from what standpoint they are imagined or narrated. [sent-268, score-0.819]

96 7 Conclusion In this paper, we presented a method for char- acterizing a text of literary fiction by extracting the network of social conversations that occur between its characters. [sent-270, score-0.981]

97 In particular, we described a high-precision method for detecting face-to-face conversations between two named characters in a novel, and showed that as the number of characters in a novel grows, so too do the cohesion, interconnectedness and balance of their social network. [sent-272, score-1.332]

98 Our results thus far suggest further review of our methods, our corpus and our results for more insights into the social networks found in this and other genres of fiction. [sent-274, score-0.522]

99 Automated discovery and analysis of social networks from threaded discussions. [sent-332, score-0.522]

100 Topic and role discovery in social networks with experiments on enron and academic email. [sent-346, score-0.522]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('characters', 0.349), ('social', 0.344), ('novels', 0.315), ('literary', 0.255), ('urban', 0.235), ('quoted', 0.214), ('rural', 0.188), ('character', 0.186), ('networks', 0.178), ('network', 0.167), ('conversational', 0.157), ('moretti', 0.157), ('bakhtin', 0.141), ('quotes', 0.128), ('conversation', 0.116), ('dialogue', 0.098), ('conversations', 0.096), ('fiction', 0.094), ('novel', 0.085), ('village', 0.082), ('elson', 0.078), ('interactions', 0.075), ('quote', 0.071), ('austen', 0.063), ('chronotope', 0.063), ('narrator', 0.063), ('novelistic', 0.063), ('texts', 0.058), ('density', 0.057), ('coreferents', 0.055), ('argues', 0.055), ('communities', 0.053), ('correlation', 0.051), ('edge', 0.051), ('vertices', 0.051), ('speaker', 0.049), ('theories', 0.048), ('degree', 0.048), ('speech', 0.047), ('mentions', 0.047), ('bilateral', 0.047), ('eagleton', 0.047), ('fictions', 0.047), ('sherlock', 0.047), ('narrative', 0.046), ('metadata', 0.045), ('williams', 0.045), ('holmes', 0.045), ('hypothesis', 0.044), ('franco', 0.041), ('attribution', 0.041), ('speakers', 0.04), ('named', 0.04), ('told', 0.039), ('detecting', 0.038), ('claims', 0.038), ('jane', 0.038), ('theorists', 0.038), ('interaction', 0.037), ('adjacency', 0.037), ('connections', 0.035), ('stories', 0.033), ('period', 0.032), ('authorial', 0.031), ('darcy', 0.031), ('determinative', 0.031), ('dickens', 0.031), ('gruzd', 0.031), ('halpin', 0.031), ('interconnectedness', 0.031), ('knowable', 0.031), ('mostellar', 0.031), ('narration', 0.031), ('omniscient', 0.031), ('zone', 0.031), ('entity', 0.031), ('connected', 0.029), ('british', 0.029), ('face', 0.029), ('columbia', 0.028), ('city', 0.028), ('proportional', 0.028), ('mansfield', 0.027), ('verso', 0.027), ('populous', 0.027), ('cliques', 0.027), ('cho', 0.027), ('critic', 0.027), ('protagonists', 0.027), ('speaking', 0.027), ('mutually', 0.027), ('vertex', 0.026), ('population', 0.025), ('extracting', 0.025), ('surprisingly', 0.025), ('edges', 0.025), ('ancient', 0.025), ('socially', 0.025), ('standpoint', 0.025), ('geography', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 112 acl-2010-Extracting Social Networks from Literary Fiction

Author: David Elson ; Nicholas Dames ; Kathleen McKeown

2 0.11420868 178 acl-2010-Non-Cooperation in Dialogue

Author: Brian Pluss

Abstract: This paper presents ongoing research on computational models for non-cooperative dialogue. We start by analysing different levels of cooperation in conversation. Then, inspired by findings from an empirical study, we propose a technique for measuring non-cooperation in political interviews. Finally, we describe a research programme towards obtaining a suitable model and discuss previous accounts for conflictive dialogue, identifying the differences with our work.

3 0.098503381 204 acl-2010-Recommendation in Internet Forums and Blogs

Author: Jia Wang ; Qing Li ; Yuanzhu Peter Chen ; Zhangxi Lin

Abstract: The variety of engaging interactions among users in social medial distinguishes it from traditional Web media. Such a feature should be utilized while attempting to provide intelligent services to social media participants. In this article, we present a framework to recommend relevant information in Internet forums and blogs using user comments, one of the most representative of user behaviors in online discussion. When incorporating user comments, we consider structural, semantic, and authority information carried by them. One of the most important observation from this work is that semantic contents of user comments can play a fairly different role in a different form of social media. When designing a recommendation system for this purpose, such a difference must be considered with caution.

4 0.084826864 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun

Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.

5 0.083806962 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management

Author: Pierre Lison

Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.

6 0.082859822 29 acl-2010-An Exact A* Method for Deciphering Letter-Substitution Ciphers

7 0.078573048 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

8 0.077606566 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

9 0.071927428 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images

10 0.071524329 224 acl-2010-Talking NPCs in a Virtual Game World

11 0.065662928 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

12 0.06319692 13 acl-2010-A Rational Model of Eye Movement Control in Reading

13 0.061214823 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

14 0.058181599 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

15 0.054346897 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

16 0.053082481 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems

17 0.052846666 16 acl-2010-A Statistical Model for Lost Language Decipherment

18 0.04909724 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems

19 0.045624278 196 acl-2010-Plot Induction and Evolutionary Search for Story Generation

20 0.04501345 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.07), (2, -0.05), (3, -0.093), (4, -0.006), (5, -0.074), (6, -0.059), (7, 0.033), (8, 0.021), (9, 0.016), (10, -0.005), (11, 0.005), (12, 0.011), (13, -0.05), (14, -0.021), (15, 0.043), (16, 0.015), (17, 0.059), (18, 0.028), (19, 0.008), (20, -0.064), (21, -0.032), (22, 0.04), (23, -0.02), (24, -0.043), (25, -0.012), (26, -0.071), (27, -0.019), (28, 0.077), (29, -0.091), (30, -0.076), (31, 0.168), (32, 0.112), (33, 0.131), (34, -0.222), (35, -0.057), (36, -0.008), (37, -0.047), (38, -0.177), (39, -0.066), (40, 0.021), (41, 0.012), (42, -0.038), (43, -0.065), (44, 0.048), (45, -0.045), (46, -0.04), (47, 0.005), (48, 0.117), (49, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96537894 112 acl-2010-Extracting Social Networks from Literary Fiction

Author: David Elson ; Nicholas Dames ; Kathleen McKeown

2 0.64633286 178 acl-2010-Non-Cooperation in Dialogue

Author: Brian Pluss

3 0.60430932 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

Author: Kornel Laskowski

Abstract: Substantial research effort has been invested in recent decades into the computational study and automatic processing of multi-party conversation. While most aspects of conversational speech have benefited from a wide availability of analytic, computationally tractable techniques, only qualitative assessments are available for characterizing multi-party turn-taking. The current paper attempts to address this deficiency by first proposing a framework for computing turn-taking model perplexity, and then by evaluating several multi-participant modeling approaches. Experiments show that direct multi-participant models do not generalize to held out data, and likely never will, for practical reasons. In contrast, the Extended-Degree-of-Overlap model represents a suitable candidate for future work in this area, and is shown to successfully predict the distribution of speech in time and across participants in previously unseen conversations.

4 0.54028898 204 acl-2010-Recommendation in Internet Forums and Blogs

Author: Jia Wang ; Qing Li ; Yuanzhu Peter Chen ; Zhangxi Lin

5 0.5172084 29 acl-2010-An Exact A* Method for Deciphering Letter-Substitution Ciphers

Author: Eric Corlett ; Gerald Penn

Abstract: Letter-substitution ciphers encode a document from a known or hypothesized language into an unknown writing system or an unknown encoding of a known writing system. It is a problem that can occur in a number of practical applications, such as in the problem of determining the encodings of electronic documents in which the language is known, but the encoding standard is not. It has also been used in relation to OCR applications. In this paper, we introduce an exact method for deciphering messages using a generalization of the Viterbi algorithm. We test this model on a set of ciphers developed from various web sites, and find that our algorithm has the potential to be a viable, practical method for efficiently solving decipherment prob- lems.

6 0.46436718 179 acl-2010-Now, Where Was I? Resumption Strategies for an In-Vehicle Dialogue System

7 0.44929349 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs

8 0.44361067 58 acl-2010-Classification of Feedback Expressions in Multimodal Data

9 0.42329592 224 acl-2010-Talking NPCs in a Virtual Game World

10 0.41119742 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management

11 0.40434062 250 acl-2010-Untangling the Cross-Lingual Link Structure of Wikipedia

12 0.39772958 81 acl-2010-Decision Detection Using Hierarchical Graphical Models

13 0.39240062 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images

14 0.37841341 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms

15 0.37197378 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

16 0.36804658 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

17 0.35627538 139 acl-2010-Identifying Generic Noun Phrases

18 0.34916174 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

19 0.34814 61 acl-2010-Combining Data and Mathematical Models of Language Change

20 0.34631565 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.011), (14, 0.039), (25, 0.049), (39, 0.019), (42, 0.053), (44, 0.02), (59, 0.064), (72, 0.016), (73, 0.058), (78, 0.024), (80, 0.029), (83, 0.128), (84, 0.03), (88, 0.259), (97, 0.015), (98, 0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79662293 112 acl-2010-Extracting Social Networks from Literary Fiction

Author: David Elson ; Nicholas Dames ; Kathleen McKeown

2 0.71915972 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

Author: Taniya Mishra ; Srinivas Bangalore

Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.

3 0.69117177 233 acl-2010-The Same-Head Heuristic for Coreference

Author: Micha Elsner ; Eugene Charniak

Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.

4 0.59121555 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

Author: Vincent Ng

Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.

5 0.58709693 73 acl-2010-Coreference Resolution with Reconcile

Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom

Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.

6 0.56949693 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

7 0.56620991 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

8 0.56103331 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

9 0.55534875 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

10 0.5543834 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

11 0.55129802 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

12 0.54903948 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

13 0.54022419 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

14 0.53968453 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

15 0.53916389 214 acl-2010-Sparsity in Dependency Grammar Induction

16 0.53884244 81 acl-2010-Decision Detection Using Hierarchical Graphical Models

17 0.5387035 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

18 0.53813529 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

19 0.53656983 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

20 0.5351029 60 acl-2010-Collocation Extraction beyond the Independence Assumption