acl acl2011 acl2011-156 knowledge-graph by maker-knowledge-mining

156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

Source: pdf

Author: Jui-Yu Weng ; Cheng-Lun Yang ; Bo-Nian Chen ; Yen-Kai Wang ; Shou-De Lin

Abstract: This paper presents a system to summarize a Microblog post and its responses with the goal to provide readers a more constructive and concise set of information for efficient digestion. We introduce a novel two-phase summarization scheme. In the first phase, the post plus its responses are classified into four categories based on the intention, interrogation, sharing, discussion and chat. For each type of post, in the second phase, we exploit different strategies, including opinion analysis, response pair identification, and response relevancy detection, to summarize and highlight critical information to display. This system provides an alternative thinking about machinesummarization: by utilizing AI approaches, computers are capable of constructing deeper and more user-friendly abstraction. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 tw , , , , Abstract This paper presents a system to summarize a Microblog post and its responses with the goal to provide readers a more constructive and concise set of information for efficient digestion. [sent-4, score-0.665]

2 In the first phase, the post plus its responses are classified into four categories based on the intention, interrogation, sharing, discussion and chat. [sent-6, score-0.633]

3 For each type of post, in the second phase, we exploit different strategies, including opinion analysis, response pair identification, and response relevancy detection, to summarize and highlight critical information to display. [sent-7, score-0.924]

4 Take summarization for example, a Microblog user usually has to browse through tens or even hundreds of posts together with their responses daily, therefore it can be beneficial if there is an intelligent tool assisting summarizing those information. [sent-10, score-1.187]

5 Automatic text summarization (ATS) has been investigated for over fifty years, but the majority of the existing techniques might not be appropriate for Microblog write-ups. [sent-11, score-0.194]

6 For instance, a popular kind of approaches for summarization tries to identify a subset of information, usually in sentence form, from longer pieces of writings as summary (Das and Martins, 2007). [sent-12, score-0.313]

7 Below we first describe some special characteristics that deviates the Microblog summarization task from general text summarization. [sent-14, score-0.234]

8 Unlike normal blogs, there is a strict limitation on the number of characters for each post (e. [sent-17, score-0.161]

9 At least three different types of posts are observed in Microblogs, expressing feeling, sharing information, and asking questions. [sent-25, score-0.594]

10 Consequently, using one mold to fit all types of Microblog posts is not sufficient. [sent-29, score-0.457]

11 Different summarization schemes for posts with different purposes are preferred. [sent-30, score-0.651]

12 Posts and responses in Microblogs are more similar to a multi-persons dialogue corpus. [sent-32, score-0.472]

13 Sometimes, the topic of discussion at the end of the thread is totally unrelated to that of the post. [sent-35, score-0.137]

14 1c 12 S0y1s1te Amss Doecmiaotinosntr faotiron Cos,m papguetast 1io3n3a–l1 L3in8g,uistics This paper introduces a framework that summarizes a post with its responses. [sent-38, score-0.184]

15 Motivated by the abovementioned characteristics of Microblogs, we plan to use a two-phase summarization scheme to develop different summarization strategies for different type of posts (see Figure 1). [sent-39, score-0.917]

16 In the first phase, a post will be automatically classified into several categories including interrogation, discussion, sharing and chat based on the intention of the users. [sent-40, score-0.479]

17 In the second phase, the system chooses different summarization components for different types of posts. [sent-41, score-0.217]

18 Tactically, we argue that it is possible to integrate post-intention classification, opinion analysis, response relevancy and response-pair mining to create an intelligent summarization framework for Microblog posts and responses. [sent-51, score-1.23]

19 We also found that the content features are not as useful as the temporal or positional features for text mining in Microblog. [sent-52, score-0.153]

20 It is possible to go beyond the literal meaning of summarization to exploit advanced text mining methods to improve the quality and usability of a summarization system. [sent-55, score-0.436]

21 2 Summarization Framework riments and Expe- Below we discuss our two-phase summarization framework and the experiment results on each individual component. [sent-56, score-0.217]

22 Our observation is that Microblog posts can have different purposes. [sent-58, score-0.457]

23 The Interrogation posts are questions asked in public with the hope to obtain some useful answers from friends or other users. [sent-60, score-0.484]

24 The responses might serve the purpose for clarification or, even worse, have nothing to do with the question. [sent-62, score-0.439]

25 Hence we believe the most appropriate summarization process for this 134 kind of posts is to find out which replies really respond to the question. [sent-63, score-0.779]

26 We created a response relevance detection component to serve as its summarization mechanism. [sent-64, score-0.609]

27 The Sharing posts are very frequently observed in Microblog as Microbloggers like to share interesting websites, pictures, and videos with their friends. [sent-65, score-0.457]

28 We introduce the opinion analysis component that provides the analysis on whether the information shared is recommended by the respondents. [sent-68, score-0.128]

29 We also observe that some posts contain characteristics of both Interrogation and Sharing. [sent-69, score-0.497]

30 We create a category named Discussion for these posts, and apply both response ranking and opinion analysis engines on this type of posts. [sent-71, score-0.401]

31 Finally, there are posts which simply act as the solicitation for further chat. [sent-72, score-0.457]

32 This kind of posts can sometimes involve multiple persons and the topic may gradually drift to a different one. [sent-75, score-0.567]

33 We believe the plausible summarization strategy is to group different messages based on their topics. [sent-76, score-0.301]

34 Therefore for Chat posts, we designed a response pair identification system to accomplish such goal. [sent-77, score-0.396]

35 We group the related responses together for display, and the number of groups represents the number of different topics in this thread. [sent-78, score-0.488]

36 Figure 1 shows the flow of our summarization … framework. [sent-79, score-0.194]

37 When an input post with responses comes in, the system first determines its intention, based on which the system adopts proper strategies for summarization. [sent-80, score-0.645]

38 1 Post Intention Classification This stage aims to classify each post into four categories, Interrogation, Sharing, Discussion, and Chat. [sent-83, score-0.161]

39 One tricky issue is that the Discussion label is essentially a combination of interrogation and sharing labels. [sent-84, score-0.371]

40 The system first checks whether the posts contains URLs or pointers to files, then uses a binary classifier to determine whether the post is interrogative. [sent-89, score-0.681]

41 For the experiment, we manually annotate 6000 posts consisting of 1840 interrogation, 2002 sharing, 1905 chat, and 254 discussion posts. [sent-90, score-0.499]

42 We train a 6-gram language model as the binary interrogation classifier. [sent-91, score-0.234]

43 Then we integrate the classifier into our system and test on 6000 posts to obtain a testing accuracy of 82. [sent-92, score-0.52]

44 The system classifies responses into 3 categories, positive, negative, and neutral. [sent-97, score-0.429]

45 First of all, we train a binary classifier to determine if a post or a reply is opinionative. [sent-99, score-0.242]

46 If the answer is yes, we then use another binary classifier to decide if the opinion is positive or negative. [sent-101, score-0.182]

47 For polarity test, we exploit the built-in emoticons in Plurk to automatically extract posts with positive and negative opinions. [sent-106, score-0.586]

48 We collect 10,000 positive and 10,000 negative posts as training data to train a language model of Naïve Bayes classifier, and evaluate on manually annotated data of 3121 posts, with 1624 positive and 1497 negative to obtain accuracy of 0. [sent-107, score-0.605]

49 3 Response Pair Identification Conversation in micro-blogs tends to diverge into multiple topics as the number of responses grows. [sent-110, score-0.482]

50 Sometimes such divergence may result in responses that are irrelevant to the original post, thus creating problems for summarization. [sent-111, score-0.406]

51 Furthermore, because the messages are usually short, it is difficult to identify the main topics of these dialoguelike responses using only keywords in the content for summarization. [sent-112, score-0.609]

52 A Response Pair is a pair of responses that the latter specifically responds to the former. [sent-114, score-0.427]

53 Based on those pairs we can then form clusters of messages to indicate different group of topics and mesFeature Description Weight Backward RefeLatter response content 0. [sent-115, score-0.597]

54 055 rencing contains former responder’s display name Forward RefeFormer response contains 0. [sent-116, score-0.428]

55 018 rencing of user latter response’s author’s name user name Response position Number of responses in 0. [sent-117, score-0.559]

56 13 difference between responses Content similarity Contents’ cosine similari- 0. [sent-118, score-0.452]

57 Looking at the content of micro-blogs, we observe that related responses are usually adjacent to each other as users tend to closely follow whether their messages are responded and reply to the responses from others quickly. [sent-123, score-1.058]

58 Therefore besides content features, we decide to add the temporal and ordering features (See Table 1) to train a classifier that takes a pair of messages as inputs and return whether they are related. [sent-124, score-0.231]

59 By identifying the response pairs, our summarization system is able to group the responses into different topic clusters and display the clusters separately. [sent-125, score-1.114]

60 For experiment, the model is trained using LIBSVM (Chang and Lin, 2001) (RBF kernel) with 6000 response pairs, half of the training set positive and half negative. [sent-127, score-0.434]

61 Responses with @user_name string in the content are matched with earlier responses by the author, user_name. [sent-129, score-0.47]

62 Based on the learned weights of the features, we observe that content feature is not very useful in determining the response pairs. [sent-130, score-0.412]

63 We also have noticed that there is high correlation between the responses relatedness and the number of other responses between them. [sent-132, score-0.812]

64 For example, users are less likely to respond to a response if there have been many replies about this response already. [sent-133, score-0.801]

65 Statistical analy- sis on positive training data shows that the average number of responses between related responses is 2. [sent-134, score-0.855]

66 We train the classifier using 6000 automaticallyextracted pairs of both positive and negative instances. [sent-136, score-0.164]

67 The baseline model which uses only content similarity feature reaches only 45% in accuracy. [sent-140, score-0.117]

68 4 Response Relevance Detection For interrogative posts, we think the best summary is to find out the relevent responses as potential answers. [sent-142, score-0.55]

69 We introduce a response relevancy detection component for the problem. [sent-143, score-0.485]

70 Temporal and Positional Features A common assertion is that the earlier responses have higher probability to be the answers of the question. [sent-151, score-0.406]

71 Based on the learned weights, it is not surprising that most important feature is the position of the response in the response hierarchy. [sent-152, score-0.65]

72 Content Features We use the length of the message, the cosine similarity of the post and the responses, and the occurrence of the interrogative words in response sentences as content features. [sent-155, score-0.693]

73 Because the interrogation posts in Plurk are relatively few, we manually find a total of 382 positive and 403 negative pairs for training and use 10-fold cross validation for evaluation. [sent-156, score-0.84]

74 The baseline is to always select the first response as the only relevant answer. [sent-158, score-0.325]

75 3 System Demonstration In this section, we show some snapshots of our summarization system with real examples using Plurk dataset. [sent-162, score-0.217]

76 Given a query term, our system first returns several posts containing the query string under the search bar. [sent-166, score-0.48]

77 When one of the posts is selected, it will generate a summary according to the detected intention and show it in a pop-up frame. [sent-167, score-0.592]

78 For interrogative posts, we perform the response relevancy detection. [sent-172, score-0.525]

79 Figure 4 is an example of summary of an interrogative post. [sent-174, score-0.144]

80 We can see that responses other than the first and the last responses are filtered because they are less relevant to the question. [sent-175, score-0.812]

81 For sharing posts, the summary consists of two parts. [sent-176, score-0.184]

82 Then the system picks three responses from the majority group or one response from each group if there is no significant difference. [sent-178, score-0.834]

83 Figure 5 is an example that most friends of the user dfrag give positive feedback to the shared video link. [sent-179, score-0.159]

84 For discussion posts, we combine the response relevancy detection subsystem and the opinion analysis sub-system for summarization. [sent-180, score-0.578]

85 The former first eliminates the responses that are not likely to be the answer of the post. [sent-181, score-0.429]

86 The latter then generates a summary for the post and relevant responses. [sent-182, score-0.208]

87 For chat posts, we apply the response pair identification component to generate the summary. [sent-184, score-0.467]

88 In the example, Figure 6, the original Plurk post is about one topic while the responses diverge to one 137 Figure 5. [sent-185, score-0.637]

89 Our system clearly separates the responses into multiple groups. [sent-189, score-0.429]

90 The users no longer have to read interleaving responses from different topics and guess which topic group a response is referring to. [sent-191, score-0.893]

91 We found only one work that discusses about the issues of summarization for Microblogs (Sharifi et al. [sent-193, score-0.194]

92 Their goal, however, is very different from ours as they try to summarize multiple posts and do not consider the responses. [sent-195, score-0.504]

93 They are essentially trying to solve a multi-document summarization problem while our problem is more similar to short dialog summarization because the dialogue nature of Microblogs is one of the most challenging part that we tried to overcome. [sent-197, score-0.454]

94 In dialogue summarization, many researchers have pointed out the importance of detecting response pairs in a conversation. [sent-198, score-0.42]

95 Zhou and Hovy (2005) concentrates on summarizing dialogue-style technical internet relay chats using supervised learning methods. [sent-202, score-0.127]

96 Zhou further clusters chat logs into several topics and then extract some essential response pairs to form summaries. [sent-203, score-0.495]

97 Due to the intrinsic difference between the writing styles of Microblog and other online sources, our experiments show that the content feature is not as useful as the position and temporal features. [sent-207, score-0.13]

98 Our system uses an effective strategy to summarize the post/response by first determine the intention and then perform different analysis depending on the post types. [sent-210, score-0.319]

99 By utilizing text mining and analysis techniques, computers are capable of providing more intelligent summarization than information condensation. [sent-212, score-0.292]

100 Digesting virtual geek culture: The summarization of technical internet relay chats, in Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). [sent-248, score-0.232]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('posts', 0.457), ('responses', 0.406), ('microblog', 0.375), ('response', 0.325), ('interrogation', 0.234), ('summarization', 0.194), ('post', 0.161), ('plurk', 0.145), ('sharing', 0.137), ('microblogs', 0.136), ('relevancy', 0.103), ('interrogative', 0.097), ('intention', 0.088), ('opinion', 0.076), ('replies', 0.071), ('chat', 0.069), ('messages', 0.067), ('dialogue', 0.066), ('content', 0.064), ('libsvm', 0.062), ('rbf', 0.049), ('summarize', 0.047), ('chats', 0.047), ('rencing', 0.047), ('shrestha', 0.047), ('summary', 0.047), ('users', 0.044), ('positive', 0.043), ('thinking', 0.043), ('discussion', 0.042), ('topics', 0.042), ('summarizing', 0.042), ('reply', 0.041), ('sharifi', 0.041), ('group', 0.04), ('characteristics', 0.04), ('classifier', 0.04), ('temporal', 0.039), ('relay', 0.038), ('topic', 0.036), ('respond', 0.036), ('thread', 0.036), ('video', 0.035), ('phase', 0.034), ('diverge', 0.034), ('half', 0.033), ('serve', 0.033), ('detection', 0.032), ('strategies', 0.032), ('taiwan', 0.031), ('intelligent', 0.031), ('negative', 0.031), ('reaches', 0.031), ('usually', 0.03), ('display', 0.03), ('clusters', 0.03), ('positional', 0.029), ('pairs', 0.029), ('blogs', 0.028), ('concise', 0.028), ('emoticons', 0.028), ('user', 0.027), ('sometimes', 0.027), ('email', 0.027), ('friends', 0.027), ('styles', 0.027), ('exploit', 0.027), ('identification', 0.027), ('shared', 0.027), ('zhou', 0.026), ('name', 0.026), ('drift', 0.026), ('kernel', 0.025), ('component', 0.025), ('computers', 0.025), ('cosine', 0.024), ('validation', 0.024), ('categories', 0.024), ('framework', 0.023), ('unrelated', 0.023), ('twitter', 0.023), ('answer', 0.023), ('weights', 0.023), ('system', 0.023), ('das', 0.023), ('cross', 0.022), ('similarity', 0.022), ('subjectivity', 0.021), ('pair', 0.021), ('utilizing', 0.021), ('kind', 0.021), ('mining', 0.021), ('automaticallyextracted', 0.021), ('writings', 0.021), ('reconsider', 0.021), ('ats', 0.021), ('beaux', 0.021), ('hutton', 0.021), ('possess', 0.021), ('respondents', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

Author: Jui-Yu Weng ; Cheng-Lun Yang ; Bo-Nian Chen ; Yen-Kai Wang ; Shou-De Lin

2 0.38269049 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

Author: Cheng-Te Li ; Chien-Yuan Wang ; Chien-Lin Tseng ; Shou-De Lin

Abstract: Micro-blogging services provide platforms for users to share their feelings and ideas on the move. In this paper, we present a search-based demonstration system, called MemeTube, to summarize the sentiments of microblog messages in an audiovisual manner. MemeTube provides three main functions: (1) recognizing the sentiments of messages (2) generating music melody automatically based on detected sentiments, and (3) produce an animation of real-time piano playing for audiovisual display. Our MemeTube system can be accessed via: http://mslab.csie.ntu.edu.tw/memetube/ .

3 0.15174191 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

Author: Dong Wang ; Yang Liu

Abstract: This paper presents a pilot study of opinion summarization on conversations. We create a corpus containing extractive and abstractive summaries of speaker’s opinion towards a given topic using 88 telephone conversations. We adopt two methods to perform extractive summarization. The first one is a sentence-ranking method that linearly combines scores measured from different aspects including topic relevance, subjectivity, and sentence importance. The second one is a graph-based method, which incorporates topic and sentiment information, as well as additional information about sentence-to-sentence relations extracted based on dialogue structure. Our evaluation results show that both methods significantly outperform the baseline approach that extracts the longest utterances. In particular, we find that incorporating dialogue structure in the graph-based method contributes to the improved system performance.

4 0.12355303 194 acl-2011-Language Use: What can it tell us?

Author: Marjorie Freedman ; Alex Baron ; Vasin Punyakanok ; Ralph Weischedel

Abstract: For 20 years, information extraction has focused on facts expressed in text. In contrast, this paper is a snapshot of research in progress on inferring properties and relationships among participants in dialogs, even though these properties/relationships need not be expressed as facts. For instance, can a machine detect that someone is attempting to persuade another to action or to change beliefs or is asserting their credibility? We report results on both English and Arabic discussion forums. 1

5 0.11022379 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech

Author: Miao Chen ; Klaus Zechner

Abstract: This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construct of communicative competence. Our goal is to find effective features, selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses, and to build automatic scoring models based on the most promising features by using machine learning methods. On human transcriptions with manually annotated clause and sentence boundaries, our best scoring model achieves an overall Pearson correlation with human rater scores of r=0.49 on an unseen test set, whereas correlations of models using sentence or clause boundaries from automated classifiers are around r=0.2. 1

6 0.1070159 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

7 0.098457009 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

8 0.084160447 159 acl-2011-Identifying Noun Product Features that Imply Opinions

9 0.081437282 177 acl-2011-Interactive Group Suggesting for Twitter

10 0.079960197 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

11 0.075380154 76 acl-2011-Comparative News Summarization Using Linear Programming

12 0.074188441 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

13 0.06886816 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

14 0.068176627 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization

15 0.067403756 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

16 0.063893497 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

17 0.063816108 185 acl-2011-Joint Identification and Segmentation of Domain-Specific Dialogue Acts for Conversational Dialogue Systems

18 0.063648835 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports

19 0.063266955 4 acl-2011-A Class of Submodular Functions for Document Summarization

20 0.060686205 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.148), (1, 0.138), (2, 0.011), (3, 0.064), (4, -0.117), (5, 0.059), (6, -0.071), (7, 0.022), (8, 0.012), (9, -0.022), (10, -0.084), (11, 0.024), (12, -0.044), (13, -0.028), (14, -0.077), (15, -0.007), (16, 0.029), (17, 0.011), (18, -0.004), (19, -0.003), (20, 0.01), (21, 0.046), (22, -0.016), (23, 0.046), (24, -0.031), (25, -0.033), (26, 0.1), (27, -0.049), (28, 0.007), (29, -0.102), (30, 0.062), (31, -0.019), (32, -0.104), (33, 0.086), (34, -0.018), (35, -0.05), (36, -0.162), (37, -0.007), (38, 0.023), (39, 0.039), (40, 0.062), (41, 0.067), (42, 0.047), (43, -0.011), (44, -0.096), (45, -0.043), (46, 0.135), (47, -0.172), (48, -0.182), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94851547 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

Author: Jui-Yu Weng ; Cheng-Lun Yang ; Bo-Nian Chen ; Yen-Kai Wang ; Shou-De Lin

2 0.63435411 194 acl-2011-Language Use: What can it tell us?

Author: Marjorie Freedman ; Alex Baron ; Vasin Punyakanok ; Ralph Weischedel

3 0.62314647 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

Author: Sara Rosenthal ; Kathleen McKeown

Abstract: We investigate whether wording, stylistic choices, and online behavior can be used to predict the age category of blog authors. Our hypothesis is that significant changes in writing style distinguish pre-social media bloggers from post-social media bloggers. Through experimentation with a range of years, we found that the birth dates of students in college at the time when social media such as AIM, SMS text messaging, MySpace and Facebook first became popular, enable accurate age prediction. We also show that internet writing characteristics are important features for age prediction, but that lexical content is also needed to produce significantly more accurate results. Our best results allow for 81.57% accuracy.

4 0.61795777 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

Author: Cheng-Te Li ; Chien-Yuan Wang ; Chien-Lin Tseng ; Shou-De Lin

5 0.53092772 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

Author: Cecilia Ovesdotter Alm

Abstract: This opinion paper discusses subjective natural language problems in terms of their motivations, applications, characterizations, and implications. It argues that such problems deserve increased attention because of their potential to challenge the status of theoretical understanding, problem-solving methods, and evaluation techniques in computational linguistics. The author supports a more holistic approach to such problems; a view that extends beyond opinion mining or sentiment analysis.

6 0.52331376 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

7 0.48312452 133 acl-2011-Extracting Social Power Relationships from Natural Language

8 0.46543935 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

9 0.45369136 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

10 0.44869784 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

11 0.43700272 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

12 0.42980132 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model

13 0.42929256 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination

14 0.42293337 35 acl-2011-An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling

15 0.40897131 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

16 0.39472005 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports

17 0.38935053 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

18 0.38465968 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution

19 0.37477079 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice

20 0.36585638 80 acl-2011-ConsentCanvas: Automatic Texturing for Improved Readability in End-User License Agreements

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.031), (13, 0.011), (17, 0.04), (26, 0.041), (32, 0.012), (36, 0.246), (37, 0.074), (39, 0.037), (41, 0.052), (53, 0.016), (59, 0.036), (72, 0.028), (88, 0.017), (91, 0.043), (96, 0.205)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89500844 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search

Author: Zhuowei Bao ; Benny Kimelfeld ; Yunyao Li

Abstract: Spelling correction for keyword-search queries is challenging in restricted domains such as personal email (or desktop) search, due to the scarcity of query logs, and due to the specialized nature of the domain. For that task, this paper presents an algorithm that is based on statistics from the corpus data (rather than the query log). This algorithm, which employs a simple graph-based approach, can incorporate different types of data sources with different levels of reliability (e.g., email subject vs. email body), and can handle complex spelling errors like splitting and merging of words. An experimental study shows the superiority of the algorithm over existing alternatives in the email domain.

same-paper 2 0.85107726 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

Author: Jui-Yu Weng ; Cheng-Lun Yang ; Bo-Nian Chen ; Yen-Kai Wang ; Shou-De Lin

3 0.72396934 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

Author: Yulan He ; Chenghua Lin ; Harith Alani

Abstract: Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.

4 0.71175599 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

Author: Cheng-Te Li ; Chien-Yuan Wang ; Chien-Lin Tseng ; Shou-De Lin

5 0.69554782 94 acl-2011-Deciphering Foreign Language

Author: Sujith Ravi ; Kevin Knight

Abstract: In this work, we tackle the task of machine translation (MT) without parallel training data. We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from nonparallel text.

6 0.69094968 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

7 0.69078064 177 acl-2011-Interactive Group Suggesting for Twitter

8 0.68932879 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs

9 0.68750489 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

10 0.68693101 117 acl-2011-Entity Set Expansion using Topic information

11 0.68530929 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

12 0.68524289 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

13 0.68481469 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation

14 0.68467605 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

15 0.68450588 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech

16 0.6840288 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

17 0.68396544 11 acl-2011-A Fast and Accurate Method for Approximate String Search

18 0.68388653 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

19 0.68344772 220 acl-2011-Minimum Bayes-risk System Combination

20 0.6829828 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework