emnlp emnlp2013 emnlp2013-48 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn , Abstract Personal profile information on social media like LinkedIn. [sent-6, score-0.855]
2 However, personal profiles usually lack organization confronted with the large amount of available information. [sent-9, score-0.519]
3 In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. [sent-11, score-2.126]
4 Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e. [sent-12, score-0.833]
5 To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. [sent-15, score-1.592]
6 0 has empowered people to actively interact with each other, forming social networks around mutually interesting information and publishing a large amount of useful user-generated content (UGC) online (Lappas et al. [sent-19, score-0.522]
7 One popular and important type of UGC is the personal profile, where people post detailed * Corresponding author 715 information on online portals about their education, experiences and other personal information. [sent-22, score-0.629]
8 com have created a viable business as profile portals, with the popularity and success partially attributed to their comprehensive personal profiles. [sent-25, score-0.766]
9 Generally, online personal profiles provide valuable resources for businesses, especially for human resource managers to find talents, and help people connect with others of similar backgrounds (Yang et al. [sent-26, score-0.585]
10 For this regard, it is highly desirable to develop reliable methods to generate a summary of a person through his profile automatically. [sent-31, score-0.711]
11 To the best of our knowledge, this is the first research that explores automatic summarization of personal profiles in social media. [sent-32, score-1.083]
12 A straightforward approach is to consider personal profile summarization as a traditional document summarization problem, which treating each personal profile independently and generate a summary for each personal profile individually. [sent-33, score-2.848]
13 As the centroid of social networking, people are usually connected to others with similar ProceSe datintlges, o Wfa tsh ein 2g01to3n, C UoSnfAe,re 1n8c-e2 o1n O Ecmtopbier ic 2a0l1 M3. [sent-38, score-0.477]
14 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 7t1ic5s–725, background in social media (e. [sent-40, score-0.379]
15 Therefore, it is reasonable to leverage social connection to improve the performance of profile summarizing. [sent-43, score-1.036]
16 The remaining challenge is how to incorporate both the profile textual information and the connection knowledge in the social networks. [sent-45, score-1.108]
17 In this study, we propose a collective factor graph model (CoFG) to summarize the text of personal profile in social networks with local textual information and social connection information. [sent-46, score-2.088]
18 The CoFG framework utilizes both the local textual attribute functions of an individual person and the social connection factor between different persons to collectively summarize personal profile on one person. [sent-47, score-1.868]
19 In this study, we treat the profile summarization as a supervised learning task. [sent-48, score-0.728]
20 Specifically, we model each sentence of the profile as a vector. [sent-49, score-0.495]
21 In the training phase, we use the vectors with the so- cial connection between each person to build the CoFG model; while in the testing phase, we perform collective inference for the importance of each sentence and select a subset of sentences as the summary according to the trained model. [sent-50, score-0.57]
22 com indicates that our proposed joint model and social connection information improve the performance of profile summarization. [sent-52, score-1.059]
23 2 Related Work In this section, we will introduce the related work on the traditional topic-based summarization, social-based summarization and factor graph model respectively. [sent-61, score-0.406]
24 However, different from all existing studies, our work is the first attempt to consider both textual information and social relationship information for supervised summarization. [sent-73, score-0.489]
25 0 has empowered people to actively interact with each other, studies focusing on social media have attracted much attention recently (Meeder et al. [sent-76, score-0.527]
26 Social-based summarization is exactly a special case of summarization where the social connection is employed to help obtaining the summarization. [sent-79, score-0.98]
27 Although topicbased summarization has been extensively studied, studies on social-based summarization are relative new and rare. [sent-80, score-0.438]
28 , (201 1) proposed an unsupervised PageRank-based social summarization approach by incorporating both document context and user context in the sentence evaluation process. [sent-82, score-0.629]
29 Unlike all the above studies, this paper focuses on a novel task, profile summarization. [sent-87, score-0.476]
30 Furthermore, we employ many other kinds of social information in profiles, such as co-major, and cocorporation between two people. [sent-88, score-0.43]
31 They are shown to be very effective for profile summarization. [sent-89, score-0.476]
32 3 Factor Graph Model As social network has been investigated for several years (Leskovec et al. [sent-91, score-0.396]
33 , 2010) and Factor Graph Model (FGM) is a popular approach to describe the relationship of social network (Tang et al. [sent-95, score-0.44]
34 Factor Graph Model builds a graph to represent the relationship of nodes on the social networks, and the factor functions are always considered to represent the relationship of the nodes. [sent-98, score-0.669]
35 (2012) formalized the problem of social relationship learning into a semi-supervised framework, and proposed Partially-labeled Pairwise Factor Graph Model (PLP-FGM) for learning to infer the type of social ties. [sent-101, score-0.794]
36 (2012) gave a formal defini- tion of link recommendation across heterogeneous networks, and proposed a ranking factor graph model (RFG) for predicting links in social networks, which effectively improves the predictive performance. [sent-103, score-0.611]
37 , (201 1b) generated summaries by modeling tweets and social contexts into a dual wing factor graph (DWFG), which utilized the mutual reinforcement between Web documents and their associated social contexts. [sent-105, score-1.004]
38 Different from all above researches, this paper proposes a pair-wise factor graph model to collectively utilize both textual information and social connection factor to generate summary of profile. [sent-106, score-1.152]
39 3 Data Collection and Statistics The personal profile summarization is a novel task and there exists no related data for accessing this issue. [sent-107, score-0.955]
40 Therefore, in this study, we collect a data set containing personal summaries with the corresponding knowledge, such as the self-introduction and personal profiles. [sent-108, score-0.623]
41 It contains a large number of personal profiles generated by users, containing various kinds of information, such as personal overview, summary, education, experience, projects and skills. [sent-113, score-0.817]
42 To begin with, 10 random people’s public profiles are selected as seed profiles, and then the profiles from their “People Also Viewed” field were collected. [sent-116, score-0.575]
43 We do not collect personal names in public profiles to protect people’s privacy. [sent-118, score-0.541]
44 Figure 1 shows an example of a person’s profile from LinkedIn. [sent-119, score-0.476]
45 The profile includes following fields: Overview: It gives a structure description of a person’s general information, such as current/previous position and workplace, brief 1 http://www. [sent-121, score-0.476]
46 However, compared with Overview, Summary, Experience, Education fields, they seem to be less important for summarization of personal profiles. [sent-131, score-0.479]
47 2 Data Statistics of Major Fields We collected 3,182 personal profiles from LinkedIn. [sent-134, score-0.519]
48 t number of non-empty fields and the average length for each field From Table 1, we can see that, The information of each profile is incomplete and inconsistent, That is, not all kinds of fields are available in each personal’s profile. [sent-139, score-0.712]
49 Most people provide their experience and education information. [sent-140, score-0.361]
50 3 Corpus Construction and Annotation Among the 921 profiles that contain the summary, we manually select 497 profiles with high quality summary to construct the corpus for our research. [sent-146, score-0.651]
51 That is, they are written carefully, and could give an overview of a person and represent the education and experience information of a person. [sent-149, score-0.393]
52 Besides, we collect social context information from Education and Experience field, and these social contexts are including by LinkedIn explicitly. [sent-152, score-0.73]
53 Table 2 shows the average length of summary and experience fields we used for evaluating our summarization approach. [sent-153, score-0.613]
54 4 Motivation and Analysis In this section, we propose the motivation of social connection to address the task of personal profile summarization. [sent-160, score-1.327]
55 To preliminarily support the motivation, some statistics of the social connection are provided. [sent-161, score-0.56]
56 Figure 2: An example of personal profile network. [sent-162, score-0.745]
57 Red is for female, blue is for male, and the dotted line means the social connection between two persons. [sent-163, score-0.588]
58 We first describe the social connections which we used. [sent-164, score-0.354]
59 Figure 2 shows an example of social connection between people from the profiles of LinkedIn. [sent-165, score-0.876]
60 We find that people are sometimes connected by several social connections. [sent-166, score-0.458]
61 From LinkedIn, four kinds of social relationship between people are extracted from the Education field and Experience field. [sent-168, score-0.568]
62 They are: co_major denotes that two persons have the same major at school co_univ denotes that two persons are graduated from the same university co_title denotes that two persons have the same title at corporation. [sent-169, score-0.426]
63 Our basic motivation of using social connection lies in the fact that “connected” people will tend to hold related experience and similar summaries. [sent-171, score-0.834]
64 We then give the statistics of edges of social connection. [sent-172, score-0.393]
65 From Table 3, we can see that the number of users is 497 while the number of social connection edges is 14,307. [sent-174, score-0.599]
66 5 Collective Factor Graph Model In this section, we propose a collective factor graph (CoFG) model for learning and summarizing the text of personal profile with local textual information and social connection. [sent-178, score-1.471]
67 1 Overview of Our Framework To generate summaries for profiles, a straightforward approach is to treat each personal profile independently and generating a summary for each personal profile individually. [sent-180, score-1.746]
68 Instead, we formalize the problem of personal profile summarization in a pair-wise factor graph model and propose an approach referred to as Loopy Belief Propagation algorithm to learn the model for generating the summary of the profile. [sent-183, score-1.302]
69 Thus, the problem of collective personal profile summarization model learning is cast as learning model parameters that maximizes the joint probability of the input continuous dynamic network. [sent-186, score-1.03]
70 2 Model Definition Formally, given a network G(V , SL,SU,X) , each sentence si is associated with an attribute vector xi of the profile and a label yi indicating whether the sentence is selected as a summary of the profile (The value of yi is binary. [sent-192, score-1.492]
71 NB( i ) denotes the set of social relationship neighbors nodes of i. [sent-207, score-0.457]
72 x, the jth The left figure shows the personal profile network. [sent-208, score-0.745]
73 Each dotted square denotes a person, and the grey square denotes the sentence selected in the summary, and the white square denotes a sentence that is not selected as the summary. [sent-210, score-0.315]
74 g(yi,yj) indi- cates the social connection factor function. [sent-215, score-0.696]
75 Local textual attribute functions {f ( xij, yi)}j : It denotes the attribute value associated with each sentence i. [sent-217, score-0.363]
76 721 Social connection factor function g( yi yj) : For the social correlation factor function, we define it through the pairwise network structure. [sent-228, score-0.967]
77 That is, if the person of sentence i and the person of sentence j have a social relationship, a factor funcy, tion for this social connection is defined (Tang et al. [sent-229, score-1.218]
78 , g yi, yjexpijyiyj2 (7) The person-person social relationships are defined on Section 4, e. [sent-233, score-0.38]
79 We define that if two persons have at least one social connection edge, they have a social relationship. [sent-236, score-0.997]
80 For example, f( v1 , yi) denotes the set of local textual attribute functions of yi . [sent-246, score-0.375]
81 , (y2 ,y4) , (y3 y5 ) ) based on the structure of the input personal profile social network. [sent-249, score-1.099]
82 For example, g( y3 ,y5) denotes social connection between y3 y, and y5 , while they share the co_major relationship on the left figure. [sent-250, score-0.663]
83 We use (the weight of the social connection factor function g( yi yj) ) as the example to explain how we learn the parameters (the algorithm also applies to tune by simply replacing with ). [sent-256, score-0.789]
84 This can be obtained by Y* argmaxL Y Y| X, G, (11) Finally, we select a subset of sentences of each testing profile as the summary according to the trained models with top-n prediction scores by Y * (Tang et al. [sent-272, score-0.681]
85 The existing summaries in these profiles are served as the reference summary (the standard answers). [sent-282, score-0.464]
86 We use 200 personal profiles as the testing data, and the remaining ones as the training data. [sent-286, score-0.546]
87 2 Experimental Results We compare the proposed CoFG approach with three baselines illustrated as follows: Random: we randomly select sentences of each profile to generate the summary for the profile. [sent-293, score-0.696]
88 HITS: we employ the HITS algorithm to perform profile summarization (Wan and Yang, 2008). [sent-294, score-0.686]
89 In detail, we first consider the words as hubs the sentences as authorities; Then, we rank the sentences with the authorities’ scores for each profile individually; Finally, the highest ranked sentences are chosen to constitute the summary. [sent-295, score-0.557]
90 PageRank: we employ the PageRank algorithm to perform profile summarization (Wan and Yang, 2008). [sent-296, score-0.686]
91 This result verifies the effectiveness of considering the social connection between the sentences in different profiles, Figure 6 shows the performance of our proposed CoFG model with different sizes of training data. [sent-305, score-0.61]
92 From Figure 6, we can see that CoFG model with social connection always performs better than MaxEnt, and the performance of our approach descends slowly when the training dataset becomes small. [sent-306, score-0.56]
93 edu/ size of training data PageRank MaxEnt CFG Figure 6: The performance of CoFG with different training data size Table 5 shows the contribution of the social edges with CoFG. [sent-311, score-0.393]
94 oU0 re3G 84 oEf2931-t h2econtribution of social edges From Table 5, we can see that all of our proposed approaches, i. [sent-315, score-0.416]
95 This result is mainly due to the fact that the information of social connection is redundant. [sent-321, score-0.56]
96 7 Conclusion and Future Work In this paper, we present a novel task named profile summarization and propose a novel approach called collective factor graph model to address this task. [sent-323, score-0.957]
97 One distinguishing feature of the proposed approach lies in its incorporating the social con- 724 nection. [sent-324, score-0.377]
98 Empirical studies demonstrate that the social connection is effective for profile summarization, which enables our approach outperform some competitive supervised and unsupervised baselines. [sent-325, score-1.073]
99 The main contribution of this paper is to explore social context information to help generate the summary of the profiles, which represents an interesting research direction in social network mining. [sent-326, score-0.92]
100 In the future work, we will explore more kinds of social context information and investigate better ways of incorporating them into profile summarization and a wider range of social network mining. [sent-327, score-1.465]
wordName wordTfidf (topN-words)
[('profile', 0.476), ('social', 0.354), ('cofg', 0.343), ('personal', 0.269), ('profiles', 0.25), ('summarization', 0.21), ('connection', 0.206), ('experience', 0.186), ('summary', 0.151), ('factor', 0.136), ('tang', 0.133), ('proceeding', 0.11), ('education', 0.109), ('yi', 0.093), ('attribute', 0.091), ('persons', 0.083), ('field', 0.075), ('collective', 0.075), ('wan', 0.075), ('pagerank', 0.073), ('textual', 0.072), ('zhuang', 0.069), ('people', 0.066), ('fields', 0.066), ('person', 0.065), ('summaries', 0.063), ('loopy', 0.062), ('maxent', 0.062), ('graph', 0.06), ('denotes', 0.059), ('yang', 0.049), ('cocorporation', 0.047), ('xij', 0.047), ('relationship', 0.044), ('network', 0.042), ('hits', 0.042), ('edges', 0.039), ('radev', 0.038), ('recommendation', 0.038), ('connected', 0.038), ('networks', 0.038), ('guy', 0.037), ('reinforcement', 0.037), ('belief', 0.035), ('yj', 0.035), ('propagation', 0.034), ('actively', 0.033), ('overview', 0.033), ('xi', 0.032), ('empowered', 0.031), ('hammersley', 0.031), ('lbp', 0.031), ('linkedin', 0.031), ('ryang', 0.031), ('functions', 0.031), ('marginal', 0.029), ('local', 0.029), ('kinds', 0.029), ('extractive', 0.028), ('dotted', 0.028), ('attributes', 0.028), ('dong', 0.027), ('testing', 0.027), ('ugc', 0.027), ('celikyilmaz', 0.027), ('lappas', 0.027), ('sentences', 0.027), ('relationships', 0.026), ('ep', 0.026), ('longest', 0.025), ('ij', 0.025), ('portals', 0.025), ('rosenthal', 0.025), ('shoushan', 0.025), ('media', 0.025), ('gradient', 0.024), ('square', 0.024), ('shen', 0.023), ('treat', 0.023), ('authorities', 0.023), ('document', 0.023), ('proposed', 0.023), ('tan', 0.022), ('motivation', 0.022), ('leskovec', 0.022), ('collect', 0.022), ('business', 0.021), ('rao', 0.021), ('zhou', 0.02), ('nb', 0.02), ('infer', 0.019), ('rouge', 0.019), ('centroid', 0.019), ('framework', 0.019), ('summarize', 0.019), ('supervised', 0.019), ('sentence', 0.019), ('generate', 0.019), ('collectively', 0.018), ('studies', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
Author: Tengfei Ma ; Hiroshi Nakagawa
Abstract: Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination.
3 0.11894923 65 emnlp-2013-Document Summarization via Guided Sentence Compression
Author: Chen Li ; Fei Liu ; Fuliang Weng ; Yang Liu
Abstract: Joint compression and summarization has been used recently to generate high quality summaries. However, such word-based joint optimization is computationally expensive. In this paper we adopt the ‘sentence compression + sentence selection’ pipeline approach for compressive summarization, but propose to perform summary guided compression, rather than generic sentence-based compression. To create an annotated corpus, the human annotators were asked to compress sentences while explicitly given the important summary words in the sentences. Using this corpus, we train a supervised sentence compression model using a set of word-, syntax-, and documentlevel features. During summarization, we use multiple compressed sentences in the integer linear programming framework to select . salient summary sentences. Our results on the TAC 2008 and 2011 summarization data sets show that by incorporating the guided sentence compression model, our summarization system can yield significant performance gain as compared to the state-of-the-art.
4 0.11866799 85 emnlp-2013-Fast Joint Compression and Summarization via Graph Cuts
Author: Xian Qian ; Yang Liu
Abstract: Extractive summarization typically uses sentences as summarization units. In contrast, joint compression and summarization can use smaller units such as words and phrases, resulting in summaries containing more information. The goal of compressive summarization is to find a subset of words that maximize the total score of concepts and cutting dependency arcs under the grammar constraints and summary length constraint. We propose an efficient decoding algorithm for fast compressive summarization using graph cuts. Our approach first relaxes the length constraint using Lagrangian relaxation. Then we propose to bound the relaxed objective function by the supermodular binary quadratic programming problem, which can be solved efficiently using graph max-flow/min-cut. Since finding the tightest lower bound suffers from local optimality, we use convex relaxation for initialization. Experimental results on TAC2008 dataset demonstrate our method achieves competitive ROUGE score and has good readability, while is much faster than the integer linear programming (ILP) method.
5 0.09492097 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model
Author: Lifu Huang ; Lian'en Huang
Abstract: Recently, much research focuses on event storyline generation, which aims to produce a concise, global and temporal event summary from a collection of articles. Generally, each event contains multiple sub-events and the storyline should be composed by the component summaries of all the sub-events. However, different sub-events have different part-whole relationship with the major event, which is important to correspond to users’ interests but seldom considered in previous work. To distinguish different types of sub-events, we propose a mixture-event-aspect model which models different sub-events into local and global aspects. Combining these local/global aspects with summarization requirements together, we utilize an optimization method to generate the component summaries along the timeline. We develop experimental systems on 6 distinctively different datasets. Evaluation and comparison results indicate the effectiveness of our proposed method.
6 0.082750335 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
8 0.077057719 174 emnlp-2013-Single-Document Summarization as a Tree Knapsack Problem
9 0.075523913 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
10 0.070651442 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers
11 0.06249008 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization
12 0.060475703 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
13 0.054033402 149 emnlp-2013-Overcoming the Lack of Parallel Data in Sentence Compression
14 0.054020129 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
15 0.051752537 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
16 0.051335 24 emnlp-2013-Application of Localized Similarity for Web Documents
17 0.046522949 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media
18 0.045236621 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
19 0.044686504 182 emnlp-2013-The Topology of Semantic Knowledge
20 0.043552559 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
topicId topicWeight
[(0, -0.154), (1, 0.069), (2, -0.089), (3, 0.076), (4, -0.03), (5, -0.075), (6, 0.246), (7, -0.077), (8, 0.05), (9, 0.004), (10, 0.014), (11, 0.029), (12, 0.062), (13, -0.048), (14, -0.051), (15, 0.005), (16, 0.086), (17, -0.019), (18, -0.082), (19, -0.056), (20, -0.044), (21, -0.032), (22, -0.005), (23, 0.038), (24, -0.011), (25, 0.03), (26, 0.037), (27, -0.05), (28, -0.057), (29, -0.061), (30, -0.082), (31, 0.119), (32, 0.102), (33, -0.116), (34, 0.047), (35, -0.166), (36, -0.027), (37, 0.027), (38, 0.051), (39, -0.084), (40, 0.051), (41, 0.004), (42, 0.134), (43, -0.08), (44, -0.021), (45, 0.022), (46, -0.027), (47, -0.031), (48, 0.018), (49, -0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.97071338 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
Author: Tengfei Ma ; Hiroshi Nakagawa
Abstract: Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination.
Author: Maria Liakata ; Simon Dobnik ; Shyamasree Saha ; Colin Batchelor ; Dietrich Rebholz-Schuhmann
Abstract: We present a method which exploits automatically generated scientific discourse annotations to create a content model for the summarisation of scientific articles. Full papers are first automatically annotated using the CoreSC scheme, which captures 11 contentbased concepts such as Hypothesis, Result, Conclusion etc at the sentence level. A content model which follows the sequence of CoreSC categories observed in abstracts is used to provide the skeleton of the summary, making a distinction between dependent and independent categories. Summary creation is also guided by the distribution of CoreSC categories found in the full articles, in order to adequately represent the article content. Fi- nally, we demonstrate the usefulness of the summaries by evaluating them in a complex question answering task. Results are very encouraging as summaries of papers from automatically obtained CoreSCs enable experts to answer 66% of complex content-related questions designed on the basis of paper abstracts. The questions were answered with a precision of 75%, where the upper bound for human summaries (abstracts) was 95%.
4 0.58315229 85 emnlp-2013-Fast Joint Compression and Summarization via Graph Cuts
Author: Xian Qian ; Yang Liu
Abstract: Extractive summarization typically uses sentences as summarization units. In contrast, joint compression and summarization can use smaller units such as words and phrases, resulting in summaries containing more information. The goal of compressive summarization is to find a subset of words that maximize the total score of concepts and cutting dependency arcs under the grammar constraints and summary length constraint. We propose an efficient decoding algorithm for fast compressive summarization using graph cuts. Our approach first relaxes the length constraint using Lagrangian relaxation. Then we propose to bound the relaxed objective function by the supermodular binary quadratic programming problem, which can be solved efficiently using graph max-flow/min-cut. Since finding the tightest lower bound suffers from local optimality, we use convex relaxation for initialization. Experimental results on TAC2008 dataset demonstrate our method achieves competitive ROUGE score and has good readability, while is much faster than the integer linear programming (ILP) method.
5 0.56805086 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model
Author: Lifu Huang ; Lian'en Huang
Abstract: Recently, much research focuses on event storyline generation, which aims to produce a concise, global and temporal event summary from a collection of articles. Generally, each event contains multiple sub-events and the storyline should be composed by the component summaries of all the sub-events. However, different sub-events have different part-whole relationship with the major event, which is important to correspond to users’ interests but seldom considered in previous work. To distinguish different types of sub-events, we propose a mixture-event-aspect model which models different sub-events into local and global aspects. Combining these local/global aspects with summarization requirements together, we utilize an optimization method to generate the component summaries along the timeline. We develop experimental systems on 6 distinctively different datasets. Evaluation and comparison results indicate the effectiveness of our proposed method.
6 0.53184849 174 emnlp-2013-Single-Document Summarization as a Tree Knapsack Problem
8 0.46376342 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
9 0.42852172 65 emnlp-2013-Document Summarization via Guided Sentence Compression
10 0.41636008 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
12 0.35143146 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!
14 0.30051747 26 emnlp-2013-Assembling the Kazakh Language Corpus
15 0.29463619 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization
16 0.29363024 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
17 0.2759248 24 emnlp-2013-Application of Localized Similarity for Web Documents
18 0.25887936 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
19 0.25389874 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
20 0.25067207 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
topicId topicWeight
[(3, 0.04), (9, 0.014), (18, 0.024), (20, 0.124), (22, 0.055), (30, 0.076), (36, 0.013), (50, 0.014), (51, 0.171), (63, 0.071), (66, 0.043), (71, 0.039), (75, 0.065), (77, 0.018), (90, 0.023), (95, 0.017), (96, 0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.89543337 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
2 0.860443 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
Author: Jesus Gonzalez-Rubio ; Daniel Ortiz-Martinez ; Jose-Miguel Benedi ; Francisco Casacuberta
Abstract: Current automatic machine translation systems are not able to generate error-free translations and human intervention is often required to correct their output. Alternatively, an interactive framework that integrates the human knowledge into the translation process has been presented in previous works. Here, we describe a new interactive machine translation approach that is able to work with phrase-based and hierarchical translation models, and integrates error-correction all in a unified statistical framework. In our experiments, our approach outperforms previous interactive translation systems, and achieves estimated effort reductions of as much as 48% relative over a traditional post-edition system.
3 0.84465659 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries
Author: Jerome White ; Douglas W. Oard ; Nitendra Rajput ; Marion Zalk
Abstract: Building search engines that can respond to spoken queries with spoken content requires that the system not just be able to find useful responses, but also that it know when it has heard enough about what the user wants to be able to do so. This paper describes a simulation study with queries spoken by non-native speakers that suggests that indicates that finding relevant content is often possible within a half minute, and that combining features based on automatically recognized words with features designed for automated prediction of query difficulty can serve as a useful basis for predicting when that useful content has been found.
Author: Baichuan Li ; Jing Liu ; Chin-Yew Lin ; Irwin King ; Michael R. Lyu
Abstract: Social media like forums and microblogs have accumulated a huge amount of user generated content (UGC) containing human knowledge. Currently, most of UGC is listed as a whole or in pre-defined categories. This “list-based” approach is simple, but hinders users from browsing and learning knowledge of certain topics effectively. To address this problem, we propose a hierarchical entity-based approach for structuralizing UGC in social media. By using a large-scale entity repository, we design a three-step framework to organize UGC in a novel hierarchical structure called “cluster entity tree (CET)”. With Yahoo! Answers as a test case, we conduct experiments and the results show the effectiveness of our framework in constructing CET. We further evaluate the performance of CET on UGC organization in both user and system aspects. From a user aspect, our user study demonstrates that, with CET-based structure, users perform significantly better in knowledge learning than using traditional list-based approach. From a system aspect, CET substantially boosts the performance of two information retrieval models (i.e., vector space model and query likelihood language model).
5 0.81493711 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
6 0.80286735 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs
7 0.8021813 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
8 0.80151451 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
9 0.79979658 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
10 0.79545844 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves
11 0.79534388 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
12 0.79508412 160 emnlp-2013-Relational Inference for Wikification
13 0.79279464 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
14 0.79229826 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
15 0.78849518 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
16 0.78644758 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
17 0.78414899 143 emnlp-2013-Open Domain Targeted Sentiment
18 0.78410298 65 emnlp-2013-Document Summarization via Guided Sentence Compression
19 0.78335589 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
20 0.78226334 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes