emnlp emnlp2013 emnlp2013-200 knowledge-graph by maker-knowledge-mining

200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems


Source: pdf

Author: Julien Gaillard ; Marc El-Beze ; Eitan Altman ; Emmanuel Ethis

Abstract: Recommendation systems (RS) take advantage ofproducts and users information in order to propose items to consumers. Collaborative, content-based and a few hybrid RS have been developed in the past. In contrast, we propose a new domain-independent semantic RS. By providing textually well-argued recommendations, we aim to give more responsibility to the end user in his decision. The system includes a new similarity measure keeping up both the accuracy of rating predictions and coverage. We propose an innovative way to apply a fast adaptation scheme at a semantic level, providing recommendations and arguments in phase with the very recent past. We have performed several experiments on films data, providing textually well-argued recommendations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Well-argued recommendation: adaptive models based on words in recommender systems Julien Gaillard Marc El-Beze University of Avignon University of Avignon Agorantic Agorantic Avignon, France Avignon, France julien. [sent-1, score-0.352]

2 fr Abstract Recommendation systems (RS) take advantage ofproducts and users information in order to propose items to consumers. [sent-5, score-0.308]

3 By providing textually well-argued recommendations, we aim to give more responsibility to the end user in his decision. [sent-8, score-0.405]

4 The system includes a new similarity measure keeping up both the accuracy of rating predictions and coverage. [sent-9, score-0.373]

5 We propose an innovative way to apply a fast adaptation scheme at a semantic level, providing recommendations and arguments in phase with the very recent past. [sent-10, score-0.46]

6 We have performed several experiments on films data, providing textually well-argued recommendations. [sent-11, score-0.187]

7 1 Introduction Recommender systems aim at suggesting appropriate items to users from a large catalog of products. [sent-12, score-0.308]

8 Those systems are individually adapted by using a specific profile for each user and item, derived from the analysis of past ratings. [sent-13, score-0.251]

9 Nevertheless, after a few bad recommendations, users will not be convinced anymore by the RS. [sent-16, score-0.165]

10 To answer these key issues, we have designed a new semantic recommender sytem (SRS) including at least two innovative features: Eitan Altman Emmanuel Ethis INRIA Sophia Antipolis University of Avignon Agorantic Agorantic Sophia-Antipolis, France Avignon, France eitan. [sent-19, score-0.351]

11 fr • • Argumentation: each recommendation relies on gaunmd comes along whi rthe a mtexmteunald argumentation, providing the reasons that led to that recommendation. [sent-23, score-0.255]

12 Fast adaptation: the system is updated in a contFinasuto audsa way, as :e tahceh s new mrev isiue wpd isa posted. [sent-24, score-0.05]

13 In doing so, the system will be perceived as less intrusive thanks to well-chosen words and its failures will be smoothed over. [sent-25, score-0.029]

14 It is therefore necessary to design a new generation of RS providing textually well-argued recommendations. [sent-26, score-0.187]

15 This way, the end user will have more elements to make a wellinformed choice. [sent-27, score-0.218]

16 Moreover, the system parameters have to be dynamically and continuously updated, in order to provide recommendations and arguments in phase with the very recent past. [sent-28, score-0.194]

17 In the next section, we present the state of the art in recommendation systems and introduce some of the improvements we have made. [sent-33, score-0.188]

18 We describe the evaluation protocol and how we have performed some experiments in section 4. [sent-35, score-0.082]

19 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is9t4ic3s–1947, of users, generally user ratings on items (Burke, 2007; Sarwar et al. [sent-40, score-0.486]

20 In these systems, the following assumption is made: if user a and user b rate n items similarly, they will rate other items in the same way (Deshpande and Karypis. [sent-42, score-0.722]

21 e when new items or users appear, it is impossible to make a recommendation, due to the absence of rating data (Schein et al. [sent-45, score-0.58]

22 In Content Based Filtering (CBF) systems, users are supposed to be independent (Mehta et al. [sent-48, score-0.165]

23 Hence for a given user, recommendations rely only on items he previously rated. [sent-50, score-0.247]

24 Generally, they apply a conceptbased approach to enhance the user modeling stage and employ standard vocabularies and ontology resources. [sent-52, score-0.248]

25 For instance, ePaper (scientific-paper recommender), computes the matching between the concepts constituting user interests and the concepts describing an item by using hierarchical relationships of domain concepts (Maidel et al. [sent-53, score-0.512]

26 Codina and Ceccaroni (2010) propose to take advantage of semantics by using an interest-prediction method based on user ratings and browsing events. [sent-55, score-0.343]

27 However, none of them are actually based on the user opinion as it is expressed in natural language. [sent-56, score-0.248]

28 Resnick (1997) was one of the first to introduce the Pearson correlation coefficient to derive a similarity measure between two entities. [sent-60, score-0.091]

29 Other similarity measures such as Jaccard and Cosine have been proposed (Meyer, 2012). [sent-61, score-0.091]

30 Let Su be the set of items rated by u, Ti the set of users who have rated item i, ru,i the rating of user u on item i and rx the mean of x (user or item). [sent-62, score-1.247]

31 PEA(i,j) stands for the Pearson similarity between items iand j and is computed as follows: √∑u∈T∑i∩uT∈j(Tiru∩T,ij−(ru r,i )2−∑ riu)∈(rTui∩,jT−j(r ruj,)j− rj)2 (1) In √the∑ ∑remainder, the Pears∑on similarity measure will be used as a baseline. [sent-63, score-0.387]

32 The Manhattan Weighted and Corrected similarity (MWC), that we introduced in (Gaillard et al. [sent-64, score-0.091]

33 Again, for none of them, textual content is taken into account. [sent-66, score-0.067]

34 2 Rating prediction Let ibe a given item and u a given user. [sent-68, score-0.217]

35 Indeed, most of social networks do not allow multiple ratings by the same user for one item. [sent-70, score-0.343]

36 In this framework, two rating prediction methods have to be defined: one user oriented and the other item oriented. [sent-71, score-0.676]

37 Sim stands for some ×× similarity function in the following formula. [sent-72, score-0.12]

38 rating(u,i) =∑v∑∈Tv∈iSTiim|S(ium,(vu), ×v) r|v,i (2) A symmetrical formula∑ ∑for items rating(i, u) is derived from and combined with (2). [sent-73, score-0.143]

39 In formula (2), Sim can be replaced by several similarity such as Pearson, Cosine or MWC similarity (Tan et al. [sent-75, score-0.211]

40 All these methods provide a measurement of the likeness between two objects. [sent-77, score-0.03]

41 We then conclude if two users (or items) are ”alike” or not. [sent-78, score-0.165]

42 If two users rate the same movies with equals ratings, then these similarities will be maximal. [sent-80, score-0.165]

43 However, they may have rated identically but for completely different reasons, making them not alike at all. [sent-81, score-0.147]

44 Moreover, none of these similarity measures can express why two users or items are similar. [sent-82, score-0.429]

45 This is due to the fact that they rely on ratings only. [sent-83, score-0.125]

46 1 New similarity based on words We propose a new similarity method, taking into account words used by users in their past reviews about items. [sent-85, score-0.413]

47 Each user x (or item) has a vocabulary set Vx and each word w in it is associated 1Details on MWC can be found in supplementary material. [sent-87, score-0.275]

48 1944 with a set of ratings Rw,x and an average usage rating rw. [sent-88, score-0.366]

49 Common words and words w associated with very heterogenous ratings Rw,x (i. [sent-90, score-0.125]

50 Nw is the number of items in which the word w appears. [sent-92, score-0.143]

51 Note that Fw has to be updated at each iteration. [sent-95, score-0.05]

52 2 Adaptation An adaptive framework proposed in (Gaillard et al. [sent-97, score-0.088]

53 , 2013) allows the system to have a dynamic adaptation along time, overcoming most of the drawbacks due to the cold-start. [sent-98, score-0.168]

54 The authors have designed a dynamic process following the principle that every update (u, i) needs to be instantly taken into account by the system. [sent-99, score-0.089]

55 We thus reduced the complexity by one degree, keeping our system very wellfitted to dynamic adaptation. [sent-102, score-0.056]

56 3 Textual recommendation The main innovative feature of our proposal is to predict what a user is going to write on an item we recommend. [sent-104, score-0.743]

57 More precisely, we can tell the user why he is expected to like or dislike the recommended item. [sent-105, score-0.322]

58 This is possible thanks to the new similarity measure we have introduced (WBS). [sent-106, score-0.12]

59 To keep it simple, the system takes into account what u has written on other items in the past and what other users have written on item i, by using WBS. [sent-108, score-0.557]

60 The idea consists in extracting what elements of ihave been liked or disliked by other users, and what u generally likes. [sent-109, score-0.03]

61 2More details can be found in the supplementary material. [sent-110, score-0.057]

62 Then, by taking into account the ratings associated with each word, we define two sub-sets Pw and Nw. [sent-112, score-0.158]

63 Pw contains what user u is probably going to like in iand Nw what u may dislike. [sent-113, score-0.279]

64 Finally, we provide the most relevant arguments contained in both Pw and Nw, and each of them is given in the context they have been used for item i. [sent-114, score-0.243]

65 4 Evaluation criteria We present here the evaluation protocol we designed. [sent-117, score-0.082]

66 However, the cornerstone of recommender system is the accuracy of rating predictions (Herlocker et al. [sent-120, score-0.576]

67 From this point of view, one could argue that the quality of a recommender engine could be assessed by its capacity to predict ratings. [sent-122, score-0.264]

68 It is thus possible to evaluate our system comparing the prediction ˆ ru,i for a given pair (u, i), with the actual real rating ru,i. [sent-123, score-0.275]

69 Last but not least, we make the following assumption: if WBS results are as good as MWC’s, the words presented by the system to users as arguments are likely to be relevant. [sent-126, score-0.225]

70 5 Experiments This work has been carried out in partnership with the website Vodkaster 4, a Cinema social network. [sent-127, score-0.03]

71 It is therefore strictly impossible to experiment a SRS on such a dataset. [sent-130, score-0.031]

72 Users post micro-reviews (MR) to express their opinion on a movie and rate it, within a 3Details on metrics are given in the supplementary material. [sent-133, score-0.09]

73 We divided the corpus into three parts, chronologically sorted: training (Tr), development (D) and test (T). [sent-137, score-0.03]

74 Note that in our experiments, the date is taken into account since we also work on dynamic adaptation. [sent-138, score-0.089]

75 2 Results Figure 1 compares four different methods: the classical Pearson (PEA) method that does not allow quick adaptation, the MWC method with and without quick adaptation MNA and ours (WBS). [sent-140, score-0.238]

76 Within the confidence interval, in terms of accuracy, Accuracy as a function of Coverage on DEV Coverage Figure 1: Evolution of accuracy as a function of coverage for PEA, MWC and WBS methods on D corpus. [sent-141, score-0.07]

77 Our word based approach is thus able to offer the arguments outperform5 5Note that the key point here is the comparison of results obtained with the baseline and with the method we propose. [sent-144, score-0.06]

78 Both of them have been evaluated with the same protocol: RMSE is computed with respect to rating predictions above some empirical threshold as done in (Gaillard et al. [sent-145, score-0.282]

79 In Table 2, we set a constant coverage (2000 predictions) in order to be able to compare results obtained with different methods. [sent-148, score-0.036]

80 CI is the radius confidence interval estimated in % on accuracy (Acc. [sent-150, score-0.152]

81 MNA (MWC without adaptation) being better and more easily updated than Pearson (PEA), we have decided to use the adaptive framework only for MWC. [sent-152, score-0.138]

82 Moreover, for Pearson dynamic adaptation, the updating algorithm complexity is increased by one degree. [sent-153, score-0.056]

83 We want to point out that the results are the same for both MWC and WBS methods, within a confidence interval (CI) radius of 1. [sent-154, score-0.152]

84 Example of outputs: The movie Apocalypse Now is recommended to user Theo6 with a rating prediction equal to 4. [sent-157, score-0.57]

85 The data we have does not contain the information on the reaction of the user to the recommendation. [sent-167, score-0.218]

86 In particular, we do not know if the textual argumentation would have been sufficient for convincing Theo6 to see the film. [sent-168, score-0.127]

87 But we know that after seeing it, he put a good rating (4. [sent-169, score-0.241]

88 1946 6 Conclusion and perspectives We have presented an innovative proposal for designing a domain-independent SRS relying on a word based similarity function (WBS), providing textually well-argued recommendations to users. [sent-171, score-0.508]

89 Moreover, this system has been developed in a dynamic and adaptive framework. [sent-172, score-0.144]

90 Acknowledgment The authors would like to thank Vodkaster for providing the data. [sent-176, score-0.067]

91 Evaluation of an ontology-content based filtering method for a personalized newspaper. [sent-226, score-0.057]

92 Using filtering agents to improve prediction quality in the groupLens research collaborative filtering system. [sent-251, score-0.235]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mwc', 0.309), ('wbs', 0.275), ('recommender', 0.264), ('rating', 0.241), ('gaillard', 0.241), ('user', 0.218), ('recommendation', 0.188), ('item', 0.183), ('users', 0.165), ('items', 0.143), ('agorantic', 0.137), ('herlocker', 0.137), ('pea', 0.136), ('avignon', 0.126), ('ratings', 0.125), ('textually', 0.12), ('pearson', 0.113), ('adaptation', 0.112), ('recommendations', 0.104), ('srs', 0.103), ('vodkaster', 0.103), ('similarity', 0.091), ('alike', 0.09), ('argumentation', 0.09), ('mna', 0.09), ('rs', 0.088), ('adaptive', 0.088), ('innovative', 0.087), ('collaborative', 0.087), ('konstan', 0.082), ('protocol', 0.082), ('fw', 0.078), ('vx', 0.076), ('altman', 0.069), ('codina', 0.069), ('maidel', 0.069), ('netflix', 0.069), ('radius', 0.069), ('resnick', 0.069), ('rmse', 0.069), ('sarwar', 0.069), ('schein', 0.069), ('france', 0.067), ('providing', 0.067), ('pw', 0.063), ('arguments', 0.06), ('recsys', 0.06), ('dislike', 0.06), ('deshpande', 0.06), ('filtering', 0.057), ('supplementary', 0.057), ('rated', 0.057), ('nw', 0.057), ('dynamic', 0.056), ('mehta', 0.055), ('updated', 0.05), ('interval', 0.049), ('quick', 0.048), ('bell', 0.044), ('recommended', 0.044), ('predictions', 0.041), ('proposal', 0.039), ('textual', 0.037), ('concepts', 0.037), ('moreover', 0.036), ('coverage', 0.036), ('idf', 0.036), ('confidence', 0.034), ('prediction', 0.034), ('account', 0.033), ('past', 0.033), ('iand', 0.033), ('tan', 0.033), ('movie', 0.033), ('impossible', 0.031), ('remainder', 0.031), ('sim', 0.031), ('classical', 0.03), ('phase', 0.03), ('likeness', 0.03), ('mcnee', 0.03), ('manhattan', 0.03), ('flash', 0.03), ('conceptbased', 0.03), ('shoval', 0.03), ('liked', 0.03), ('cornerstone', 0.03), ('ntot', 0.03), ('vu', 0.03), ('tois', 0.03), ('riedl', 0.03), ('grenoble', 0.03), ('chronologically', 0.03), ('partnership', 0.03), ('none', 0.03), ('stands', 0.029), ('thanks', 0.029), ('formula', 0.029), ('going', 0.028), ('ci', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

Author: Julien Gaillard ; Marc El-Beze ; Eitan Altman ; Emmanuel Ethis

Abstract: Recommendation systems (RS) take advantage ofproducts and users information in order to propose items to consumers. Collaborative, content-based and a few hybrid RS have been developed in the past. In contrast, we propose a new domain-independent semantic RS. By providing textually well-argued recommendations, we aim to give more responsibility to the end user in his decision. The system includes a new similarity measure keeping up both the accuracy of rating predictions and coverage. We propose an innovative way to apply a fast adaptation scheme at a semantic level, providing recommendations and arguments in phase with the very recent past. We have performed several experiments on films data, providing textually well-argued recommendations.

2 0.094740637 28 emnlp-2013-Automated Essay Scoring by Maximizing Human-Machine Agreement

Author: Hongbo Chen ; Ben He

Abstract: Previous approaches for automated essay scoring (AES) learn a rating model by minimizing either the classification, regression, or pairwise classification loss, depending on the learning algorithm used. In this paper, we argue that the current AES systems can be further improved by taking into account the agreement between human and machine raters. To this end, we propose a rankbased approach that utilizes listwise learning to rank algorithms for learning a rating model, where the agreement between the human and machine raters is directly incorporated into the loss function. Various linguistic and statistical features are utilized to facilitate the learning algorithms. Experiments on the publicly available English essay dataset, Automated Student Assessment Prize (ASAP), show that our proposed approach outperforms the state-of-the-art algorithms, and achieves performance comparable to professional human raters, which suggests the effectiveness of our proposed method for automated essay scoring.

3 0.081546023 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

Author: Qiming Diao ; Jing Jiang

Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.

4 0.078909166 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

Author: Baichuan Li ; Jing Liu ; Chin-Yew Lin ; Irwin King ; Michael R. Lyu

Abstract: Social media like forums and microblogs have accumulated a huge amount of user generated content (UGC) containing human knowledge. Currently, most of UGC is listed as a whole or in pre-defined categories. This “list-based” approach is simple, but hinders users from browsing and learning knowledge of certain topics effectively. To address this problem, we propose a hierarchical entity-based approach for structuralizing UGC in social media. By using a large-scale entity repository, we design a three-step framework to organize UGC in a novel hierarchical structure called “cluster entity tree (CET)”. With Yahoo! Answers as a test case, we conduct experiments and the results show the effectiveness of our framework in constructing CET. We further evaluate the performance of CET on UGC organization in both user and system aspects. From a user aspect, our user study demonstrates that, with CET-based structure, users perform significantly better in knowledge learning than using traditional list-based approach. From a system aspect, CET substantially boosts the performance of two information retrieval models (i.e., vector space model and query likelihood language model).

5 0.077354722 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts

Author: Morgane Ciot ; Morgan Sonderegger ; Derek Ruths

Abstract: While much work has considered the problem of latent attribute inference for users of social media such as Twitter, little has been done on non-English-based content and users. Here, we conduct the first assessment of latent attribute inference in languages beyond English, focusing on gender inference. We find that the gender inference problem in quite diverse languages can be addressed using existing machinery. Further, accuracy gains can be made by taking language-specific features into account. We identify languages with complex orthography, such as Japanese, as difficult for existing methods, suggesting a valuable direction for future research.

6 0.060837645 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

7 0.059328146 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

8 0.056898437 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

9 0.051190197 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

10 0.048862047 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

11 0.047669243 155 emnlp-2013-Question Difficulty Estimation in Community Question Answering Services

12 0.047510948 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

13 0.043783132 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

14 0.043552559 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

15 0.042215783 4 emnlp-2013-A Dataset for Research on Short-Text Conversations

16 0.041527875 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

17 0.040659133 41 emnlp-2013-Building Event Threads out of Multiple News Articles

18 0.03858703 24 emnlp-2013-Application of Localized Similarity for Web Documents

19 0.038493425 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

20 0.038322244 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.13), (1, 0.045), (2, -0.076), (3, 0.001), (4, 0.032), (5, -0.01), (6, 0.029), (7, 0.049), (8, 0.044), (9, -0.039), (10, -0.08), (11, 0.07), (12, -0.022), (13, -0.063), (14, 0.07), (15, 0.005), (16, -0.003), (17, 0.027), (18, -0.058), (19, -0.011), (20, -0.043), (21, -0.001), (22, 0.005), (23, 0.015), (24, -0.014), (25, 0.021), (26, -0.119), (27, 0.005), (28, -0.155), (29, 0.048), (30, -0.1), (31, -0.03), (32, 0.025), (33, -0.021), (34, -0.059), (35, 0.075), (36, -0.115), (37, 0.115), (38, -0.121), (39, -0.091), (40, 0.172), (41, -0.188), (42, 0.048), (43, -0.06), (44, -0.073), (45, 0.039), (46, 0.075), (47, 0.065), (48, 0.121), (49, 0.137)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9561184 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

Author: Julien Gaillard ; Marc El-Beze ; Eitan Altman ; Emmanuel Ethis

Abstract: Recommendation systems (RS) take advantage ofproducts and users information in order to propose items to consumers. Collaborative, content-based and a few hybrid RS have been developed in the past. In contrast, we propose a new domain-independent semantic RS. By providing textually well-argued recommendations, we aim to give more responsibility to the end user in his decision. The system includes a new similarity measure keeping up both the accuracy of rating predictions and coverage. We propose an innovative way to apply a fast adaptation scheme at a semantic level, providing recommendations and arguments in phase with the very recent past. We have performed several experiments on films data, providing textually well-argued recommendations.

2 0.71641636 28 emnlp-2013-Automated Essay Scoring by Maximizing Human-Machine Agreement

Author: Hongbo Chen ; Ben He

Abstract: Previous approaches for automated essay scoring (AES) learn a rating model by minimizing either the classification, regression, or pairwise classification loss, depending on the learning algorithm used. In this paper, we argue that the current AES systems can be further improved by taking into account the agreement between human and machine raters. To this end, we propose a rankbased approach that utilizes listwise learning to rank algorithms for learning a rating model, where the agreement between the human and machine raters is directly incorporated into the loss function. Various linguistic and statistical features are utilized to facilitate the learning algorithms. Experiments on the publicly available English essay dataset, Automated Student Assessment Prize (ASAP), show that our proposed approach outperforms the state-of-the-art algorithms, and achieves performance comparable to professional human raters, which suggests the effectiveness of our proposed method for automated essay scoring.

3 0.46667615 199 emnlp-2013-Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students

Author: Philip Resnik ; Anderson Garron ; Rebecca Resnik

Abstract: in College Students Anderson Garron University of Maryland College Park, MD 20742 agarron@cs.umd.edu Rebecca Resnik Mindwell Psychology Bethesda 5602 Shields Drive Bethesda, MD 20817 drrebeccaresnik@gmail.com out adequate insurance or in rural areas – cannot ac- We investigate the value-add of topic modeling in text analysis for depression, and for neuroticism as a strongly associated personality measure. Using Pennebaker’s Linguistic Inquiry and Word Count (LIWC) lexicon to provide baseline features, we show that straightforward topic modeling using Latent Dirichlet Allocation (LDA) yields interpretable, psychologically relevant “themes” that add value in prediction of clinical assessments.

4 0.44933209 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts

Author: Morgane Ciot ; Morgan Sonderegger ; Derek Ruths

Abstract: While much work has considered the problem of latent attribute inference for users of social media such as Twitter, little has been done on non-English-based content and users. Here, we conduct the first assessment of latent attribute inference in languages beyond English, focusing on gender inference. We find that the gender inference problem in quite diverse languages can be addressed using existing machinery. Further, accuracy gains can be made by taking language-specific features into account. We identify languages with complex orthography, such as Japanese, as difficult for existing methods, suggesting a valuable direction for future research.

5 0.4396365 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

Author: Mikhail Ageev ; Dmitry Lagun ; Eugene Agichtein

Abstract: Passage retrieval is a crucial first step of automatic Question Answering (QA). While existing passage retrieval algorithms are effective at selecting document passages most similar to the question, or those that contain the expected answer types, they do not take into account which parts of the document the searchers actually found useful. We propose, to the best of our knowledge, the first successful attempt to incorporate searcher examination data into passage retrieval for question answering. Specifically, we exploit detailed examination data, such as mouse cursor movements and scrolling, to infer the parts of the document the searcher found interesting, and then incorporate this signal into passage retrieval for QA. Our extensive experiments and analysis demonstrate that our method significantly improves passage retrieval, compared to using textual features alone. As an additional contribution, we make available to the research community the code and the search behavior data used in this study, with the hope of encouraging further research in this area.

6 0.43380338 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

7 0.39888811 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

8 0.38959605 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

9 0.38280812 155 emnlp-2013-Question Difficulty Estimation in Community Question Answering Services

10 0.38079593 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

11 0.36156759 123 emnlp-2013-Learning to Rank Lexical Substitutions

12 0.35596311 184 emnlp-2013-This Text Has the Scent of Starbucks: A Laplacian Structured Sparsity Model for Computational Branding Analytics

13 0.31485319 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

14 0.30780455 202 emnlp-2013-Where Not to Eat? Improving Public Policy by Predicting Hygiene Inspections Using Online Reviews

15 0.3072286 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

16 0.30588654 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

17 0.2929911 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

18 0.27500933 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri

19 0.26407981 94 emnlp-2013-Identifying Manipulated Offerings on Review Portals

20 0.26053667 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.028), (18, 0.039), (22, 0.045), (30, 0.064), (50, 0.011), (51, 0.145), (66, 0.025), (69, 0.404), (71, 0.037), (75, 0.02), (90, 0.016), (96, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7379902 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

Author: Julien Gaillard ; Marc El-Beze ; Eitan Altman ; Emmanuel Ethis

Abstract: Recommendation systems (RS) take advantage ofproducts and users information in order to propose items to consumers. Collaborative, content-based and a few hybrid RS have been developed in the past. In contrast, we propose a new domain-independent semantic RS. By providing textually well-argued recommendations, we aim to give more responsibility to the end user in his decision. The system includes a new similarity measure keeping up both the accuracy of rating predictions and coverage. We propose an innovative way to apply a fast adaptation scheme at a semantic level, providing recommendations and arguments in phase with the very recent past. We have performed several experiments on films data, providing textually well-argued recommendations.

2 0.6967749 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

Author: Kai Zhao ; James Cross ; Liang Huang

Abstract: We present the first provably optimal polynomial time dynamic programming (DP) algorithm for best-first shift-reduce parsing, which applies the DP idea of Huang and Sagae (2010) to the best-first parser of Sagae and Lavie (2006) in a non-trivial way, reducing the complexity of the latter from exponential to polynomial. We prove the correctness of our algorithm rigorously. Experiments confirm that DP leads to a significant speedup on a probablistic best-first shift-reduce parser, and makes exact search under such a model tractable for the first time.

3 0.42155963 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

Author: Baichuan Li ; Jing Liu ; Chin-Yew Lin ; Irwin King ; Michael R. Lyu

Abstract: Social media like forums and microblogs have accumulated a huge amount of user generated content (UGC) containing human knowledge. Currently, most of UGC is listed as a whole or in pre-defined categories. This “list-based” approach is simple, but hinders users from browsing and learning knowledge of certain topics effectively. To address this problem, we propose a hierarchical entity-based approach for structuralizing UGC in social media. By using a large-scale entity repository, we design a three-step framework to organize UGC in a novel hierarchical structure called “cluster entity tree (CET)”. With Yahoo! Answers as a test case, we conduct experiments and the results show the effectiveness of our framework in constructing CET. We further evaluate the performance of CET on UGC organization in both user and system aspects. From a user aspect, our user study demonstrates that, with CET-based structure, users perform significantly better in knowledge learning than using traditional list-based approach. From a system aspect, CET substantially boosts the performance of two information retrieval models (i.e., vector space model and query likelihood language model).

4 0.41947421 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

Author: Qiming Diao ; Jing Jiang

Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.

5 0.41590324 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou

Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1

6 0.41543427 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations

7 0.40920368 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

8 0.40918633 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves

9 0.40565473 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

10 0.4049463 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

11 0.40386355 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

12 0.40153581 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

13 0.40104976 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

14 0.4006601 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution

15 0.39939636 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution

16 0.39902726 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

17 0.39817354 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

18 0.39816344 143 emnlp-2013-Open Domain Targeted Sentiment

19 0.39700356 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

20 0.39572567 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts