emnlp emnlp2013 emnlp2013-133 knowledge-graph by maker-knowledge-mining

133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression


Source: pdf

Author: James Foulds ; Padhraic Smyth

Abstract: When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. [sent-3, score-0.721]

2 In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. [sent-4, score-1.042]

3 Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. [sent-5, score-2.037]

4 When we are first introduced to a new area of scientific study, it would be useful to automatically find the most important articles, and the relationships of influence between articles. [sent-9, score-0.603]

5 The importance of a scientific work has previously been measured chiefly through metrics derived from citation counts, such as impact factors. [sent-12, score-0.617]

6 In this work we introduce topical influence, a quantitative metric for measuring the latter type of scientific influence, defined in the context of an unsupervised generative model for scientific corpora. [sent-18, score-0.834]

7 The model posits that articles “coerce” the articles that cite them into having similar topical content to them. [sent-19, score-0.929]

8 Thus, articles with higher topical influence have a larger effect on the topics of the articles that cite them. [sent-20, score-1.395]

9 We model this influence mechanism via a regression on the parameters of the Dirichlet prior over topics in an LDA-style topic model. [sent-21, score-0.678]

10 We show how the models can be used to recover meaningful influence scores, both for articles and for specific citations. [sent-22, score-0.61]

11 By looking not just at the citation graph but also taking into account the content of the articles, topical influence can provide a better picture of scientific impact than simple citation counts. [sent-23, score-1.799]

12 Measures of importance such as PageRank (Brin and Page, 1998) can be derived recursively from the citation graph. [sent-30, score-0.4]

13 Such graphbased measures do not in general make use of the textual content of the articles, although it is possible to apply them to graphs where the edges between articles are determined based on the similarity of their content instead of the citation graph (Lin, 2008). [sent-31, score-0.637]

14 A variety of methods have previously been proposed for analyzing text and citation links together, such as modeling connections between words and citations Cohn and Hofmann (2001), classifying citation function (Teufel et al. [sent-32, score-0.858]

15 , 2006), and jointly modeling citation links and document content (Chang and Blei, 2009). [sent-33, score-0.426]

16 However, these methods do not directly measure article importance or influence relationships between articles given their citations. [sent-34, score-0.858]

17 1 In their approach, every word is assigned an extra latent variable, namely the cited article whose topic distribution the topic was drawn from. [sent-38, score-0.561]

18 (2007) also assume that the citation graph is bipartite, consist- ing of one set of citing articles and one set of cited articles—in contrast, our proposed models can handle arbitrary citation graphs in the form of directed 1A somewhat He et al. [sent-41, score-1.264]

19 While both the CIM and our approach can identify the influence of specific citations between articles, our model can also infer how influential each article is overall, and provides a flexible modeling framework which can handle different assumptions about influence. [sent-43, score-0.842]

20 In their model, citing articles “vote” on each cited article’s topic distribution in retrospect, via a network flow model. [sent-47, score-0.591]

21 Since this voting occurs in time-reversed order, it does not describe an influence mechanism and is not a generative model that can simulate or predict new documents. [sent-48, score-0.443]

22 Finally, the document influence model of Gerrish and Blei (2010) can be viewed as orthogonal to this work, in that it models the impact of documents on topics over time (specifically, how topics change over time) rather than how articles influence the specific articles that cite them. [sent-49, score-1.561]

23 Although there are many aspects by which the importance of a scientific article can be judged, in this work we are interested in the extent to which a given article has or will have subsequent articles that build upon it or are otherwise inspired by its ideas. [sent-52, score-0.831]

24 We begin by defining topical influence, a quantitative measure for this type of influence. [sent-53, score-0.461]

25 The presence of a citation from article b to article a therefore indicates that article b may have been influenced by the ideas in article a, to some unknown extent. [sent-57, score-1.245]

26 We hypothesize that the extent ofthis influence manifests itself in the language of b. [sent-58, score-0.393]

27 Using latent Dirichlet allocation (LDA) topics as a concrete proxy for the vague notion of “ideas”, we define the topical influence of a to be the extent to which article a coerces the documents which cite it to have similar topic distributions to it. [sent-59, score-1.344]

28 Topical influence will be made precise in the context of a generative model for scientific corpora, conditioned on the citation graph, called topical influence regression (TIR). [sent-60, score-1.859]

29 In our case, we want to model the influence that a document has on the topic distributions of the documents that cite it. [sent-71, score-0.675]

30 n(d) nk(d) l(d) Let be number of words in article d, be the number of words assigned to topic k, and let C(d) be the set of articles that d cites. [sent-74, score-0.474]

31 Since the ’s sum to one, the topical influence of article c can be interpreted as the number of words of precision that it adds to the prior of the topic distributions of each document that cites it. [sent-79, score-1.282]

32 Thus, encodes the degree to which article c influences the topics of each of the articles that cite it. [sent-81, score-0.649]

33 From another perspective, marginalizing out we can view the topic counts (in the standard LDA l(c) θ(d), 115 Articles that d cites Figure 1: The graphical model for the portion of the TIR model connected to article a (the links from the z’s and l’s to the α(d) ’s are deterministic). [sent-82, score-0.403]

34 In our model, for each article c cited by article d we place balls, with colors distributed according to ¯ z(c) , into article d’s urn initially. [sent-93, score-0.824]

35 Thus, article d’s topic assignments are more likely to be similar to those of the more influential articles that it cites. [sent-94, score-0.621]

36 (2) measures the total impact (in a topical sense) of the article. [sent-101, score-0.501]

37 2 Generative Model for Topical Influence Regression The full assumed generative process for articles in this model begins with a directed acyclic citation graph G = {V, E}. [sent-104, score-0.61]

38 We assume that G is a DAG so that influence relationships are consistent with some temporal ordering of the articles, and so that the resulting model is a Bayesian network. [sent-107, score-0.429]

39 Here, each vertex vi corresponds to an article di, edge e = (v1, v2) ∈ E IFF d1 is cited by d2, and vertices (articles) are n ∈um Eb IeFreFd d in a topological ordering with respect to G. [sent-108, score-0.436]

40 Note that each is a function of the topics of the documents that it cites, parameterized by their topical influence values. [sent-111, score-0.95]

41 We therefore call this model topical influence regression (TIR). [sent-112, score-0.902]

42 The TIR model provides us with topical influence scores for each article, but it does not tell us w(d) α(d) about topical influence relationships between specific pairs of cited and citing articles. [sent-113, score-1.973]

43 To model such relationships, we can consider a hierarchical extension to TIR, with edge-wise topical influences for each edge (c, d) of the citation graph, l(c,d) ∼ TruncGaussian(l(c) , σ, l(c,d) ≥ 0). [sent-114, score-0.896]

44 (3) c∈XC(d) This hierarchical setup allows us to continue to infer article-level topical influences, and provides a mechanism for sharing statistical strength between influences associated with one cited article. [sent-116, score-0.699]

45 3 Relationship to Dirichlet-Multinomial Regression The TIR model can be viewed as an adaption of the Dirichlet-multinomial regression (DMR) framework of Mimno and McCallum (2008) to model topical influence. [sent-119, score-0.509]

46 The DMR model can also be applied to text corpora with citation information, by setting the feature vectors to be binary indicators of the presence of a citation to each article. [sent-121, score-0.73]

47 TIR differs in that the functional form of the regression is parameterized in a way that directly models influence, and also differs in that the regression takes advantage of the content of the cited articles via their topic assignments. [sent-122, score-0.629]

48 Because an article’s prior over topic distributions depends on the topic assignments of the articles that it cites, TIR induces a network of dependencies between the topic assignments of the documents. [sent-123, score-0.547]

49 Specifically, if we collapse out the dependencies between the z’s of each document form a Bayesian network whose graph is the citation graph. [sent-124, score-0.484]

50 To illustrate this, Figure 2 shows an example citation graph and the resulting Bayesian network. [sent-126, score-0.393]

51 In the figure, an edge in (a) from c to d corresponds x(d) Θ, to a citation of c by d. [sent-127, score-0.397]

52 Conditioned on the topics, the dependence relationships between z nodes in (b) follow the same structure as the citation graph. [sent-128, score-0.401]

53 In the case ofP PTIR, in the collapsed model the full conditional posterior for the topical influence values lis Pr(l|z, λ) ∝ Pr(z|l)Pr(l|λ). [sent-144, score-0.846]

54 The topical influence vQalues lcan be sampled using Metropolis-Hastings updates, or slice sampling. [sent-146, score-0.824]

55 The derivative of the log-likelihood with respect to the topical influence of article a is l(a) dPdlr((az)|l)=d:aX∈C(d)? [sent-149, score-1.026]

56 The corpora both contained a small number (53, and 14, respectively) of citation graph loops due to insider knowledge of simultaneous publications. [sent-164, score-0.422]

57 This corresponds to only transmitting influence information downward in the citation DAG, but not transmitting “reverse influence” information upwards. [sent-167, score-0.816]

58 Preliminary experiments on synthetic data indicated that this did not significantly impact the ability of the model to recover the topical influence weights. [sent-168, score-0.892]

59 118 Figure 3: Topical influence per edge versus number of times cited by the citing article (NIPS). [sent-182, score-0.916]

60 With this in mind, we explore how topical influence scores relate to document metadata, which serves as a proxy for ground truth. [sent-188, score-0.885]

61 In many cases, if article c is repeatedly cited in the text of article d it may indicate that d builds heavily on c. [sent-189, score-0.579]

62 Overall, the “most influential” references were cited 171 times in the text of their citing articles, while the “least influential” references were cited 128 times. [sent-191, score-0.464]

63 Of the 45 articles where the counts were not tied, the most influential references had the higher citation counts 33 times. [sent-192, score-0.746]

64 A sign test rejects the null hypothesis that the median difference in citation counts between least and most influential references is zero at α = 0. [sent-193, score-0.519]

65 Authors often build upon their own work, so we would expect self-citations to have higher edge-wise topical influence on average. [sent-197, score-0.85]

66 For ACL the mean topical influence for a self citation edge is 2. [sent-198, score-1.253]

67 We selected roughly 10% of the articles in each corpus (170 and 330 documents for NIPS and ACL, respectively) for testing, chosen among the articles that made at least one citation. [sent-208, score-0.411]

68 For each algorithm, we burned in for 250 iterations, then executed 1000 iterations, optimizing topical influence weights/DMR parameters every 10th iteration. [sent-215, score-0.824]

69 ad superior predictive performance to LDA on these corpora, demonstrating that topical influence has predictive value (Table 1). [sent-218, score-0.824]

70 3 Exploring Topical Influence In this section we explore the inferred topical influence scores , total topical influence scores and edgewise topical influence scores (recall their definitions in Equations 1, 2 and 3, respectively). [sent-222, score-2.539]

71 Table 2 shows the most influential articles in the ACL corpus, according to citation counts, topical influence and total topical influence (the latter two inferred with the TIR model). [sent-223, score-2.391]

72 However, the BLEU article has a relatively low topical influence value of 0. [sent-227, score-1.026]

73 We emphasize that topical influence measures a specific dimension of scientific importance, namely the tendency of an article to influence the ideas (as mediated by the topics) of citing articles; papers with low topical influence such as the BLEU article may be important for other reasons. [sent-229, score-2.819]

74 l(d) l(c,d) T(d) l(d) Ranking papers by their influence weights (Table 2, middle) has the opposite difficulty to ranking by citation counts the papers with the highest topical influence were typically cited only once, by the same authors. [sent-230, score-1.888]

75 A more useful metric, however, is the total topical influence (the bottom sub-table in Table 2). [sent-233, score-0.851]

76 This is the total number of words of prior concentration, summed over all of its citers, that the article has contributed, and is a measure of the total corpus-wide topical influence of the paper. [sent-234, score-1.109]

77 The ACL paper with the highest total topical influence, by David Chiang, won the ACL best paper award in 2005. [sent-236, score-0.458]

78 Figure 4: Topical influence for self and non-self citation edges. [sent-239, score-0.79]

79 ” Thus, although it is has an important role as a landmark neural network success story, it does not score highly in terms of topical influence. [sent-249, score-0.461]

80 This paper is ranked 13th according to total topical influence, with a score of 1. [sent-250, score-0.458]

81 The top tworanked papers according to total topical influence, on Gaussian Process Regression and POMDPs respectively, were both seminal papers that spawned large bodies of related work. [sent-252, score-0.554]

82 It is only referenced three times, but has a very high topical influence of 19. [sent-255, score-0.853]

83 Although it is easy 120 to see why this paper scores highly on topical influence, in this case the metric has perhaps overstated its importance. [sent-258, score-0.431]

84 A limitation of topical influ- ence is that it can potentially give more credit than is due when an article is cited by a small number of topically similar papers, due to overfitting. [sent-259, score-0.842]

85 Using the TIRE model, we can also look at influence relationships between pairs of articles. [sent-262, score-0.429]

86 Tables 4 and 5 show the most and least topically influential references, and the most and least influenced citing papers, for three example articles from ACL and NIPS, respectively. [sent-263, score-0.493]

87 The model correctly assigns higher influence scores along the edges to and from relevant documents. [sent-264, score-0.42]

88 For the ACL papers, the BLEU algorithm’s article is inferred to have zero topical influence on Chiang’s paper, consistent with its role × l(d) inferred by TIR (middle), and total topical influence T(d) inferred by TIR (bottom). [sent-265, score-1.997]

89 For total topical influence, the breakdown of T(d) = l(d) citation count is shown in parentheses. [sent-266, score-0.823]

90 by TIR (middle), and total topical influence T(d) inferred by TIR (bottom). [sent-267, score-0.891]

91 Table 5: Least and most influential references and citers, and the influence weights along these edges, inferred by the TIRE model for three example NIPS articles. [sent-374, score-0.552]

92 In the NIPS corpus, the article by Bengio and Frasconi, on recurrent neural network architectures, extends previous work by the same authors, which is correctly assigned the highest topical influence. [sent-377, score-0.663]

93 A particularly interesting case is the paper by Dayan and Hinton, which is heavily influenced by a paper by Moore, and in turn strongly influences a later paper by Moore, thus illustrating the interplay of scientific influence between authors along the citation graph. [sent-378, score-1.034]

94 6 Conclusions / Discussion This paper introduced the notion of topical influence, a quantitative measure of scientific impact which arises from a latent variable model called topical influence regression. [sent-380, score-1.526]

95 The model builds upon the ideas of Dirichlet-multinomial regression to encode influence relationships between articles along the ci- tation graph. [sent-381, score-0.763]

96 By training TIR, we can recover topical influence scores that give us insight into the impact of scientific articles. [sent-382, score-1.066]

97 In future work, the proposed framework could readily be extended to model other aspects of scientific influence, such as the effects of authors and journals on topical influence, and to exploit the con122 text in which citations occur. [sent-384, score-0.733]

98 From an exploratory analysis perspective, it would be instructive to compare topical influence trajectories over time for different papers. [sent-385, score-0.824]

99 This could be further facilitated by explicitly modeling the dynamics of each article’s topical influence score. [sent-386, score-0.824]

100 The TIR framework could potentially also be applicable to other application domains such as modeling how interpersonal influence affects the spread of memes via social media. [sent-387, score-0.393]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('topical', 0.431), ('influence', 0.393), ('citation', 0.365), ('tir', 0.346), ('article', 0.202), ('articles', 0.192), ('cited', 0.175), ('scientific', 0.174), ('citations', 0.128), ('influential', 0.119), ('dmr', 0.115), ('citing', 0.114), ('cite', 0.114), ('tire', 0.1), ('cites', 0.086), ('topic', 0.08), ('polya', 0.08), ('regression', 0.078), ('topics', 0.073), ('influences', 0.068), ('xc', 0.067), ('lda', 0.061), ('document', 0.061), ('nips', 0.059), ('dietz', 0.058), ('mimno', 0.048), ('papers', 0.048), ('zk', 0.046), ('cim', 0.043), ('shaparenko', 0.043), ('kd', 0.043), ('urn', 0.043), ('impact', 0.043), ('inferred', 0.04), ('ideas', 0.038), ('balls', 0.038), ('cun', 0.038), ('relationships', 0.036), ('counts', 0.035), ('wallach', 0.035), ('pr', 0.035), ('importance', 0.035), ('influenced', 0.034), ('gerrish', 0.034), ('venue', 0.034), ('topically', 0.034), ('self', 0.032), ('edge', 0.032), ('dirichlet', 0.032), ('teufel', 0.03), ('truncated', 0.03), ('brin', 0.03), ('network', 0.03), ('blei', 0.03), ('quantitative', 0.03), ('prior', 0.029), ('xk', 0.029), ('annealed', 0.029), ('citers', 0.029), ('eoas', 0.029), ('foulds', 0.029), ('insider', 0.029), ('lmeoas', 0.029), ('predictively', 0.029), ('referenced', 0.029), ('smyth', 0.029), ('tciea', 0.029), ('topicflow', 0.029), ('transmitting', 0.029), ('xkc', 0.029), ('griffiths', 0.028), ('graph', 0.028), ('zi', 0.028), ('assignments', 0.028), ('documents', 0.027), ('topological', 0.027), ('nallapati', 0.027), ('gibbs', 0.027), ('edges', 0.027), ('total', 0.027), ('dl', 0.026), ('parameterized', 0.026), ('upon', 0.026), ('graphs', 0.025), ('generative', 0.025), ('bleu', 0.025), ('held', 0.025), ('mechanism', 0.025), ('recover', 0.025), ('nk', 0.024), ('latent', 0.024), ('likelihood', 0.024), ('tools', 0.024), ('color', 0.024), ('az', 0.023), ('lse', 0.023), ('flu', 0.023), ('handwritten', 0.023), ('cohn', 0.022), ('collapsed', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

Author: James Foulds ; Padhraic Smyth

Abstract: When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach.

2 0.15059802 24 emnlp-2013-Application of Localized Similarity for Web Documents

Author: Peter Rebersek ; Mateja Verlic

Abstract: In this paper we present a novel approach to automatic creation of anchor texts for hyperlinks in a document pointing to similar documents. Methods used in this approach rank parts of a document based on the similarity to a presumably related document. Ranks are then used to automatically construct the best anchor text for a link inside original document to the compared document. A number of different methods from information retrieval and natural language processing are adapted for this task. Automatically constructed anchor texts are manually evaluated in terms of relatedness to linked documents and compared to baseline consisting of originally inserted anchor texts. Additionally we use crowdsourcing for evaluation of original anchors and au- tomatically constructed anchors. Results show that our best adapted methods rival the precision of the baseline method.

3 0.12005487 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

4 0.088414915 5 emnlp-2013-A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task

Author: Maria Liakata ; Simon Dobnik ; Shyamasree Saha ; Colin Batchelor ; Dietrich Rebholz-Schuhmann

Abstract: We present a method which exploits automatically generated scientific discourse annotations to create a content model for the summarisation of scientific articles. Full papers are first automatically annotated using the CoreSC scheme, which captures 11 contentbased concepts such as Hypothesis, Result, Conclusion etc at the sentence level. A content model which follows the sequence of CoreSC categories observed in abstracts is used to provide the skeleton of the summary, making a distinction between dependent and independent categories. Summary creation is also guided by the distribution of CoreSC categories found in the full articles, in order to adequately represent the article content. Fi- nally, we demonstrate the usefulness of the summaries by evaluating them in a complex question answering task. Results are very encouraging as summaries of papers from automatically obtained CoreSCs enable experts to answer 66% of complex content-related questions designed on the basis of paper abstracts. The questions were answered with a precision of 75%, where the upper bound for human summaries (abstracts) was 95%.

5 0.084753826 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models

Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao

Abstract: One of the language phenomena that n-gram language model fails to capture is the topic information of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all ngrams into exclusive topics, and two, developing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document.

6 0.084057108 121 emnlp-2013-Learning Topics and Positions from Debatepedia

7 0.066680439 41 emnlp-2013-Building Event Threads out of Multiple News Articles

8 0.064914174 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

9 0.064482436 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

10 0.062702052 61 emnlp-2013-Detecting Promotional Content in Wikipedia

11 0.057370357 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

12 0.055542909 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication

13 0.051918309 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

14 0.051416997 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

15 0.050706998 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

16 0.05061873 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

17 0.048716929 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

18 0.046534251 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

19 0.045988023 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

20 0.040276136 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.148), (1, 0.041), (2, -0.063), (3, 0.051), (4, 0.015), (5, -0.052), (6, 0.064), (7, 0.035), (8, -0.022), (9, -0.107), (10, -0.06), (11, -0.107), (12, -0.128), (13, 0.062), (14, 0.044), (15, 0.109), (16, 0.061), (17, 0.023), (18, -0.074), (19, 0.024), (20, -0.084), (21, 0.065), (22, -0.049), (23, 0.143), (24, 0.054), (25, -0.032), (26, 0.142), (27, 0.005), (28, 0.124), (29, -0.013), (30, 0.036), (31, 0.011), (32, -0.259), (33, 0.009), (34, 0.012), (35, -0.041), (36, -0.011), (37, -0.029), (38, 0.024), (39, -0.075), (40, 0.053), (41, -0.031), (42, 0.067), (43, -0.107), (44, -0.07), (45, 0.042), (46, -0.095), (47, 0.032), (48, 0.215), (49, -0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97563469 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

Author: James Foulds ; Padhraic Smyth

Abstract: When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach.

2 0.62658775 24 emnlp-2013-Application of Localized Similarity for Web Documents

Author: Peter Rebersek ; Mateja Verlic

Abstract: In this paper we present a novel approach to automatic creation of anchor texts for hyperlinks in a document pointing to similar documents. Methods used in this approach rank parts of a document based on the similarity to a presumably related document. Ranks are then used to automatically construct the best anchor text for a link inside original document to the compared document. A number of different methods from information retrieval and natural language processing are adapted for this task. Automatically constructed anchor texts are manually evaluated in terms of relatedness to linked documents and compared to baseline consisting of originally inserted anchor texts. Additionally we use crowdsourcing for evaluation of original anchors and au- tomatically constructed anchors. Results show that our best adapted methods rival the precision of the baseline method.

3 0.58438826 121 emnlp-2013-Learning Topics and Positions from Debatepedia

Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith

Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.

4 0.52156484 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models

Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao

Abstract: One of the language phenomena that n-gram language model fails to capture is the topic information of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all ngrams into exclusive topics, and two, developing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document.

5 0.51761836 199 emnlp-2013-Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students

Author: Philip Resnik ; Anderson Garron ; Rebecca Resnik

Abstract: in College Students Anderson Garron University of Maryland College Park, MD 20742 agarron@cs.umd.edu Rebecca Resnik Mindwell Psychology Bethesda 5602 Shields Drive Bethesda, MD 20817 drrebeccaresnik@gmail.com out adequate insurance or in rural areas – cannot ac- We investigate the value-add of topic modeling in text analysis for depression, and for neuroticism as a strongly associated personality measure. Using Pennebaker’s Linguistic Inquiry and Word Count (LIWC) lexicon to provide baseline features, we show that straightforward topic modeling using Latent Dirichlet Allocation (LDA) yields interpretable, psychologically relevant “themes” that add value in prediction of clinical assessments.

6 0.46764976 61 emnlp-2013-Detecting Promotional Content in Wikipedia

7 0.46663502 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

8 0.46076792 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches

9 0.44690022 5 emnlp-2013-A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task

10 0.44325995 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

11 0.3918663 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions

12 0.39170694 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

13 0.33678326 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

14 0.32801905 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

15 0.32131609 41 emnlp-2013-Building Event Threads out of Multiple News Articles

16 0.31386635 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

17 0.31384841 138 emnlp-2013-Naive Bayes Word Sense Induction

18 0.28946707 26 emnlp-2013-Assembling the Kazakh Language Corpus

19 0.28767642 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

20 0.27212653 23 emnlp-2013-Animacy Detection with Voting Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.066), (9, 0.035), (16, 0.01), (18, 0.03), (22, 0.065), (30, 0.059), (47, 0.015), (50, 0.013), (51, 0.157), (65, 0.287), (66, 0.046), (71, 0.027), (74, 0.011), (75, 0.034), (77, 0.024), (96, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80556798 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

Author: James Foulds ; Padhraic Smyth

Abstract: When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach.

2 0.72464168 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data

Author: Varada Kolhatkar ; Heike Zinsmeister ; Graeme Hirst

Abstract: Interpreting anaphoric shell nouns (ASNs) such as this issue and this fact is essential to understanding virtually any substantial natural language text. One obstacle in developing methods for automatically interpreting ASNs is the lack of annotated data. We tackle this challenge by exploiting cataphoric shell nouns (CSNs) whose construction makes them particularly easy to interpret (e.g., the fact that X). We propose an approach that uses automatically extracted antecedents of CSNs as training data to interpret ASNs. We achieve precisions in the range of 0.35 (baseline = 0.21) to 0.72 (baseline = 0.44), depending upon the shell noun. 1

3 0.59429038 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier

Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.

4 0.55575734 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

5 0.55119908 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou

Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1

6 0.54916757 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

7 0.54908407 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

8 0.54868418 152 emnlp-2013-Predicting the Presence of Discourse Connectives

9 0.54651207 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

10 0.54635644 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

11 0.54618841 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

12 0.54423255 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

13 0.54382992 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

14 0.54273927 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

15 0.54224116 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

16 0.54215485 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

17 0.5409146 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

18 0.54056567 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

19 0.54052848 143 emnlp-2013-Open Domain Targeted Sentiment

20 0.53984547 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs