acl acl2011 acl2011-174 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Text mining and data harvesting algorithms have become popular in the computational linguistics community. [sent-2, score-0.33]
2 They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. [sent-3, score-0.502]
3 They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. [sent-4, score-0.239]
4 1 Introduction Text mining / harvesting algorithms have been applied in recent years for various uses, including learning of semantic constraints for verb participants (Lin and Pantel, 2002) related pairs in various relations, such as part-whole (Girju et al. [sent-8, score-0.511]
5 They generally start with one or more seed terms and employ patterns that specify the desired information as it relates to the 1616 seed(s). [sent-13, score-0.27]
6 Generally, the harvesting procedure is recursive, in which data (terms or patterns) gathered in one step of a cycle are used as seeds in the following step, to gather more terms or patterns. [sent-17, score-0.395]
7 This method treats the source text as a graph or network, consisting of terms (words) as nodes and inter-term relations as edges. [sent-18, score-0.346]
8 Text mining is a process of network traversal, and faces the standard problems of handling cycles, ranking search alternatives, estimating yield maxima, etc. [sent-20, score-0.37]
9 The computational properties of large networks and large network traversal have been studied intensively (Sabidussi, 1966; Freeman, 1979; Watts and Strogatz, 1998) and especially, over the past years, in the context of the world wide web (Page et al. [sent-21, score-0.814]
10 It sometimes explains why text mining algo- 1These networks are generally far larger and more densely interconnected than the world wide web’s network of pages and hyperlinks. [sent-28, score-0.696]
11 In Section 3 we describe the general harvesting procedure, and follow with an examination of the various statistical properties of implicit semantic networks in Section 4, using our implemented harvester to provide illustrative statistics. [sent-33, score-0.834]
12 While clustering approaches tend to extract general facts, pattern based approaches have shown to produce more constrained but accurate lists of semantic terms. [sent-43, score-0.284]
13 Researchers outside computational linguistics have studied complex networks such as the World Wide Web, the Social Web, the network of scientific papers, among others. [sent-46, score-0.677]
14 They have investigated the properties of these text-based networks with the objective of understanding their structure and applying this knowledge to determine node importance/centrality, connectivity, growth and decay of interest, etc. [sent-47, score-0.544]
15 However, no-one has studied the properties of the text-based semantic networks induced by semantic relations between terms with the objective of understanding their structure and applying this knowledge to improve concept discovery. [sent-58, score-0.981]
16 Most relevant to this theme is the work of Steyvers and Tenenbaum (Steyvers and Tenenbaum, 2004), who studied three manually built lexical networks (association norms, WordNet, and Roget’s Thesaurus (Roget, 191 1)) and proposed a model of the growth of the semantic structure over time. [sent-59, score-0.606]
17 These networks are limited to the semantic relations among nouns. [sent-60, score-0.622]
18 In this paper we take a step further to explore the statistical properties of semantic networks relating proper names, nouns, verbs, and adjectives. [sent-61, score-0.585]
19 We implement a general harvesting procedure and show its results for these word types. [sent-63, score-0.249]
20 A fundamental difference with the work of (Steyvers and Tenenbaum, 2004) is that we study very large semantic networks built ‘naturally’ by (millions of) users rather than ‘artificially’ by a small set of experts. [sent-64, score-0.541]
21 The large networks capture the semantic intuitions and knowledge of the collective mass. [sent-65, score-0.541]
22 3 Inducing Semantic Networks in the Web Text mining algorithms such as those mentioned above raise certain questions, such as: Why are some seed terms more powerful (provide a greater yield) than others? [sent-67, score-0.293]
23 On the face of it, one would need to know the structure of the network a priori to be able to provide answers. [sent-72, score-0.253]
24 For example, in the text mining community, (Kozareva and Hovy, 2010b) have shown that one can obtain a quite accurate estimate of the eventual yield of a pattern and seed after only five steps of harvesting. [sent-74, score-0.395]
25 They do not provide an answer, but research from the network community does. [sent-76, score-0.258]
26 To illustrate the properties of networks of the kind induced by semantic relations, and to show the applicability of network research to text harvesting, we implemented a harvesting algorithm and applied it to a representative set of relations and seeds in two languages. [sent-77, score-1.214]
27 Since the goal of this paper is not the development of a new text harvesting algorithm, we implemented a version of an existing one: the so-called DAP (doubly-anchored pattern) algorithm (Kozareva et al. [sent-78, score-0.249]
28 , 2007), (5) can be formulated to learn semantic lexicons and relations for noun, verb and verb+preposition syntactic constructions; (6) functions equally well in different languages. [sent-81, score-0.262]
29 Next we describe the knowledge harvesting procedure and the construction of the text-mined semantic networks. [sent-82, score-0.43]
30 1 Harvesting to Induce Semantic Networks For a given semantic class of interest say singers, the algorithm starts with a seed example ofthe class, say 1618 Madonna. [sent-84, score-0.324]
31 The seed term is inserted in the lexicosyntactic pattern “class such as seed and *”, which learns on the position of the ∗ new terms of type clelaasrns. [sent-85, score-0.473]
32 The output of the algorithm is a set of terms for the semantic class. [sent-87, score-0.25]
33 The algorithm is implemented as a breadth-first search and its mechanism is described as follows: The output of the knowledge harvesting algorithm is a network of semantic terms interconnected by the semantic relation captured in the pattern. [sent-88, score-0.967]
34 We can represent the traversed (implicit) network as a directed graph G(V, E) with nodes V (|V | = n) adnirde edges E(|E| = m). [sent-89, score-0.457]
35 For example, given the sentence (where the pattern is in italics and the extracted term is underlined) “He loves singers such as Madonna and Michael Jackson”, two nodes Madonna and Michael Jackson with an edge e=(Madonna, Michael Jackson) would be created in the graph G. [sent-95, score-0.455]
36 The starting seed term Madonna is shown in red color and the harvested terms are in blue. [sent-97, score-0.371]
37 2 Data We harvested data from the Web for a representative selection of semantic classes and relations, of ! [sent-99, score-0.309]
38 , 2005; Pasca, 2007; Kozareva and Hovy, 2010a): • • • • • semantic classes that can be learned using difsfeemrenatn tsicee cdlsa (e. [sent-116, score-0.252]
39 , “singers es luecahr as M usaindgon dnifaand *” and “singers such as Placido Domingo and *”); semantic classes that are expressed through difsfeemrenatn lexico-syntactic patterns (e. [sent-118, score-0.31]
40 , “expensive aenridz *n car”, “dogs run and *”); semantic relations with more complex lexicosyntactic structure (e. [sent-122, score-0.293]
41 In total, we collected 10GB of data which was partof-speech tagged with Treetagger (Schmid, 1994) and used for the semantic term extraction. [sent-131, score-0.243]
42 Table 1 summarizes the number of nodes and edges learned for each semantic network using pattern Pi and the initial seed shown in italics. [sent-132, score-0.799]
43 4 Statistical Properties of Text-Mined Semantic Networks In this section we apply a range of relevant measures from the network analysis community to the networks described above. [sent-134, score-0.618]
44 It measures the degree to which the network structure determines the importance of a node in the network (Sabidussi, 1966; Freeman, 1979). [sent-137, score-0.673]
45 We explore the effect of two centrality measures: indegree and outdegree. [sent-138, score-0.333]
46 The indegree of a node u denoted as indegree(u)=P(v, u) considers the sum of all incoming edges to uP and captures the ability of a semantic term to be dPiscovered by other semantic terms. [sent-139, score-0.847]
47 The Poutdegree of a node u denoted as outdegree(u)=P(u, v) considers the number of outgoing edges of Pthe node u and measures the ability of a semantic tPerm to discover new terms. [sent-140, score-0.525]
48 Since harvesting algorithms are notorious for extracting erroneous information, we use the two centrality measures to rerank the harvested elements. [sent-142, score-0.437]
49 Table 2 shows the accuracy2 of the singer semantic terms at different ranks using the in and out degree measures. [sent-143, score-0.395]
50 shows that for the text-mined semantic networks, the ability of a term to discover new terms is more important than the ability to be discovered. [sent-146, score-0.432]
51 Table 3 shows the top and bottom 10 terms of the semantic class. [sent-153, score-0.25]
52 The nodes with high outdegree correspond to famous or contemporary singers. [sent-155, score-0.54]
53 Potentially, knowing which terms have a high outdegree allows one to rerank candidate seeds for more effective harvesting. [sent-157, score-0.528]
54 , 2000) and social networks like Orkut and Flickr, the textmined semantic networks also exhibit a power-law distribution. [sent-161, score-0.943]
55 This means that while a few terms have a significantly high degree, the majority of the semantic terms have small degree. [sent-162, score-0.319]
56 Figure 2 shows the indegree and outdegree distributions for different semantic classes, lexico-syntactic patterns, and languages (English and Spanish). [sent-163, score-0.836]
57 It is interesting to note that the indegree powerlaw exponents for all semantic networks fall within the same range (γin ≈ 2. [sent-174, score-0.836]
58 4), and similarly for the outdegree exponents (γout ≈ n1 . [sent-175, score-0.435]
59 However, tthhee values of the indegree and≈ outdegree exponents differ from each other. [sent-177, score-0.677]
60 The difference in the distributions can be explained by the link asymmetry of semantic terms: A discovering B does not necessarily mean that B will discover A. [sent-180, score-0.266]
61 In the text-mined semantic networks, this asymmetry is caused by patterns of language use, such as the fact that people use first adjectives of the size and then of the color (e. [sent-181, score-0.315]
62 3 Sparsity Another relevant property of the semantic networks concerns sparsity. [sent-186, score-0.541]
63 Sparsity can e bex -aslesom captured through t ihse density o Sfp tahreAll networks have low density which suggests −th1a)t the networks exhibit a sparse connectivity pattern. [sent-190, score-0.841]
64 Similar behavior was reported for the WordNet and Roget’s semantic networks (Steyvers and Tenenbaum, 2004). [sent-192, score-0.541]
65 4 Connectedness For every network, we computed the strongly connected component (SCC) such that for all nodes (semantic terms) in the SCC, there is a path from any node to another node in the SCC considering the direction of the edges between the nodes. [sent-202, score-0.493]
66 Unlike WordNet and Roget’s semantic networks where the SCC consists 96% of all semantic terms, in the text-mined semantic networks only 12 to 55% of the terms are in the SCC. [sent-205, score-1.332]
67 This shows that not all nodes can reach (discover) every other node in the network. [sent-206, score-0.267]
68 5 Path Lengths and Diameter Next, we describe the properties of the shortest paths between the semantic terms in the SCC. [sent-211, score-0.294]
69 The diameter of the SCC is calculated as the maximum distance over all pairs of nodes (u, v), such that a node v is reachable from node u. [sent-215, score-0.497]
70 Table 5 shows the average distance and the diameter of the semantic networks. [sent-216, score-0.302]
71 The diameter shows the maximum number of steps necessary to reach from any node to any other, while the average distance shows the number of steps necessary on average. [sent-219, score-0.298]
72 Overall, all networks have very short average path lengths and small diameters that are consistent with Watt’s finding for small-world networks. [sent-220, score-0.392]
73 Therefore, the yield of harvesting seeds can be predicted within five steps explaining (Kozareva and Hovy, 2010b; Vyas et al. [sent-221, score-0.392]
74 We also compute for any randomly selected node in the semantic network on average how many hops (steps) are necessary to reach from one node to another. [sent-223, score-0.722]
75 6 Clustering The clustering coefficient (C) is another measure to study the connectivity structure of the networks (Watts and Strogatz, 1998). [sent-226, score-0.583]
76 The clustering coefficient C for the whole semantic network is the Paverage clustering coefficient of all its nodes, C=1n P Ci. [sent-230, score-0.681]
77 The value of the clustering coefficient rangesP between [0, 1] , where 0 indicates that the nodes doP not have neighbors which are themselves connected, while 1indicates that all nodes are connected. [sent-231, score-0.455]
78 Table 6 shows the clustering coefficient for all text-mined semantic networks together with the number of closed and open triads3. [sent-232, score-0.68]
79 Similarly, we are interested in understanding how the nodes of the semantic networks connect to each other. [sent-237, score-0.76]
80 JDD is approximated by the degree correlation function knn which maps the outdegree and the average 3A triad is three nodes that are connected by either two (open triad) or three (closed triad) directed ties. [sent-240, score-0.8]
81 CClosedTriadsOpenTriads indegree of all nodes connected to a node with that outdegree. [sent-242, score-0.555]
82 High values of knn indicate that high-degree nodes tend to connect to other highdegree nodes (forming a “core” in the network), while lower values of knn suggest that the highdegree nodes tend to connect to low-degree ones. [sent-243, score-0.806]
83 The figure plots the outdegree and the average indegree of the semantic terms in the networks on a log-log scale. [sent-245, score-1.234]
84 We can see that for all networks the high-degree nodes tend to connect to other high-degree ones. [sent-246, score-0.579]
85 8 Assortivity The property of the nodes to connect to other nodes with similar degrees can be captured through the assortivity coefficient r (Newman, 2003). [sent-249, score-0.55]
86 d Aes pteosndit vtoe caossnonreticvti tyo nooefdfeics eonf s mimeialnars degree, while negative coefficient means that nodes are likely to connect to nodes with degree very different from their own. [sent-252, score-0.558]
87 We find that the assortivitiy coefficient of our semantic networks is positive, ranging from 0. [sent-253, score-0.633]
88 In this respect, the semantic networks differ from the Web, which has a negative assortivity (Newman, 2003). [sent-256, score-0.622]
89 5 Discussion The above studies show that many of the properties discovered of the network formed by the web hold also for the networks induced by semantic relations in text mining applications, for various semantic classes, semantic relations, and languages. [sent-259, score-1.415]
90 The small-world phenomenon, for example, holds that any node is connected to any other node in at most six steps. [sent-261, score-0.264]
91 5 the semantic networks also exhibit this phenomenon, we can explain the observation of (Kozareva and Hovy, 2010b) that one can quite accurately predict the relative ‘goodness’ of a seed term (its eventual total yield and the number of steps required to obtain that) within five harvesting steps. [sent-263, score-1.11]
92 We have shown that due 1623 to the strongly connected components in text mining networks, not all elements within the harvested graph can discover each other. [sent-264, score-0.316]
93 This implies that harvesting algorithms have to be started with several seeds to obtain adequate Recall (Vyas et al. [sent-265, score-0.326]
94 We have shown that centrality measures can be used successfully to rank harvested terms to guide the network traversal, and to validate the correctness of the harvested terms. [sent-267, score-0.576]
95 6 Conclusion In this paper we describe the implicit ‘hidden’ semantic network graph structure induced over the text of the web and other sources by the semantic relations people use in sentences. [sent-269, score-0.856]
96 We describe how term harvesting patterns whose seed terms are harvested and then applied recursively can be used to discover these semantic term networks. [sent-270, score-0.975]
97 Learning arguments and supertypes of semantic relations using recursive patterns. [sent-347, score-0.262]
98 Espresso: Leveraging generic patterns for automatically harvesting semantic relations. [sent-384, score-0.488]
99 The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. [sent-446, score-0.393]
100 Ranking scientific publications using a simple model of network traffic. [sent-463, score-0.283]
wordName wordTfidf (topN-words)
[('outdegree', 0.382), ('networks', 0.36), ('harvesting', 0.249), ('indegree', 0.242), ('network', 0.222), ('semantic', 0.181), ('nodes', 0.158), ('kozareva', 0.153), ('seed', 0.143), ('singers', 0.141), ('node', 0.109), ('scc', 0.106), ('madonna', 0.106), ('broder', 0.101), ('hops', 0.101), ('tenenbaum', 0.101), ('harvested', 0.097), ('coefficient', 0.092), ('centrality', 0.091), ('degree', 0.089), ('web', 0.084), ('roget', 0.082), ('diameter', 0.082), ('mining', 0.081), ('relations', 0.081), ('assortivity', 0.081), ('seeds', 0.077), ('pasca', 0.073), ('traversal', 0.07), ('terms', 0.069), ('steyvers', 0.066), ('vyas', 0.065), ('knn', 0.065), ('zornitsa', 0.065), ('term', 0.062), ('scientific', 0.061), ('connect', 0.061), ('triad', 0.06), ('watts', 0.06), ('patterns', 0.058), ('singer', 0.056), ('pattern', 0.056), ('discover', 0.054), ('connectivity', 0.053), ('exponents', 0.053), ('etzioni', 0.052), ('eventual', 0.049), ('clauset', 0.049), ('hovy', 0.048), ('newman', 0.047), ('riloff', 0.047), ('clustering', 0.047), ('pagerank', 0.046), ('girju', 0.046), ('jackson', 0.046), ('connected', 0.046), ('eduard', 0.045), ('pantel', 0.044), ('properties', 0.044), ('social', 0.042), ('kleinberg', 0.042), ('suchanek', 0.042), ('cantantes', 0.04), ('difsfeemrenatn', 0.04), ('gasser', 0.04), ('highdegree', 0.04), ('huafeng', 0.04), ('jdd', 0.04), ('preiss', 0.04), ('radicchi', 0.04), ('sabidussi', 0.04), ('sayyadi', 0.04), ('strogatz', 0.04), ('distance', 0.039), ('edges', 0.039), ('graph', 0.038), ('adjectives', 0.038), ('people', 0.038), ('community', 0.036), ('freeman', 0.035), ('bombs', 0.035), ('ranking', 0.035), ('discovery', 0.035), ('studied', 0.034), ('density', 0.034), ('steps', 0.034), ('ability', 0.033), ('shepherd', 0.033), ('interconnected', 0.033), ('path', 0.032), ('citation', 0.032), ('yield', 0.032), ('relation', 0.032), ('wordnet', 0.032), ('distributions', 0.031), ('structure', 0.031), ('classes', 0.031), ('kempe', 0.031), ('xie', 0.031), ('scientists', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 174 acl-2011-Insights from Network Structure for Text Mining
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
2 0.15558675 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal
Author: Apoorv Agarwal
Abstract: In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. Iplan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, Ipropose to apply my system to a wide range of genres. By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. Lastly, I will attempt to make convolution kernels more scalable and interpretable.
3 0.15196121 162 acl-2011-Identifying the Semantic Orientation of Foreign Words
Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev
Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.
4 0.12138372 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
Author: Tetsuo Kiso ; Masashi Shimbo ; Mamoru Komachi ; Yuji Matsumoto
Abstract: In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.
5 0.1047722 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
Author: Ivan Titov ; Alexandre Klementiev
Abstract: We propose a non-parametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of semantically equivalent syntactic structures, and (3) predict predicate-argument relations between the fragments. We use hierarchical PitmanYor processes to model statistical dependencies between meaning representations of predicates and those of their arguments, as well as the clusters of their syntactic realizations. We develop a modification of the MetropolisHastings split-merge sampler, resulting in an efficient inference algorithm for the model. The method is experimentally evaluated by us- ing the induced semantic representation for the question answering task in the biomedical domain.
6 0.10461446 177 acl-2011-Interactive Group Suggesting for Twitter
7 0.099754088 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
8 0.09952607 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
9 0.089341938 217 acl-2011-Machine Translation System Combination by Confusion Forest
10 0.087184116 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
11 0.087025195 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
12 0.081785552 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
13 0.078549907 167 acl-2011-Improving Dependency Parsing with Semantic Classes
14 0.078244902 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
15 0.077687576 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts
16 0.076795429 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers
17 0.074653879 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
18 0.074313648 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
19 0.073836043 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis
20 0.069664121 293 acl-2011-Template-Based Information Extraction without the Templates
topicId topicWeight
[(0, 0.182), (1, 0.064), (2, -0.085), (3, 0.005), (4, 0.013), (5, -0.021), (6, -0.011), (7, -0.034), (8, -0.066), (9, -0.089), (10, -0.018), (11, -0.051), (12, 0.052), (13, 0.038), (14, -0.073), (15, -0.164), (16, -0.016), (17, -0.111), (18, -0.0), (19, -0.051), (20, 0.015), (21, 0.065), (22, 0.032), (23, 0.085), (24, 0.055), (25, -0.073), (26, -0.01), (27, 0.189), (28, 0.003), (29, 0.029), (30, 0.011), (31, -0.04), (32, -0.023), (33, -0.036), (34, 0.133), (35, -0.068), (36, -0.022), (37, 0.012), (38, -0.06), (39, 0.05), (40, -0.003), (41, -0.154), (42, -0.062), (43, 0.03), (44, -0.016), (45, 0.063), (46, -0.023), (47, 0.004), (48, -0.011), (49, -0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.93769377 174 acl-2011-Insights from Network Structure for Text Mining
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
2 0.69641274 162 acl-2011-Identifying the Semantic Orientation of Foreign Words
Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev
Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.
3 0.665923 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
Author: Tetsuo Kiso ; Masashi Shimbo ; Mamoru Komachi ; Yuji Matsumoto
Abstract: In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.
4 0.62088424 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis
Author: Amjad Abu-Jbara ; Dragomir Radev
Abstract: In this paper we present Clairlib, an opensource toolkit for Natural Language Processing, Information Retrieval, and Network Analysis. Clairlib provides an integrated framework intended to simplify a number of generic tasks within and across those three areas. It has a command-line interface, a graphical interface, and a documented API. Clairlib is compatible with all the common platforms and operating systems. In addition to its own functionality, it provides interfaces to external software and corpora. Clairlib comes with a comprehensive documentation and a rich set of tutorials and visual demos.
5 0.55684102 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal
Author: Apoorv Agarwal
Abstract: In my thesis, Ipropose to build a system that would enable extraction of social interactions from texts. To date Ihave defined a comprehensive set of social events and built a preliminary system that extracts social events from news articles. Iplan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, Ipropose to apply my system to a wide range of genres. By extracting linguistic constructs relevant to social interactions, I will be able to empirically analyze different kinds of linguistic constructs that people use to express social interactions. Lastly, I will attempt to make convolution kernels more scalable and interpretable.
6 0.53710753 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
7 0.53590214 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
8 0.52387792 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
9 0.51854408 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
10 0.51006317 177 acl-2011-Interactive Group Suggesting for Twitter
11 0.50415289 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
12 0.49592569 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
13 0.46865413 200 acl-2011-Learning Dependency-Based Compositional Semantics
14 0.46378279 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon
15 0.45406288 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts
16 0.45310867 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
17 0.45032385 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
18 0.44581023 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
19 0.43769857 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
20 0.43453625 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment
topicId topicWeight
[(5, 0.028), (9, 0.017), (17, 0.05), (26, 0.031), (34, 0.232), (37, 0.062), (39, 0.039), (41, 0.048), (55, 0.027), (59, 0.066), (72, 0.033), (91, 0.064), (96, 0.182), (97, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.83093387 174 acl-2011-Insights from Network Structure for Text Mining
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
2 0.80889642 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
Author: Ines Rehbein ; Josef Ruppenhofer
Abstract: Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard data, there is some doubt whether the approach can be successful when applied to noisy, real-world data sets. This paper presents a thorough evaluation of the impact of annotation noise on AL and shows that systematic noise resulting from biased coder decisions can seriously harm the AL process. We present a method to filter out inconsistent annotations during AL and show that this makes AL far more robust when ap- plied to noisy data.
3 0.78228641 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
Author: Colin Cherry ; Shane Bergsma
Abstract: Graph-based dependency parsing can be sped up significantly if implausible arcs are eliminated from the search-space before parsing begins. State-of-the-art methods for arc filtering use separate classifiers to make pointwise decisions about the tree; they label tokens with roles such as root, leaf, or attaches-tothe-left, and then filter arcs accordingly. Because these classifiers overlap substantially in their filtering consequences, we propose to train them jointly, so that each classifier can focus on the gaps of the others. We integrate the various pointwise decisions as latent variables in a single arc-level SVM classifier. This novel framework allows us to combine nine pointwise filters, and adjust their sensitivity using a shared threshold based on arc length. Our system filters 32% more arcs than the independently-trained classifiers, without reducing filtering speed. This leads to faster parsing with no reduction in accuracy.
4 0.69225109 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld
Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
5 0.69190598 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel
Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.
6 0.68980348 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
7 0.68812168 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
8 0.68760765 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
9 0.68686301 177 acl-2011-Interactive Group Suggesting for Twitter
10 0.68645275 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
11 0.68568802 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
12 0.68494809 254 acl-2011-Putting it Simply: a Context-Aware Approach to Lexical Simplification
13 0.68469733 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
14 0.68368709 187 acl-2011-Jointly Learning to Extract and Compress
15 0.68359691 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework
16 0.68158078 178 acl-2011-Interactive Topic Modeling
17 0.68131065 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
18 0.68023372 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
19 0.6799252 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
20 0.67948127 117 acl-2011-Entity Set Expansion using Topic information