emnlp emnlp2013 emnlp2013-182 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jimmy Dubuisson ; Jean-Pierre Eckmann ; Christian Scheible ; Hinrich Schutze
Abstract: Studies of the graph of dictionary definitions (DD) (Picard et al., 2009; Levary et al., 2012) have revealed strong semantic coherence of local topological structures. The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). Based on our earlier work (Levary et al., 2012), we study a different class of word definitions, namely those of the Free Association (FA) dataset (Nelson et al., 2004). These are responses by subjects to a cue word, which are then summarized by a directed, free association graph. We find that the structure of this network is quite different from both the Wordnet and the dictionary networks. This difference can be explained by the very nature of free association as compared to the more “logical” construction of dictionaries. It thus sheds some (quantitative) light on the psychology of free association. In NLP, semantic groups or clusters are interesting for various applications such as word sense disambiguation. The FA graph is tighter than the DD graph, because of the large number of triangles. This also makes drift of meaning quite measurable so that FA graphs provide a quantitative measure of the semantic coherence of small groups of words.
Reference: text
sentIndex sentText sentNum sentScore
1 The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). [sent-8, score-0.625]
2 Traditionally, a popular lexical database of English is Wordnet (Miller, 1995; Miller and Fellbaum, 1998), which organizes the semantic network in terms of graph theory. [sent-20, score-0.277]
3 For example, it has become clear more recently that cycles and triangles play an important role in semantic networks, see e. [sent-22, score-0.7]
4 For DD, the nodes are words and the directed edges point from a word to its definition(s). [sent-32, score-0.336]
5 For FA, the nodes are again words, and each cue word has a directed edge to each association it elicits. [sent-33, score-0.351]
6 Although the links in these graphs were not con- structed by following a rational centralized process, their graph exhibits very specific features and we concentrate on the study of its topological properties. [sent-34, score-0.418]
7 In particular, we find that the main strongly connected component of the FA graph (the so-called core) is very cyclic in nature and contains a large predominance of short cycles (i. [sent-44, score-0.639]
8 In contrast to the DD graph, bunches of triangles form well-delimited lexical fields of collective semantic knowledge. [sent-47, score-0.547]
9 To show the semantic coherence of these lexical fields of the FA graph, we perform an experiment with human raters and find that cycles are strongly semantically connected even when compared to close neighbors in the graph. [sent-52, score-0.458]
10 , in graphs with few links per node, the number of triangles is extremely rare. [sent-56, score-0.542]
11 Therefore, if one does find many triangles in a graph, they must be not only a signal of non-randomness, but carry relevant information about the domain of research as shown earlier (Eckmann and Moses, 2002). [sent-57, score-0.394]
12 A directed graph is a pair G = (V, E) of a set V of vertices and, a set E of ordered pairs of vertices also called directed edges. [sent-73, score-1.037]
13 A vertex with null in-degree is called a source and a vertex with null out-degree is called a sink. [sent-80, score-0.306]
14 A directed path is a sequence of vertices such that a directed edge exists between each consecutive pair of vertices of the graph. [sent-81, score-0.942]
15 A directed graph is said to be strongly connected, (resp. [sent-82, score-0.427]
16 weakly connected) if for every pair of vertices in the graph, there exists a directed path (resp. [sent-83, score-0.504]
17 weakly connected component, WCC) of a directed graph G is a maximal strongly connected (resp. [sent-86, score-0.596]
18 A directed cycle is a directed path such that its start vertex is the same as its end vertex. [sent-88, score-0.633]
19 A colink is a directed cycle of length 2 and a triangle a directed cycle of length 3. [sent-89, score-0.638]
20 The distance between two vertices in a graph is the number of edges in the shortest path connecting them. [sent-90, score-0.634]
21 The characteristic path length is the average distance between any two vertices of G. [sent-92, score-0.325]
22 The density of a directed graph G(V, E) is the proportion of existing edges over the total number of possible edges and is defined as: d = |E|/(|V | (|V | − 1)) The neighborhood Ni of a vertex vi is Ni = {vj : eij ∈ E or eji ∈ E}. [sent-93, score-0.864]
23 T∈he E Elo ocral e clustering coefficient Ci for a vertex vi corresponds to the density of its neighborhood subgraph. [sent-94, score-0.372]
24 For a directed graph, it is thus given by: Ci=|{ejk: v|jN,iv|k(|N∈i N| −i,e 1j)k∈ E}| The clustering coefficient of a graph G is the average of the local clustering coefficients of all its vertices. [sent-95, score-0.505]
25 The efficiency Eff of a directed graph G is an in- dicator of the traffic capacity of a network. [sent-96, score-0.383]
26 On the other hand, when ρD is negative, the graph features disassortative mixing and high-degree nodes tend to connect to low degree nodes. [sent-100, score-0.345]
27 We generate the corresponding graph by adding a directed edge for each cue-target pair of the dataset. [sent-106, score-0.45]
28 We only consider pairs whose target was normed in order to avoid overloading the graph with noisy data (e. [sent-107, score-0.273]
29 For comparison with dictionary definitions (DD), we construct a graph from the Wordnet2 dictionary (nouns only), following (Levary et al. [sent-112, score-0.301]
30 It consists in recursively removing the source and sink nodes from a weakly connected directed graph and permits to get the subgraph induced by the union of its strongly connected components. [sent-121, score-0.859]
31 It turns out that the FA graph also contains a giant SCC, therefore getting the core consists more simply in extracting the main SCC of the initial graph. [sent-125, score-0.505]
32 3 Vertex degree analysis The FA core has a maximum in-degree of 3 13, a maximum out-degree of 33 and an average degree of 25. [sent-128, score-0.328]
33 4 Cycle decomposition of the core We define the vertex k-cycle multiplicity (resp. [sent-140, score-0.47]
34 Although the number of 4-shortest cycles is comparable in the core and core-ER graphs for example, there are in reality far more 4-cycles in the core (i. [sent-145, score-0.8]
35 We see that when considering shortest cycles, short cycles tend to hide long ones, and, as a large proportion of nodes in the core belong to 2- and 3-cycles, many longer cycles do not get counted at all. [sent-148, score-0.874]
36 The first thing we observe is that the core has a very high density of short cycles: the subset of nodes belonging to 2-cycles (i. [sent-151, score-0.429]
37 , nodes with 2-cycle multiplicities > 0) cover 95% of the core vertices and the 3-cycles cover 88% of the core vertices. [sent-153, score-0.775]
38 This shows that the core is very cyclic in nature and that it remains very well connected for shortlength cycles: most vertices of the core indeed be- long to at least one co-link or triangle. [sent-155, score-0.758]
39 In order to limit computation times, we only considered shortest cycles for lengths ≥ 3 and analyzed tshidee rdeidst rshibourttieosnt oyfc tehse f onru lmenbgetrh so ≥f s 3ho anrtde astn cycles in the core compared to equivalent random graphs. [sent-156, score-0.753]
40 Whereas there are many more short cycles in the core, we observe a predominance of 4, 5 and 6cycles in core-ER graphs. [sent-157, score-0.318]
41 However, we find again a slight predominance of long cycles (length between 7 and 15) in the core (see Fig. [sent-158, score-0.544]
42 Longer cycles are more difficult to describe: Relations linking words of a given cycle exhibit semantic drift with increasing length (cf. [sent-177, score-0.427]
43 The large predominance of short cycles in the core may indeed be a natural consequence of the semantic information being acquired by means of associative learning (Ashcraft and Radvansky, 2009; Shanks, 1995). [sent-183, score-0.675]
44 These groups ofhighly interconnected vertices are called communities and convey important properties of the network. [sent-188, score-0.43]
45 , Ck} of the vertthexa ts eat oafr a graph G = represents a good community structure if the proportion of edges inside the Ci is higher than the proportion of edges between them (Fortunato, 2010). [sent-192, score-0.478]
46 Computing such communities in a large graph is generally computationally expensive (Lancichinetti and Fortunato, 2009). [sent-193, score-0.401]
47 The idea lying behind this algorithm is that random walks on a graph will tend to get trapped in the densely connected subgraphs. [sent-195, score-0.277]
48 Let Pitj be the probability of going from vertex i to vertex j through a random walk of length t. [sent-196, score-0.338]
49 The distance between two vertices iand j of the graph is defined as: rij(t) =tuuvkXn=1(Pitkd−(k P)jtk)2 where d(k) is the degree of vertex k. [sent-197, score-0.659]
50 , 2001): at each step k, the two communities that minimize the mean σk of the squared distances between each vertex and its community are merged: σk=n1CX∈PkiX∈Cri2C 4. [sent-204, score-0.396]
51 2 Clustering of the core We first identify the communities of the FA core using the Walktrap algorithm. [sent-206, score-0.64]
52 , for a length of 2, we find 35 communities whereas for a length of 9, we only find 8 communities). [sent-209, score-0.292]
53 For a path length of 2, the algorithm extracts 35 communities, 7 of which contain more than 100 vertices, 3 of which contain between 100 and 50 vertices and 25 of which contain less than 50 vertices. [sent-210, score-0.325]
54 3 Clustering of the core co-links We define the k-cycle induced subgraph of a graph G as the subgraph of G induced by the set of its vertices with k-cycle multiplicity > 0. [sent-222, score-1.058]
55 The co-link graph ofa graph G(V, E) is the undirected graph obtained by replacing each co-link (i. [sent-223, score-0.683]
56 × The co-link graph of the FA core has 4’508 vertices and 8’309 edges for a density of 8 10−4. [sent-226, score-0.832]
57 Extracting the co-link graph is thus an efficient way of selecting the set of most important semantic links (i. [sent-228, score-0.319]
58 , the set of 2-cycles that appear in large predominance in the core compared to what is found in an equivalent random graph) while filtering out the noisy or negligible ones. [sent-230, score-0.302]
59 , 923 communities when length equals 2) whereas for longer paths the average size of the communities increases more and more. [sent-234, score-0.448]
60 The community detection exhibits thus a far finer degree ofgranularity for the core co-links graph than for the core itself. [sent-235, score-0.771]
61 Examples of communities found in the core colinks graph include (standards, values, morals, ethics), (hopeless, romantic, worthless, useless), (thesaurus, dictionary, vocabulary, encyclopedia) or (molecule, atom, electron, nucleus, proton, neutron). [sent-237, score-0.672]
62 4 DD core clustering vs FA core clustering The clustering of both cores has very different characteristics: We illustrate the neighborhoods of conflict for both cases in Fig. [sent-240, score-0.681]
63 674 Figure 2: Neighborhood of conflict in the FA core The set of words belonging to the neighborhood of conflict are clearly part of the same lexical field. [sent-242, score-0.466]
64 The high density of colinks leads to cyclicity and we see that many directed triangles are present in the local subgraph (e. [sent-243, score-0.81]
65 We can even find triangles of colinks that link together words semantically strongly related (e. [sent-246, score-0.483]
66 On one hand, the words in communities of the DD core are in most cases either synonyms, e. [sent-250, score-0.414]
67 On the other hand, communities of the FA core are generally composed of words belonging to the same lexical field and sharing the same level of abstraction. [sent-255, score-0.522]
68 twowgrcooauprnflicdtifsraigcretiomnent Figure 3: Neighborhood conflict in the DD core First, we note that the neighborhood has a lower density than in the FA core. [sent-262, score-0.43]
69 As it generally happens in the neighborhood subgraphs of the DD core, source nodes are rather specific words whereas sink nodes are generic words. [sent-264, score-0.333]
70 1 Extraction of the seed We already saw that most vertices of the core belong to directed 2- and 3-cycles. [sent-270, score-0.754]
71 We call seed the subgraph of the core induced by the set V3 of vertices belonging to directed triangles and shell the subgraph of the core induced by the set V \V3 (i. [sent-277, score-1.755]
72 Figure 4: Composition of the FA graph The graph of FA contains a giant SCC (the core). [sent-281, score-0.492]
73 The subgraph of the core induced by the set of nodes belonging to at least one triangle also forms a giant component we call the ‘seed’ . [sent-282, score-0.628]
74 The subgraph of the core induced by the set of nodes not belonging to any triangle is called the ‘shell’ and is composed of many small SCCs, including single vertices. [sent-283, score-0.562]
75 to The seed contains 4’313 vertices (89% of the core) and 54’ 197 edges. [sent-290, score-0.318]
76 The first thing to notice is that it has 100 times more co-links (7’895) and 20 times more triangles (13’ 119) than an equivalent random graph. [sent-291, score-0.394]
77 We call shortcuts the 32’773 edges of the seed that do not belong to 3-cycles, see Fig. [sent-292, score-0.306]
78 The seed obviously also contains cycles whose length is greater than 3. [sent-294, score-0.35]
79 By linking two triangles, these shortcuts permit to move two basic semantic units closer together and create longer cycles (i. [sent-299, score-0.411]
80 Long cycles can be thus considered as groupings of basic semantic units. [sent-302, score-0.306]
81 In the case of two triangles sharing one vertex for example, it is possible to add at most 6 shortcuts, whereas, for two triangles sharing two vertices, at most 2 shortcuts can be added. [sent-303, score-1.15]
82 Furthermore, there is a limit on the number of shortcuts that can possibly be added in the seed before it gets saturated, as all its vertices belong to at least one triangle. [sent-306, score-0.463]
83 We show that at most 16 shortcuts can be added between two isolated triangles, at most 6 between 2 triangles sharing 2 vertex and at most 2 between 2 triangles sharing 2 vertices (see Fig. [sent-307, score-1.392]
84 We focus on the arrangements of triangles as they constitute the set of elementary concepts. [sent-311, score-0.44]
85 Let t be the graph operator which transforms a graph G into the intersection graph tG of its 2simplices (i. [sent-313, score-0.669]
86 From a topological perspective, we deduce that bunches of triangles (i. [sent-328, score-0.496]
87 We start by generating the initial graph (23’219 vertices and 325’589 edges), then extract its core (7’754 vertices and 247’ 172 edges) and its seed 1http : / /www . [sent-362, score-0.999]
88 Both FA cores have a high density of connected triangles, whereas cycles in the DD graph tend to be longer and most triangles are isolated. [sent-372, score-1.019]
89 As the topological properties of free association graphs reflect key aspects of semantic knowledge, we believe some graph theory metrics could be used efficiently to derive new ways of measuring semantic similarity between words. [sent-377, score-0.644]
90 (2005) recognize that triangles form semantically strongly cohesive groups and apply clustering coefficients for word sense disambiguation. [sent-396, score-0.531]
91 Their work focuses on undirected graphs of corpus co-occurrences whereas our work builds on directed associations. [sent-397, score-0.36]
92 8 Conclusion The cognitive process ofdiscrete free association being an epiphenomenon of our semantic memory at work, the cumulative set of free associations of the USF dataset can be viewed as the projection of a collective semantic memory. [sent-399, score-0.444]
93 To analyze the semantic memory, we use the tools of graph theory, and compare it also to dictionary graphs. [sent-400, score-0.321]
94 In both cases, triangles play a crucial role in the local topology and they form the set of elementary concepts of the underlying graph. [sent-401, score-0.488]
95 , pairs of triangles sharing an edge or forming tetrahedras). [sent-404, score-0.513]
96 As the words of a graph of free associations acquire their meaning from the set of associations they are involved in (Deese, 1962), we go a step further by examining the neighborhood of nodes and extracting the statistics of cycles. [sent-405, score-0.674]
97 -¿ -¿I call the pairs of triangles sharing an edge the 2-clovers ;-) Comparing dictionaries to free association, we find the free association graph being more concept 678 driven, with words in small clusters being on the same level of abstraction. [sent-407, score-0.973]
98 Moreover, we think that graphs of free associations could find interesting applications for Word Sense Disambiguation (e. [sent-408, score-0.282]
99 Finally, we believe that studying the dynamics of graphs of free associations may be of particular interest for observing the change in meaning of certain words (Deese, 1967), or more generally to follow the cultural evolution arising among a social group. [sent-414, score-0.325]
100 Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. [sent-426, score-0.274]
wordName wordTfidf (topN-words)
[('triangles', 0.394), ('fa', 0.322), ('vertices', 0.242), ('cycles', 0.242), ('core', 0.226), ('graph', 0.213), ('dd', 0.204), ('communities', 0.188), ('directed', 0.17), ('vertex', 0.153), ('levary', 0.121), ('free', 0.107), ('graphs', 0.106), ('subgraph', 0.105), ('shortcuts', 0.105), ('neighborhood', 0.092), ('eckmann', 0.091), ('multiplicity', 0.091), ('cycle', 0.089), ('edges', 0.085), ('nodes', 0.081), ('cliques', 0.079), ('shell', 0.079), ('seed', 0.076), ('predominance', 0.076), ('associations', 0.069), ('associative', 0.067), ('edge', 0.067), ('density', 0.066), ('giant', 0.066), ('scc', 0.066), ('connected', 0.064), ('semantic', 0.064), ('clustering', 0.061), ('normed', 0.06), ('usf', 0.06), ('walktrap', 0.06), ('topological', 0.057), ('belonging', 0.056), ('triangle', 0.056), ('community', 0.055), ('wup', 0.053), ('dorow', 0.053), ('sharing', 0.052), ('degree', 0.051), ('path', 0.051), ('steyvers', 0.048), ('nelson', 0.048), ('topology', 0.048), ('conflict', 0.046), ('elementary', 0.046), ('bricks', 0.045), ('bunches', 0.045), ('colinks', 0.045), ('elisha', 0.045), ('gravino', 0.045), ('palermo', 0.045), ('shortcut', 0.045), ('wcc', 0.045), ('xxy', 0.045), ('dictionary', 0.044), ('fields', 0.044), ('strongly', 0.044), ('undirected', 0.044), ('meaning', 0.043), ('shortest', 0.043), ('links', 0.042), ('weakly', 0.041), ('eat', 0.04), ('belong', 0.04), ('whereas', 0.04), ('tenenbaum', 0.039), ('picard', 0.039), ('sink', 0.039), ('induced', 0.038), ('psychology', 0.036), ('stimulus', 0.034), ('association', 0.033), ('length', 0.032), ('cohesive', 0.032), ('florida', 0.032), ('intersection', 0.03), ('ashcraft', 0.03), ('bollob', 0.03), ('curvature', 0.03), ('cyclicity', 0.03), ('deese', 0.03), ('eff', 0.03), ('everitt', 0.03), ('exy', 0.03), ('fortunato', 0.03), ('hancock', 0.03), ('heylighen', 0.03), ('jenkins', 0.03), ('jonyer', 0.03), ('lancichinetti', 0.03), ('mirage', 0.03), ('motifs', 0.03), ('santo', 0.03), ('sccs', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 182 emnlp-2013-The Topology of Semantic Knowledge
Author: Jimmy Dubuisson ; Jean-Pierre Eckmann ; Christian Scheible ; Hinrich Schutze
Abstract: Studies of the graph of dictionary definitions (DD) (Picard et al., 2009; Levary et al., 2012) have revealed strong semantic coherence of local topological structures. The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). Based on our earlier work (Levary et al., 2012), we study a different class of word definitions, namely those of the Free Association (FA) dataset (Nelson et al., 2004). These are responses by subjects to a cue word, which are then summarized by a directed, free association graph. We find that the structure of this network is quite different from both the Wordnet and the dictionary networks. This difference can be explained by the very nature of free association as compared to the more “logical” construction of dictionaries. It thus sheds some (quantitative) light on the psychology of free association. In NLP, semantic groups or clusters are interesting for various applications such as word sense disambiguation. The FA graph is tighter than the DD graph, because of the large number of triangles. This also makes drift of meaning quite measurable so that FA graphs provide a quantitative measure of the semantic coherence of small groups of words.
2 0.080165267 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
Author: He He ; Hal Daume III ; Jason Eisner
Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
3 0.07969927 41 emnlp-2013-Building Event Threads out of Multiple News Articles
Author: Xavier Tannier ; Veronique Moriceau
Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.
4 0.072921291 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster
Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.
5 0.071441323 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues
Author: Matt Gardner ; Partha Pratim Talukdar ; Bryan Kisiel ; Tom Mitchell
Abstract: Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRAbased approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.
6 0.06132295 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
7 0.058983915 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data
8 0.054350168 118 emnlp-2013-Learning Biological Processes with Global Constraints
9 0.051982228 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
10 0.051019955 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
11 0.049701266 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
12 0.04856674 145 emnlp-2013-Optimal Beam Search for Machine Translation
13 0.047566816 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections
14 0.047517017 138 emnlp-2013-Naive Bayes Word Sense Induction
15 0.046088703 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
16 0.045486532 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
17 0.044686504 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
18 0.043030191 149 emnlp-2013-Overcoming the Lack of Parallel Data in Sentence Compression
19 0.042904764 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
20 0.042518109 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
topicId topicWeight
[(0, -0.162), (1, 0.04), (2, -0.033), (3, 0.029), (4, -0.009), (5, 0.036), (6, -0.002), (7, -0.012), (8, -0.014), (9, 0.008), (10, 0.004), (11, -0.003), (12, -0.061), (13, 0.064), (14, -0.003), (15, 0.002), (16, -0.084), (17, 0.055), (18, 0.001), (19, 0.061), (20, 0.034), (21, -0.066), (22, 0.032), (23, 0.071), (24, -0.09), (25, -0.114), (26, 0.061), (27, 0.092), (28, -0.081), (29, -0.143), (30, -0.054), (31, -0.057), (32, -0.08), (33, -0.023), (34, -0.059), (35, 0.101), (36, 0.081), (37, 0.149), (38, -0.096), (39, -0.042), (40, -0.053), (41, -0.044), (42, 0.226), (43, -0.025), (44, 0.101), (45, 0.007), (46, -0.136), (47, -0.089), (48, -0.069), (49, -0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.95866597 182 emnlp-2013-The Topology of Semantic Knowledge
Author: Jimmy Dubuisson ; Jean-Pierre Eckmann ; Christian Scheible ; Hinrich Schutze
Abstract: Studies of the graph of dictionary definitions (DD) (Picard et al., 2009; Levary et al., 2012) have revealed strong semantic coherence of local topological structures. The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). Based on our earlier work (Levary et al., 2012), we study a different class of word definitions, namely those of the Free Association (FA) dataset (Nelson et al., 2004). These are responses by subjects to a cue word, which are then summarized by a directed, free association graph. We find that the structure of this network is quite different from both the Wordnet and the dictionary networks. This difference can be explained by the very nature of free association as compared to the more “logical” construction of dictionaries. It thus sheds some (quantitative) light on the psychology of free association. In NLP, semantic groups or clusters are interesting for various applications such as word sense disambiguation. The FA graph is tighter than the DD graph, because of the large number of triangles. This also makes drift of meaning quite measurable so that FA graphs provide a quantitative measure of the semantic coherence of small groups of words.
2 0.6212315 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues
Author: Matt Gardner ; Partha Pratim Talukdar ; Bryan Kisiel ; Tom Mitchell
Abstract: Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRAbased approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.
3 0.50609297 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
Author: He He ; Hal Daume III ; Jason Eisner
Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
4 0.47859922 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries
Author: Marius Pasca
Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.
5 0.47028777 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster
Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.
6 0.43163097 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections
7 0.40740904 23 emnlp-2013-Animacy Detection with Voting Models
8 0.39251766 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs
9 0.39147532 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
10 0.3598024 58 emnlp-2013-Dependency Language Models for Sentence Completion
11 0.3506726 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI
12 0.34356552 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
13 0.32989952 138 emnlp-2013-Naive Bayes Word Sense Induction
14 0.32136458 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion
15 0.30890495 41 emnlp-2013-Building Event Threads out of Multiple News Articles
17 0.29992804 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations
18 0.29901278 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!
19 0.29476264 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
20 0.29347479 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation
topicId topicWeight
[(3, 0.023), (9, 0.435), (18, 0.018), (22, 0.06), (30, 0.058), (50, 0.012), (51, 0.146), (66, 0.036), (71, 0.028), (75, 0.025), (77, 0.014), (96, 0.03)]
simIndex simValue paperId paperTitle
1 0.83205622 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues
Author: Matt Gardner ; Partha Pratim Talukdar ; Bryan Kisiel ; Tom Mitchell
Abstract: Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRAbased approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.
same-paper 2 0.8162083 182 emnlp-2013-The Topology of Semantic Knowledge
Author: Jimmy Dubuisson ; Jean-Pierre Eckmann ; Christian Scheible ; Hinrich Schutze
Abstract: Studies of the graph of dictionary definitions (DD) (Picard et al., 2009; Levary et al., 2012) have revealed strong semantic coherence of local topological structures. The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). Based on our earlier work (Levary et al., 2012), we study a different class of word definitions, namely those of the Free Association (FA) dataset (Nelson et al., 2004). These are responses by subjects to a cue word, which are then summarized by a directed, free association graph. We find that the structure of this network is quite different from both the Wordnet and the dictionary networks. This difference can be explained by the very nature of free association as compared to the more “logical” construction of dictionaries. It thus sheds some (quantitative) light on the psychology of free association. In NLP, semantic groups or clusters are interesting for various applications such as word sense disambiguation. The FA graph is tighter than the DD graph, because of the large number of triangles. This also makes drift of meaning quite measurable so that FA graphs provide a quantitative measure of the semantic coherence of small groups of words.
3 0.74432296 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao
Abstract: One of the language phenomena that n-gram language model fails to capture is the topic information of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all ngrams into exclusive topics, and two, developing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document.
4 0.4482626 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier
Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.
5 0.43307826 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
Author: He He ; Hal Daume III ; Jason Eisner
Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
6 0.4188267 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
7 0.41794318 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
8 0.41422161 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression
9 0.41116402 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
10 0.41038454 138 emnlp-2013-Naive Bayes Word Sense Induction
11 0.40840012 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
12 0.40586066 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
13 0.40545461 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach
14 0.40355772 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
15 0.40291184 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches
16 0.40183115 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
17 0.39700341 118 emnlp-2013-Learning Biological Processes with Global Constraints
18 0.3940371 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation
19 0.3929382 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections
20 0.39061493 149 emnlp-2013-Overcoming the Lack of Parallel Data in Sentence Compression