emnlp emnlp2010 emnlp2010-124 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ioannis Klapaftis ; Suresh Manandhar
Abstract: Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. The inferred hierarchical structures are applied to the problem of word sense disambiguation, where we show that our method performs sig- nificantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.
Reference: text
sentIndex sentText sentNum sentScore
1 Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. [sent-6, score-0.243]
2 Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. [sent-7, score-0.33]
3 This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. [sent-8, score-0.374]
4 1 Introduction A number of NLP problems can be cast into a graphbased framework, in which entities are represented as vertices in a graph and relations between them are depicted by weighted or unweighted edges. [sent-10, score-0.423]
5 , 2006) have constructed word co-occurrence graphs for a target polysemous word and applied graph-clustering to obtain the clusters (senses) of that word. [sent-12, score-0.344]
6 uk resented as vertices in a graph and edges between them are drawn according to their common tokens or words of a given POS category, e. [sent-16, score-0.496]
7 , 2008) suggest that graphs exhibit a hierarchical structure (e. [sent-24, score-0.271]
8 a binary tree), in which vertices are divided into groups that are further subdivided into groups of groups, and so on, until we reach the leaves. [sent-26, score-0.234]
9 This hierarchical structure provides additional information as opposed to flat clustering by explicitly including organisation at all scales of a graph (Clauset et al. [sent-27, score-0.435]
10 In this paper, we present an unsupervised method for inferring the hierarchical structure (binary tree) of a graph, in which vertices are the contexts of a polysemous word and edges represent the similarity between contexts. [sent-29, score-0.684]
11 Thus, it induces the senses of that word at different levels of sense granularity. [sent-33, score-0.237]
12 Finally, we compare our results with state-of-the-art sense induction systems and show that our method yields improvements. [sent-38, score-0.231]
13 2 Related work Typically, graph-based methods, when applied to unsupervised sense disambiguation represent each word wi co-occurring with the target word tw as a vertex. [sent-40, score-0.247]
14 Two vertices are connected via an edge if they co-occur in one or more contexts of tw. [sent-41, score-0.356]
15 Once the co-occurrence graph of tw has been constructed, different graph clustering algorithms are applied to induce the senses. [sent-42, score-0.411]
16 Figure 2 shows an example of a graph for the target word paper that appears with two different senses scholarly article and newspaper. [sent-44, score-0.329]
17 V ´eronis (2004) has shown that co-occurrence graphs are small-world networks that contain highly dense subgraphs representing the different clusters (senses) of the target word (V´ eronis, 2004). [sent-45, score-0.285]
18 The identified hub is then deleted along with its direct neighbours from the graph producing a new cluster. [sent-48, score-0.304]
19 The deleted region corresponds to the newspaper sense of the target word paper. [sent-50, score-0.207]
20 V ´eronis (2004) further processed the identified clusters (senses), in order to assign the rest of graph vertices to the identified clusters by 746 utilising the minimum spanning tree of the original graph. [sent-51, score-0.547]
21 Nouns are represented as vertices, while edges between vertices are drawn, if their associated nouns co-occur in conjunctions or disjunctions more than a given number of times. [sent-58, score-0.392]
22 The resulting graph is then pruned by removing the target word and vertices with a low degree. [sent-60, score-0.438]
23 Finally, the MCL algorithm (Dongen, 2000) is used to cluster the graph and produce a set of clusters (senses) each one consisting of a set of contextually related words. [sent-61, score-0.224]
24 Chinese Whispers (CW) (Biemann, 2006) is a parameter-free1 graph clustering method that has been applied in sense induction to cluster the cooccurrence graph of a target word (Biemann, 2006), as well as a graph of collocations related to the target word (Klapaftis and Manandhar, 2008). [sent-62, score-0.938]
25 The evaluation of the collocational-graph method in the SemEval-2007 sense induction task (Agirre and Soroa, 2007) showed promising results. [sent-63, score-0.231]
26 All the described methods for sense induction ap1One needs to specify only the number of iterations. [sent-64, score-0.231]
27 Figure 3: Running example of graph creation ply flat graph clustering methods to derive the clusters (senses) of a target word. [sent-68, score-0.543]
28 As a result, they neglect the fact that their constructed graphs often exhibit a hierarchical structure that is useful in several tasks including word sense disambiguation. [sent-69, score-0.436]
29 3 Building a graph of contexts This section describes the process of creating a graph of contexts for a polysemous target word. [sent-70, score-0.594]
30 In the example, the target word paper appears with the scholarly article sense in the contexts A, B, and with the newspaper sense in the contexts C and D. [sent-72, score-0.515]
31 2 Graph creation Graph vertices: To create the graph of vertices, we represent each context ci as a vertex in a graph G. [sent-84, score-0.456]
32 Graph edges: Edges between the vertices of the graph are drawn based on their similarity, defined in Equation 1, where simcl (ci, cj) is the collocational weight of contexts ci, cj and simwd(ci, cj) is their bag-of-words weight. [sent-85, score-0.808]
33 If the edge weight W(ci, cj) is above a prespecified threshold (parameter p3), then an edge is drawn between the corresponding vertices in the graph. [sent-86, score-0.389]
34 W(ci,cj) =21(simcl(ci,cj) + simwd(ci,cj)) (1) Collocational weight: The limited polysemy of collocations can be exploited to compute the similarity between contexts ci and cj. [sent-87, score-0.299]
35 At the end of this process, each context ci of tw is associated with a vector of collocations (vi). [sent-97, score-0.23]
36 2Our definition of context is equivalent to an instance of the target word in the SemEval-2007 sense induction task dataset (Agirre and Soroa, 2007). [sent-99, score-0.273]
37 Given two contexts ci and cj, we calculate their collocational weight using the Jaccard coefficient on the collocational vectors, i. [sent-101, score-0.369]
38 Given two contexts ci and cj, we calculate their bag-of-words weight using the Jaccard coefficient on the word vectors, i. [sent-122, score-0.207]
39 The collocational weight and bag-of-words weight are averaged to derive the edge weight between two contexts as defined in Equation 1. [sent-125, score-0.287]
40 This graph is the input to the hierarchical random graphs method (Clauset et al. [sent-127, score-0.433]
41 4 Hierarchical Random Graphs for sense induction In this section, we describe the process of inferring the hierarchical structure of the graph of contexts using hierarchical random graphs (Clauset et al. [sent-129, score-0.938]
42 748 Figure 4: Two dendrograms for the graph in Figure 3. [sent-131, score-0.325]
43 1 The Hierarchical Random Graph model A dendrogram is a binary tree with n leaves and n 1parents. [sent-133, score-0.277]
44 eG oivfe twn a set of n contexts that we need to arrange hierarchi− cally, let us denote by G = (V, E) the graph of contexts, where V = {v0, v1 . [sent-136, score-0.246]
45 Given an undirected graph G, each of its n vertices is a leaf in a dendrogram, while the internal nodes of that dendrogram indicate the hierarchical relationships among the leaves. [sent-143, score-0.825]
46 The primary assumption in the hierarchical random graph model is that edges in G exist independently, but with a probability that is not identically distributed. [sent-150, score-0.389]
47 Let Dk be an internal node of dendrogram D and f(Dk) be the number of edges between the vertices of the subtrees of the subtree rooted at Dk that actually exist in G. [sent-156, score-0.762]
48 For example, in Figure 4(A), f(D2) = 1, because there is one edge in G connecting vertices B and C. [sent-157, score-0.272]
49 The likelihood of the hierarchical random graph (D, is defined in Equation 2, where A(Dk) = l(Dk)r(Dk) − f(Dk). [sent-160, score-0.279]
50 θ~) L(D,θ~) = Y θfk(Dk)(1 − θk)A(Dk) (2) DYk Y∈D The probabilities θk that maximise the likelihood of a dendrogram D can be easily estimated using the method of MLE i. [sent-161, score-0.217]
51 This means that high-likelihood dendrograms partition vertices into subtrees, such that the connections among their vertices in the observed graph are either very rare or very common (Clauset et al. [sent-166, score-0.793]
52 n leaves in each dendrogram, the total number of differe√nt dendrograms is superexponential ((2n 3)! [sent-184, score-0.196]
53 To deal with this problem, we use a Markov Chain Monte Carlo (MCMC) method that samples dendrograms from the space of dendrogram models with probability proportional to their likelihood. [sent-188, score-0.38]
54 Each time MCMC samples a dendrogram with a new highest likelihood, that dendrogram is stored. [sent-189, score-0.434]
55 Hence, our goal is to choose the highest likelihood dendrogram once MCMC has converged. [sent-190, score-0.217]
56 In particular, given a current dendrogram Dcurr, each internal node Dk of Dcurr is associated with three subtrees of Dcurr. [sent-193, score-0.441]
57 Figure 5: (A) current configuration for internal node Dk and its associated subtrees (B) first alternative configuration, (C) second alternative configuration. [sent-210, score-0.224]
58 1 Sense mapping The output of HRG learning is a dendrogram D with n leaves (contexts) and n 1internal nodes. [sent-217, score-0.25]
59 Such a sense-tagged corpus is needed when induced word senses need to be mapped to a gold standard sense inventory. [sent-220, score-0.237]
60 Instead of using a hard mapping from the dendrogram internal nodes to the Gold Standard (GS) senses, we use a soft probabilistic mapping and calculate P(sk |Di), i. [sent-221, score-0.312]
61 L|eDt F(Di) be the set of training contexts grouped by internal node Di. [sent-223, score-0.217]
62 Let F0(sk) be the set of training contexts that are tagged with sense sk. [sent-224, score-0.223]
63 We followed the setting of SemEval-2007 sense induction task (Agirre and Soroa, 2007). [sent-231, score-0.256]
64 Then, the weight assigned to sense sk is the sum of weighted scores provided by each identified parent. [sent-237, score-0.241]
65 This is shown in Equation 6, where θi is the probability associated with each internal node Di from the hierarchical random graph (see Figure 4(A)). [sent-238, score-0.435]
66 Finally, the highest weight determines the winning sense for context cj (Equation 7). [sent-240, score-0.322]
67 1 Evaluation setting & baselines We evaluate our method on the nouns of the SemEval-2007 word sense induction task (Agirre and Soroa, 2007) under the second evaluation setting of that task, i. [sent-252, score-0.312]
68 The first aim of our evaluation is to test whether inferring the hierarchical structure ofthe constructed graphs improves WSD performance. [sent-258, score-0.37]
69 For that reason our first baseline, Chinese Whispers Unweighted version (CWU), takes as input the same unweighted graph of contexts as HRGs in order to produce a flat clustering. [sent-259, score-0.332]
70 We followed the same sense mapping method as in the SemEval-2007 sense induction task (Agirre and Soroa, 2007). [sent-261, score-0.37]
71 Our second baseline, Chinese Whispers Weighted version (CWW), is similar to the previous one, with the difference that the edges of the input graph are weighted using Equation 1. [sent-262, score-0.239]
72 For clustering the graphs of CWU and CWW we employ, Chinese Whispers4 (Biemann, 2006). [sent-263, score-0.21]
73 The second aim of our evaluation is to assess whether the hierarchical structure inferred by HRGs is more informative than the hierarchical structure inferred by traditional Hierarchical Clustering (HAC). [sent-264, score-0.234]
74 Hence, our third baseline, takes as input a similarity matrix of the graph vertices and performs bottom-up clustering with average-linkage, which has already been used in WSI in (Pantel and Lin, 4The number of iterations for CW was set to 200. [sent-265, score-0.491]
75 To calculate the similarity matrix of vertices we follow a process similar to the one used in Section 4. [sent-267, score-0.273]
76 The similarity between two vertices is calculated according to the degree of connectedness among their direct neighbours. [sent-269, score-0.273]
77 Given two vertices (contexts) ci and cj, let N(ci, cj) be the set oftheir neighbours and K(ci, cj) be the set of edges between the vertices in N(ci, cj). [sent-271, score-0.721]
78 ld exist between vertices in N(ci, cj) is Thus, the similarity of ci, cj is set equ? [sent-274, score-0.461]
79 It seems that using weighted edges creates a bias towards the MFS, in effect missing rare senses of a target word. [sent-314, score-0.217]
80 2) are associated to more than one sense of the target word and most strongly associated to the MFS. [sent-316, score-0.227]
81 Overall, the comparison of HRGs against the CWU and CWW baselines has shown that inferring the hierarchical structure of observed graphs leads to improved WSD performance as opposed to using flat clustering. [sent-318, score-0.403]
82 This is because HRGs are able to in752 fer both the hierarchical structure of the graph and include the probabilities, θk, associated with each internal node. [sent-319, score-0.397]
83 dendrograms in which vertices are less likely to be connected on small scales than on large ones, as well as mixtures of assortative and disassortative (Clauset et al. [sent-356, score-0.465]
84 Brody & Lapata (2009) presented a sense induction method that is related to Latent Dirichlet Allocation (Blei et al. [sent-369, score-0.231]
85 The inclusion of different feature types as separate models in the sense induction process can easily be modeled in our setting, by inferring a different hierarchy of target word instances according to each feature type, and then combining all of them to a consensus tree. [sent-375, score-0.346]
86 (2007) developed a vector-based method that performs sense induction by grouping the contexts of a target word using three types of features, i. [sent-378, score-0.383]
87 Klapaftis & Manandhar (2008) developed a graph-based sense induction method, in which vertices correspond to collocations related to the target word and edges between vertices are drawn ac- Table 3: HRGs against recent methods & baselines. [sent-385, score-0.922]
88 The constructed graph is smoothed to identify more edges between vertices and then clustered using Chinese Whispers (Biemann, 2006). [sent-387, score-0.499]
89 Despite that, it is a flat clustering method that ignores the hierarchical structure exhibited by observed graphs. [sent-389, score-0.232]
90 The previous section has shown that inferring the hierarchical structure of graphs leads to superior WSD performance. [sent-390, score-0.344]
91 7 Conclusion & future work We presented an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. [sent-396, score-0.374]
92 Our method creates a graph, in which vertices correspond to contexts of a polysemous target word and edges between them are drawn according to their similarity. [sent-397, score-0.52]
93 , 2008) was applied 754 to the constructed graph in order to infer its hierarchical structure, i. [sent-399, score-0.305]
94 The learned tree provides an induction of the senses of a given word at different levels of sense granularity and was applied to the problem of WSD. [sent-402, score-0.356]
95 The WSD process mapped the tree’s internal nodes to GS senses using a sense tagged corpus, and then tagged new instances by exploiting the structural information provided by the tree. [sent-403, score-0.332]
96 Our experimental results have shown that our graphs exhibit hierarchical organisation that can be captured by HRGs, in effect providing improved WSD performance compared to flat clustering. [sent-404, score-0.371]
97 Additionally, our comparison against hierarchical agglomerative clustering with average-linkage has shown that HRGs perform significantly better than HAC when the graphs do not suffer from sparsity (disconnected graphs). [sent-405, score-0.353]
98 The comparison with state-of-the-art sense induction systems has shown that our method yields improvements. [sent-406, score-0.231]
99 dependency relations, second-order cooccurrences, named entities and others to construct our undirected graphs and then applying HRGs, in order to measure the impact of each feature type on the induced hierarchical structures within a WSD setting. [sent-409, score-0.271]
100 , 2008), we are also working on using MCMC in order to sample more than one dendrogram at equilibrium, and then combine them to a consensus tree. [sent-411, score-0.217]
wordName wordTfidf (topN-words)
[('hrgs', 0.475), ('dk', 0.314), ('hac', 0.267), ('vertices', 0.234), ('dendrogram', 0.217), ('clauset', 0.198), ('dendrograms', 0.163), ('graph', 0.162), ('cj', 0.155), ('graphs', 0.154), ('sense', 0.139), ('cwu', 0.136), ('hierarchical', 0.117), ('klapaftis', 0.109), ('wsd', 0.105), ('senses', 0.098), ('internal', 0.095), ('cww', 0.095), ('manandhar', 0.095), ('ci', 0.095), ('induction', 0.092), ('agirre', 0.09), ('contexts', 0.084), ('collocational', 0.081), ('dcurr', 0.081), ('collocations', 0.081), ('eronis', 0.081), ('neighbours', 0.081), ('whispers', 0.081), ('edges', 0.077), ('sk', 0.074), ('inferring', 0.073), ('subtrees', 0.068), ('soroa', 0.068), ('clusters', 0.062), ('polysemous', 0.06), ('flat', 0.059), ('clustering', 0.056), ('hrg', 0.054), ('ioannis', 0.054), ('suresh', 0.054), ('gs', 0.052), ('mcmc', 0.048), ('di', 0.048), ('disconnected', 0.046), ('dorow', 0.046), ('biemann', 0.046), ('target', 0.042), ('vj', 0.042), ('disassortative', 0.041), ('organisation', 0.041), ('simcl', 0.041), ('simwd', 0.041), ('similarity', 0.039), ('jaccard', 0.039), ('node', 0.038), ('equation', 0.038), ('edge', 0.038), ('vertex', 0.037), ('brody', 0.036), ('disambiguation', 0.035), ('beate', 0.035), ('hub', 0.035), ('edmonds', 0.035), ('distributionally', 0.035), ('pedersen', 0.034), ('exist', 0.033), ('leaves', 0.033), ('vi', 0.032), ('tw', 0.031), ('nouns', 0.031), ('widdows', 0.029), ('niu', 0.029), ('mcnemar', 0.029), ('threshold', 0.028), ('weight', 0.028), ('chinese', 0.027), ('assortative', 0.027), ('disjunctions', 0.027), ('dnext', 0.027), ('dyk', 0.027), ('ramage', 0.027), ('scholarly', 0.027), ('senseclusters', 0.027), ('slonim', 0.027), ('discriminating', 0.027), ('dense', 0.027), ('unweighted', 0.027), ('sparse', 0.027), ('tree', 0.027), ('constructed', 0.026), ('morristown', 0.026), ('deleted', 0.026), ('agglomerative', 0.026), ('grouping', 0.026), ('setting', 0.025), ('bc', 0.024), ('associated', 0.023), ('drawn', 0.023), ('dunning', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs
Author: Ioannis Klapaftis ; Suresh Manandhar
Abstract: Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. The inferred hierarchical structures are applied to the problem of word sense disambiguation, where we show that our method performs sig- nificantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.
2 0.14364544 66 emnlp-2010-Inducing Word Senses to Improve Web Search Result Clustering
Author: Roberto Navigli ; Giuseppe Crisafulli
Abstract: In this paper, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). We first acquire the senses (i.e., meanings) of a query by means of a graphbased clustering algorithm that exploits cycles (triangles and squares) in the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. Our experiments, conducted on datasets of ambiguous queries, show that our approach improves search result clustering in terms of both clustering quality and degree of diversification.
3 0.112677 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification
Author: Longhua Qian ; Guodong Zhou
Abstract: Seed sampling is critical in semi-supervised learning. This paper proposes a clusteringbased stratified seed sampling approach to semi-supervised learning. First, various clustering algorithms are explored to partition the unlabeled instances into different strata with each stratum represented by a center. Then, diversity-motivated intra-stratum sampling is adopted to choose the center and additional instances from each stratum to form the unlabeled seed set for an oracle to annotate. Finally, the labeled seed set is fed into a bootstrapping procedure as the initial labeled data. We systematically evaluate our stratified bootstrapping approach in the semantic relation classification subtask of the ACE RDC (Relation Detection and Classification) task. In particular, we compare various clustering algorithms on the stratified bootstrapping performance. Experimental results on the ACE RDC 2004 corpus show that our clusteringbased stratified bootstrapping approach achieves the best F1-score of 75.9 on the subtask of semantic relation classification, approaching the one with golden clustering.
4 0.10375385 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira
Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.
5 0.097838014 77 emnlp-2010-Measuring Distributional Similarity in Context
Author: Georgiana Dinu ; Mirella Lapata
Abstract: The computation of meaning similarity as operationalized by vector-based models has found widespread use in many tasks ranging from the acquisition of synonyms and paraphrases to word sense disambiguation and textual entailment. Vector-based models are typically directed at representing words in isolation and thus best suited for measuring similarity out of context. In his paper we propose a probabilistic framework for measuring similarity in context. Central to our approach is the intuition that word meaning is represented as a probability distribution over a set of latent senses and is modulated by context. Experimental results on lexical substitution and word similarity show that our algorithm outperforms previously proposed models.
6 0.086800553 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?
7 0.069865316 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics
8 0.064886697 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping
9 0.058567077 84 emnlp-2010-NLP on Spoken Documents Without ASR
10 0.056587767 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task
11 0.051255289 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
12 0.046056461 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction
13 0.045449652 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
14 0.043499622 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding
15 0.04199129 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
16 0.041054901 39 emnlp-2010-EMNLP 044
17 0.040068813 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction
18 0.040034063 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
19 0.039706394 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space
20 0.03909779 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
topicId topicWeight
[(0, 0.156), (1, 0.094), (2, -0.037), (3, 0.031), (4, 0.013), (5, 0.032), (6, -0.158), (7, 0.1), (8, 0.008), (9, 0.159), (10, 0.07), (11, -0.072), (12, -0.071), (13, -0.12), (14, -0.073), (15, -0.217), (16, 0.014), (17, -0.056), (18, 0.14), (19, 0.013), (20, 0.102), (21, 0.065), (22, -0.14), (23, -0.214), (24, 0.013), (25, 0.07), (26, -0.029), (27, 0.104), (28, -0.114), (29, 0.02), (30, -0.249), (31, 0.105), (32, 0.032), (33, 0.018), (34, -0.129), (35, -0.056), (36, 0.072), (37, -0.136), (38, 0.142), (39, -0.018), (40, -0.067), (41, -0.012), (42, 0.107), (43, 0.077), (44, 0.036), (45, -0.07), (46, 0.013), (47, -0.013), (48, -0.034), (49, 0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.95914459 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs
Author: Ioannis Klapaftis ; Suresh Manandhar
Abstract: Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. The inferred hierarchical structures are applied to the problem of word sense disambiguation, where we show that our method performs sig- nificantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.
2 0.58576876 66 emnlp-2010-Inducing Word Senses to Improve Web Search Result Clustering
Author: Roberto Navigli ; Giuseppe Crisafulli
Abstract: In this paper, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). We first acquire the senses (i.e., meanings) of a query by means of a graphbased clustering algorithm that exploits cycles (triangles and squares) in the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. Our experiments, conducted on datasets of ambiguous queries, show that our approach improves search result clustering in terms of both clustering quality and degree of diversification.
3 0.43143937 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task
Author: Roi Reichart ; Ari Rappoport
Abstract: Polysemy is a major characteristic of natural languages. Like words, syntactic forms can have several meanings. Understanding the correct meaning of a syntactic form is of great importance to many NLP applications. In this paper we address an important type of syntactic polysemy the multiple possible senses of tense syntactic forms. We make our discussion concrete by introducing the task of Tense Sense Disambiguation (TSD): given a concrete tense syntactic form present in a sentence, select its appropriate sense among a set of possible senses. Using English grammar textbooks, we compiled a syntactic sense dictionary comprising common tense syntactic forms and semantic senses for each. We annotated thousands of BNC sentences using the – defined senses. We describe a supervised TSD algorithm trained on these annotations, which outperforms a strong baseline for the task.
4 0.41974691 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira
Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.
5 0.41230991 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics
Author: Joseph Reisinger ; Raymond Mooney
Abstract: We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.
6 0.41127047 77 emnlp-2010-Measuring Distributional Similarity in Context
7 0.34428865 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification
8 0.34253952 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding
9 0.32782918 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping
10 0.32297251 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference
11 0.30650803 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?
12 0.2475304 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition
13 0.2467557 84 emnlp-2010-NLP on Spoken Documents Without ASR
14 0.21752793 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
15 0.19509317 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
16 0.18876648 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction
17 0.18677661 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
18 0.16718537 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text
19 0.15463887 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
20 0.15269044 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction
topicId topicWeight
[(10, 0.013), (12, 0.02), (17, 0.372), (29, 0.109), (30, 0.031), (32, 0.018), (52, 0.053), (56, 0.039), (62, 0.012), (66, 0.124), (72, 0.04), (76, 0.016), (79, 0.013), (83, 0.017), (87, 0.022), (89, 0.017)]
simIndex simValue paperId paperTitle
1 0.73229629 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution
Author: Karthik Raghunathan ; Heeyoung Lee ; Sudarshan Rangarajan ; Nate Chambers ; Mihai Surdeanu ; Dan Jurafsky ; Christopher Manning
Abstract: Most coreference resolution models determine if two mentions are coreferent using a single function over a set of constraints or features. This approach can lead to incorrect decisions as lower precision features often overwhelm the smaller number of high precision ones. To overcome this problem, we propose a simple coreference architecture based on a sieve that applies tiers of deterministic coreference models one at a time from highest to lowest precision. Each tier builds on the previous tier’s entity cluster output. Further, our model propagates global information by sharing attributes (e.g., gender and number) across mentions in the same cluster. This cautious sieve guarantees that stronger features are given precedence over weaker ones and that each decision is made using all of the information available at the time. The framework is highly modular: new coreference modules can be plugged in without any change to the other modules. In spite of its simplicity, our approach outperforms many state-of-the-art supervised and unsupervised models on several standard corpora. This suggests that sievebased approaches could be applied to other NLP tasks.
same-paper 2 0.72002077 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs
Author: Ioannis Klapaftis ; Suresh Manandhar
Abstract: Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. The inferred hierarchical structures are applied to the problem of word sense disambiguation, where we show that our method performs sig- nificantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.
3 0.46826631 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation
Author: Jacob Eisenstein ; Brendan O'Connor ; Noah A. Smith ; Eric P. Xing
Abstract: The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports” or “entertainment” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models.
4 0.43924734 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment
Author: Samuel Brody
Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.
5 0.43679664 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
Author: Adria de Gispert ; Juan Pino ; William Byrne
Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.
6 0.43655607 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
7 0.43476644 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
8 0.43168876 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics
9 0.42969519 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
10 0.42848155 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
11 0.42801896 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
12 0.42772195 109 emnlp-2010-Translingual Document Representations from Discriminative Projections
13 0.42767376 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space
14 0.42727312 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
15 0.42654192 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
16 0.42626137 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
17 0.42421925 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation
18 0.42395771 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
19 0.42364711 104 emnlp-2010-The Necessity of Combining Adaptation Methods
20 0.42336532 84 emnlp-2010-NLP on Spoken Documents Without ASR