acl acl2012 acl2012-12 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Seokhwan Kim ; Gary Geunbae Lee
Abstract: Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus.
Reference: text
sentIndex sentText sentNum sentScore
1 s g Abstract Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. [sent-5, score-0.406]
2 To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. [sent-6, score-0.622]
3 This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus. [sent-7, score-1.083]
4 1 Introduction Relation extraction aims to identify semantic relations of entities in a document. [sent-8, score-0.246]
5 Although many supervised machine learning approaches have been successfully applied to relation extraction tasks (Zelenko et al. [sent-9, score-0.406]
6 , 2006), applications of these approaches are still limited because they require a sufficient number of training examples to obtain good extraction results. [sent-11, score-0.176]
7 Several datasets that provide manual annotations of semantic relationships are available from MUC (Grishman and Sundheim, 1996) and ACE (Doddington et al. [sent-12, score-0.154]
8 Although these datasets encourage the development of relation extractors for these major languages, there are few labeled training samples for learning new systems in 48 Gary Geunbae Lee Dept. [sent-14, score-0.264]
9 Because manual annotation of semantic relations for such resourcepoor languages is very expensive, we instead consider weakly supervised learning techniques (Riloff and Jones, 1999; Agichtein and Gravano, 2000; Zhang, 2004; Chen et al. [sent-18, score-0.311]
10 , 2006) to learn the relation extractors without significant annotation efforts. [sent-19, score-0.36]
11 , 2007) and Wikipedia (Wu and Weld, 2010), that were not specially constructed for relation extraction instead of using task-specific training or seed examples. [sent-22, score-0.373]
12 We previously proposed to leverage parallel corpora as a new kind of external resource for relation extraction (Kim et al. [sent-23, score-0.46]
13 To obtain training examples in the resource-poor target language, this approach exploited a cross-lingual annotation projection by propagating annotations that were gener- ÚÓ ated by a relation extraction system in a resourcerich source language. [sent-25, score-1.146]
14 In this approach, projected annotations were determined in a single pass process by considering only alignments between entity candidates; we call this action direct projection. [sent-26, score-0.365]
15 In this paper, we propose a graph-based projection approach for weakly supervised relation extraction. [sent-27, score-0.829]
16 This approach utilizes a graph that is constucted with both instance and context information and that is operated in an iterative manner. [sent-28, score-0.284]
17 The goal of our graph-based approach is to improve the robustness of the extractor with respect to errors that are generated and accumulated by preprocessors. [sent-29, score-0.089]
18 Cross-lingual annotation projection intends to learn an extractor ft for good performance without significant effort toward building resources for a resource-poor target language Lt. [sent-43, score-0.804]
19 To accomplish that goal, the method automatically creates a set of annotated text for ft, utilizing a well-made extractor fs for a resource-rich source language Ls and a parallel corpus of Ls and Lt. [sent-44, score-0.245]
20 Figure 1 shows an example of annotation projection for relation extraction with a bi-text in Lt Korean and Ls English. [sent-45, score-0.952]
21 Given an English sentence, an instance hBarack Obama, Honolului his s eexnttreanccteed, as positive. [sent-46, score-0.063]
22 Early studies in cross-lingual annotation projection were accomplished for various natural language processing tasks (Yarowsky and Ngai, 2001; Yarowsky et al. [sent-48, score-0.614]
23 These studies adopted a simple direct projection strategy that propagates the annotations in the source language sentences to word-aligned target sentences, and a target system can bootstrap from these projected annotations. [sent-51, score-0.921]
24 However, these automatic annotations can be unreliable because of source text mis-classification and word alignment errors; thus, it can cause a critical falling-off in the annotation projection quality. [sent-61, score-0.718]
25 Although some noise reduction strategies for projecting semantic relations were proposed (Kim et al. [sent-62, score-0.145]
26 , 2010), the direct projection approach is still vulnerable to erroneous inputs generated by submodules. [sent-63, score-0.588]
27 We note two main causes for this limitation: (1) the direct projection approach considers only alignments between entity candidates, and it does not consider any contextual information; and, (2) it is performed by a single pass process. [sent-64, score-0.713]
28 To solve both of these problems at once, we propose a graph-based projection approach for relation extraction. [sent-65, score-0.737]
29 3 Graph Construction The most crucial factor in the success of graphbased learning approaches is how to construct a graph that is appropriate for the target task. [sent-66, score-0.216]
30 Das and Petrov (Das and Petrov, 2011) proposed a graphbased bilingual projection of part-of-speech tagging by considering the tagged words in the source language as labeled examples and connecting them to the unlabeled words in the target language, while referring to the word alignments. [sent-67, score-0.815]
31 Graph construction for projecting semantic relationships is more complicated than part-of-speech tagging because the unit instance of projection is a pair of entities and not a word or morpheme that is equivalent to the alignment unit. [sent-68, score-0.756]
32 1 Graph Vertices To construct a graph for a relation projection, we define two types of vertices: instance vertices V and context vertices U. [sent-70, score-1.125]
33 Instance vertices are defined for all pairs of entity candidates in the source and target languages. [sent-71, score-0.476]
34 Each instance vertex has a soft label vector Y = [ y− ], which contains the probabilities that the instance is positive or negative, respectively. [sent-72, score-0.516]
35 The larger the value, the more likely the instance has a semantic relationship. [sent-73, score-0.108]
36 The initial label vDalues oEf an y+ y+ instance vertex vsij∈ Vsfor the instanceDeis,esjEin the source language are assigned based oDn the cEonfidence score of the extractor fs. [sent-74, score-0.549]
37 With respect to the target language, every instance vertex ∈ Vt has the same initial values of 0. [sent-75, score-0.432]
38 The other type of vertices, context vertices, are used for identifying relation descriptors that are contextual subtexts that represent semantic relationships of the positive instances. [sent-77, score-0.413]
39 Because the characteristics of these descriptive contexts vary depending on the language, context vertices should be defined to be language-specific. [sent-78, score-0.412]
40 In the case of English, we define vitj y+ the context vertex for each trigram that is located between a given entity pair that is semantically related. [sent-79, score-0.408]
41 If the context vertices Us for the source language sentences are defined, then the units of context in the target language can also be created based on the word alignments. [sent-80, score-0.583]
42 The aligned counterpart of each source language context vertex is used for generating a context vertex uti ∈ Ut in the target language. [sent-81, score-0.886]
43 Each context vertex us ∈ Us and ut ∈ Ut also has and y−, which represent how likely Uthe context is to denote semantic relationships. [sent-82, score-0.59]
44 The probability values for all of the context vertices in both of the languages are initially assigned to = y− = 0. [sent-83, score-0.441]
45 2 Edge Weights The graph for our graph-based projection is constructed by connecting related vertex pairs by weighted edges. [sent-86, score-1.01]
46 If a given pair of vertices is likely to have the same label, then the edge connecting these vertices should have a large weight value. [sent-87, score-0.864]
47 The first type of edges consists of connections between an instance vertex and a context vertex in the same language. [sent-89, score-0.816]
48 For a pair of an instance vertex vi,j and a context vertex uk, these vertices are connected if the context sequence of vi,j contains uk as a subseq? [sent-90, score-1.229]
49 Another edge category is for the pairs of context vertices in a language. [sent-97, score-0.498]
50 Because each context vertex is considered to be an n-gram pattern in our work, the weight value for each edge of this type represents the pattern similarity between two context vertices. [sent-98, score-0.563]
51 The edge weight w(uk , ul) is computed by Jaccard’s coefficient between uk and ul. [sent-99, score-0.186]
52 While the previous two categories of edges are concerned with monolingual connections, the other type addresses bilingual alignments of context vertices between the source language and the target lan50 guage. [sent-100, score-0.683]
53 We define the weight for a bilingual edge connecting usk and utl as the relative frequency of alignments, as follows: w(usk,utl) = count? [sent-101, score-0.213]
54 , × where count (us, ut) is the number of alignments between us and ut across the whole parallel corpus. [sent-105, score-0.236]
55 4 Label Propagation To induce labels for all of the unlabeled vertices on the graph constructed in Section 3, we utilize the label propagation algorithm (Zhu and Ghahramani, 2002), which is a graph-based semi-supervised learning algorithm. [sent-106, score-0.614]
56 First, we construct an n n matrix T that represFeinrtsst ,t wranes ciotionnst probabilities nfo mr aatlrli xof T Tth teh vertex pairs. [sent-107, score-0.373]
57 After assigning all of the values on the matrix, we normalize the matrix for each row, to make the element values be probabilities. [sent-108, score-0.065]
58 The other input to the algorithm is an n 2 matrix Y , which inditcoat ethse eth aelg probabilities o nf w ×h 2etmh eart a given vertex vi iispositive or not. [sent-109, score-0.373]
59 The matrix T and Y are initialized by the values described in Section 3. [sent-110, score-0.065]
60 For the input matrices T and Y , label propagation is performed by multiplying the two matrices, to update the Y matrix. [sent-111, score-0.153]
61 5 Implementation To demonstrate the effectiveness of the graph-based projection approach for relation extraction, we developed a Korean relation extraction system that was trained with projected annotations from English resources. [sent-114, score-1.245]
62 We used an English-Korean parallel corpus 1 that contains 266,892 bi-sentence pairs in English and Korean. [sent-115, score-0.072]
63 We obtained 155,409 positive instances from the English sentences using an off-theshelf relation extraction system, ReVerb 2 (Fader et al. [sent-116, score-0.372]
64 1The parallel corpus collected is available in our website: http://isoft. [sent-118, score-0.072]
65 edu/ Table 1: Comparison between direct and graph-based projection approaches to extract semantic relationships for four relation types Type P DirRect F PGraphR-basedF Acquisition51. [sent-124, score-0.926]
66 3 The English sentence annotations in the parallel corpus were then propagated into the corresponding Korean sentences. [sent-154, score-0.135]
67 We used the GIZA++ software 3 (Och and Ney, 2003) to obtain the word alignments for each bi-sentence in the parallel corpus. [sent-155, score-0.137]
68 The graph-based projection was performed by the Junto toolkit 4 with the maximum number of iterations of 10 for each execution. [sent-156, score-0.518]
69 Projected instances were utilized as training examples to learn the Korean relation extractor. [sent-157, score-0.248]
70 In our model, we adopted the subtree kernel method for the shortest path dependency kernel (Bunescu and Mooney, 2005). [sent-159, score-0.182]
71 The dataset consists of 500 sentences for four relation types: Acquisition, Birthplace, Inventor of, and Won Prize. [sent-162, score-0.219]
72 The first experiment aimed to compare two systems constructed by the direct projection (Kim et al. [sent-164, score-0.623]
73 Table 1 shows the performances of the relation extraction of the two systems. [sent-166, score-0.385]
74 htm 51 Table 2: Comparisons of our projection approach to heuristic and Wikipedia-based approaches Approach P R F Heuristic-based92. [sent-176, score-0.546]
75 30 system with direct projection for all of the four relation types. [sent-185, score-0.807]
76 To demonstrate the merits of our work against other approaches based on monolingual external resources, we performed comparisons with the following two baselines: heuristic-based (Banko et al. [sent-188, score-0.19]
77 7 Conclusions This paper presented a novel graph-based projection approach for relation extraction. [sent-196, score-0.737]
78 Our approach performed a label propagation algorithm on a proposed graph that represented the instance and context fea- tures of both the source and target languages. [sent-197, score-0.439]
79 The feasibility of our approach was demonstrated by our Korean relation extraction system. [sent-198, score-0.338]
80 Experimental results show that our graph-based projection helped to improve the performance of the cross-lingual annotation projection of the semantic relations, and our system outperforms the other systems, which incorporate monolingual external resources. [sent-199, score-1.261]
81 In this work, we operated the graph-based projection under very restricted conditions, because of high complexity of the algorithm. [sent-200, score-0.582]
82 For future work, we plan to relieve the complexity problem for dealing with more expanded graph structure to improve the performance of our proposed approach. [sent-201, score-0.088]
83 In Proceedings of the fifth ACM conference on Digital libraries, pages 85–94. [sent-209, score-0.053]
84 In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2670–2676. [sent-218, score-0.053]
85 In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 724–73 1. [sent-224, score-0.053]
86 In Proceedings of the 45th annual meeting of the Associationfor ComputationalLinguistics, volume 45, pages 576–583. [sent-230, score-0.053]
87 Relation ex- traction using label propagation based semi-supervised learning. [sent-237, score-0.117]
88 In Proceedings of the 21st International Conference on ComputationalLinguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 129–136. [sent-238, score-0.053]
89 In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 600–609. [sent-244, score-0.053]
90 The automatic content extraction (ACE) program–tasks, data, and evaluation. [sent-253, score-0.119]
91 In Pro52 ceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1535–1545. [sent-261, score-0.053]
92 In Proceedings of the 16th conference on Computational linguistics, volume 1, pages 466–471. [sent-267, score-0.053]
93 Bootstrapping parsers via syntactic projec- tion across parallel texts. [sent-275, score-0.072]
94 In Proceedings of the European Conference on Machine Learning, pages 137–142. [sent-281, score-0.053]
95 In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, pages 22–25. [sent-286, score-0.053]
96 Korean national corpus in the 21st century sejong project. [sent-298, score-0.064]
97 Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. [sent-333, score-0.548]
98 Inducing multilingual text analysis tools via robust projection across aligned corpora. [sent-340, score-0.548]
99 A composite kernel to extract relations between entities with both flat and structured features. [sent-355, score-0.158]
100 Learning from labeled and unlabeled data with label propagation. [sent-366, score-0.079]
wordName wordTfidf (topN-words)
[('projection', 0.518), ('vertices', 0.343), ('vertex', 0.308), ('korean', 0.236), ('relation', 0.219), ('extraction', 0.119), ('bunescu', 0.11), ('projected', 0.107), ('ut', 0.099), ('annotation', 0.096), ('extractor', 0.089), ('graph', 0.088), ('edge', 0.086), ('kim', 0.082), ('yarowsky', 0.082), ('kernel', 0.076), ('parallel', 0.072), ('computationallinguistics', 0.072), ('direct', 0.07), ('uk', 0.069), ('context', 0.069), ('propagation', 0.069), ('alignments', 0.065), ('matrix', 0.065), ('ls', 0.064), ('inventor', 0.064), ('operated', 0.064), ('sejong', 0.064), ('annotations', 0.063), ('instance', 0.063), ('banko', 0.062), ('connecting', 0.061), ('target', 0.061), ('mooney', 0.06), ('agichtein', 0.056), ('birthplace', 0.056), ('ngai', 0.056), ('pohang', 0.056), ('zitouni', 0.056), ('korea', 0.055), ('rr', 0.055), ('aa', 0.054), ('pages', 0.053), ('weakly', 0.052), ('mke', 0.051), ('projecting', 0.051), ('zelenko', 0.051), ('external', 0.05), ('relations', 0.049), ('doddington', 0.048), ('merits', 0.048), ('won', 0.048), ('label', 0.048), ('performances', 0.047), ('relationships', 0.046), ('das', 0.046), ('semantic', 0.045), ('extractors', 0.045), ('bb', 0.045), ('fader', 0.045), ('pado', 0.045), ('oo', 0.043), ('uu', 0.043), ('associationfor', 0.043), ('fs', 0.043), ('source', 0.041), ('supervised', 0.04), ('ft', 0.04), ('graphbased', 0.039), ('riloff', 0.039), ('hwa', 0.038), ('ace', 0.038), ('weld', 0.037), ('grishman', 0.037), ('ministry', 0.036), ('economy', 0.036), ('matrices', 0.036), ('edges', 0.035), ('bilingual', 0.035), ('constructed', 0.035), ('ho', 0.035), ('monolingual', 0.034), ('soderland', 0.034), ('positive', 0.034), ('connections', 0.033), ('entities', 0.033), ('wikipedia', 0.032), ('entity', 0.031), ('unlabeled', 0.031), ('weight', 0.031), ('aligned', 0.03), ('comparisons', 0.03), ('shortest', 0.03), ('examples', 0.029), ('wu', 0.029), ('languages', 0.029), ('pass', 0.029), ('approaches', 0.028), ('cessing', 0.028), ('bk', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Author: Seokhwan Kim ; Gary Geunbae Lee
Abstract: Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus.
2 0.31486487 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
Author: Sungchul Kim ; Kristina Toutanova ; Hwanjo Yu
Abstract: In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence. The model outperforms both standard annotation projection methods and methods based solely on Wikipedia metadata.
3 0.17135753 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
Author: Enrique Alfonseca ; Katja Filippova ; Jean-Yves Delort ; Guillermo Garrido
Abstract: We describe the use of a hierarchical topic model for automatically identifying syntactic and lexical patterns that explicitly state ontological relations. We leverage distant supervision using relations from the knowledge base FreeBase, but do not require any manual heuristic nor manual seed list selections. Results show that the learned patterns can be used to extract new relations with good precision.
4 0.1135877 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
5 0.11313251 42 acl-2012-Bootstrapping via Graph Propagation
Author: Max Whitney ; Anoop Sarkar
Abstract: Bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined objective function. The experimental results show that our proposed bootstrapping algorithm achieves state of the art performance or better on several different natural language data sets.
6 0.11112321 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
7 0.10357422 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
8 0.097914457 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
9 0.096256234 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
10 0.089509904 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification
11 0.084786668 64 acl-2012-Crosslingual Induction of Semantic Roles
12 0.080442443 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
13 0.076213293 169 acl-2012-Reducing Wrong Labels in Distant Supervision for Relation Extraction
14 0.076189205 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
15 0.072967604 73 acl-2012-Discriminative Learning for Joint Template Filling
16 0.07168328 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
17 0.07149715 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
18 0.068356715 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
19 0.067863375 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
20 0.067796811 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
topicId topicWeight
[(0, -0.225), (1, 0.081), (2, -0.042), (3, 0.089), (4, 0.082), (5, 0.018), (6, -0.07), (7, 0.035), (8, -0.039), (9, -0.06), (10, 0.175), (11, -0.034), (12, -0.02), (13, -0.036), (14, 0.087), (15, 0.085), (16, -0.118), (17, -0.046), (18, 0.156), (19, 0.113), (20, -0.115), (21, -0.006), (22, 0.02), (23, -0.029), (24, -0.045), (25, 0.134), (26, -0.103), (27, -0.082), (28, -0.111), (29, 0.24), (30, -0.009), (31, -0.003), (32, 0.055), (33, 0.177), (34, -0.195), (35, 0.028), (36, -0.092), (37, -0.07), (38, -0.005), (39, 0.113), (40, -0.052), (41, 0.018), (42, -0.056), (43, 0.126), (44, 0.173), (45, -0.009), (46, -0.083), (47, 0.068), (48, -0.062), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.94778079 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Author: Seokhwan Kim ; Gary Geunbae Lee
Abstract: Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus.
2 0.80331796 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
Author: Sungchul Kim ; Kristina Toutanova ; Hwanjo Yu
Abstract: In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence. The model outperforms both standard annotation projection methods and methods based solely on Wikipedia metadata.
3 0.53339922 42 acl-2012-Bootstrapping via Graph Propagation
Author: Max Whitney ; Anoop Sarkar
Abstract: Bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined objective function. The experimental results show that our proposed bootstrapping algorithm achieves state of the art performance or better on several different natural language data sets.
4 0.47002628 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
Author: Enrique Alfonseca ; Katja Filippova ; Jean-Yves Delort ; Guillermo Garrido
Abstract: We describe the use of a hierarchical topic model for automatically identifying syntactic and lexical patterns that explicitly state ontological relations. We leverage distant supervision using relations from the knowledge base FreeBase, but do not require any manual heuristic nor manual seed list selections. Results show that the learned patterns can be used to extract new relations with good precision.
5 0.44110313 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
Author: Marcis Pinnis ; Radu Ion ; Dan Stefanescu ; Fangzhong Su ; Inguna Skadina ; Andrejs Vasiljevs ; Bogdan Babych
Abstract: The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual mapping of terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. This demonstration focuses on the English, Latvian, Lithuanian, and Romanian languages.
6 0.42939711 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
7 0.41524139 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
8 0.40155584 153 acl-2012-Named Entity Disambiguation in Streaming Data
9 0.39504075 73 acl-2012-Discriminative Learning for Joint Template Filling
10 0.37620151 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
11 0.36359316 191 acl-2012-Temporally Anchored Relation Extraction
12 0.36160195 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
13 0.35975933 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
14 0.35030431 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
15 0.34916711 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification
16 0.34050906 129 acl-2012-Learning High-Level Planning from Text
17 0.33994699 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
18 0.33741722 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
19 0.33555844 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy
20 0.32516593 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
topicId topicWeight
[(25, 0.024), (26, 0.026), (28, 0.049), (37, 0.061), (39, 0.042), (74, 0.036), (82, 0.359), (84, 0.018), (85, 0.025), (90, 0.096), (92, 0.039), (94, 0.029), (96, 0.023), (99, 0.087)]
simIndex simValue paperId paperTitle
1 0.8436588 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions
Author: Amjad Abu-Jbara ; Dragomir Radev
Abstract: We present Subgroup Detector, a system for analyzing threaded discussions and identifying the attitude of discussants towards one another and towards the discussion topic. The system uses attitude predictions to detect the split of discussants into subgroups of opposing views. The system uses an unsupervised approach based on rule-based opinion target detecting and unsupervised clustering techniques. The system is open source and is freely available for download. An online demo of the system is available at: http://clair.eecs.umich.edu/SubgroupDetector/
2 0.83801752 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens
Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.
same-paper 3 0.8214758 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Author: Seokhwan Kim ; Gary Geunbae Lee
Abstract: Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus.
4 0.80508548 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
Author: Ioannis Konstas ; Mirella Lapata
Abstract: This paper proposes a data-driven method for concept-to-text generation, the task of automatically producing textual output from non-linguistic input. A key insight in our approach is to reduce the tasks of content selection (“what to say”) and surface realization (“how to say”) into a common parsing problem. We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. Experimental evaluation on the ATIS domain shows that our model outperforms a competitive discriminative system both using BLEU and in a judgment elicitation study.
5 0.65754277 187 acl-2012-Subgroup Detection in Ideological Discussions
Author: Amjad Abu-Jbara ; Pradeep Dasigi ; Mona Diab ; Dragomir Radev
Abstract: The rapid and continuous growth of social networking sites has led to the emergence of many communities of communicating groups. Many of these groups discuss ideological and political topics. It is not uncommon that the participants in such discussions split into two or more subgroups. The members of each subgroup share the same opinion toward the discussion topic and are more likely to agree with members of the same subgroup and disagree with members from opposing subgroups. In this paper, we propose an unsupervised approach for automatically detecting discussant subgroups in online communities. We analyze the text exchanged between the participants of a discussion to identify the attitude they carry toward each other and towards the various aspects of the discussion topic. We use attitude predictions to construct an attitude vector for each discussant. We use clustering techniques to cluster these vectors and, hence, determine the subgroup membership of each participant. We compare our methods to text clustering and other baselines, and show that our method achieves promising results.
6 0.59049648 191 acl-2012-Temporally Anchored Relation Extraction
8 0.53705156 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
9 0.53032273 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
10 0.52055061 31 acl-2012-Authorship Attribution with Author-aware Topic Models
11 0.50371563 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
12 0.49941739 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
13 0.49222106 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
14 0.49153492 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
15 0.48697436 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
16 0.48673135 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
17 0.4837153 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
18 0.48359716 71 acl-2012-Dependency Hashing for n-best CCG Parsing
19 0.48058856 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
20 0.47275716 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations