emnlp emnlp2012 emnlp2012-30 knowledge-graph by maker-knowledge-mining

30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing


Source: pdf

Author: Hui Yang

Abstract: Taxonomies can serve as browsing tools for document collections. However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. .

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Constructing Task-Specific Taxonomies for Document Collection Browsing Hui Yang Department of Computer Science Georgetown University 37th and O street, NW Washington, DC, 20057 huiyang@ c s Abstract Taxonomies can serve as browsing tools for document collections. [sent-1, score-0.501]

2 However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. [sent-2, score-0.366]

3 This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. [sent-3, score-0.915]

4 The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. [sent-4, score-0.255]

5 The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. [sent-5, score-0.81]

6 A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. [sent-6, score-0.033]

7 In fact, taxonomies serve as browsing tools in many venues, including the Library of Congress Subject Headings (LCSH, 2011) for the U. [sent-15, score-0.789]

8 When used for browsing, concepts1 in taxonomies are linked to documents containing them and taxonomic structures are navigated to find particular doc- uments. [sent-20, score-0.467]

9 Users can navigate through a browsing taxonomy to explore the documents in the collection. [sent-21, score-0.918]

10 A browsing taxonomy benefits information access by providing corpus overview for a document collection and allowing more focused reading by presenting together documents about the same concept. [sent-22, score-1.02]

11 Most existing browsing taxonomies, such as LCSH and ODP, are manually constructed to support large collections in general domains. [sent-23, score-0.499]

12 In situations where document collections are given ad-hoc, such as search result organization (Carpineto et al. [sent-25, score-0.088]

13 , 2009), email collection exploration (Yang and Callan, 2008), and literature investigation (Chau et al. [sent-26, score-0.046]

14 , 2011), existing taxonomies may even not be able to provide the right coverage of concepts. [sent-27, score-0.366]

15 It is necessary to explore ad-hoc (semi-)automatic techniques to quickly derive task-specific browsing taxonomies for arbitrary document collections. [sent-28, score-0.887]

16 (Hovy, 2002) pointed out that one key challenge in taxonomy construction is multiple perspectives embedded in concepts and relations. [sent-29, score-0.809]

17 One cause for multiple perspectives is the inherent facets in concepts, e. [sent-30, score-0.147]

18 For example, when building a taxonomy for search results of query trip to 1English terms or entities; usually nouns or noun phrases. [sent-34, score-0.473]

19 Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls DC, Jane may organize the concepts based on places of interests while Tom may organize them based on dates in visit. [sent-37, score-0.45]

20 Typically, a taxonomy only conveys one or two perspectives from many choices. [sent-38, score-0.546]

21 One realistic solution is to leave the decision to the constructor independent of the confusion that comes from facets, task specification or personalization. [sent-40, score-0.094]

22 When multiple perspectives present in the same taxonomy, it is not uncommon that the perspectives are mixed. [sent-41, score-0.19]

23 For example, along a path financial institute→bank→river bank, financpiaatlh fininsatintuctiea→l ibnasntiktu seh→obwasn one perspective aanndbcaianlk→ inrsitivteurt eb→anbka snhkow ssh oawnsoth oenr. [sent-42, score-0.256]

24 Many approaches on paurotob-matic taxonomy construction suffer from this problem because their foci are on accurately identifying local relations between concept pairs (Etzioni et al. [sent-44, score-0.611]

25 , 2005; Pantel and Pennacchiotti, 2006) instead of on global control over the entire taxonomic structure. [sent-45, score-0.101]

26 More recently, approaches attempted to build the full taxonomy structure (Snow et al. [sent-46, score-0.503]

27 , 2006; Yang and Callan, 2009; Kozareva and Hovy, 2010), however, few have looked into how to incorporate task specifications into taxonomy construction. [sent-47, score-0.503]

28 In this paper, we extended an existing taxonomy construction approach (Yang and Callan, 2009) to build task-specific taxonomies for document collection browsing. [sent-48, score-0.986]

29 The extension comes in two parts: handling path consistency and incorporating specifications from users. [sent-49, score-0.317]

30 We uniquely employ pairwise semantic distance as an entry point to incrementally build browsing taxonomies. [sent-50, score-0.691]

31 A supervised distance learning algorithm not only allows us to incorporate multiple semantic features to evaluate the proximity between concepts, but also allows us to learn the metric function from personal preferences. [sent-51, score-0.216]

32 Users can thus manually modify the taxonomies and to some extent teach the algorithm to predict his/her way to organize the concepts. [sent-52, score-0.456]

33 Moreover, by minimizing the overall semantic distances among concepts and restricting minimal semantic distances along a path, we find the best hierarchical structure as the browsing taxonomy. [sent-53, score-1.172]

34 2 Related Work Document collection browsing has been studied as an alternative to the ranked list representation for search results by the Information Retrieval (IR) community. [sent-55, score-0.491]

35 , 1992) and monothetic concept hierarchies (Sanderson and Croft, 1999; Lawrie et al. [sent-57, score-0.21]

36 Clustering approaches hierarchically cluster documents in a collection and label the clus- ters. [sent-61, score-0.068]

37 Monothetic approaches organize the concepts into hierarchies and link documents to related concepts. [sent-62, score-0.395]

38 Both approaches are mainly based on pure statistics, such as document frequency (Sanderson and Croft, 1999) and conditional probability (Lawrie et al. [sent-63, score-0.082]

39 The major drawback of these pure statistical approaches is their neglect of semantics among concepts. [sent-65, score-0.048]

40 The NLP community has extensively studied automatic taxonomy construction. [sent-67, score-0.451]

41 Although traditional research on taxonomy construction focuses on extracting local relations between concept pairs (Hearst, 1992; Berland and Charniak, 1999; Ravichandran and Hovy, 2002; Girju et al. [sent-68, score-0.611]

42 , 2006) proposed to estimate taxonomic structure via maximizing the overall likelihood of a taxonomy. [sent-73, score-0.101]

43 (Kozareva and Hovy, 2010) proposed to connect local concept pairs by finding the longest path in a subsumption graph. [sent-74, score-0.28]

44 Researcher also attempted to carve out taxonomies from existing ones. [sent-76, score-0.388]

45 (Stoica and Hearst, 2007) managed to extract a browsing taxonomy from hypernym relations within WordNet (Fellbaum, 1998). [sent-78, score-0.931]

46 To support browsing in arbitrary collections, in this paper, we propose to incorporate task specification in a taxonomy. [sent-79, score-0.561]

47 One way to achieve it is to define task-specific distances among concepts. [sent-80, score-0.15]

48 Moreover, through controlling distance scores among concepts, we can enforce path consistency in taxonomies. [sent-81, score-0.435]

49 For example, when the distance between financial institute and river bank is big, the path financial institute→bank→river bank will be pruned and the concepts →wbial n bke→ repositioned. [sent-82, score-0.83]

50 Inspired by ME, we take a distance learning approach to deal with path consistency (Section 3) and task specification (Section 4) in taxonomy construction. [sent-83, score-0.956]

51 3 Build Structure-Optimized Taxonomies This section presents how to automatically build taxonomies. [sent-84, score-0.03]

52 We take two steps to build browsing taxonomy for a given document collection. [sent-85, score-0.982]

53 The first step is to extract the concepts and the second is to organize the concepts. [sent-86, score-0.338]

54 For concept extraction, we take a simple but effective approach: (1) We first parse the document collection and exhaustively extract nouns, noun phrases, and named entities that occur >5 times in the collection. [sent-87, score-0.225]

55 In the test, we search each candidate concept in the Google search engine and remove a candidate if it appears <4 times within the top 10 Google snippets. [sent-89, score-0.123]

56 (3) We finally cluster similar concept candidates into groups by Latent Semantic Analysis (Bellegarda et al. [sent-90, score-0.123]

57 , 1996) and select the candidate with the highest tfidf value within a group to form the concept set C. [sent-91, score-0.123]

58 Although our extraction algorithm is very effective with 95% precision and 80% recall in a manual evaluation, sometimes C may still miss some important concepts for the collection. [sent-92, score-0.226]

59 This can be later corrected by users interactively through adding new concepts (Section 4). [sent-93, score-0.283]

60 To organize the concepts in C into taxonomic structures, we extend the incremental clustering framework proposed by ME (Yang and Callan, 1280 2009). [sent-94, score-0.46]

61 At each insertion, a concept cz is at the parent (or child) position for every existing node in the current taxonomy. [sent-96, score-0.337]

62 The evaluation of the best position depends on the semantic distance between cz and its temporary child (or parent) node and the semantic distance among all other concepts in the taxonomy. [sent-97, score-0.9]

63 An advantage in ME is that it allows incorporating various constraints to the taxonomic structure. [sent-98, score-0.101]

64 For example, ME can handle concept generalityspecificity by learning different semantic distance functions for general concepts which are located at upper levels and specific concepts which are located at lower levels in a taxonomy. [sent-99, score-0.872]

65 In this section, we introduce a new semantic distance learning method (Section 3. [sent-100, score-0.216]

66 1) and extend ME by controlling path consistency (Section 3. [sent-101, score-0.31]

67 1 Estimating Semantic Distances Pair-wise semantic distances among concepts build the foundation for taxonomy construction. [sent-104, score-0.927]

68 ME models the semantic distance d(cx, cy) between concepts cx and cy as a linear combination of underlying feature functions. [sent-105, score-0.83]

69 Similar to ME, we also assume that “there are some underlying feature functions that measure semantic dissimilarity for concepts and a good semantic distance is a combination of these features”. [sent-106, score-0.584]

70 Different from ME, we model the semantic distance d(cx, cy) between concepts (cx, cy) as a Maphalanobis distance (Mahalanobis, 1936): dcx,cy = pΦ(cx,cy)TW−1Φ(cx,xy), where Φ(cx, cy) is the pset of underlying feature functions {φk : (cx, cy) } with k=1,. [sent-107, score-0.615]

71 Mahalanobis distance is a general parametric function widely used in distance metric learning (Yang, 2006). [sent-116, score-0.327]

72 It measures the dissimilarity between two random vectors of the same distribution with a covariance matrix W, which scales the data points from their original values by When only di- W1/2. [sent-117, score-0.045]

73 (1) It is in a parametric form so that it allows us to learn a distance function by supervised learning and provides an opportunity to assign different weights for each type of semantic features. [sent-120, score-0.271]

74 (2) When W is properly constrained to be positive semi-definite (PSD) (Bhatia, 2006), a Mahalanobis-formatted distance will be guaranteed to satisfy non-negativity and triangle inequality, which was not addressed in ME. [sent-121, score-0.203]

75 As long as these two conditions are satisfied, one may learn other forms of distance functions to represent a semantic distance. [sent-122, score-0.243]

76 We can estimate W by minimizing the squared errors between training semantic distances d and the expected value We also need to constrain W to be PSD to satisfy triangle inequality and nonnegativity. [sent-123, score-0.349]

77 The objective function for semantic distance estimation is: dˆ. [sent-124, score-0.237]

78 To generate the training semantic distances, we collected 100 hypernym taxonomy fragments from WordNet (Fellbaum, 1998) and ODP. [sent-129, score-0.556]

79 The semantic distance for a concept pair (cx, cy) in a training taxonomy fragment is generated by assuming every edge is weighted as 1 and summing up the edge weights along the shortest path from cx to cy in the taxonomy fragment. [sent-130, score-1.81]

80 In Section 4, we will show how to use user inputs as training data to capture taskspecifications in taxonomy construction. [sent-131, score-0.504]

81 2 Enforcing Path Consistency In ME, the main taxonomy structure optimization framework is based on minimization of overall semantic distance among all concepts in the taxonomy and the minimum evolution assumption. [sent-133, score-1.447]

82 We extend the framework by introducing another optimization objective to the framework: path consistency objective. [sent-134, score-0.328]

83 The idea is that in any root-to-leaf path in a taxonomy, all concepts on the path should be about the same topic or the same perspective. [sent-135, score-0.54]

84 Within a root-to- leaf path, the concepts need to be coherent no matter how far away they are apart. [sent-136, score-0.226]

85 It suggests that a good path’s sum of the semantic distances should be small. [sent-137, score-0.22]

86 = minWPx=1Py|N=(1ctrx)|((dctrx,ctry qΦ(ctrx,ctrPy)TWP−1Φ(ctrx,ctry))2; W qforeach cz ∈ C \ S S ← S ∪∈ { Ccz \ \} ;S iSf W ← ? [sent-139, score-0.168]

87 ) Figure 1: An algorithm for taxonomy structure optimization with path consistency control. [sent-144, score-0.737]

88 C denotes the entire concept set, S the current concept set, and R the current relation set. [sent-145, score-0.246]

89 N(ctrx ) is the neighborhood of a training concept ctrx , including its parent and child(ten). [sent-146, score-0.199]

90 ) indicates the set of relations between a new concept cz and all other existing concepts. [sent-148, score-0.313]

91 T is the taxonomy with concept set S and relation set R. [sent-149, score-0.574]

92 Therefore, we propose to minimize the sum of se- mantic distances along a root-to-leaf path. [sent-150, score-0.174]

93 Particularly, when adding a new concept cz into an existing browsing hierarchy T, we try it at different positions in T. [sent-151, score-0.781]

94 At each temporary position, we can calculate the sum of the semantic distances along the root-toleaf path Pcz that contains the new concept cx. [sent-152, score-0.576]

95 The path consistency objective is given by: objpath= mPcizncx,cy∈XPcz,x < y defines the order of the concepts to avoid counting the same pair of pair-wise distances twice. [sent-153, score-0.662]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('taxonomy', 0.451), ('browsing', 0.445), ('taxonomies', 0.344), ('concepts', 0.226), ('cy', 0.222), ('cz', 0.168), ('cx', 0.166), ('path', 0.157), ('distances', 0.15), ('distance', 0.146), ('concept', 0.123), ('organize', 0.112), ('consistency', 0.108), ('objme', 0.104), ('objpath', 0.104), ('callan', 0.101), ('taxonomic', 0.101), ('perspectives', 0.095), ('specification', 0.094), ('semantic', 0.07), ('mahalanobis', 0.067), ('bank', 0.067), ('yang', 0.063), ('river', 0.061), ('document', 0.056), ('financial', 0.053), ('carpineto', 0.052), ('ctrx', 0.052), ('facets', 0.052), ('lawrie', 0.052), ('lcsh', 0.052), ('monothetic', 0.052), ('psd', 0.052), ('sanderson', 0.052), ('specifications', 0.052), ('stoica', 0.052), ('temporary', 0.052), ('minimization', 0.05), ('kozareva', 0.05), ('tw', 0.047), ('collection', 0.046), ('dissimilarity', 0.045), ('congress', 0.045), ('odp', 0.045), ('hovy', 0.043), ('minimizing', 0.037), ('hearst', 0.037), ('construction', 0.037), ('users', 0.035), ('hypernym', 0.035), ('hierarchies', 0.035), ('triangle', 0.035), ('inequality', 0.035), ('parametric', 0.035), ('croft', 0.033), ('pennacchiotti', 0.033), ('user', 0.033), ('collections', 0.032), ('evolution', 0.032), ('build', 0.03), ('etzioni', 0.029), ('supporting', 0.028), ('functions', 0.027), ('fellbaum', 0.027), ('dc', 0.027), ('snow', 0.027), ('located', 0.027), ('pure', 0.026), ('along', 0.024), ('controlling', 0.024), ('parent', 0.024), ('hierarchy', 0.023), ('library', 0.023), ('webbased', 0.022), ('cutting', 0.022), ('directory', 0.022), ('georgetown', 0.022), ('interactively', 0.022), ('neglect', 0.022), ('ofr', 0.022), ('researcher', 0.022), ('sdp', 0.022), ('ssh', 0.022), ('trip', 0.022), ('weigh', 0.022), ('wle', 0.022), ('child', 0.022), ('pantel', 0.022), ('documents', 0.022), ('arbitrary', 0.022), ('existing', 0.022), ('satisfy', 0.022), ('attempted', 0.022), ('objective', 0.021), ('optimization', 0.021), ('extend', 0.021), ('quickly', 0.02), ('inputs', 0.02), ('venues', 0.02), ('opportunity', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

Author: Hui Yang

Abstract: Taxonomies can serve as browsing tools for document collections. However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. .

2 0.10572877 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

Author: Liwei Chen ; Yansong Feng ; Lei Zou ; Dongyan Zhao

Abstract: In this paper, we investigate different usages of feature representations in the web person name disambiguation task which has been suffering from the mismatch of vocabulary and lack of clues in web environments. In literature, the latter receives less attention and remains more challenging. We explore the feature space in this task and argue that collecting person specific evidences from a corpus level can provide a more reasonable and robust estimation for evaluating a feature’s importance in a given web page. This can alleviate the lack of clues where discriminative features can be reasonably weighted by taking their corpus level importance into account, not just relying on the current local context. We therefore propose a topic-based model to exploit the person specific global importance and embed it into the person name similarity. The experimental results show that the corpus level topic in- formation provides more stable evidences for discriminative features and our method outperforms the state-of-the-art systems on three WePS datasets.

3 0.090247221 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

Author: Ni Lao ; Amarnag Subramanya ; Fernando Pereira ; William W. Cohen

Abstract: We study how to extend a large knowledge base (Freebase) by reading relational information from a large Web text corpus. Previous studies on extracting relational knowledge from text show the potential of syntactic patterns for extraction, but they do not exploit background knowledge of other relations in the knowledge base. We describe a distributed, Web-scale implementation of a path-constrained random walk model that learns syntactic-semantic inference rules for binary relations from a graph representation of the parsed text and the knowledge base. Experiments show significant accuracy improvements in binary relation prediction over methods that consider only text, or only the existing knowledge base.

4 0.081567407 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

Author: Ndapandula Nakashole ; Gerhard Weikum ; Fabian Suchanek

Abstract: This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.

5 0.062418763 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution

Author: Vivi Nastase ; Alex Judea ; Katja Markert ; Michael Strube

Abstract: Computational approaches to metonymy resolution have focused almost exclusively on the local context, especially the constraints placed on a potentially metonymic word by its grammatical collocates. We expand such approaches by taking into account the larger context. Our algorithm is tested on the data from the metonymy resolution task (Task 8) at SemEval 2007. The results show that incorporation of the global context can improve over the use of the local context alone, depending on the types of metonymies addressed. As a second contribution, we move towards unsupervised resolution of metonymies, made feasible by considering ontological relations as possible readings. We show that such an unsupervised approach delivers promising results: it beats the supervised most frequent sense baseline and performs close to a supervised approach using only standard lexico-syntactic features.

6 0.053842328 97 emnlp-2012-Natural Language Questions for the Web of Data

7 0.036874857 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

8 0.036103103 121 emnlp-2012-Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

9 0.034683228 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

10 0.033465736 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model

11 0.03308703 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

12 0.03216406 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

13 0.030954866 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

14 0.030673487 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

15 0.02969957 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

16 0.029612219 88 emnlp-2012-Minimal Dependency Length in Realization Ranking

17 0.029503731 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

18 0.02911696 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming

19 0.028348645 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

20 0.027731979 41 emnlp-2012-Entity based QA Retrieval


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.114), (1, 0.067), (2, 0.012), (3, 0.022), (4, -0.033), (5, -0.005), (6, 0.073), (7, 0.094), (8, -0.041), (9, -0.001), (10, 0.023), (11, -0.044), (12, -0.05), (13, 0.011), (14, 0.069), (15, -0.061), (16, -0.008), (17, -0.016), (18, 0.09), (19, 0.112), (20, 0.012), (21, 0.087), (22, 0.024), (23, -0.102), (24, 0.129), (25, 0.082), (26, -0.137), (27, 0.088), (28, 0.012), (29, 0.174), (30, -0.08), (31, -0.306), (32, -0.075), (33, -0.323), (34, 0.071), (35, -0.206), (36, 0.035), (37, -0.12), (38, 0.128), (39, -0.01), (40, 0.01), (41, 0.057), (42, -0.018), (43, 0.122), (44, -0.056), (45, -0.052), (46, -0.013), (47, -0.035), (48, 0.071), (49, -0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97599906 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

Author: Hui Yang

Abstract: Taxonomies can serve as browsing tools for document collections. However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. .

2 0.58818656 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

Author: Ni Lao ; Amarnag Subramanya ; Fernando Pereira ; William W. Cohen

Abstract: We study how to extend a large knowledge base (Freebase) by reading relational information from a large Web text corpus. Previous studies on extracting relational knowledge from text show the potential of syntactic patterns for extraction, but they do not exploit background knowledge of other relations in the knowledge base. We describe a distributed, Web-scale implementation of a path-constrained random walk model that learns syntactic-semantic inference rules for binary relations from a graph representation of the parsed text and the knowledge base. Experiments show significant accuracy improvements in binary relation prediction over methods that consider only text, or only the existing knowledge base.

3 0.51724696 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution

Author: Vivi Nastase ; Alex Judea ; Katja Markert ; Michael Strube

Abstract: Computational approaches to metonymy resolution have focused almost exclusively on the local context, especially the constraints placed on a potentially metonymic word by its grammatical collocates. We expand such approaches by taking into account the larger context. Our algorithm is tested on the data from the metonymy resolution task (Task 8) at SemEval 2007. The results show that incorporation of the global context can improve over the use of the local context alone, depending on the types of metonymies addressed. As a second contribution, we move towards unsupervised resolution of metonymies, made feasible by considering ontological relations as possible readings. We show that such an unsupervised approach delivers promising results: it beats the supervised most frequent sense baseline and performs close to a supervised approach using only standard lexico-syntactic features.

4 0.47609055 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

Author: Liwei Chen ; Yansong Feng ; Lei Zou ; Dongyan Zhao

Abstract: In this paper, we investigate different usages of feature representations in the web person name disambiguation task which has been suffering from the mismatch of vocabulary and lack of clues in web environments. In literature, the latter receives less attention and remains more challenging. We explore the feature space in this task and argue that collecting person specific evidences from a corpus level can provide a more reasonable and robust estimation for evaluating a feature’s importance in a given web page. This can alleviate the lack of clues where discriminative features can be reasonably weighted by taking their corpus level importance into account, not just relying on the current local context. We therefore propose a topic-based model to exploit the person specific global importance and embed it into the person name similarity. The experimental results show that the corpus level topic in- formation provides more stable evidences for discriminative features and our method outperforms the state-of-the-art systems on three WePS datasets.

5 0.38145494 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

Author: Ndapandula Nakashole ; Gerhard Weikum ; Fabian Suchanek

Abstract: This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.

6 0.26777735 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

7 0.23679657 97 emnlp-2012-Natural Language Questions for the Web of Data

8 0.18501709 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model

9 0.17059132 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

10 0.15755619 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management

11 0.15457374 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation

12 0.15318082 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion

13 0.15016465 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

14 0.14790273 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

15 0.14727394 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level

16 0.14445895 121 emnlp-2012-Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

17 0.13979565 38 emnlp-2012-Employing Compositional Semantics and Discourse Consistency in Chinese Event Extraction

18 0.13659282 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

19 0.13470136 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

20 0.13225308 88 emnlp-2012-Minimal Dependency Length in Realization Ranking


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.135), (9, 0.207), (12, 0.027), (16, 0.029), (25, 0.013), (34, 0.071), (45, 0.019), (60, 0.05), (63, 0.059), (64, 0.042), (65, 0.046), (73, 0.019), (74, 0.022), (76, 0.064), (80, 0.026), (86, 0.039), (95, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79189819 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

Author: Hui Yang

Abstract: Taxonomies can serve as browsing tools for document collections. However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. .

2 0.69522643 120 emnlp-2012-Streaming Analysis of Discourse Participants

Author: Benjamin Van Durme

Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.

3 0.45626447 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky

Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.

4 0.45110065 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

5 0.4453029 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

Author: Liwei Chen ; Yansong Feng ; Lei Zou ; Dongyan Zhao

Abstract: In this paper, we investigate different usages of feature representations in the web person name disambiguation task which has been suffering from the mismatch of vocabulary and lack of clues in web environments. In literature, the latter receives less attention and remains more challenging. We explore the feature space in this task and argue that collecting person specific evidences from a corpus level can provide a more reasonable and robust estimation for evaluating a feature’s importance in a given web page. This can alleviate the lack of clues where discriminative features can be reasonably weighted by taking their corpus level importance into account, not just relying on the current local context. We therefore propose a topic-based model to exploit the person specific global importance and embed it into the person name similarity. The experimental results show that the corpus level topic in- formation provides more stable evidences for discriminative features and our method outperforms the state-of-the-art systems on three WePS datasets.

6 0.44017375 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

7 0.43878862 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

8 0.43787211 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

9 0.43585926 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

10 0.43316442 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

11 0.43009263 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

12 0.42895487 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

13 0.42730784 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

14 0.42707798 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

15 0.42559916 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

16 0.42496255 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

17 0.4218218 95 emnlp-2012-N-gram-based Tense Models for Statistical Machine Translation

18 0.42180622 97 emnlp-2012-Natural Language Questions for the Web of Data

19 0.42094323 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

20 0.42053914 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis