nips nips2005 nips2005-185 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Raymond J. Mooney, Razvan C. Bunescu
Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. [sent-7, score-0.957]
2 This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. [sent-8, score-0.617]
3 Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. [sent-9, score-0.782]
4 It involves the analysis of text documents, with the aim of identifying particular types of entities and relations among them. [sent-11, score-0.428]
5 Reliably extracting relations between entities in natural-language documents is still a difficult, unsolved problem. [sent-12, score-0.554]
6 For example, in the sentence “protesters seized several pumping stations”, the task is to identify a L OCATED AT relationship between protesters (a P ERSON entity) and stations (a L OCATION entity). [sent-15, score-0.339]
7 For example, the sentence “TR6 specifically binds Fas ligand”, asserts an interaction relationship between the two proteins TR6 and Fas ligand. [sent-17, score-0.429]
8 As in the case of the more traditional applications of IE, systems based on manually developed extraction rules [3, 4] were soon superseded by information extractors learned through training on supervised corpora [5, 6]. [sent-18, score-0.394]
9 One challenge posed by the biological domain is that current systems for doing part-of-speech (POS) tagging or parsing do not perform as well on the biomedical narrative as on the newspaper corpora on which they were originally trained. [sent-19, score-0.327]
10 Motivated by the task of extracting protein-protein interactions from biomedical corpora, we present a generalization of the subsequence kernel from [7] that works with sequences containing combinations of words and word classes. [sent-21, score-0.929]
11 This generalized kernel is further tailored for the task of relation extraction. [sent-22, score-0.369]
12 Experimental results show that the new relation kernel outperforms two previous rule-based methods for interaction extraction. [sent-23, score-0.457]
13 With a small modification, the same kernel is used for extracting top-level relations from ACE corpora, providing better results than a recent approach based on dependency tree kernels. [sent-24, score-0.585]
14 2 Background One of the first approaches to extracting protein interactions is that of Blaschke et al. [sent-25, score-0.315]
15 Between every two adjacent words is a number indicating the maximum number of intervening words allowed when matching the rule to a sentence. [sent-28, score-0.278]
16 A sentence matches the rule if and only if it satisfies the word constraints in the given order and respects the respective word gaps. [sent-30, score-0.516]
17 one of the given words must match the sentence at that position. [sent-37, score-0.341]
18 3 Extraction using a Relation Kernel Both Blaschke and ELCS do interaction extraction based on a limited set of matching rules, where a rule is simply a sparse (gappy) subsequence of words or POS tags anchored on the two protein-name tokens. [sent-39, score-1.02]
19 Therefore, the two methods share a common limitation: either through manual selection (Blaschke), or as a result of the greedy learning procedure (ELCS), they end up using only a subset of all possible anchored sparse subsequences. [sent-40, score-0.221]
20 Ideally, we would want to use all such anchored sparse subsequences as features, with weights reflecting their relative accuracy. [sent-41, score-0.31]
21 However explicitly creating for each sentence a vector with a position for each such feature is infeasible, due to the high dimensionality of the feature space. [sent-42, score-0.218]
22 Computing the dot-product between two such vectors amounts to calculating the number of common anchored subsequences between the two sentences. [sent-44, score-0.306]
23 This can be done very efficiently by modifying the dynamic programming algorithm used in the string kernel from [7] to account only for common sparse subsequences constrained to contain the two protein-name tokens. [sent-45, score-0.385]
24 • [B] Between: only words between the two entities are essential for asserting the relationship. [sent-48, score-0.367]
25 • [BA] Between–After: words between and after the two entity mentions are simultaneously used to express the relationship. [sent-50, score-0.418]
26 Another observation is that all these patterns use at most 4 words to express the relationship (not counting the two entity names). [sent-52, score-0.558]
27 Consequently, when computing the relation kernel, we restrict the counting of common anchored subsequences only to those having one of the three types described above, with a maximum word-length of 4. [sent-53, score-0.609]
28 This type of feature selection leads not only to a faster kernel computation, but also to less overfitting, which results in increased accuracy (see Section 5 for comparative experiments). [sent-54, score-0.2]
29 This can be alleviated by categorizing words into classes with varying degrees of generality, and then allowing patterns to use both words and their classes. [sent-56, score-0.362]
30 Examples of word classes are POS tags and generalizations over POS tags such as Noun, Active Verb or Passive Verb. [sent-57, score-0.419]
31 The entity type can also be used, if the word is part of a known named entity, as well as the type of the chunk containing the word, when chunking information is available. [sent-58, score-0.527]
32 Patterns then will consist of sparse subsequences of words, POS tags, general POS (GPOS) tags, entity and chunk types, or WordNet synsets. [sent-60, score-0.476]
33 4 Subsequence Kernels for Relation Extraction We are going to show how to compute the relation kernel described in the previous section in two steps. [sent-62, score-0.369]
34 1 we present a generalization of the subsequence kernel from [7]. [sent-64, score-0.403]
35 This new kernel works with patterns construed as mixtures of words and word classes. [sent-65, score-0.531]
36 Based on this generalized subsequence kernel, in Section 4. [sent-66, score-0.244]
37 2 we formally define and show the efficient computation of the relation kernel used in our experiments. [sent-67, score-0.369]
38 The sequence s[i : j] is the contiguous subsequence si . [sent-84, score-0.244]
39 We say that the sequence u ∈ Σ∗ is a (sparse) subsequence of s if there is a sequence of |u| indices i such that ∪ uk ∈ sik , for all k = 1, . [sent-98, score-0.302]
40 Finally, let Kn (s, t, λ) (Equation 1) be the number of weighted sparse subsequences u of length n common to s and t (i. [sent-103, score-0.266]
41 Through them, we try to capture head-modifier dependencies that are important for relation extraction; for lack of reliable dependency information, the larger the word gap is between two words, the less confident we are in the existence of a headmodifier relationship between them. [sent-108, score-0.481]
42 ′ = 1, f or all s, t = λKi (sx, t) + λ2 Ki−1 (s, t) · c(x, y) Ki (sx, t) = λKi (s, t) + Ki (sx, t) Kn (sx, t) = Kn (s, t) + K0 (s, t) ′′ Ki (sx, ty) ′ ′′ ′ ′ ′′ ′ λ2 Kn−1 (s, t[1 : j − 1]) · c(x, t[j]) j Figure 1: Computation of subsequence kernel. [sent-113, score-0.244]
43 2 Computing the Relation Kernel As described in Section 2, the input consists of a set of sentences, where each sentence contains exactly two entities (protein names in the case of interaction extraction). [sent-115, score-0.59]
44 In Figure 2 we show the segments that will be used for computing the relation kernel between two example sentences s and t. [sent-116, score-0.463]
45 In sentence s for instance, x1 and x2 are the two entities, sf is the sentence segment before x1 , sb is the segment between x1 and x2 , and sa is the sentence ′ segment after x2 . [sent-117, score-0.982]
46 For convenience, we also include the auxiliary segment sb = x1 sb x2 , ′ whose span is computed as l(sb ) = l(sb ) + 2 (in all length computations, we consider x1 and x2 as contributing one unit only). [sent-118, score-0.335]
47 sb sf x1 s = sa x2 s’ b tf t = tb y1 ta y2 t’ b Figure 2: Sentence segments. [sent-119, score-0.322]
48 The relation kernel computes the number of common patterns between two sentences s and t, where the set of patterns is restricted to the three types introduced in Section 3. [sent-120, score-0.783]
49 All three sub-kernels include in their computation the counting of common subsequences ′ ′ between sb and tb . [sent-123, score-0.432]
50 In order to speed up the computation, all these common counts can be calculated separately in bKi , which is defined as the number of common subsequences of ′ ′ length i between sb and tb , anchored at x1 /x2 and y1 /y2 respectively (i. [sent-124, score-0.626]
51 Then f bK simply counts the number of subsequences that match j positions before the first entity and i positions between the entities, constrained to have length less than a constant f bmax . [sent-127, score-0.502]
52 In Section 3 a a we observed that all three subsequence patterns use at most 4 words to express a relation, ′ therefore we set constants f bmax , bmax and bamax to 4. [sent-131, score-0.693]
53 5 Experimental Results The relation kernel (ERK) is evaluated on the task of extracting relations from two corpora with different types of narrative, which are described in more detail in the following sections. [sent-134, score-0.81]
54 In both cases, we assume that the entities and their labels are known. [sent-135, score-0.244]
55 All preprocessing steps – sentence segmentation, tokenization, POS tagging and chunking – were performed using the OpenNLP1 package. [sent-136, score-0.319]
56 If a sentence contains n entities (n ≥ 2), it is replicated into n sentences, each containing only two entities. [sent-137, score-0.5]
57 If the two entities are 2 known to be in a relationship, then the replicated sentence is added to the set of corresponding positive sentences, otherwise it is added to the set of negative sentences. [sent-138, score-0.5]
58 During testing, a sentence having n entities (n ≥ 2) is again replicated into n sentences in a similar way. [sent-139, score-0.594]
59 2 The relation kernel is used in conjunction with SVM learning in order to find a decision hyperplane that best separates the positive examples from negative examples. [sent-140, score-0.369]
60 The performance is measured using precision (percentage of correctly extracted relations out of total extracted) and recall (percentage of correctly extracted relations out of total number of relations annotated in the corpus). [sent-144, score-0.482]
61 The graph points are obtained by varying a threshold on the minimum acceptable extraction confidence, based on the probability estimates from LibSVM. [sent-146, score-0.234]
62 1 Interaction Extraction from AImed We did comparative experiments on the AImed corpus, which has been previously used for training the protein interaction extraction systems in [6]. [sent-154, score-0.469]
63 The results, summarized in Figure 4(a), show that the relation kernel outperforms both ELCS and the manually written rules. [sent-161, score-0.369]
64 To evaluate the impact that the three types of patterns have on performance, we compare ERK with an ablated system (ERK-A) that uses all possible patterns, constrained only to be anchored on the two entity names. [sent-165, score-0.488]
65 2 Relation Extraction from ACE To evaluate how well this relation kernel ports to other types of narrative, we applied it to the problem of extracting top-level relations from the ACE corpus [2], the version used for the September 2002 evaluation. [sent-168, score-0.746]
66 This version of the ACE corpus contains three types of annotations: coreference, named entities and relations. [sent-170, score-0.398]
67 There are five types of entities – P ERSON, O RGANIZATION, FACILITY, L OCATION, and G EO -P OLITICAL E NTITY – which can participate in five general, top-level relations: ROLE, PART, L OCATED, N EAR, and S OCIAL. [sent-171, score-0.298]
68 A recent approach to extracting relations is described in [9]. [sent-172, score-0.266]
69 The authors use a generalized version of the tree kernel from [10] to compute a kernel over relation examples, where a relation example consists of the smallest dependency tree containing the two entities of the relation. [sent-173, score-1.207]
70 – [S2] One binary SVM is trained for relation detection, meaning that all positive relation instances are combined into one class. [sent-175, score-0.42]
71 The thresholded output of this binary classifier is used as training data for a second multi-class SVM, trained for relation classification. [sent-176, score-0.21]
72 We trained our relation kernel, under the first scenario, to recognize the same 5 top-level relation types. [sent-177, score-0.42]
73 While for interaction extraction we used only the lexicalized version of the kernel, here we utilize more features, corresponding to the following feature spaces: Σ1 is the word vocabulary, Σ2 is the set of POS tags, Σ3 is the set of generic POS tags, and Σ4 contains the 5 entity types. [sent-178, score-0.694]
74 We also used chunking information as follows: all (sparse) subsequences were created exclusively from the chunk heads, where a head is defined as the last word in a chunk. [sent-179, score-0.481]
75 The same criterion was used for computing the length of a subsequence – all words other than head words were ignored. [sent-180, score-0.573]
76 This is based on the observation that in general words other than the chunk head do not contribute to establishing a relationship between two entities outside of that chunk. [sent-181, score-0.537]
77 One exception is when both entities in the example sentence are contained in the same chunk. [sent-182, score-0.462]
78 In these cases, we let one chunk contribute both entity heads. [sent-186, score-0.284]
79 Also, an important difference from the interaction extraction case is that often the two entities in a relation do not have any words separating them, as for example in noun-noun compounds. [sent-187, score-0.899]
80 This pattern consists of a sequence of length two formed from the head words (or their word classes) of the two entities. [sent-189, score-0.339]
81 Correspondingly, we updated the relation kernel from Figure 3 with a new kernel term mK, as illustrated in Equation 4. [sent-190, score-0.528]
82 mK(s, t) = c(x1 , y1 ) · c(x2 , y2 ) · λ2+2 (5) We present in Table 1 the results of using our updated relation kernel to extract relations from ACE, under the first scenario. [sent-192, score-0.499]
83 We also show the results presented in [9] for their best performing kernel K4 (a sum between a bag-of-words kernel and the dependency kernel) under both scenarios. [sent-193, score-0.413]
84 Therefore we expect to get an even more significant increase in performance by training our relation kernel in the same cascaded fashion. [sent-206, score-0.403]
85 6 Related Work In [10], a tree kernel is defined over shallow parse representations of text, together with an efficient algorithm for computing it. [sent-207, score-0.224]
86 Experiments on extracting P ERSON -A FFILIATION and O RGANIZATION -L OCATION relations from 200 news articles show the advantage of using this new type of tree kernels over three feature-based algorithms. [sent-208, score-0.378]
87 The same kernel was slightly generalized in [9] and applied on dependency tree representations of sentences, with dependency trees being created from head-modifier relationships extracted from syntactic parse trees. [sent-209, score-0.453]
88 Experimental results show a clear win of the dependency tree kernel over a bag-of-words kernel. [sent-210, score-0.319]
89 For relation extraction, word order is important, and our experimental results support this claim – all subsequence patterns used in our approach retain the order between words. [sent-212, score-0.703]
90 For subsequence kernels, the semantics is known by definition: each subsequence pattern corresponds to a dimension in the Hilbert space. [sent-214, score-0.488]
91 This enabled us to easily restrict the types of patterns counted by the kernel to the three types that we deemed relevant for relation extraction. [sent-215, score-0.593]
92 7 Conclusion and Future Work We have presented a new relation extraction method based on a generalization of subsequence kernels. [sent-216, score-0.688]
93 After a small modification, the same kernel was evaluated on the task of extracting top-level relations from the ACE corpus, showing better performance when compared with a recent dependency tree kernel. [sent-218, score-0.585]
94 2 – using the relation kernel in a cascaded fashion, in order to improve the low recall caused by the highly unbalanced data distribution. [sent-220, score-0.45]
95 Currently, the method assumes the named entities are known. [sent-222, score-0.287]
96 A natural extension is to integrate named entity recognition with relation extraction. [sent-223, score-0.453]
97 Valencia, The frame-based module of the Suiseki information extraction system, IEEE Intelligent Systems 17 (2002) 14–20. [sent-242, score-0.234]
98 Craven, Representing sentence structure in hidden Markov models for information extraction, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001), Seattle, WA, 2001, pp. [sent-245, score-0.218]
99 Sorensen, Dependency tree kernels for relation extraction, in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), Barcelona, Spain, 2004, pp. [sent-271, score-0.322]
100 Richardella, Kernel methods for relation extraction, Journal of Machine Learning Research 3 (2003) 1083–1106. [sent-276, score-0.21]
wordName wordTfidf (topN-words)
[('pos', 0.271), ('entities', 0.244), ('subsequence', 0.244), ('extraction', 0.234), ('sentence', 0.218), ('relation', 0.21), ('entity', 0.2), ('erk', 0.193), ('elcs', 0.174), ('kernel', 0.159), ('subsequences', 0.154), ('kn', 0.154), ('ace', 0.151), ('tags', 0.143), ('extracting', 0.136), ('word', 0.133), ('relations', 0.13), ('sb', 0.128), ('words', 0.123), ('corpora', 0.121), ('anchored', 0.118), ('patterns', 0.116), ('protein', 0.106), ('bk', 0.106), ('bki', 0.097), ('blaschke', 0.097), ('dependency', 0.095), ('sentences', 0.094), ('sx', 0.092), ('interaction', 0.088), ('chunk', 0.084), ('narrative', 0.077), ('ki', 0.077), ('bak', 0.077), ('tb', 0.077), ('interactions', 0.073), ('bmax', 0.067), ('chunking', 0.067), ('tree', 0.065), ('biomedical', 0.061), ('austin', 0.061), ('erson', 0.058), ('mentions', 0.058), ('mooney', 0.058), ('ocation', 0.058), ('sik', 0.058), ('corpus', 0.057), ('types', 0.054), ('ie', 0.049), ('proteins', 0.049), ('kernels', 0.047), ('recall', 0.047), ('mk', 0.047), ('sf', 0.046), ('precision', 0.045), ('documents', 0.044), ('language', 0.044), ('relationship', 0.043), ('head', 0.043), ('named', 0.043), ('counts', 0.041), ('comparative', 0.041), ('names', 0.04), ('fb', 0.04), ('length', 0.04), ('segment', 0.039), ('counting', 0.039), ('bamax', 0.039), ('bunescu', 0.039), ('extractors', 0.039), ('fas', 0.039), ('fore', 0.039), ('kate', 0.039), ('lexicalized', 0.039), ('noun', 0.039), ('ocated', 0.039), ('protesters', 0.039), ('razvan', 0.039), ('rganization', 0.039), ('stations', 0.039), ('syntactic', 0.039), ('tjk', 0.039), ('replicated', 0.038), ('sparse', 0.038), ('express', 0.037), ('sa', 0.037), ('cascaded', 0.034), ('url', 0.034), ('newspaper', 0.034), ('lexical', 0.034), ('ta', 0.034), ('tagging', 0.034), ('ty', 0.034), ('valencia', 0.034), ('common', 0.034), ('rk', 0.033), ('rule', 0.032), ('manual', 0.031), ('aimed', 0.031), ('asserts', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 185 nips-2005-Subsequence Kernels for Relation Extraction
Author: Raymond J. Mooney, Razvan C. Bunescu
Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1
2 0.14431806 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models
Author: Purnamrita Sarkar, Andrew W. Moore
Abstract: This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in p-dimensional Euclidian latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional kd-trees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for non-linear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS co-publication data. We present a detailed version of this work in [1]. 1
3 0.12604928 89 nips-2005-Group and Topic Discovery from Relations and Their Attributes
Author: Xuerui Wang, Natasha Mohanty, Andrew McCallum
Abstract: We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Blockstructures for votes, the Group-Topic model’s joint inference discovers more cohesive groups and improved topics. 1
4 0.095348239 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
Author: Tong Zhang, Rie Kubota Ando
Abstract: We consider a framework for semi-supervised learning using spectral decomposition based un-supervised kernel design. This approach subsumes a class of previously proposed semi-supervised learning methods on data graphs. We examine various theoretical properties of such methods. In particular, we derive a generalization performance bound, and obtain the optimal kernel design by minimizing the bound. Based on the theoretical analysis, we are able to demonstrate why spectral kernel design based methods can often improve the predictive performance. Experiments are used to illustrate the main consequences of our analysis.
5 0.093912035 100 nips-2005-Interpolating between types and tokens by estimating power-law generators
Author: Sharon Goldwater, Mark Johnson, Thomas L. Griffiths
Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
6 0.089446411 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
7 0.087044194 184 nips-2005-Structured Prediction via the Extragradient Method
8 0.080729395 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables
9 0.073394865 16 nips-2005-A matching pursuit approach to sparse Gaussian process regression
10 0.073191382 195 nips-2005-Transfer learning for text classification
11 0.072611034 171 nips-2005-Searching for Character Models
12 0.071116179 48 nips-2005-Context as Filtering
13 0.059942778 103 nips-2005-Kernels for gene regulatory regions
14 0.057914339 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
15 0.056760695 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
16 0.054948747 95 nips-2005-Improved risk tail bounds for on-line algorithms
17 0.053930741 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection
18 0.052736614 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
19 0.052245971 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks
20 0.051103845 126 nips-2005-Metric Learning by Collapsing Classes
topicId topicWeight
[(0, 0.164), (1, 0.049), (2, -0.025), (3, 0.018), (4, 0.017), (5, -0.053), (6, 0.092), (7, 0.156), (8, 0.051), (9, 0.075), (10, -0.058), (11, -0.072), (12, -0.082), (13, -0.185), (14, -0.174), (15, 0.082), (16, -0.024), (17, 0.01), (18, -0.002), (19, -0.053), (20, -0.05), (21, -0.186), (22, 0.026), (23, -0.129), (24, -0.01), (25, 0.026), (26, -0.051), (27, -0.032), (28, 0.004), (29, 0.122), (30, -0.022), (31, 0.131), (32, -0.031), (33, 0.028), (34, 0.063), (35, -0.055), (36, -0.06), (37, 0.061), (38, -0.142), (39, 0.1), (40, 0.147), (41, -0.148), (42, 0.043), (43, -0.046), (44, -0.03), (45, 0.076), (46, -0.04), (47, -0.051), (48, -0.035), (49, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.95088798 185 nips-2005-Subsequence Kernels for Relation Extraction
Author: Raymond J. Mooney, Razvan C. Bunescu
Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1
2 0.70291215 89 nips-2005-Group and Topic Discovery from Relations and Their Attributes
Author: Xuerui Wang, Natasha Mohanty, Andrew McCallum
Abstract: We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Blockstructures for votes, the Group-Topic model’s joint inference discovers more cohesive groups and improved topics. 1
3 0.50529534 171 nips-2005-Searching for Character Models
Author: Jaety Edwards, David Forsyth
Abstract: We introduce a method to automatically improve character models for a handwritten script without the use of transcriptions and using a minimum of document specific training data. We show that we can use searches for the words in a dictionary to identify portions of the document whose transcriptions are unambiguous. Using templates extracted from those regions, we retrain our character prediction model to drastically improve our search retrieval performance for words in the document.
4 0.48657015 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
Author: Jun Suzuki, Hideki Isozaki
Abstract: This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small sub-structures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method.
5 0.44721022 103 nips-2005-Kernels for gene regulatory regions
Author: Jean-philippe Vert, Robert Thurman, William S. Noble
Abstract: We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classifier built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identifies a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from five yeast species to recognize classes of cell-cycle regulated genes. Supplementary data is available at http://noble.gs.washington.edu/proj/pkernel. 1
6 0.41637662 100 nips-2005-Interpolating between types and tokens by estimating power-law generators
7 0.41609916 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models
8 0.3736369 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
9 0.35315636 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression
10 0.35182816 16 nips-2005-A matching pursuit approach to sparse Gaussian process regression
11 0.32598066 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables
12 0.32363889 195 nips-2005-Transfer learning for text classification
13 0.31892332 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
14 0.31549963 48 nips-2005-Context as Filtering
15 0.31464112 184 nips-2005-Structured Prediction via the Extragradient Method
16 0.29775926 52 nips-2005-Correlated Topic Models
17 0.28636655 71 nips-2005-Fast Krylov Methods for N-Body Learning
18 0.27654952 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
19 0.26026469 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
20 0.25784218 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
topicId topicWeight
[(3, 0.042), (10, 0.027), (11, 0.013), (27, 0.565), (31, 0.033), (34, 0.08), (41, 0.012), (55, 0.014), (69, 0.037), (73, 0.023), (88, 0.052), (91, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.92117196 185 nips-2005-Subsequence Kernels for Relation Extraction
Author: Raymond J. Mooney, Razvan C. Bunescu
Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1
2 0.90787768 87 nips-2005-Goal-Based Imitation as Probabilistic Inference over Graphical Models
Author: Deepak Verma, Rajesh P. Rao
Abstract: Humans are extremely adept at learning new skills by imitating the actions of others. A progression of imitative abilities has been observed in children, ranging from imitation of simple body movements to goalbased imitation based on inferring intent. In this paper, we show that the problem of goal-based imitation can be formulated as one of inferring goals and selecting actions using a learned probabilistic graphical model of the environment. We first describe algorithms for planning actions to achieve a goal state using probabilistic inference. We then describe how planning can be used to bootstrap the learning of goal-dependent policies by utilizing feedback from the environment. The resulting graphical model is then shown to be powerful enough to allow goal-based imitation. Using a simple maze navigation task, we illustrate how an agent can infer the goals of an observed teacher and imitate the teacher even when the goals are uncertain and the demonstration is incomplete.
3 0.90192199 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?
Author: Yan Karklin, Michael S. Lewicki
Abstract: Linear implementations of the efficient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1 [1, 2]. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive fields in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures [3, 4] and other generative statistical models [5, 6], can capture higher-order regularities in natural images and explain nonlinear aspects of neural processing such as normalization and context effects [6,7]. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been fixed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and find that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and filters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive fields and collectively span a broad range of spatial scales. Our work unifies several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.
4 0.75364619 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares
Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato
Abstract: An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural firing or in operating artificial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classical linear regression approaches are often numerically too fragile in high dimensions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear approaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, regularizing against overfitting. In comparison with ordinary least squares, stepwise regression, partial least squares, LASSO regression and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method offers a superior mixture of characteristics in terms of regularization against overfitting, computational efficiency and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientific results, our analyses demonstrate that EMG data can be well predicted from M1 neurons, further opening the path for possible real-time interfaces between brains and machines. 1
5 0.54493397 109 nips-2005-Learning Cue-Invariant Visual Responses
Author: Jarmo Hurri
Abstract: Multiple visual cues are used by the visual system to analyze a scene; achromatic cues include luminance, texture, contrast and motion. Singlecell recordings have shown that the mammalian visual cortex contains neurons that respond similarly to scene structure (e.g., orientation of a boundary), regardless of the cue type conveying this information. This paper shows that cue-invariant response properties of simple- and complex-type cells can be learned from natural image data in an unsupervised manner. In order to do this, we also extend a previous conceptual model of cue invariance so that it can be applied to model simple- and complex-cell responses. Our results relate cue-invariant response properties to natural image statistics, thereby showing how the statistical modeling approach can be used to model processing beyond the elemental response properties visual neurons. This work also demonstrates how to learn, from natural image data, more sophisticated feature detectors than those based on changes in mean luminance, thereby paving the way for new data-driven approaches to image processing and computer vision. 1
6 0.50544173 36 nips-2005-Bayesian models of human action understanding
7 0.48813257 100 nips-2005-Interpolating between types and tokens by estimating power-law generators
8 0.47495487 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
9 0.46888831 119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods
10 0.43821999 203 nips-2005-Visual Encoding with Jittering Eyes
11 0.42672828 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
12 0.42612457 153 nips-2005-Policy-Gradient Methods for Planning
13 0.42552146 158 nips-2005-Products of ``Edge-perts
14 0.42121935 152 nips-2005-Phase Synchrony Rate for the Recognition of Motor Imagery in Brain-Computer Interface
15 0.41624719 48 nips-2005-Context as Filtering
16 0.41384178 199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions
17 0.40848228 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception
18 0.40623415 169 nips-2005-Saliency Based on Information Maximization
19 0.38215226 35 nips-2005-Bayesian model learning in human visual perception
20 0.37585673 171 nips-2005-Searching for Character Models