nips nips2003 nips2003-99 knowledge-graph by maker-knowledge-mining

99 nips-2003-Kernels for Structured Natural Language Data


Source: pdf

Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda

Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract This paper devises a novel kernel function for structured natural language data. [sent-6, score-0.242]

2 , character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. [sent-9, score-0.116]

3 The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. [sent-10, score-0.066]

4 HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. [sent-11, score-0.635]

5 In this paper, we define the kernel function and show how it permits efficient calculation. [sent-12, score-0.048]

6 Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e. [sent-13, score-0.229]

7 1 Introduction Recent developments in kernel technology enable us to handle discrete structures, such as sequences, trees, and graphs. [sent-16, score-0.108]

8 Convolution Kernels [4, 12] demonstrate how to build kernels over discrete structures. [sent-18, score-0.21]

9 Since texts can be analyzed as discrete structures, these discrete kernels have been applied to NLP tasks, such as sequence kernels [8, 9] for text categorization and tree kernels [1, 2] for (shallow) parsing. [sent-19, score-0.838]

10 In this paper, we focus on tasks in the application areas of NLP, such as Machine Translation, Text Summarization, Text Categorization and Question Answering. [sent-20, score-0.036]

11 In these tasks, richer types of information within texts, such as syntactic and semantic information, are required for higher performance. [sent-21, score-0.334]

12 However, syntactic information and semantic information are formed by very complex structures that cannot be written in simple structures, such as sequences and trees. [sent-22, score-0.395]

13 The motivation of this paper is to propose kernels specifically suited to structured natural language data. [sent-23, score-0.375]

14 The proposed kernels can handle several of the structures found within texts and calculate kernels with regard to these structures at a practical cost and time. [sent-24, score-0.774]

15 Accordingly, these kernels can be efficiently applied to learning and clustering problems in NLP applications. [sent-25, score-0.181]

16 W ir ecto r ] - N s ia n [A s ia n C o n ) r e s ul t J un t r y ( 1 a n un a tio . [sent-27, score-0.622]

17 tr y ] n ] of a n ) J u n N ic h ( 5 p h r a s e c h un p is k e r r ime min of is t e r N P a d ir o e p e n d e n c y s t r uc t ur e Koizumi is p of J a p N a n P a l y r ime min a n P . [sent-28, score-0.645]

18 e t ) J a p [A [executive] [executive d or d of ic h . [sent-31, score-0.142]

19 e r ic t ion r ime um a n I N C Koizumi . [sent-33, score-0.305]

20 J a p of e r s on ir o a n ( 2 r ime J t it ie s min e r i r o K o P i z u N P er s o N P n N m P i i s V p B J [n r i m um J J e b er ] m i n i s ter o N N I N [executive] N P J a p f N [A C a n N s ia n o N . [sent-34, score-0.396]

21 un n tr y ] tr y Figure 1: Examples of structures within texts as determined by basic NLP tools 2 Structured Natural Language Data for Application Tasks in NLP In general, natural language data contain many kinds of syntactic and semantic structures. [sent-36, score-0.945]

22 These syntactic and semantic structures can provide important information for understanding natural language and, moreover, tackling real tasks in application areas of NLP. [sent-38, score-0.551]

23 The accuracies of basic NLP tools such as POS taggers, NP chunkers, NE taggers, and dependency structure analyzers have improved to the point that they can help to develop real applications. [sent-39, score-0.092]

24 This paper proposes a method to handle these syntactic and semantic structures in a single framework: We combine the results of basic NLP tools to make one hierarchically structured data set. [sent-40, score-0.596]

25 Figure 1 shows an example of structures within texts analyzed by basic NLP tools that are currently available and that offer easy use and high performance. [sent-41, score-0.295]

26 As shown in Figure 1, structures in texts can be hierarchical or recursive “graphs in graph”. [sent-42, score-0.474]

27 A certain node can be constructed or characterized by other graphs. [sent-43, score-0.14]

28 Nodes usually have several kinds of attributes, such as words, POS tags, semantic information such as WordNet [3], and classes of the named entities. [sent-44, score-0.193]

29 Therefore, we should employ a (1) directed, (2) multi-labeled, and (3) hierarchically structured graph to model structured natural language data. [sent-46, score-0.413]

30 Then, a graph G = (V, E) is called a directed graph if E is a set of directed links E ⊂ V × V . [sent-48, score-0.46]

31 Definition 2 (Hierarchically Structured Graph) Let Gi = (Vi , Ei ) be a subgraph in G = (V, E) where Vi ⊆ V and Ei ⊆ E, and G = {G1 , . [sent-51, score-0.045]

32 F ⊂ V × G represents a set of vertical links from a node v ∈ V to a subgraph Gi ∈ G. [sent-55, score-0.297]

33 Then, G = (V, E, G, F ) is called a hierarchically structured graph if each node has at most one vertical edge. [sent-56, score-0.397]

34 Intuitively, vertical link fi,Gj ∈ F from node vi to graph Gj indicates that node vi contains graph Gj . [sent-57, score-0.845]

35 Finally, in this paper, we successfully represent structured natural language data by using a multi-labeled hierarchical directed graph. [sent-58, score-0.5]

36 Definition 3 (Multi-Labeled Hierarchical Directed Graph) G = (V, E, M, G, F ) is a multi-labeled hierarchical directed graph. [sent-59, score-0.306]

37 In this paper, we call a multi-labeled hierarchical directed graph a hierarchical directed graph. [sent-61, score-0.695]

38 3 Kernels on Hierarchical Directed Acyclic Graph At first, in order to calculate kernels efficiently, we add one constraint: that the hierarchical directed graph has no cyclic paths. [sent-62, score-0.57]

39 First, we define a path on a Hierarchical Directed Graph. [sent-63, score-0.029]

40 If a node has no vertical link, then the node is called a terminal node, which is denoted as ¯ T ⊂ V ; otherwise it is a non-terminal node, which is denoted as T ⊂ V . [sent-64, score-0.354]

41 Definition 4 (Hierarchical Path (HiP)) Let p = vi , ei,j , vj , . [sent-65, score-0.257]

42 Let Υ(v) be a function that returns a subgraph Gi that is linked with v by a vertical link if ¯ v ∈ T . [sent-69, score-0.208]

43 Let P(G) be a function that returns the set of all HiPs in G, where links between v ∈ G and v ∈ G are ignored. [sent-70, score-0.122]

44 , h(vk ), ek,l , h(vl ) is / ¯ defined as a HiP, where h(v) returns vph , ph ∈ P(Gx ) s. [sent-74, score-0.47]

45 Intuitively, a HiP is constructed by a path in the path structure, e. [sent-77, score-0.058]

46 Definition 5 (Hierarchical Directed Acyclic Graph (HDAG)) hierarchical directed graph G = (V, E, M, G, F ) is an HDAG if there is no HiP from any node v to the same node v. [sent-83, score-0.669]

47 A primitive feature for defining kernels on HDAGs is a hierarchical attribute subsequence. [sent-84, score-0.461]

48 Definition 6 (Hierarchical Attribute Subsequence (HiAS)) A HiAS is defined as a list of attributes with hierarchical information extracted from nodes on HiPs. [sent-85, score-0.293]

49 For example, let ph = vi , ei,j , vj vm , em,n , vn , . [sent-86, score-0.649]

50 , vk , ek,l , vl be a HiP, then, HiASs in ph are written as τ (ph ) = ai , aj am , an , . [sent-89, score-0.566]

51 , ak , al , which is all combinations for all ai ∈ τ (vi ), where τ (v) of node v is a function that returns the set of attributes allocated to node v, and τ (ph ) of HiP ph is a function that returns all possible HiASs extracted from HiP ph . [sent-92, score-1.353]

52 Γ∗ denotes all possible HiASs constructed by the attribute in Γ and γi ∈ Γ∗ denotes the i’th HiAS. [sent-93, score-0.099]

53 An explicit representation of a feature vector of an HDAG kernel is defined as φ(G) = (φ1 (G), . [sent-94, score-0.048]

54 , φ|Γ∗ | (G)), where φ represents the explicit feature mapping from HDAG to the numerical feature space. [sent-97, score-0.03]

55 9 b g r a p v v h G : di r ec t ed l i nk : v . [sent-110, score-0.065]

56 KHDAG (G 1 , G 2 ) = Wγi (ph )Wγi (ph ), 1 2 γi ∈Γ∗ γi ∈τ (ph )|ph ∈P(G 1 ) 1 1 (1) γi ∈τ (ph )|ph ∈P(G 2 ) 2 2 where Wγi (ph ) represents the weight value of HiAS γi in HiP ph . [sent-113, score-0.422]

57 In the case of NL data, for example, WΓ (a) might be given by the score of tf ∗ idf from large scale documents, WV (v) by the type of chunk such as word, phrase or named entity, WE (vi , vj ) by the type of relation between vi and vj , and WF (vi , Gj ) by the number of nodes in Gj . [sent-116, score-0.47]

58 Soft Structural Matching Frameworks Since HDAG kernels permit not only the exact matching of substructures but also approximate matching, we add the framework of node skip and relaxation of hierarchical information. [sent-117, score-0.631]

59 First, we discuss the framework of the node skip. [sent-118, score-0.14]

60 We introduce decay function Λ V (v)(0 < ΛV (v) ≤ 1), which represents the cost of skipping node v when extracting HiASs from the HiPs, which is almost the same architecture as [8]. [sent-119, score-0.17]

61 For example, a HiAS under the node skips is written as ∗ a2 , a3 , ∗, a5 from HiP v1 v2 , v3 , v4 , v5 , where ∗ is the explicit representation of a node that is skipped. [sent-120, score-0.28]

62 For the sake of simplicity, for all the weights WV (v), WE (vi , vj ), WF (vi , Gj ), and WΓ (a), are taken as 1 and for all v, ΛV (v) = λ if v has at least one attribute, otherwise ΛV (v) = 1. [sent-124, score-0.136]

63 , r|R| } represent nodes in G 1 and G 2 , respectively. [sent-142, score-0.066]

64 K(q, r) represents the sum of the weighted common HiASs that are extracted from the HiPs whose sink nodes are q and r. [sent-143, score-0.096]

65 ˆ K(q, r) = JG 1 ,G 2 (q, r)H(q, r) + H(q, r)I(q, r) + I(q, r) (4) Function I(q, r) returns the weighted number of common attributes of nodes q and r, I(q, r) = WV (q)WV (r) WΓ (a1 )WΓ (a2 )δ(a1 , a2 ), (5) a1 ∈τ (q) a2 ∈τ (r) where δ(a1 , a2 ) = 1 if a1 = a2 , and 0 otherwise. [sent-144, score-0.19]

66 Let H(q, r) be a function that returns the sum of the weighted common HiASs between q and r including Υ(q) and Υ(r). [sent-145, score-0.078]

67 Function ψ(q) returns a set of nodes that have direct links to node q. [sent-148, score-0.328]

68 ψ(q) = ∅ means that no node has a direct link to s. [sent-149, score-0.187]

69 Next, we show the formula when using the framework of relaxation of hierarchical information. [sent-150, score-0.22]

70 According to equation (3), given the recursive definition of KHDAG (q, r), the value between two HDAGs can be calculated in time O(|Q||R|). [sent-154, score-0.032]

71 In actual use, we may want to evaluate only the subset of all HiASs whose sizes are under n when determining the kernel value because of the problem discussed in [1]. [sent-155, score-0.048]

72 Finally, we normalize the values of the HDAG kernels to remove any bias introduced by the number of nodes in the graphs. [sent-157, score-0.247]

73 This normalization corresponds to the standard unit norm normalization of examples in the feature space corresponding to the kernel space ˆ K(x, y) = K(x, y) · (K(x, x)K(y, y))−1/2 [4]. [sent-158, score-0.048]

74 First, as a pre-process, the nodes are sorted under two conditions, V (Υ(v)) v and Ψ(v) v, where Ψ(v) represents all nodes that have a path to v. [sent-160, score-0.191]

75 The dynamic programming technique can be used to compute HDAG kernels very efficiently: By following the sorted order, the values that are needed to calculate K(q, r) have already been calculated in the previous calculation. [sent-161, score-0.181]

76 4 Experiments Our aim was to test the efficiency of using the richer syntactic and semantic structures available within texts, which can be treated now for the first time by our proposed method. [sent-162, score-0.454]

77 hi e r a r c hi c a l J a p of N er ] a n ) inister J J [n - K I N N N C [executive] [A a n N ? [sent-164, score-0.086]

78 P C P n is V ) a n ountr y s ia n - K o c hu tr y ] Figure 4: Examples of input data of comparison methods Table 2: Results of question classification by SVM with comparison kernel functions evaluated by F-measure n 1 HDAG-K DAG-K DS-K Seq-K BOW-K . [sent-170, score-0.322]

79 We compared the performance of the proposed kernel, the HDAG Kernel (HDAG-K), with DAG kernels (DAG-K), Dependency Structure kernels (DS-K) [2], and sequence kernels (Seq-K) [9]. [sent-236, score-0.543]

80 Moreover, we evaluated the bag-of-words kernel (BOW-K) [6], that is, the bag-of-words with polynomial kernels, as the baseline method. [sent-237, score-0.083]

81 The main difference between each method is the ability to treat syntactic and semantic information within texts. [sent-238, score-0.275]

82 We used words, named entity tags, and semantic information [5] for attributes. [sent-241, score-0.193]

83 Seq-K only treats word order, DS-K and DAG-K treat dependency structures, and HDAG-K treats the NP and NE chunks with their dependency structures. [sent-242, score-0.165]

84 Comparing HDAG-K to DAG-K shows the difference in performance between handling the hierarchical structures and not handling them. [sent-244, score-0.363]

85 Note that though DAG-K and DS-K handle input objects of the same form, their kernel calculation methods differ as do their return values. [sent-246, score-0.079]

86 We evaluated the performance of the comparison methods with question type TIME TOP, ORGANIZATION, LOCATION, and NUMEX, which are defined in the CRL QA-data1 . [sent-250, score-0.087]

87 Table 2 shows the average F-measure as evaluated by 5-fold cross validation. [sent-251, score-0.035]

88 n in Table 2 indicates the threshold of an attribute’s number, that is, we evaluated only those HiASs that contain less than n-attributes for each kernel calculation. [sent-252, score-0.083]

89 The experiments in this paper were designed to investigate how to improve the performance by using the richer syntactic and semantic structures within texts. [sent-254, score-0.454]

90 In the task of Question Classification, a given question is classified into Question Type, which reflects the intention of the question. [sent-255, score-0.088]

91 edu/˜sekine/PROJECT/CRLQA/ indicate that our approach, incorporating richer structure features within texts, is well suited to the tasks in the NLP applications. [sent-259, score-0.095]

92 The original DS-K requires exact matching of the tree structure, even when it is extended for more flexible matching. [sent-260, score-0.068]

93 The sequence, DAG, and HDAG kernels offer approximate matching by the framework of node skip, which produces better performance in the tasks that evaluate the intention of the texts. [sent-262, score-0.431]

94 The structure of HDAG approaches that of DAG if we do not consider the hierarchical structure. [sent-263, score-0.181]

95 In addition, the structures of sequences and trees are entirely included in that of DAG. [sent-264, score-0.12]

96 Thus, the HDAG kernel subsumes some of the discrete kernels, such as sequence, tree, and graph kernels. [sent-265, score-0.16]

97 5 Conclusions This paper proposed HDAG kernels, which can handle more of the rich syntactic and semantic information present within texts. [sent-266, score-0.306]

98 Our proposed method is a very generalized framework for handling structured natural language data. [sent-267, score-0.225]

99 We evaluated the performance of HDAG kernels with the real NLP task of question classification. [sent-268, score-0.268]

100 Our experiments showed that HDAG kernels offer better performance than sequence kernels, tree kernels, and the baseline method bag-of-words kernels if the target task requires the use of the richer information within texts. [sent-269, score-0.451]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ph', 0.392), ('hdag', 0.338), ('hiass', 0.214), ('nlp', 0.214), ('hias', 0.196), ('kernels', 0.181), ('hierarchical', 0.181), ('gj', 0.173), ('un', 0.157), ('vi', 0.157), ('semantic', 0.146), ('ic', 0.142), ('texts', 0.141), ('hip', 0.141), ('node', 0.14), ('wf', 0.139), ('syntactic', 0.129), ('directed', 0.125), ('structures', 0.12), ('executive', 0.107), ('koizumi', 0.107), ('wv', 0.107), ('gi', 0.104), ('vj', 0.1), ('attribute', 0.099), ('ime', 0.093), ('language', 0.09), ('ia', 0.085), ('graph', 0.083), ('ir', 0.082), ('returns', 0.078), ('structured', 0.074), ('hdags', 0.071), ('um', 0.07), ('nodes', 0.066), ('nk', 0.065), ('hierarchically', 0.062), ('richer', 0.059), ('dependency', 0.058), ('ul', 0.056), ('ak', 0.056), ('hips', 0.053), ('inister', 0.053), ('khdag', 0.053), ('ountr', 0.053), ('sasaki', 0.053), ('skip', 0.052), ('question', 0.052), ('aj', 0.052), ('vl', 0.049), ('chunks', 0.049), ('tr', 0.049), ('kernel', 0.048), ('link', 0.047), ('named', 0.047), ('dag', 0.046), ('attributes', 0.046), ('subgraph', 0.045), ('links', 0.044), ('np', 0.042), ('pos', 0.042), ('tags', 0.042), ('vk', 0.042), ('ei', 0.04), ('convolution', 0.04), ('relaxation', 0.039), ('vertical', 0.038), ('matching', 0.038), ('japanese', 0.037), ('acyclic', 0.037), ('im', 0.037), ('text', 0.037), ('otherwise', 0.036), ('tasks', 0.036), ('intention', 0.036), ('jg', 0.036), ('numex', 0.036), ('oun', 0.036), ('taggers', 0.036), ('wordnet', 0.036), ('evaluated', 0.035), ('tools', 0.034), ('er', 0.033), ('recursive', 0.032), ('handling', 0.031), ('suzuki', 0.031), ('discourse', 0.031), ('maeda', 0.031), ('ai', 0.031), ('handle', 0.031), ('natural', 0.03), ('tree', 0.03), ('represents', 0.03), ('uc', 0.029), ('categorization', 0.029), ('path', 0.029), ('discrete', 0.029), ('jun', 0.028), ('gx', 0.028), ('summarization', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 99 nips-2003-Kernels for Structured Natural Language Data

Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda

Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1

2 0.085217074 121 nips-2003-Log-Linear Models for Label Ranking

Author: Ofer Dekel, Yoram Singer, Christopher D. Manning

Abstract: Label ranking is the task of inferring a total order over a predefined set of labels for each given instance. We present a general framework for batch learning of label ranking functions from supervised data. We assume that each instance in the training data is associated with a list of preferences over the label-set, however we do not assume that this list is either complete or consistent. This enables us to accommodate a variety of ranking problems. In contrast to the general form of the supervision, our goal is to learn a ranking function that induces a total order over the entire set of labels. Special cases of our setting are multilabel categorization and hierarchical classification. We present a general boosting-based learning algorithm for the label ranking problem and prove a lower bound on the progress of each boosting iteration. The applicability of our approach is demonstrated with a set of experiments on a large-scale text corpus. 1

3 0.082713515 118 nips-2003-Link Prediction in Relational Data

Author: Ben Taskar, Ming-fai Wong, Pieter Abbeel, Daphne Koller

Abstract: Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph — entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. 1

4 0.078701854 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning

Author: Maneesh Sahani

Abstract: Significant plasticity in sensory cortical representations can be driven in mature animals either by behavioural tasks that pair sensory stimuli with reinforcement, or by electrophysiological experiments that pair sensory input with direct stimulation of neuromodulatory nuclei, but usually not by sensory stimuli presented alone. Biologically motivated theories of representational learning, however, have tended to focus on unsupervised mechanisms, which may play a significant role on evolutionary or developmental timescales, but which neglect this essential role of reinforcement in adult plasticity. By contrast, theoretical reinforcement learning has generally dealt with the acquisition of optimal policies for action in an uncertain world, rather than with the concurrent shaping of sensory representations. This paper develops a framework for representational learning which builds on the relative success of unsupervised generativemodelling accounts of cortical encodings to incorporate the effects of reinforcement in a biologically plausible way. 1

5 0.074404418 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification

Author: Ting liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1

6 0.073696323 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering

7 0.064520389 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

8 0.063196965 112 nips-2003-Learning to Find Pre-Images

9 0.050638679 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

10 0.050352663 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

11 0.049155299 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

12 0.04839398 30 nips-2003-Approximability of Probability Distributions

13 0.048050057 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

14 0.046467688 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

15 0.046134152 128 nips-2003-Minimax Embeddings

16 0.046060931 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

17 0.043524612 157 nips-2003-Plasticity Kernels and Temporal Statistics

18 0.043369416 42 nips-2003-Bounded Finite State Controllers

19 0.042872559 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

20 0.042270776 124 nips-2003-Max-Margin Markov Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.136), (1, -0.028), (2, -0.014), (3, 0.018), (4, -0.015), (5, -0.052), (6, -0.05), (7, -0.01), (8, 0.053), (9, 0.017), (10, 0.12), (11, -0.056), (12, 0.012), (13, -0.073), (14, 0.137), (15, -0.034), (16, -0.082), (17, -0.022), (18, 0.076), (19, 0.036), (20, -0.012), (21, -0.065), (22, -0.043), (23, 0.04), (24, -0.002), (25, -0.085), (26, 0.019), (27, 0.108), (28, -0.08), (29, -0.049), (30, 0.047), (31, 0.027), (32, 0.054), (33, -0.029), (34, -0.168), (35, -0.161), (36, -0.029), (37, 0.096), (38, -0.041), (39, -0.108), (40, 0.197), (41, -0.004), (42, -0.118), (43, -0.076), (44, 0.09), (45, 0.077), (46, 0.079), (47, 0.002), (48, 0.027), (49, 0.222)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96214354 99 nips-2003-Kernels for Structured Natural Language Data

Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda

Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1

2 0.47254202 118 nips-2003-Link Prediction in Relational Data

Author: Ben Taskar, Ming-fai Wong, Pieter Abbeel, Daphne Koller

Abstract: Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph — entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. 1

3 0.42641342 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman

Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1

4 0.41657206 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

Author: Xavier Carreras, Lluís Màrquez

Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1

5 0.41383848 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification

Author: Ting liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1

6 0.40634233 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

7 0.36180598 187 nips-2003-Training a Quantum Neural Network

8 0.36068228 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning

9 0.35699719 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering

10 0.35636035 112 nips-2003-Learning to Find Pre-Images

11 0.35610688 168 nips-2003-Salient Boundary Detection using Ratio Contour

12 0.35361087 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments

13 0.34726173 30 nips-2003-Approximability of Probability Distributions

14 0.32315552 121 nips-2003-Log-Linear Models for Label Ranking

15 0.30204612 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees

16 0.29942295 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

17 0.29720432 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

18 0.27778155 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

19 0.2728335 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures

20 0.26671392 128 nips-2003-Minimax Embeddings


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.032), (11, 0.038), (16, 0.011), (26, 0.403), (30, 0.012), (35, 0.04), (53, 0.045), (55, 0.01), (71, 0.086), (76, 0.038), (85, 0.077), (91, 0.096), (99, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79974753 99 nips-2003-Kernels for Structured Natural Language Data

Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda

Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1

2 0.65730071 151 nips-2003-PAC-Bayesian Generic Chaining

Author: Jean-yves Audibert, Olivier Bousquet

Abstract: There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations. Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester [1], which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand [2]. This combination is quite natural since the generic chaining is based on the notion of majorizing measures, which can be considered as priors on the set of classifiers, and such priors also arise in the PAC-bayesian setting. 1

3 0.57145184 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Author: Sanjiv Kumar, Martial Hebert

Abstract: In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudo-likelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph min-cut algorithms. The performance of the model was verified on the synthetic as well as the real-world images. The DRF model outperforms the MRF model in the experiments. 1

4 0.37910312 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin

Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as on-line learning. 1

5 0.36983311 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Author: Kevin P. Murphy, Antonio Torralba, William T. Freeman

Abstract: Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field for jointly solving the tasks of object detection and scene classification. 1

6 0.36604041 117 nips-2003-Linear Response for Approximate Inference

7 0.36510059 121 nips-2003-Log-Linear Models for Label Ranking

8 0.36420652 41 nips-2003-Boosting versus Covering

9 0.36409768 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

10 0.36319113 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

11 0.36255857 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

12 0.36238086 3 nips-2003-AUC Optimization vs. Error Rate Minimization

13 0.36143643 124 nips-2003-Max-Margin Markov Networks

14 0.36127695 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

15 0.36110753 189 nips-2003-Tree-structured Approximations by Expectation Propagation

16 0.36078835 158 nips-2003-Policy Search by Dynamic Programming

17 0.36067072 100 nips-2003-Laplace Propagation

18 0.3591738 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

19 0.35861725 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

20 0.35822621 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions