nips nips2002 nips2002-85 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: In this paper we present a new algorithm suitable for matching discrete objects such as strings and trees in linear time, thus obviating dynarrtic programming with quadratic time complexity. Furthermore, prediction cost in many cases can be reduced to linear cost in the length of the sequence to be classified, regardless of the number of support vectors. This improvement on the currently available algorithms makes string kernels a viable alternative for the practitioner.
Reference: text
sentIndex sentText sentNum sentScore
1 au Abstract In this paper we present a new algorithm suitable for matching discrete objects such as strings and trees in linear time, thus obviating dynarrtic programming with quadratic time complexity. [sent-13, score-0.576]
2 This improvement on the currently available algorithms makes string kernels a viable alternative for the practitioner. [sent-15, score-0.292]
3 Common examples include biological sequence analysis where data is represented as strings [4] and Natural Language Processing (NLP) where the data is in the form a parse tree [3]. [sent-17, score-0.417]
4 This paper presents a means of computing kernels on strings [15, 7, 12] and trees [3] in linear time in the size of the arguments, regardless of the weighting that is associated with any of the terms, plus linear time complexity for prediction, regardless of the number of support vectors. [sent-23, score-0.612]
5 Note that the method we present here is far more general than strings and trees, and it can be applied to finite state machines, formal languages, automata, etc. [sent-25, score-0.229]
6 In order to use this idea for matching strings (which have a quadratically increasing number of substrings) and trees (which can be transformed into strings) efficient sorting is realized by the compression of the set of all substrings into a suffix tree. [sent-31, score-1.179]
7 Moreover, dictionary keeping allows us to use arbitrary weightings for each of the substrings and still compute the kernels in linear time. [sent-32, score-0.313]
8 The empty string is denoted by E and A * represents the set of all non empty strings defined over the alphabet A. [sent-42, score-0.475]
9 Ixl denotes the length of x , uv E A * the concatenation of two strings u , v and au the concatenation of a character and a string. [sent-44, score-0.361]
10 We use xli : j] with 1 ::; i ::; j ::; Ixl to denote the substring of x between locations i and j (both inclusive). [sent-45, score-0.424]
11 If x = uv w for some (possibly empty) u, v , w, then u is called a prefix of x while v is called a substring (also denoted by v [;;; x ) and w is called a suffix of x . [sent-46, score-0.854]
12 • The k-spectrum kernel takes into account substrings of length k [J 2] . [sent-55, score-0.245]
13 The latter are important for two reasons: first since the suffix tree representation of a string will be used to compute kernels efficiently, and secondly, since we may wish to compute kernels on trees, which will be carried out by reducing trees to strings and then applying a string-kernel. [sent-60, score-1.486]
14 3 Tree Kernels A tree is defined as a connected directed graph with no cycles. [sent-61, score-0.231]
15 A node with no children is referred to as a leaf A subtree rooted at node n is denoted as Tn and t F T is used to indicate that t is a subtree of T. [sent-62, score-0.648]
16 If a set of nodes in the tree along with the corresponding edges forms a tree then we define it to be a subset tree. [sent-63, score-0.582]
17 If every node n of the tree contains a label, denoted by label( n), then the tree is called an labeled tree. [sent-64, score-0.528]
18 If only the leaf nodes contain labels then the tree is called an leaf-labeled tree. [sent-65, score-0.383]
19 Kernels on trees can be defined by defining kernels on matching subset trees as proposed by [3] or (more restrictively) by defining kernels on matching subtrees. [sent-66, score-0.953]
20 (3) L t FT ,t' FT' Ordering Trees An ordered tree is one in which the child nodes of every node are ordered as per the ordering defined on the node labels. [sent-68, score-0.649]
21 We will use these symbols to define Figure 1: Two equivalent trees tags for each node as follows: ~ c! [sent-75, score-0.419]
22 For instance, the root nodes of both trees depicted above would be encoded as [[] [[] [lll. [sent-89, score-0.394]
23 We now prove that the tag of the root node, indeed, is a unique identifier and that it can be constructed in log linear time. [sent-90, score-0.522]
24 Theorem 1 Denote by T a binary tree with I nodes and let A be the maximum length of a label. [sent-91, score-0.345]
25 Then the following properties hold for the tag of the root node: 1. [sent-92, score-0.522]
26 tag (root) can be computed in (A + 2)(llog21) time and linear storage in I. [sent-93, score-0.421]
27 Substrings S oftag(root) starting with '[' and ending with a balanced '] ' correspond to subtrees T' ofT where s is the tag on T'. [sent-95, score-0.47]
28 Arbitrary substrings s oftag(root) correspond to subset trees T' ofT. [sent-97, score-0.293]
29 tag (root) is invariant under permutations of the leaves and allows the reconstruction of an unique element of the equivalence class (under permutation). [sent-99, score-0.419]
30 The tag of a leaf can be constructed in constant time by storing [, ], and a pointer to the label of the leaf (if it exists), that is in 3 operations. [sent-101, score-0.661]
31 By our induction assumption we can construct the tag for nl and n2 in (A + 2)(h log2 h) and (A + 2)(12 log2 12) time respectively. [sent-104, score-0.485]
32 Comparing the tags of nl and n2 costs at most (A + 2) min(h, l2) operations and the tag itself can be constructed in constant time and linear space by manipulating pointers. [sent-105, score-0.536]
33 (4) One way of visualizing our ordering is by imagining that we perform a DFS (depth first search) on the tree T and emit a '[' followed by the label on the node, when we visit a node for the first time and a '1' when we leave a node for the last time. [sent-107, score-0.587]
34 It is clear that a balanced substring s of tag (root) is emitted only when the corresponding DFS on T' is completed. [sent-108, score-0.67]
35 We can emit a substring of tag( root) only if we can perform a DFS on the corresponding set of nodes. [sent-110, score-0.257]
36 This implies that these nodes constitute a tree and hence by definition are subset trees of T. [sent-111, score-0.483]
37 Since leaf nodes do not have children their tag is clearly invariant under permutation. [sent-113, score-0.648]
38 For an internal node we perform lexicographic sorting on the tags of its children. [sent-114, score-0.239]
39 Concerning the reconstruction, we proceed as follows: each tag of a subtree starts with ' [' and ends in a balanced '] ', hence we can strip the first [] pair from the tag, take whatever is left outside brackets as the label of the root node, and repeat the procedure with the balanced [. [sent-117, score-0.739]
40 This will construct a tree with the same tag as tag(root), thus proving claim 4. [sent-121, score-0.626]
41 • An extension to trees with d nodes is straightforward (the cost increases to d log2 d of the original cost), yet the proof, in particular (4) becomes more technical without providing additional insight, hence we omit this generalization for brevity. [sent-122, score-0.3]
42 Corollary 2 Kernels on trees T , T' can be computed via string kernels, if we use tag(T) , tag(T') as strings. [sent-123, score-0.309]
43 J substrings have nonzero weight W s then we obtain the subtree matching kernel defined in (3). [sent-127, score-0.522]
44 This reduces the problem of tree kernels to string kernels and all we need to show in the following is how the latter can be computed efficiently. [sent-128, score-0.615]
45 For this purpose we need to introduce suffix trees. [sent-129, score-0.448]
46 4 Suffix Trees and Matching Statistics Definition The suffix tree is a compacted trie that stores all suffixes of a given text string. [sent-130, score-0.664]
47 We denote the suffix tree of the string x by S (x) . [sent-131, score-0.793]
48 Moreover, let nodes( S( x)) be the set of all nodes of S(x ) and let root (S (x )) be the root of S(x ). [sent-132, score-0.37]
49 For a node w, father (w) denotes its parent, T(w) denotes the subtree tree rooted at the node, Ivs(w) denotes the number of leaves in the subtree and path( w) := w is the path from the root to the node. [sent-133, score-0.732]
50 That is, we use the path w from root to node as the label of the node w. [sent-134, score-0.507]
51 We denote by words(S(x )) the set of all ab strings w such that wu E nodes(S(x )) for some (possibly empty) string u, which means abc$ that words(S(x)) is the set of all possible substrings of x. [sent-135, score-0.495]
52 For every t E words(S(x)) we define ceil (t) as the node w such that w = tu and u is the shortest (possibly empty) Figure 2: Suffix Tree of ababc substring such that w E nodes(S(x)). [sent-136, score-0.557]
53 Similarly, for every t E words(S(x)) we define floor(t) as the node w such that t = wu and u is the shortest (possibly empty) substring such that w E nodes(S(x )). [sent-137, score-0.445]
54 Given a string t and a suffix tree S(x), we can decide if t E words(S(x)) in O(lt l) time by just walking down the corresponding edges of S(x). [sent-138, score-0.899]
55 If the sentinel character $ is added to the string x then it can be shown that for any t E words(S(x)), lvs( ceil( t)) gives us the number of occurrence of t in x [5]. [sent-139, score-0.26]
56 Let aw be a node in S(x), and v be the longest suffix of w such that v E nodes(S(x)). [sent-142, score-0.687]
57 An unlabeled edge aw ---+ v is called a suffix link in S (x ). [sent-143, score-0.524]
58 A suffix link of the form aw ---+ W is called atomic. [sent-144, score-0.49]
59 It can be shown that all the suffix links in a suffix tree are atomic [5, Proposition 2. [sent-145, score-1.13]
60 We add suffix links to S(x), to allow us to perform efficient string matching: suppose we found that aw is a substring of x by parsing the suffix tree S (x ). [sent-147, score-1.595]
61 If we want to locate the node corresponding to w, it would be wasteful to parse the tree again. [sent-149, score-0.372]
62 The suffix tree building algorithms make use of this property of suffix links to perform the construction in linear time. [sent-151, score-1.13]
63 The suffix tree construction algorithm of [13] constructs the suffix tree and all such suffix links in linear time. [sent-152, score-1.766]
64 The key observation is that VH I ::::: Vi - 1, since if xli : Vi] is a substring of y then definitely xli + 1 : Vi ] is also a substring of Table 1: Matching statistic of abba with respect to S (a b abc ). [sent-156, score-0.848]
65 Besides this, the matching substring in y that we find, must have xli + 1 : Vi] as a prefix. [sent-158, score-0.592]
66 The Matching Statistics algorithm [2] exploits this observation and uses it to cleverly walk down the suffix links of S(y) in order to compute the matching statistics in O( lxl ) time. [sent-159, score-0.762]
67 It then finds P H I = floor( x[i + 1 : Vi ]) by first walking down the suffix link of Pi and then walking down the edges corresponding to the remaining portion of xli + 1 : Vi] until it reaches floor( x[i + 1 : Vi]) . [sent-161, score-0.773]
68 Now VH I can be found easily by walking from P H I along the edges of S(y) that match the string x li + l : n], until we can go no further. [sent-162, score-0.287]
69 The value of VI is found by simply walking down S(y) to find the longest prefix of x which matches a substring of y. [sent-163, score-0.469]
70 String a 2 ab b 1 b b 2 babeS a 1 ab Matching substrings Using V and C we can read off the number of matching substrings in x and y. [sent-164, score-0.45]
71 The useful observation here is that the only substrings which occur in both x and y are those which are prefixes of x li : Vi] . [sent-165, score-0.304]
72 The number of occurrences of a substring in y can be found by lvs( ceil(w)) (see Section 4). [sent-166, score-0.28]
73 Lemma 3 w is a substring of x iff there is an i such that w is a prefix of x li : n]. [sent-168, score-0.424]
74 Lemma 4 The set of matching substrings of x and y is the set of all prefixes of xli : Vi] . [sent-170, score-0.616]
75 By above lemma there is an i such that w is a prefix of xli : n]. [sent-172, score-0.364]
76 Since Vi is the length of the maximal prefix of xli : n] which is a substring in y, it follows that Vi ::::: Iw l. [sent-173, score-0.611]
77 • 5 Weights and Kernels From the previous sections we know how to determine the set of all longest prefixes x li : Vi ] of x li : n] in y in linear time. [sent-175, score-0.259]
78 Theorem 5 Let x and y be strings and c and V be the matching statistics of x with respect to y. [sent-177, score-0.365]
79 Then k( x, y) can be computed in O(l x l + Iy l) time as Ixl Ixl k(x, y) = val(x[i : Vi ]) = val( ci ) + lvs(ceil(x[i : Vi])) W(y , xli : Vi ]) (6) L L i= 1 i= 1 where val (t) := lYseceil (t)) . [sent-180, score-0.35]
80 Also, due to recursion, the second equality of (6) holds and we may compute each term in constant time by a simple lookup for val(ci ) and computation of W(y , xli : Vi]) ' Since we have Ixl terms, the whole procedure takes O( lxl ) time, which proves the O( lxl + Iyl) time complexity. [sent-185, score-0.485]
81 We know from Lemma 4 that the sum in (2) can be decomposed into the sum over matches between y and each of the prefixes of xli : Vi] (this takes care of all the substrings in x matching with y). [sent-187, score-0.616]
82 This reduces the problem to showing that each term in the sum of (6) corresponds to the contribution of all prefixes of x li : vJ Assume we descend down the path xli : Vi] in S(y) (e. [sent-188, score-0.393]
83 , for the string bab with respect to the tree of Figure 2 this would correspond to (root, b, bab», then each of the prefixes t along the path (e. [sent-190, score-0.54]
84 Examples of such weighting schemes are the kernels suggested by [15], where Wi = A- i, [7] where Wi = 1, and [10], where Wi = Olio Generic Weights In case of generic weights, we have several options: recall that one often will want to compute m 2 kernels k(x , x'), given m strings x E X. [sent-201, score-0.504]
85 Hence we could build the suffix trees for Xi beforehand and annotate each of the nodes and characters on the edges explicitly (at super-linear cost per string), which means that later, for the dot products, we will only need to perform table lookup of W( x , x' (i : Vi)). [sent-202, score-0.842]
86 We can build a suffix tree I; of all strings in X. [sent-204, score-0.833]
87 Again , this can be done in time linear in the total length of all the strings (simply consider the concatenation of all strings) . [sent-205, score-0.296]
88 It can be shown that for all x and all i , xli : Vi] will be a node in this tree. [sent-206, score-0.347]
89 Now note that all the strings ending on the same edge in I; will have the same weights assigned to them. [sent-210, score-0.263]
90 The benefit of (8) is twofold: we can compute the weights of all the nodes of I; in time linear in the total length of strings in X . [sent-213, score-0.455]
91 Recall that, for prediction in a Support Vector Machine we need to compute f( x) = L : I Ctik(Xi, x ), which implies that we need to combine the contribution due to matching substrings from each one of the Support Vectors. [sent-219, score-0.346]
92 For a node V E nodes(S(X8)) we modify the definition of Ivs(v) as the sum of weights associated with the subtree rooted at node v. [sent-222, score-0.491]
93 A straightforward application of the matching statistics algorithm of [2] shows that we can find the matching statistics of x with respect to all strings in Xs in O(l xl ) time. [sent-223, score-0.563]
94 A length weighted kernel was used and we assigned weights W s = Aisl for all substring matches of length greater than 3 regardless of triplet boundaries. [sent-230, score-0.45]
95 3 Being a proof of concept, we did not try to tune the soft margin SVM parameters (the main point of the paper being the introduction of a novel means of evaluating string kernels efficiently rather than applications -~- a separate paper focusing on applications -"1 • is in preparation). [sent-232, score-0.37]
96 _--spectrum kernel with k = 3 [12] and our string kernel with A = 0. [sent-247, score-0.279]
97 ernel _ - 7 Conclusion We have shown that string kernels need not come at a super-linear cost in SVMs and that prediction can be carried out at cost linear only in the length of the argument, thus providing optimal run-time behaviour. [sent-262, score-0.403]
98 For not too-unbalanced trees (we assume that the tree shrinks at least by a constant factor at each coarsening) computation of the kernel over all coarsening levels can then be carried out at cost still linear in the overall size of the tree. [sent-265, score-0.491]
99 Likewise, we can consider the strings generated by finite state machines and thereby compare the finite state machines themselves. [sent-268, score-0.261]
100 From Ukkonen to McCreight and Weiner: A unifying view of linear-time suffix tree construction. [sent-321, score-0.636]
wordName wordTfidf (topN-words)
[('suffix', 0.448), ('tag', 0.394), ('substring', 0.229), ('strings', 0.197), ('xli', 0.195), ('tree', 0.188), ('matching', 0.168), ('vi', 0.159), ('string', 0.157), ('trees', 0.152), ('node', 0.152), ('prefix', 0.144), ('substrings', 0.141), ('floor', 0.139), ('kernels', 0.135), ('root', 0.128), ('nodes', 0.114), ('ceil', 0.112), ('prefixes', 0.112), ('val', 0.097), ('ws', 0.083), ('subtree', 0.083), ('leaf', 0.081), ('ixl', 0.08), ('define', 0.064), ('ivs', 0.064), ('tl', 0.064), ('nl', 0.064), ('lxl', 0.063), ('kernel', 0.061), ('children', 0.059), ('coarsening', 0.056), ('occurrences', 0.051), ('tags', 0.051), ('walking', 0.051), ('li', 0.051), ('efficiently', 0.049), ('algorithmica', 0.048), ('bab', 0.048), ('dfs', 0.048), ('lvs', 0.048), ('sentinel', 0.048), ('balanced', 0.047), ('links', 0.046), ('proves', 0.045), ('longest', 0.045), ('claim', 0.044), ('defined', 0.043), ('length', 0.043), ('aw', 0.042), ('label', 0.04), ('roc', 0.039), ('empty', 0.039), ('pointer', 0.038), ('beforehand', 0.038), ('rooted', 0.038), ('india', 0.038), ('regardless', 0.037), ('weights', 0.037), ('compute', 0.037), ('efficient', 0.037), ('ix', 0.036), ('sorting', 0.036), ('path', 0.035), ('positives', 0.035), ('unlabeled', 0.034), ('cost', 0.034), ('uv', 0.033), ('lsi', 0.033), ('finite', 0.032), ('dynarrtic', 0.032), ('eprefix', 0.032), ('nums', 0.032), ('oftag', 0.032), ('parse', 0.032), ('tn', 0.032), ('ci', 0.031), ('wj', 0.03), ('xs', 0.03), ('xl', 0.03), ('character', 0.03), ('proof', 0.029), ('concatenation', 0.029), ('ending', 0.029), ('definition', 0.029), ('edges', 0.028), ('suffixes', 0.028), ('emit', 0.028), ('bangalore', 0.028), ('freq', 0.028), ('lookup', 0.028), ('time', 0.027), ('wi', 0.026), ('svm', 0.026), ('nonzero', 0.026), ('leaves', 0.025), ('occurrence', 0.025), ('lemma', 0.025), ('smola', 0.024), ('indian', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 85 nips-2002-Fast Kernels for String and Tree Matching
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: In this paper we present a new algorithm suitable for matching discrete objects such as strings and trees in linear time, thus obviating dynarrtic programming with quadratic time complexity. Furthermore, prediction cost in many cases can be reduced to linear cost in the length of the sequence to be classified, regardless of the number of support vectors. This improvement on the currently available algorithms makes string kernels a viable alternative for the practitioner.
2 0.16193798 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
Author: Eleazar Eskin, Jason Weston, William S. Noble, Christina S. Leslie
Abstract: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings. ¡ ¢
3 0.14331993 119 nips-2002-Kernel Dependency Estimation
Author: Jason Weston, Olivier Chapelle, Vladimir Vapnik, André Elisseeff, Bernhard Schölkopf
Abstract: We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images. 1
4 0.13675284 191 nips-2002-String Kernels, Fisher Kernels and Finite State Automata
Author: Craig Saunders, Alexei Vinokourov, John S. Shawe-taylor
Abstract: In this paper we show how the generation of documents can be thought of as a k-stage Markov process, which leads to a Fisher kernel from which the n-gram and string kernels can be re-constructed. The Fisher kernel view gives a more flexible insight into the string kernel and suggests how it can be parametrised in a way that reflects the statistics of the training corpus. Furthermore, the probabilistic modelling approach suggests extending the Markov process to consider sub-sequences of varying length, rather than the standard fixed-length approach used in the string kernel. We give a procedure for determining which sub-sequences are informative features and hence generate a Finite State Machine model, which can again be used to obtain a Fisher kernel. By adjusting the parametrisation we can also influence the weighting received by the features . In this way we are able to obtain a logarithmic weighting in a Fisher kernel. Finally, experiments are reported comparing the different kernels using the standard Bag of Words kernel as a baseline. 1
5 0.10229916 203 nips-2002-Using Tarjan's Red Rule for Fast Dependency Tree Construction
Author: Dan Pelleg, Andrew W. Moore
Abstract: We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction of the correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain is a probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection of more data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan’s red-edge rule, which is generally considered a guaranteed recipe for bad performance. 1
6 0.09760195 80 nips-2002-Exact MAP Estimates by (Hyper)tree Agreement
7 0.089160152 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization
8 0.086762108 167 nips-2002-Rational Kernels
9 0.083843403 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
10 0.083423577 104 nips-2002-How the Poverty of the Stimulus Solves the Poverty of the Stimulus
11 0.083179206 125 nips-2002-Learning Semantic Similarity
12 0.077032775 120 nips-2002-Kernel Design Using Boosting
13 0.073908038 151 nips-2002-Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines
14 0.072470248 156 nips-2002-On the Complexity of Learning the Kernel Matrix
15 0.071667969 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
16 0.067070708 84 nips-2002-Fast Exact Inference with a Factored Model for Natural Language Parsing
17 0.066822141 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
18 0.06481798 96 nips-2002-Generalized² Linear² Models
19 0.062418822 113 nips-2002-Information Diffusion Kernels
20 0.062021896 106 nips-2002-Hyperkernels
topicId topicWeight
[(0, -0.164), (1, -0.094), (2, 0.034), (3, -0.067), (4, -0.103), (5, 0.023), (6, 0.093), (7, 0.091), (8, -0.002), (9, -0.084), (10, 0.004), (11, 0.062), (12, -0.009), (13, -0.001), (14, -0.119), (15, -0.093), (16, 0.159), (17, 0.131), (18, 0.026), (19, -0.15), (20, -0.061), (21, 0.115), (22, -0.176), (23, 0.02), (24, 0.043), (25, -0.089), (26, 0.003), (27, -0.114), (28, 0.054), (29, -0.196), (30, 0.092), (31, 0.105), (32, -0.041), (33, 0.095), (34, -0.02), (35, -0.001), (36, -0.021), (37, 0.014), (38, 0.066), (39, -0.041), (40, 0.108), (41, -0.011), (42, 0.031), (43, 0.012), (44, -0.028), (45, 0.035), (46, 0.024), (47, 0.014), (48, 0.009), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.96118164 85 nips-2002-Fast Kernels for String and Tree Matching
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: In this paper we present a new algorithm suitable for matching discrete objects such as strings and trees in linear time, thus obviating dynarrtic programming with quadratic time complexity. Furthermore, prediction cost in many cases can be reduced to linear cost in the length of the sequence to be classified, regardless of the number of support vectors. This improvement on the currently available algorithms makes string kernels a viable alternative for the practitioner.
2 0.64895177 84 nips-2002-Fast Exact Inference with a Factored Model for Natural Language Parsing
Author: Dan Klein, Christopher D. Manning
Abstract: We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to similar, non-factored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A* parsing algorithm, which enables efficient, exact inference.
3 0.57194072 203 nips-2002-Using Tarjan's Red Rule for Fast Dependency Tree Construction
Author: Dan Pelleg, Andrew W. Moore
Abstract: We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction of the correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain is a probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection of more data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan’s red-edge rule, which is generally considered a guaranteed recipe for bad performance. 1
4 0.50477827 167 nips-2002-Rational Kernels
Author: Corinna Cortes, Patrick Haffner, Mehryar Mohri
Abstract: We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that can be used for analysis of variable-length sequences or more generally weighted automata, in applications such as computational biology or speech recognition. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm. We also describe several general families of positive definite symmetric rational kernels. These general kernels can be combined with Support Vector Machines to form efficient and powerful techniques for spoken-dialog classification: highly complex kernels become easy to design and implement and lead to substantial improvements in the classification accuracy. We also show that the string kernels considered in applications to computational biology are all specific instances of rational kernels.
5 0.49718311 191 nips-2002-String Kernels, Fisher Kernels and Finite State Automata
Author: Craig Saunders, Alexei Vinokourov, John S. Shawe-taylor
Abstract: In this paper we show how the generation of documents can be thought of as a k-stage Markov process, which leads to a Fisher kernel from which the n-gram and string kernels can be re-constructed. The Fisher kernel view gives a more flexible insight into the string kernel and suggests how it can be parametrised in a way that reflects the statistics of the training corpus. Furthermore, the probabilistic modelling approach suggests extending the Markov process to consider sub-sequences of varying length, rather than the standard fixed-length approach used in the string kernel. We give a procedure for determining which sub-sequences are informative features and hence generate a Finite State Machine model, which can again be used to obtain a Fisher kernel. By adjusting the parametrisation we can also influence the weighting received by the features . In this way we are able to obtain a logarithmic weighting in a Fisher kernel. Finally, experiments are reported comparing the different kernels using the standard Bag of Words kernel as a baseline. 1
6 0.4967775 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
7 0.4855307 80 nips-2002-Exact MAP Estimates by (Hyper)tree Agreement
8 0.48256028 119 nips-2002-Kernel Dependency Estimation
9 0.47926286 104 nips-2002-How the Poverty of the Stimulus Solves the Poverty of the Stimulus
10 0.4741855 35 nips-2002-Automatic Acquisition and Efficient Representation of Syntactic Structures
11 0.36154726 15 nips-2002-A Probabilistic Model for Learning Concatenative Morphology
12 0.35151657 179 nips-2002-Scaling of Probability-Based Optimization Algorithms
13 0.33441004 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization
14 0.30925658 120 nips-2002-Kernel Design Using Boosting
15 0.30894145 6 nips-2002-A Formulation for Minimax Probability Machine Regression
16 0.29282442 42 nips-2002-Bias-Optimal Incremental Problem Solving
17 0.29095238 185 nips-2002-Speeding up the Parti-Game Algorithm
18 0.28915432 151 nips-2002-Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines
19 0.28313431 187 nips-2002-Spikernels: Embedding Spiking Neurons in Inner-Product Spaces
20 0.27402407 96 nips-2002-Generalized² Linear² Models
topicId topicWeight
[(11, 0.045), (23, 0.019), (42, 0.05), (44, 0.291), (51, 0.027), (54, 0.133), (55, 0.035), (67, 0.013), (68, 0.022), (72, 0.016), (74, 0.102), (87, 0.012), (92, 0.05), (98, 0.082)]
simIndex simValue paperId paperTitle
same-paper 1 0.81301868 85 nips-2002-Fast Kernels for String and Tree Matching
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: In this paper we present a new algorithm suitable for matching discrete objects such as strings and trees in linear time, thus obviating dynarrtic programming with quadratic time complexity. Furthermore, prediction cost in many cases can be reduced to linear cost in the length of the sequence to be classified, regardless of the number of support vectors. This improvement on the currently available algorithms makes string kernels a viable alternative for the practitioner.
2 0.68134695 140 nips-2002-Margin Analysis of the LVQ Algorithm
Author: Koby Crammer, Ran Gilad-bachrach, Amir Navot, Naftali Tishby
Abstract: Prototypes based algorithms are commonly used to reduce the computational complexity of Nearest-Neighbour (NN) classifiers. In this paper we discuss theoretical and algorithmical aspects of such algorithms. On the theory side, we present margin based generalization bounds that suggest that these kinds of classifiers can be more accurate then the 1-NN rule. Furthermore, we derived a training algorithm that selects a good set of prototypes using large margin principles. We also show that the 20 years old Learning Vector Quantization (LVQ) algorithm emerges naturally from our framework. 1
3 0.66227549 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
Author: Peter Sykacek, Stephen J. Roberts
Abstract: We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques. We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. The variational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.
4 0.56246853 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
Author: Sepp Hochreiter, Klaus Obermayer
Abstract: We investigate the problem of learning a classification task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to different sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classifier which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive definite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microarrays, where “column” objects correspond to samples, “row” objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classification and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classification results. 1
5 0.56043833 27 nips-2002-An Impossibility Theorem for Clustering
Author: Jon M. Kleinberg
Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. 1
6 0.55980319 53 nips-2002-Clustering with the Fisher Score
7 0.55793935 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
8 0.55757219 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
9 0.55746579 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
10 0.55738622 2 nips-2002-A Bilinear Model for Sparse Coding
11 0.55706203 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
12 0.5553658 124 nips-2002-Learning Graphical Models with Mercer Kernels
13 0.5541994 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
14 0.5541603 3 nips-2002-A Convergent Form of Approximate Policy Iteration
15 0.55350417 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
16 0.55287176 10 nips-2002-A Model for Learning Variance Components of Natural Images
17 0.55215603 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
18 0.55030745 191 nips-2002-String Kernels, Fisher Kernels and Finite State Automata
19 0.55003232 163 nips-2002-Prediction and Semantic Association
20 0.54951417 63 nips-2002-Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting