nips nips2005 nips2005-175 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun Suzuki, Hideki Isozaki
Abstract: This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small sub-structures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. [sent-6, score-0.747]
2 Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. [sent-7, score-0.649]
3 This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. [sent-9, score-0.741]
4 The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. [sent-10, score-0.497]
5 Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method. [sent-11, score-0.286]
6 Conceptually, these proposed kernels are defined as instances of convolution kernels [3, 11], which provides the concept of kernels over discrete structures. [sent-13, score-1.191]
7 However, unfortunately, experiments have shown that in some cases there is a critical issue with convolution kernels in NLP tasks [2, 1, 10]. [sent-14, score-0.562]
8 That is, since natural language data contain many types of symbols, NLP tasks usually deal with extremely high dimension and sparse feature space. [sent-15, score-0.161]
9 As a result, the convolution kernel approach can never be trained effectively, and it behaves like a nearest neighbor rule. [sent-16, score-0.313]
10 To avoid this issue, we generally eliminate large sub-structures from the set of features used. [sent-17, score-0.091]
11 However, the main reason for using convolution kernels is that we aim to use structural features easily and efficiently. [sent-18, score-0.552]
12 If their use is limited to only very small structures, this negates the advantages of using convolution kernels. [sent-19, score-0.204]
13 This paper discusses this issue of convolution kernels, in particular sequence and tree ker- nels, and proposes a new method based on statistical significant test. [sent-20, score-0.639]
14 The proposed method deals only with those features that are statistically significant for solving the target task, and large significant sub-structures can be used without over-fitting. [sent-21, score-0.197]
15 Moreover, by using sub-structure mining algorithms, the proposed method can be executed efficiently by embedding it in an original kernel calculation process, which is defined by the dynamicprogramming (DP) based calculation. [sent-22, score-0.499]
16 2 Convolution Kernels for Sequences and Trees Convolution kernels have been proposed as a concept of kernels for discrete structures, such as sequences, trees and graphs. [sent-23, score-0.802]
17 This framework defines the kernel function between input objects as the convolution of “sub-kernels”, i. [sent-24, score-0.313]
18 the kernels for the decompositions (parts or sub-structures) of the objects. [sent-26, score-0.292]
19 Conceptually, convolution kernels K(X, Y ) enumerate all sub-structures occurring in X and Y and then calculate their inner product, which is simply written as: K(X, Y ) = φ(X), φ(Y ) = i φi (X) · φi (Y ). [sent-28, score-0.611]
20 φ represents the feature mapping from the discrete object to the feature space; that is, φ(X) = (φ1 (X), . [sent-29, score-0.221]
21 Therefore, with sequence kernels, input objects X and Y are sequences, and φi (X) is a sub-sequence; with tree kernels, X and Y are trees, and φi (X) is a sub-tree. [sent-36, score-0.283]
22 Up to now, many kinds of sequence and tree kernels have been proposed for a variety of different tasks. [sent-37, score-0.62]
23 To clarify the discussion, this paper basically follows the framework of [1], which proposed a gapped word sequence kernel, and [5], which introduced a labeled ordered tree kernel. [sent-38, score-0.37]
24 We can treat that sequence is one of the special form of trees if we say sequences are rooted by their last symbol and each node has one child each of a previous symbol. [sent-39, score-0.411]
25 Then, let Ln be a set of symbols whose sizes are n and P (Ln ) be a set of trees that are constructed by Ln . [sent-42, score-0.165]
26 We denote a tree u ∈ P (Ln ) whose size is n or less, where ∪n Lm = Ln . [sent-44, score-0.186]
27 Let T be a tree and 1 m=1 1 sub(T ) be a function that returns a set of all possible sub-trees in T . [sent-45, score-0.247]
28 For example, a sub-tree ‘a-b-c-d’, where ‘a’, ‘b’, ‘c’ and ‘d’ represent symbols and ‘-’ represents an edge between symbols, covers sub-trees ‘d’, ‘a-c-d’ and ‘b-d’. [sent-47, score-0.09]
29 Formally, sequence and tree kernels can be defined as the same form as K SK,TK (T 1 , T 2 ) = Cu (t1 )γu (t u∈P (Ln ) t1 ∈sub(T 1 ) 1 1 2 ) Cu (t2 )γu (t ) . [sent-51, score-0.575]
30 (1) t2 ∈sub(T 2 ) Note that this formula is also including the node skip framework that is generally introduced only in sequence kernels[7, 1]; λ is the decay factor that handles the gap present in sub-trees u and t. [sent-52, score-0.159]
31 Sequence and tree kernels are defined in recursive formula to calculate them efficiently instead of the explicit calculation of Equation (1). [sent-53, score-0.742]
32 Moreover, when implemented, these kernels can calculated in O(n|T 1 ||T 2 |), where |T | represents the number of nodes in T , by using the DP technique. [sent-54, score-0.351]
33 Note, that if the kernel does not use size restriction, the calculation cost becomes O(|T 1 ||T 2 |). [sent-55, score-0.251]
34 3 Problem of Applying Convolution Kernels to Real tasks According to the original definition of convolution kernels, all of the sub-structures are enumerated and calculated for the kernels. [sent-56, score-0.266]
35 As a result, the dimension of feature space becomes extremely high, and all kernel values K(X, Y ) are very small compared to the kernel value of the object itself, K(X, X). [sent-62, score-0.296]
36 In this situation, the convolution kernel approach can never be trained effectively, and it will behave like a nearest neighbor rule; we obtain a result that is very precise but with very low recall. [sent-63, score-0.313]
37 To avoid this, most conventional methods use an approach that involves smoothing the kernel values or eliminating features based on the sub-structure size. [sent-65, score-0.237]
38 For sequence kernels, [1] use a feature elimination method based on the size of sub-sequence n. [sent-66, score-0.206]
39 This means that the kernel calculation deals only with those sub-sequences whose length is n or less. [sent-67, score-0.279]
40 As well as the sequence kernel, [2] proposed a method that restricts the features based on subtree depth for tree kernels. [sent-68, score-0.452]
41 The main reason for using these kernels is that they allow us to employ structural features simply and efficiently. [sent-73, score-0.348]
42 n = 2 or 3), the full benefits of the kernels are missed. [sent-76, score-0.292]
43 4 Statistical Feature Mining Method for Sequence and Tree Kernels This section proposes a new approach to feature selection, which is based on statistical significant test, in contrast to the conventional methods, which use sub-structure size. [sent-81, score-0.24]
44 In our approach, we test the statistical deviation of all sub-structures in the training samples between the appearance of positive samples and negative samples, and then, select only the substructures which are larger than a certain threshold τ as features. [sent-83, score-0.112]
45 In this paper, we explains our proposed method by using the chi-squared (χ2 ) value as a statistical metric. [sent-85, score-0.12]
46 We note, however, we can use many types of statistical metrics in our proposed Table 1: Contingency table and notation method. [sent-86, score-0.089]
47 for the chi-squared value First, we briefly explain how to calculate c c ¯ row the χ2 value by referring to Table 1. [sent-87, score-0.084]
48 In the kernel calculation with the statistical feature selection, if χ2 (u) < τ holds, that is, u is not statistically significant, then u is eliminated from the features, and the value of u is presumed to be 0 for the kernel value. [sent-95, score-0.519]
49 Therefore, the sequence and tree kernel with feature selection (SK,TK+FS) can be defined as follows: K SK,TK+FS (T 1 , T 2 ) = u∈{u|τ ≤χ2 (u),u∈P (Ln )} t1 ∈sub(T 1 ) 1 Cu (t1 )γu (t 1 2 ) Cu (t2 )γu (t ) . [sent-96, score-0.516]
50 t2 ∈sub(T 2 ) (2) The difference with their original kernels is simply the condition of the first summation, which is τ ≤ χ2 (u). [sent-97, score-0.328]
51 The basic idea of using a statistical metric to select features is quite natural, but it is not a very attractive approach. [sent-98, score-0.165]
52 We note, however, it is not clear how to calculate that kernels efficiently with a statistical feature selection. [sent-99, score-0.498]
53 It is computationally infeasible to calculate χ2 (u) for all possible u with a naive exhaustive method. [sent-100, score-0.084]
54 In our approach, we take advantage of sub-structure mining algorithms in order to calculate χ2 (u) efficiently and to embed statistical feature selection to the kernel calculation. [sent-101, score-0.533]
55 Formally, sub-structure mining is to find the complete set, but no-duplication, of all significant (generally frequent) sub-structures from dataset. [sent-102, score-0.172]
56 Specifically, we apply combination of a sequential pattern mining technique, PrefixSpan [9], and a statistical metric pruning (SMP) method, Apriori SMP [8]. [sent-103, score-0.301]
57 Briefly saying, it finds any sub-sequences uw whose size is n, by searching a single symbol w in the projected database of the sub-sequence (prefix) u of size n − 1. [sent-105, score-0.842]
58 The projected database is a partial database which only contains all postfixes (pointers in the implementation) of appeared the prefix u in the database. [sent-106, score-0.318]
59 It starts searching from n = 1, that is, it enumerates all the significant sub-sequences by the recursive calculation of pattern-growth, searching in the projected database of prefix u and adding a symbol w to u, and prefix-projection, making projected database of uw. [sent-107, score-0.727]
60 The upper bound of the χ2 value of a sequence uv, which is the concatenation of sequences u and v, can be calculated by the value of the contingency table of the prefix u [8]: χ2 (uv) ≤ χ2 (u) = max (chi(Ouc , Ouc ), chi(Ou − Ouc , 0)) . [sent-109, score-0.216]
61 In our context, we can eliminate all (super-)sequences uv from candidates of the feature without the explicit evaluation of uv. [sent-111, score-0.231]
62 Using this property in the PrefixSpan algorithm, we can eliminate to evaluate all the (super)sequences uv by evaluating the upper bound of sequence u. [sent-112, score-0.25]
63 After finding the number of individual symbol w appeared in projected database of u, we evaluate uw in the following three conditions: (1) τ ≤ χ2 (uw), (2) τ > χ2 (uw), τ > χ2 (uw), and (3) τ > χ2 (uw), τ ≤ χ2 (uw). [sent-113, score-0.825]
64 With condition (1), sub-sequence uw is selected as the feature. [sent-114, score-0.564]
65 With condition (2), uw is pruned, that is, all uwv are also pruned from search space. [sent-115, score-0.633]
66 With condition (3), uw is not a significant, however, uwv can be a significant; thus uw is not selected as features, however, mining is continue to uwv. [sent-116, score-1.306]
67 Figure 1 shows an example of searching and pruning the sub-sequences to select significant features by the PrefixSpan with SMP algorithm. [sent-117, score-0.203]
68 1 ˆ χ 2 ( u ') n=1 w χ ( u ') n=2 Projected database Sample id: pointer Ex. [sent-165, score-0.102]
69 Thus, we take advantage of the string (sequence) encoding method for trees and treat them in sequence kernels. [sent-167, score-0.364]
70 Figure 2 shows an example of the string encoding for trees under the postorder traversal. [sent-168, score-0.291]
71 We treat these brackets as a special symbol during the sequential pattern mining phase. [sent-170, score-0.283]
72 We previously said that sequence can be treated as one of trees. [sent-173, score-0.097]
73 We also encode in the case of sequence; for example a sequence ’a b c d’ is encoded in ‘((((a) b) c) d)’. [sent-174, score-0.097]
74 That is, we can define sequence and tree kernels with our feature selection method in the same form. [sent-175, score-0.73]
75 Sequence and Tree Kernels with Statistical Feature Mining: Sequence and Tree kernels with our proposed feature selection method is defined in the following equations. [sent-176, score-0.492]
76 Hn (Ti1 , Tj2 ; D) K SK,TK+FS (T 1 , T 2 ; D) = 1≤i≤|T 1 | (3) 1≤j≤|T 2 | D represents the training data, and i and j represent indices of nods in postorder of T 1 and T 2 , respectively. [sent-177, score-0.117]
77 Let Hn (Ti1 , Tj2 ; D) be a function that returns the sum value of all statistically significant common sub-sequences u if t1 = t2 and |u| ≤ n. [sent-178, score-0.098]
78 Then, let Ju (Ti1 , Tj2 ; D), Ju (Ti1 , Tj2 ; D) and Ju (Ti1 , Tj2 ; D) be functions that calculate the value of the common sub-sequences between Ti1 and Tj2 recursively. [sent-180, score-0.084]
79 Juw (Ti1 , Tj2 ) = Ju (Ti1 , Tj2 ; D) · Iw (t1 , t2 ) if uw ∈ Γn (Ti1 , Tj2 ; D), i j 0 otherwise, (5) where Iw (t1 , t2 ) is a function that returns 1 iff t1 = w and t2 = w, and 0 otherwise. [sent-181, score-0.589]
80 |u|−1 Γn (Ti1 , Tj2 ; D) = {u | u ∈ Γn (Ti1 , Tj2 ; D), τ ≤ χ2 (u), u|u| ∈ ∩i=1 ans(ui )} (8) |u|−1 u|u| ∈ ∩i=1 ans(ui ) evaluates if a sub-sequence u is complete sub-tree, where ans(ui ) returns ancestor of the node ui . [sent-186, score-0.166]
81 The following two equations are introduced for recursive the set operation to calculate Γn (Ti1 , Tj2 ; D) and Γn (Ti1 , Tj2 ; D). [sent-190, score-0.122]
82 (11) In the implementation, χ2 (uw) and χ2 (uw), where uw represents a concatenation of a sequence u and a symbol w, can be calculated by a set of pointers of u against data and the number of appearance of w in backside of the pointers. [sent-192, score-0.842]
83 We note that the set of pointers of uw can be simply obtained from previous search of u. [sent-193, score-0.602]
84 There are some technique in order to calculate kernel faster in the implementation. [sent-196, score-0.193]
85 After that, we look in that results in TRIE instead of explicitly calculate χ2 (u) again when the kernel finds the same sub-sequence. [sent-199, score-0.193]
86 Moreover, when the projected database is exactly the same, these sub-sequences can be merged since the value of χ2 (uv) and χ2 (uv) for any postfix v are exactly the same. [sent-200, score-0.182]
87 By using that, we only have to look up that ˆ index of w to evaluate whether or not any uw are significant features. [sent-202, score-0.528]
88 Equations (4) to (7) can be performed in the same as the original DP based kernel calculation. [sent-203, score-0.109]
89 Moreover, calculating χ2 (u) and χ2 (u) with sub-structure mining alˆ gorithms allow to calculate the same order of the DP based kernel calculation. [sent-264, score-0.365]
90 As a result, statistical feature selection can be embedded in original kernel calculation based on the DP. [sent-265, score-0.448]
91 Essentially, the worst case time complexity of the proposed method will become exponential, since we enumerate individual sub-structures in sub-structure mining phase. [sent-266, score-0.279]
92 However, actual calculation time in the most cases of our experiments is even faster than original kernel calculation, since search space pruning efficiently remove vain calculation and the implementation techniques briefly explained above provide practical calculation speed. [sent-267, score-0.62]
93 We note that if we set τ = 0, which means all features are dealt with kernel calculation, we can get exactly the same kernel value as the original tree kernel. [sent-268, score-0.489]
94 5 Experiments and Results We evaluated the performance of the proposed method in actual NLP tasks, namely English question classification (EQC), subjectivity detection (SD) and polarity identification (PI) tasks. [sent-269, score-0.147]
95 By using these data, we compared the proposed method (SK+FS and TK+FS) with a conventional method (SK or TK), as discussed in Section 3, and with bag-of-words (BOW) Kernel (BOW-K)[4] as baseline methods. [sent-274, score-0.179]
96 We used word sequences for input objects of sequence kernels and word dependency trees for tree kernels. [sent-275, score-0.829]
97 6 Conclusions This paper proposed a statistical feature selection method for sequence kernels and tree kernels. [sent-301, score-0.819]
98 Our approach can select significant features automatically based on a statistical significance test. [sent-302, score-0.138]
99 The proposed method can be embedded in the original DP based kernel calculation process by using sub-structure mining algorithms. [sent-303, score-0.528]
100 Our experiments demonstrated that our method is superior to conventional methods. [sent-304, score-0.103]
wordName wordTfidf (topN-words)
[('uw', 0.528), ('kernels', 0.292), ('convolution', 0.204), ('ju', 0.19), ('ouc', 0.19), ('tree', 0.186), ('mining', 0.172), ('pre', 0.168), ('fs', 0.151), ('xspan', 0.148), ('ou', 0.147), ('calculation', 0.142), ('nlp', 0.127), ('uv', 0.118), ('kernel', 0.109), ('trees', 0.107), ('smp', 0.106), ('database', 0.102), ('cu', 0.101), ('sequence', 0.097), ('postorder', 0.085), ('calculate', 0.084), ('symbol', 0.081), ('tk', 0.08), ('projected', 0.08), ('feature', 0.078), ('cant', 0.075), ('sub', 0.074), ('chi', 0.074), ('sk', 0.073), ('conventional', 0.072), ('string', 0.064), ('ans', 0.063), ('sequences', 0.063), ('returns', 0.061), ('dp', 0.06), ('pruning', 0.058), ('symbols', 0.058), ('features', 0.056), ('ln', 0.055), ('oc', 0.055), ('searching', 0.051), ('language', 0.048), ('pointers', 0.047), ('proposes', 0.046), ('selection', 0.046), ('signi', 0.046), ('ti', 0.045), ('proposed', 0.045), ('statistical', 0.044), ('eqc', 0.042), ('isozaki', 0.042), ('iw', 0.042), ('oij', 0.042), ('subjectivity', 0.042), ('suzuki', 0.042), ('trie', 0.042), ('uwv', 0.042), ('word', 0.042), ('ciently', 0.038), ('select', 0.038), ('ui', 0.038), ('recursive', 0.038), ('moreover', 0.037), ('statistically', 0.037), ('traversal', 0.037), ('subtree', 0.037), ('classi', 0.036), ('hn', 0.036), ('condition', 0.036), ('eliminate', 0.035), ('encoding', 0.035), ('tasks', 0.035), ('structures', 0.035), ('appeared', 0.034), ('ancestor', 0.034), ('eij', 0.034), ('jun', 0.034), ('ntt', 0.034), ('node', 0.033), ('discrete', 0.033), ('concept', 0.033), ('represents', 0.032), ('issue', 0.031), ('enumerate', 0.031), ('method', 0.031), ('appearance', 0.03), ('treat', 0.03), ('polarity', 0.029), ('dealt', 0.029), ('contingency', 0.029), ('sentences', 0.029), ('skip', 0.029), ('embedded', 0.029), ('deals', 0.028), ('search', 0.027), ('calculated', 0.027), ('sd', 0.027), ('cation', 0.027), ('metric', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
Author: Jun Suzuki, Hideki Isozaki
Abstract: This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small sub-structures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method.
2 0.14482473 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
3 0.11976518 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil
Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1
4 0.10189565 103 nips-2005-Kernels for gene regulatory regions
Author: Jean-philippe Vert, Robert Thurman, William S. Noble
Abstract: We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classifier built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identifies a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from five yeast species to recognize classes of cell-cycle regulated genes. Supplementary data is available at http://noble.gs.washington.edu/proj/pkernel. 1
5 0.092977688 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer
Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1
6 0.092027955 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
7 0.089446411 185 nips-2005-Subsequence Kernels for Relation Extraction
8 0.085784383 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables
9 0.076031506 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
10 0.075721607 105 nips-2005-Large-Scale Multiclass Transduction
11 0.072141513 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories
12 0.067930624 77 nips-2005-From Lasso regression to Feature vector machine
13 0.067264363 1 nips-2005-AER Building Blocks for Multi-Layer Multi-Chip Neuromorphic Vision Systems
14 0.066935688 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
15 0.062308516 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
16 0.06203175 151 nips-2005-Pattern Recognition from One Example by Chopping
17 0.059686914 153 nips-2005-Policy-Gradient Methods for Planning
18 0.057553817 78 nips-2005-From Weighted Classification to Policy Search
19 0.056649819 70 nips-2005-Fast Information Value for Graphical Models
20 0.056523543 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
topicId topicWeight
[(0, 0.185), (1, 0.054), (2, -0.021), (3, -0.001), (4, -0.02), (5, 0.064), (6, 0.051), (7, 0.212), (8, 0.103), (9, 0.075), (10, 0.033), (11, -0.144), (12, -0.049), (13, 0.001), (14, -0.085), (15, 0.056), (16, -0.028), (17, -0.128), (18, -0.002), (19, -0.002), (20, 0.037), (21, -0.138), (22, -0.02), (23, -0.15), (24, 0.077), (25, -0.02), (26, -0.051), (27, -0.152), (28, 0.036), (29, 0.096), (30, 0.014), (31, -0.135), (32, 0.014), (33, 0.065), (34, 0.031), (35, 0.042), (36, -0.007), (37, -0.106), (38, -0.053), (39, 0.044), (40, -0.025), (41, -0.018), (42, 0.029), (43, 0.034), (44, 0.146), (45, 0.009), (46, -0.108), (47, -0.089), (48, -0.02), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.96150249 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
Author: Jun Suzuki, Hideki Isozaki
Abstract: This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small sub-structures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method.
2 0.76499403 103 nips-2005-Kernels for gene regulatory regions
Author: Jean-philippe Vert, Robert Thurman, William S. Noble
Abstract: We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classifier built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identifies a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from five yeast species to recognize classes of cell-cycle regulated genes. Supplementary data is available at http://noble.gs.washington.edu/proj/pkernel. 1
3 0.66875321 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
4 0.6087938 185 nips-2005-Subsequence Kernels for Relation Extraction
Author: Raymond J. Mooney, Razvan C. Bunescu
Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1
5 0.52709186 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer
Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1
6 0.45832095 77 nips-2005-From Lasso regression to Feature vector machine
7 0.44870728 105 nips-2005-Large-Scale Multiclass Transduction
8 0.44690949 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables
9 0.42165434 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
10 0.41503865 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression
11 0.4057143 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
12 0.40442204 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
13 0.37925321 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
14 0.3721678 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
15 0.36625507 71 nips-2005-Fast Krylov Methods for N-Body Learning
16 0.36418262 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection
17 0.35271272 125 nips-2005-Message passing for task redistribution on sparse graphs
18 0.34539193 62 nips-2005-Efficient Estimation of OOMs
19 0.34332404 121 nips-2005-Location-based activity recognition
20 0.34294614 196 nips-2005-Two view learning: SVM-2K, Theory and Practice
topicId topicWeight
[(3, 0.055), (9, 0.296), (10, 0.022), (27, 0.041), (31, 0.072), (34, 0.13), (39, 0.023), (41, 0.02), (50, 0.011), (55, 0.034), (67, 0.012), (69, 0.042), (73, 0.042), (77, 0.011), (88, 0.083), (91, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.76318634 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
Author: Jun Suzuki, Hideki Isozaki
Abstract: This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small sub-structures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method.
2 0.70472276 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
Author: Yoshua Bengio, Olivier Delalleau, Nicolas L. Roux
Abstract: We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior – with similarity between examples expressed with a local kernel – are sensitive to the curse of dimensionality, or more precisely to the variability of the target. Our discussion covers supervised, semisupervised and unsupervised learning algorithms. These algorithms are found to be local in the sense that crucial properties of the learned function at x depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical non-parametric statistical learning. We show in the case of the Gaussian kernel that when the function to be learned has many variations, these algorithms require a number of training examples proportional to the number of variations, which could be large even though there may exist short descriptions of the target function, i.e. their Kolmogorov complexity may be low. This suggests that there exist non-local learning algorithms that at least have the potential to learn about such structured but apparently complex functions (because locally they have many variations), while not using very specific prior domain knowledge. 1
3 0.53655934 184 nips-2005-Structured Prediction via the Extragradient Method
Author: Ben Taskar, Simon Lacoste-Julian, Michael I. Jordan
Abstract: We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1
4 0.53373879 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
Author: John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientific computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1
5 0.53226525 78 nips-2005-From Weighted Classification to Policy Search
Author: Doron Blatt, Alfred O. Hero
Abstract: This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems. The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcement learning problem into a sequence of single-stage reinforcement learning subproblems, each of which is solved via an exact reduction to a weighted-classification problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning problem into simpler supervised learning subproblems. It is shown that the method converges in a finite number of steps to a solution that cannot be further improved by componentwise optimization. The implication of the proposed algorithm is that a plethora of classification methods can be applied to find policies in the reinforcement learning problem. 1
6 0.51820296 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification
7 0.51646525 144 nips-2005-Off-policy Learning with Options and Recognizers
8 0.51621121 50 nips-2005-Convex Neural Networks
9 0.51368171 47 nips-2005-Consistency of one-class SVM and related algorithms
10 0.51362038 105 nips-2005-Large-Scale Multiclass Transduction
11 0.51094115 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
12 0.50994152 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
13 0.50960821 178 nips-2005-Soft Clustering on Graphs
14 0.50925726 114 nips-2005-Learning Rankings via Convex Hull Separation
15 0.50897962 195 nips-2005-Transfer learning for text classification
16 0.5087893 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
17 0.50855541 23 nips-2005-An Application of Markov Random Fields to Range Sensing
18 0.50397032 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables
19 0.50253129 77 nips-2005-From Lasso regression to Feature vector machine
20 0.50184542 98 nips-2005-Infinite latent feature models and the Indian buffet process