nips nips2005 nips2005-175 knowledge-graph by maker-knowledge-mining

175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

Source: pdf

Author: Jun Suzuki, Hideki Isozaki

Abstract: This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small sub-structures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. The proposed method, in order to execute efﬁciently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. Experiments on real NLP tasks conﬁrm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. [sent-6, score-0.747]

2 Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. [sent-7, score-0.649]

3 This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger sub-structures effectively. [sent-9, score-0.741]

4 The proposed method, in order to execute efﬁciently, can be embedded into an original kernel calculation process by using sub-structure mining algorithms. [sent-10, score-0.497]

5 Experiments on real NLP tasks conﬁrm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method. [sent-11, score-0.286]

6 Conceptually, these proposed kernels are deﬁned as instances of convolution kernels [3, 11], which provides the concept of kernels over discrete structures. [sent-13, score-1.191]

7 However, unfortunately, experiments have shown that in some cases there is a critical issue with convolution kernels in NLP tasks [2, 1, 10]. [sent-14, score-0.562]

8 That is, since natural language data contain many types of symbols, NLP tasks usually deal with extremely high dimension and sparse feature space. [sent-15, score-0.161]

9 As a result, the convolution kernel approach can never be trained effectively, and it behaves like a nearest neighbor rule. [sent-16, score-0.313]

10 To avoid this issue, we generally eliminate large sub-structures from the set of features used. [sent-17, score-0.091]

11 However, the main reason for using convolution kernels is that we aim to use structural features easily and efﬁciently. [sent-18, score-0.552]

12 If their use is limited to only very small structures, this negates the advantages of using convolution kernels. [sent-19, score-0.204]

13 This paper discusses this issue of convolution kernels, in particular sequence and tree ker- nels, and proposes a new method based on statistical signiﬁcant test. [sent-20, score-0.639]

14 The proposed method deals only with those features that are statistically signiﬁcant for solving the target task, and large signiﬁcant sub-structures can be used without over-ﬁtting. [sent-21, score-0.197]

15 Moreover, by using sub-structure mining algorithms, the proposed method can be executed efﬁciently by embedding it in an original kernel calculation process, which is deﬁned by the dynamicprogramming (DP) based calculation. [sent-22, score-0.499]

16 2 Convolution Kernels for Sequences and Trees Convolution kernels have been proposed as a concept of kernels for discrete structures, such as sequences, trees and graphs. [sent-23, score-0.802]

17 This framework deﬁnes the kernel function between input objects as the convolution of “sub-kernels”, i. [sent-24, score-0.313]

18 the kernels for the decompositions (parts or sub-structures) of the objects. [sent-26, score-0.292]

19 Conceptually, convolution kernels K(X, Y ) enumerate all sub-structures occurring in X and Y and then calculate their inner product, which is simply written as: K(X, Y ) = φ(X), φ(Y ) = i φi (X) · φi (Y ). [sent-28, score-0.611]

20 φ represents the feature mapping from the discrete object to the feature space; that is, φ(X) = (φ1 (X), . [sent-29, score-0.221]

21 Therefore, with sequence kernels, input objects X and Y are sequences, and φi (X) is a sub-sequence; with tree kernels, X and Y are trees, and φi (X) is a sub-tree. [sent-36, score-0.283]

22 Up to now, many kinds of sequence and tree kernels have been proposed for a variety of different tasks. [sent-37, score-0.62]

23 To clarify the discussion, this paper basically follows the framework of [1], which proposed a gapped word sequence kernel, and [5], which introduced a labeled ordered tree kernel. [sent-38, score-0.37]

24 We can treat that sequence is one of the special form of trees if we say sequences are rooted by their last symbol and each node has one child each of a previous symbol. [sent-39, score-0.411]

25 Then, let Ln be a set of symbols whose sizes are n and P (Ln ) be a set of trees that are constructed by Ln . [sent-42, score-0.165]

26 We denote a tree u ∈ P (Ln ) whose size is n or less, where ∪n Lm = Ln . [sent-44, score-0.186]

27 Let T be a tree and 1 m=1 1 sub(T ) be a function that returns a set of all possible sub-trees in T . [sent-45, score-0.247]

28 For example, a sub-tree ‘a-b-c-d’, where ‘a’, ‘b’, ‘c’ and ‘d’ represent symbols and ‘-’ represents an edge between symbols, covers sub-trees ‘d’, ‘a-c-d’ and ‘b-d’. [sent-47, score-0.09]

29 Formally, sequence and tree kernels can be deﬁned as the same form as K SK,TK (T 1 , T 2 ) = Cu (t1 )γu (t u∈P (Ln ) t1 ∈sub(T 1 ) 1 1 2 ) Cu (t2 )γu (t ) . [sent-51, score-0.575]

30 (1) t2 ∈sub(T 2 ) Note that this formula is also including the node skip framework that is generally introduced only in sequence kernels[7, 1]; λ is the decay factor that handles the gap present in sub-trees u and t. [sent-52, score-0.159]

31 Sequence and tree kernels are deﬁned in recursive formula to calculate them efﬁciently instead of the explicit calculation of Equation (1). [sent-53, score-0.742]

32 Moreover, when implemented, these kernels can calculated in O(n|T 1 ||T 2 |), where |T | represents the number of nodes in T , by using the DP technique. [sent-54, score-0.351]

33 Note, that if the kernel does not use size restriction, the calculation cost becomes O(|T 1 ||T 2 |). [sent-55, score-0.251]

34 3 Problem of Applying Convolution Kernels to Real tasks According to the original deﬁnition of convolution kernels, all of the sub-structures are enumerated and calculated for the kernels. [sent-56, score-0.266]

35 As a result, the dimension of feature space becomes extremely high, and all kernel values K(X, Y ) are very small compared to the kernel value of the object itself, K(X, X). [sent-62, score-0.296]

36 In this situation, the convolution kernel approach can never be trained effectively, and it will behave like a nearest neighbor rule; we obtain a result that is very precise but with very low recall. [sent-63, score-0.313]

37 To avoid this, most conventional methods use an approach that involves smoothing the kernel values or eliminating features based on the sub-structure size. [sent-65, score-0.237]

38 For sequence kernels, [1] use a feature elimination method based on the size of sub-sequence n. [sent-66, score-0.206]

39 This means that the kernel calculation deals only with those sub-sequences whose length is n or less. [sent-67, score-0.279]

40 As well as the sequence kernel, [2] proposed a method that restricts the features based on subtree depth for tree kernels. [sent-68, score-0.452]

41 The main reason for using these kernels is that they allow us to employ structural features simply and efﬁciently. [sent-73, score-0.348]

42 n = 2 or 3), the full beneﬁts of the kernels are missed. [sent-76, score-0.292]

43 4 Statistical Feature Mining Method for Sequence and Tree Kernels This section proposes a new approach to feature selection, which is based on statistical signiﬁcant test, in contrast to the conventional methods, which use sub-structure size. [sent-81, score-0.24]

44 In our approach, we test the statistical deviation of all sub-structures in the training samples between the appearance of positive samples and negative samples, and then, select only the substructures which are larger than a certain threshold τ as features. [sent-83, score-0.112]

45 In this paper, we explains our proposed method by using the chi-squared (χ2 ) value as a statistical metric. [sent-85, score-0.12]

46 We note, however, we can use many types of statistical metrics in our proposed Table 1: Contingency table and notation method. [sent-86, score-0.089]

47 for the chi-squared value First, we brieﬂy explain how to calculate c c ¯ row the χ2 value by referring to Table 1. [sent-87, score-0.084]

48 In the kernel calculation with the statistical feature selection, if χ2 (u) < τ holds, that is, u is not statistically signiﬁcant, then u is eliminated from the features, and the value of u is presumed to be 0 for the kernel value. [sent-95, score-0.519]

49 Therefore, the sequence and tree kernel with feature selection (SK,TK+FS) can be deﬁned as follows: K SK,TK+FS (T 1 , T 2 ) = u∈{u|τ ≤χ2 (u),u∈P (Ln )} t1 ∈sub(T 1 ) 1 Cu (t1 )γu (t 1 2 ) Cu (t2 )γu (t ) . [sent-96, score-0.516]

50 t2 ∈sub(T 2 ) (2) The difference with their original kernels is simply the condition of the ﬁrst summation, which is τ ≤ χ2 (u). [sent-97, score-0.328]

51 The basic idea of using a statistical metric to select features is quite natural, but it is not a very attractive approach. [sent-98, score-0.165]

52 We note, however, it is not clear how to calculate that kernels efﬁciently with a statistical feature selection. [sent-99, score-0.498]

53 It is computationally infeasible to calculate χ2 (u) for all possible u with a naive exhaustive method. [sent-100, score-0.084]

54 In our approach, we take advantage of sub-structure mining algorithms in order to calculate χ2 (u) efﬁciently and to embed statistical feature selection to the kernel calculation. [sent-101, score-0.533]

55 Formally, sub-structure mining is to ﬁnd the complete set, but no-duplication, of all signiﬁcant (generally frequent) sub-structures from dataset. [sent-102, score-0.172]

56 Speciﬁcally, we apply combination of a sequential pattern mining technique, PreﬁxSpan [9], and a statistical metric pruning (SMP) method, Apriori SMP [8]. [sent-103, score-0.301]

57 Brieﬂy saying, it ﬁnds any sub-sequences uw whose size is n, by searching a single symbol w in the projected database of the sub-sequence (preﬁx) u of size n − 1. [sent-105, score-0.842]

58 The projected database is a partial database which only contains all postﬁxes (pointers in the implementation) of appeared the preﬁx u in the database. [sent-106, score-0.318]

59 It starts searching from n = 1, that is, it enumerates all the signiﬁcant sub-sequences by the recursive calculation of pattern-growth, searching in the projected database of preﬁx u and adding a symbol w to u, and preﬁx-projection, making projected database of uw. [sent-107, score-0.727]

60 The upper bound of the χ2 value of a sequence uv, which is the concatenation of sequences u and v, can be calculated by the value of the contingency table of the preﬁx u [8]: χ2 (uv) ≤ χ2 (u) = max (chi(Ouc , Ouc ), chi(Ou − Ouc , 0)) . [sent-109, score-0.216]

61 In our context, we can eliminate all (super-)sequences uv from candidates of the feature without the explicit evaluation of uv. [sent-111, score-0.231]

62 Using this property in the PreﬁxSpan algorithm, we can eliminate to evaluate all the (super)sequences uv by evaluating the upper bound of sequence u. [sent-112, score-0.25]

63 After ﬁnding the number of individual symbol w appeared in projected database of u, we evaluate uw in the following three conditions: (1) τ ≤ χ2 (uw), (2) τ > χ2 (uw), τ > χ2 (uw), and (3) τ > χ2 (uw), τ ≤ χ2 (uw). [sent-113, score-0.825]

64 With condition (1), sub-sequence uw is selected as the feature. [sent-114, score-0.564]

65 With condition (2), uw is pruned, that is, all uwv are also pruned from search space. [sent-115, score-0.633]

66 With condition (3), uw is not a signiﬁcant, however, uwv can be a signiﬁcant; thus uw is not selected as features, however, mining is continue to uwv. [sent-116, score-1.306]

67 Figure 1 shows an example of searching and pruning the sub-sequences to select signiﬁcant features by the PreﬁxSpan with SMP algorithm. [sent-117, score-0.203]

68 1 ˆ χ 2 ( u ') n=1 w χ ( u ') n=2 Projected database Sample id: pointer Ex. [sent-165, score-0.102]

69 Thus, we take advantage of the string (sequence) encoding method for trees and treat them in sequence kernels. [sent-167, score-0.364]

70 Figure 2 shows an example of the string encoding for trees under the postorder traversal. [sent-168, score-0.291]

71 We treat these brackets as a special symbol during the sequential pattern mining phase. [sent-170, score-0.283]

72 We previously said that sequence can be treated as one of trees. [sent-173, score-0.097]

73 We also encode in the case of sequence; for example a sequence ’a b c d’ is encoded in ‘((((a) b) c) d)’. [sent-174, score-0.097]

74 That is, we can deﬁne sequence and tree kernels with our feature selection method in the same form. [sent-175, score-0.73]

75 Sequence and Tree Kernels with Statistical Feature Mining: Sequence and Tree kernels with our proposed feature selection method is deﬁned in the following equations. [sent-176, score-0.492]

76 Hn (Ti1 , Tj2 ; D) K SK,TK+FS (T 1 , T 2 ; D) = 1≤i≤|T 1 | (3) 1≤j≤|T 2 | D represents the training data, and i and j represent indices of nods in postorder of T 1 and T 2 , respectively. [sent-177, score-0.117]

77 Let Hn (Ti1 , Tj2 ; D) be a function that returns the sum value of all statistically signiﬁcant common sub-sequences u if t1 = t2 and |u| ≤ n. [sent-178, score-0.098]

78 Then, let Ju (Ti1 , Tj2 ; D), Ju (Ti1 , Tj2 ; D) and Ju (Ti1 , Tj2 ; D) be functions that calculate the value of the common sub-sequences between Ti1 and Tj2 recursively. [sent-180, score-0.084]

79 Juw (Ti1 , Tj2 ) = Ju (Ti1 , Tj2 ; D) · Iw (t1 , t2 ) if uw ∈ Γn (Ti1 , Tj2 ; D), i j 0 otherwise, (5) where Iw (t1 , t2 ) is a function that returns 1 iff t1 = w and t2 = w, and 0 otherwise. [sent-181, score-0.589]

80 |u|−1 Γn (Ti1 , Tj2 ; D) = {u | u ∈ Γn (Ti1 , Tj2 ; D), τ ≤ χ2 (u), u|u| ∈ ∩i=1 ans(ui )} (8) |u|−1 u|u| ∈ ∩i=1 ans(ui ) evaluates if a sub-sequence u is complete sub-tree, where ans(ui ) returns ancestor of the node ui . [sent-186, score-0.166]

81 The following two equations are introduced for recursive the set operation to calculate Γn (Ti1 , Tj2 ; D) and Γn (Ti1 , Tj2 ; D). [sent-190, score-0.122]

82 (11) In the implementation, χ2 (uw) and χ2 (uw), where uw represents a concatenation of a sequence u and a symbol w, can be calculated by a set of pointers of u against data and the number of appearance of w in backside of the pointers. [sent-192, score-0.842]

83 We note that the set of pointers of uw can be simply obtained from previous search of u. [sent-193, score-0.602]

84 There are some technique in order to calculate kernel faster in the implementation. [sent-196, score-0.193]

85 After that, we look in that results in TRIE instead of explicitly calculate χ2 (u) again when the kernel ﬁnds the same sub-sequence. [sent-199, score-0.193]

86 Moreover, when the projected database is exactly the same, these sub-sequences can be merged since the value of χ2 (uv) and χ2 (uv) for any postﬁx v are exactly the same. [sent-200, score-0.182]

87 By using that, we only have to look up that ˆ index of w to evaluate whether or not any uw are signiﬁcant features. [sent-202, score-0.528]

88 Equations (4) to (7) can be performed in the same as the original DP based kernel calculation. [sent-203, score-0.109]

89 Moreover, calculating χ2 (u) and χ2 (u) with sub-structure mining alˆ gorithms allow to calculate the same order of the DP based kernel calculation. [sent-264, score-0.365]

90 As a result, statistical feature selection can be embedded in original kernel calculation based on the DP. [sent-265, score-0.448]

91 Essentially, the worst case time complexity of the proposed method will become exponential, since we enumerate individual sub-structures in sub-structure mining phase. [sent-266, score-0.279]

92 However, actual calculation time in the most cases of our experiments is even faster than original kernel calculation, since search space pruning efﬁciently remove vain calculation and the implementation techniques brieﬂy explained above provide practical calculation speed. [sent-267, score-0.62]

93 We note that if we set τ = 0, which means all features are dealt with kernel calculation, we can get exactly the same kernel value as the original tree kernel. [sent-268, score-0.489]

94 5 Experiments and Results We evaluated the performance of the proposed method in actual NLP tasks, namely English question classiﬁcation (EQC), subjectivity detection (SD) and polarity identiﬁcation (PI) tasks. [sent-269, score-0.147]

95 By using these data, we compared the proposed method (SK+FS and TK+FS) with a conventional method (SK or TK), as discussed in Section 3, and with bag-of-words (BOW) Kernel (BOW-K)[4] as baseline methods. [sent-274, score-0.179]

96 We used word sequences for input objects of sequence kernels and word dependency trees for tree kernels. [sent-275, score-0.829]

97 6 Conclusions This paper proposed a statistical feature selection method for sequence kernels and tree kernels. [sent-301, score-0.819]

98 Our approach can select signiﬁcant features automatically based on a statistical signiﬁcance test. [sent-302, score-0.138]

99 The proposed method can be embedded in the original DP based kernel calculation process by using sub-structure mining algorithms. [sent-303, score-0.528]

100 Our experiments demonstrated that our method is superior to conventional methods. [sent-304, score-0.103]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('uw', 0.528), ('kernels', 0.292), ('convolution', 0.204), ('ju', 0.19), ('ouc', 0.19), ('tree', 0.186), ('mining', 0.172), ('pre', 0.168), ('fs', 0.151), ('xspan', 0.148), ('ou', 0.147), ('calculation', 0.142), ('nlp', 0.127), ('uv', 0.118), ('kernel', 0.109), ('trees', 0.107), ('smp', 0.106), ('database', 0.102), ('cu', 0.101), ('sequence', 0.097), ('postorder', 0.085), ('calculate', 0.084), ('symbol', 0.081), ('tk', 0.08), ('projected', 0.08), ('feature', 0.078), ('cant', 0.075), ('sub', 0.074), ('chi', 0.074), ('sk', 0.073), ('conventional', 0.072), ('string', 0.064), ('ans', 0.063), ('sequences', 0.063), ('returns', 0.061), ('dp', 0.06), ('pruning', 0.058), ('symbols', 0.058), ('features', 0.056), ('ln', 0.055), ('oc', 0.055), ('searching', 0.051), ('language', 0.048), ('pointers', 0.047), ('proposes', 0.046), ('selection', 0.046), ('signi', 0.046), ('ti', 0.045), ('proposed', 0.045), ('statistical', 0.044), ('eqc', 0.042), ('isozaki', 0.042), ('iw', 0.042), ('oij', 0.042), ('subjectivity', 0.042), ('suzuki', 0.042), ('trie', 0.042), ('uwv', 0.042), ('word', 0.042), ('ciently', 0.038), ('select', 0.038), ('ui', 0.038), ('recursive', 0.038), ('moreover', 0.037), ('statistically', 0.037), ('traversal', 0.037), ('subtree', 0.037), ('classi', 0.036), ('hn', 0.036), ('condition', 0.036), ('eliminate', 0.035), ('encoding', 0.035), ('tasks', 0.035), ('structures', 0.035), ('appeared', 0.034), ('ancestor', 0.034), ('eij', 0.034), ('jun', 0.034), ('ntt', 0.034), ('node', 0.033), ('discrete', 0.033), ('concept', 0.033), ('represents', 0.032), ('issue', 0.031), ('enumerate', 0.031), ('method', 0.031), ('appearance', 0.03), ('treat', 0.03), ('polarity', 0.029), ('dealt', 0.029), ('contingency', 0.029), ('sentences', 0.029), ('skip', 0.029), ('embedded', 0.029), ('deals', 0.028), ('search', 0.027), ('calculated', 0.027), ('sd', 0.027), ('cation', 0.027), ('metric', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

Author: Jun Suzuki, Hideki Isozaki

2 0.14482473 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore

Abstract: In this paper we consider the problem of ﬁnding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efﬁciently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.

3 0.11976518 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1

4 0.10189565 103 nips-2005-Kernels for gene regulatory regions

Author: Jean-philippe Vert, Robert Thurman, William S. Noble

Abstract: We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classiﬁer built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identiﬁes a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from ﬁve yeast species to recognize classes of cell-cycle regulated genes. Supplementary data is available at http://noble.gs.washington.edu/proj/pkernel. 1

5 0.092977688 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classiﬁcation, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-inﬁnite linear program that can be efﬁciently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classiﬁcation. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

6 0.092027955 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

7 0.089446411 185 nips-2005-Subsequence Kernels for Relation Extraction

8 0.085784383 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

9 0.076031506 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

10 0.075721607 105 nips-2005-Large-Scale Multiclass Transduction

11 0.072141513 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories

12 0.067930624 77 nips-2005-From Lasso regression to Feature vector machine

13 0.067264363 1 nips-2005-AER Building Blocks for Multi-Layer Multi-Chip Neuromorphic Vision Systems

14 0.066935688 69 nips-2005-Fast Gaussian Process Regression using KD-Trees

15 0.062308516 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

16 0.06203175 151 nips-2005-Pattern Recognition from One Example by Chopping

17 0.059686914 153 nips-2005-Policy-Gradient Methods for Planning

18 0.057553817 78 nips-2005-From Weighted Classification to Policy Search

19 0.056649819 70 nips-2005-Fast Information Value for Graphical Models

20 0.056523543 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, 0.054), (2, -0.021), (3, -0.001), (4, -0.02), (5, 0.064), (6, 0.051), (7, 0.212), (8, 0.103), (9, 0.075), (10, 0.033), (11, -0.144), (12, -0.049), (13, 0.001), (14, -0.085), (15, 0.056), (16, -0.028), (17, -0.128), (18, -0.002), (19, -0.002), (20, 0.037), (21, -0.138), (22, -0.02), (23, -0.15), (24, 0.077), (25, -0.02), (26, -0.051), (27, -0.152), (28, 0.036), (29, 0.096), (30, 0.014), (31, -0.135), (32, 0.014), (33, 0.065), (34, 0.031), (35, 0.042), (36, -0.007), (37, -0.106), (38, -0.053), (39, 0.044), (40, -0.025), (41, -0.018), (42, 0.029), (43, 0.034), (44, 0.146), (45, 0.009), (46, -0.108), (47, -0.089), (48, -0.02), (49, -0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96150249 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

Author: Jun Suzuki, Hideki Isozaki

2 0.76499403 103 nips-2005-Kernels for gene regulatory regions

Author: Jean-philippe Vert, Robert Thurman, William S. Noble

3 0.66875321 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore

4 0.6087938 185 nips-2005-Subsequence Kernels for Relation Extraction

Author: Raymond J. Mooney, Razvan C. Bunescu

Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1

5 0.52709186 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

6 0.45832095 77 nips-2005-From Lasso regression to Feature vector machine

7 0.44870728 105 nips-2005-Large-Scale Multiclass Transduction

8 0.44690949 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

9 0.42165434 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

10 0.41503865 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

11 0.4057143 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

12 0.40442204 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

13 0.37925321 69 nips-2005-Fast Gaussian Process Regression using KD-Trees

14 0.3721678 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

15 0.36625507 71 nips-2005-Fast Krylov Methods for N-Body Learning

16 0.36418262 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection

17 0.35271272 125 nips-2005-Message passing for task redistribution on sparse graphs

18 0.34539193 62 nips-2005-Efficient Estimation of OOMs

19 0.34332404 121 nips-2005-Location-based activity recognition

20 0.34294614 196 nips-2005-Two view learning: SVM-2K, Theory and Practice

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.055), (9, 0.296), (10, 0.022), (27, 0.041), (31, 0.072), (34, 0.13), (39, 0.023), (41, 0.02), (50, 0.011), (55, 0.034), (67, 0.012), (69, 0.042), (73, 0.042), (77, 0.011), (88, 0.083), (91, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76318634 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

Author: Jun Suzuki, Hideki Isozaki

2 0.70472276 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

Author: Yoshua Bengio, Olivier Delalleau, Nicolas L. Roux

Abstract: We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior – with similarity between examples expressed with a local kernel – are sensitive to the curse of dimensionality, or more precisely to the variability of the target. Our discussion covers supervised, semisupervised and unsupervised learning algorithms. These algorithms are found to be local in the sense that crucial properties of the learned function at x depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical non-parametric statistical learning. We show in the case of the Gaussian kernel that when the function to be learned has many variations, these algorithms require a number of training examples proportional to the number of variations, which could be large even though there may exist short descriptions of the target function, i.e. their Kolmogorov complexity may be low. This suggests that there exist non-local learning algorithms that at least have the potential to learn about such structured but apparently complex functions (because locally they have many variations), while not using very speciﬁc prior domain knowledge. 1

3 0.53655934 184 nips-2005-Structured Prediction via the Extragradient Method

Author: Ben Taskar, Simon Lacoste-Julian, Michael I. Jordan

Abstract: We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic ﬂow. This makes the approach an efﬁcient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1

4 0.53373879 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

Author: John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientiﬁc computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean ﬁeld approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1

5 0.53226525 78 nips-2005-From Weighted Classification to Policy Search

Author: Doron Blatt, Alfred O. Hero

Abstract: This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems. The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcement learning problem into a sequence of single-stage reinforcement learning subproblems, each of which is solved via an exact reduction to a weighted-classiﬁcation problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning problem into simpler supervised learning subproblems. It is shown that the method converges in a ﬁnite number of steps to a solution that cannot be further improved by componentwise optimization. The implication of the proposed algorithm is that a plethora of classiﬁcation methods can be applied to ﬁnd policies in the reinforcement learning problem. 1

6 0.51820296 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

7 0.51646525 144 nips-2005-Off-policy Learning with Options and Recognizers

8 0.51621121 50 nips-2005-Convex Neural Networks

9 0.51368171 47 nips-2005-Consistency of one-class SVM and related algorithms

10 0.51362038 105 nips-2005-Large-Scale Multiclass Transduction

11 0.51094115 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

12 0.50994152 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

13 0.50960821 178 nips-2005-Soft Clustering on Graphs

14 0.50925726 114 nips-2005-Learning Rankings via Convex Hull Separation

15 0.50897962 195 nips-2005-Transfer learning for text classification

16 0.5087893 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

17 0.50855541 23 nips-2005-An Application of Markov Random Fields to Range Sensing

18 0.50397032 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

19 0.50253129 77 nips-2005-From Lasso regression to Feature vector machine

20 0.50184542 98 nips-2005-Infinite latent feature models and the Indian buffet process