jmlr jmlr2008 jmlr2008-9 knowledge-graph by maker-knowledge-mining

9 jmlr-2008-Active Learning by Spherical Subdivision

Source: pdf

Author: Falk-Florian Henrich, Klaus Obermayer

Abstract: We introduce a computationally feasible, “constructive” active learning method for binary classiﬁcation. The learning algorithm is initially formulated for separable classiﬁcation problems, for a hyperspherical data space with constant data density, and for great spheres as classiﬁers. In order to reduce computational complexity the version space is restricted to spherical simplices and learning procedes by subdividing the edges of maximal length. We show that this procedure optimally reduces a tight upper bound on the generalization error. The method is then extended to other separable classiﬁcation problems using products of spheres as data spaces and isometries induced by charts of the sphere. An upper bound is provided for the probability of disagreement between classiﬁers (hence the generalization error) for non-constant data densities on the sphere. The emphasis of this work lies on providing mathematically exact performance estimates for active learning strategies. Keywords: active learning, spherical subdivision, error bounds, simplex halving

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The learning algorithm is initially formulated for separable classiﬁcation problems, for a hyperspherical data space with constant data density, and for great spheres as classiﬁers. [sent-8, score-0.275]

2 In order to reduce computational complexity the version space is restricted to spherical simplices and learning procedes by subdividing the edges of maximal length. [sent-9, score-0.394]

3 The method is then extended to other separable classiﬁcation problems using products of spheres as data spaces and isometries induced by charts of the sphere. [sent-11, score-0.482]

4 Keywords: active learning, spherical subdivision, error bounds, simplex halving 1. [sent-14, score-0.818]

5 The article is organized as follows: After introducing the geometric setup of active learning for binary classiﬁcation (Section 2), we formulate a learning method for data distributed on the unit sphere Sn ⊂ n+1 (Section 3). [sent-53, score-0.377]

6 Thereafter, we extend the basic algorithm to a broader class ¡ ¡ ¡ 106 ACTIVE L EARNING BY S PHERICAL S UBDIVISION of separable, binary classiﬁcation problems using isometries induced by charts of the sphere and products of hyperspheres (Sections 4, 5). [sent-54, score-0.472]

7 Upper bounds are provided for the generalization error for nonconstant data densities for binary classiﬁcation problems on the sphere (Section 6) and for linear classiﬁers without bias in n (Section 7). [sent-56, score-0.394]

8 However, this is not the case, because n , or any submanifold therein, can be embedded into S n , the n-sphere, by the inverse of stereographic projection (see Section 4). [sent-64, score-0.283]

9 The use of closed hemispheres as classiﬁers is appropriate for spherical data, since hemispheres are the direct analog to half spaces in Euclidean geometry. [sent-92, score-0.433]

10 Since the total volume of the unit sphere R V := Sn ω with respect to its canonical Riemannian metric and volume form is not equal to one, we have to normalize: d G (c1 , c2 ) = 1 V Z D(c1 ,c2 ) ω= 1 V α 1 V = d(c1 , c2 ). [sent-99, score-0.518]

11 Geometrically, V is a convex spherical polytope, a high-dimensional generalization of a spherical polygon whose edges are segments of great circles and whose (n − 1)dimensional facets are segments of (n − 1)-dimensional great spheres within S n . [sent-108, score-0.893]

12 This results in a set of admissible classiﬁers S ⊂ Sn which is an equilateral simplex on Sn . [sent-118, score-0.5]

13 Select one of the edges of maximal length of the current simplex S. [sent-120, score-0.637]

14 Replace the old simplex S by the part in direction of u, that is, the simplex which has as vertices m as well as all vertices of S with exception of that end point e of the chosen longest edge for which u, e < 0. [sent-129, score-1.135]

15 Figure 1 illustrates one iteration of the simplex algorithm on the sphere S 2 . [sent-134, score-0.657]

16 The “random orthogonal c longest edge m u o b a Figure 1: The drawing above shows one iteration of the simplex algorithm for the two-dimensional case, that is, for spherical triangles on the unit sphere S 2 . [sent-135, score-1.062]

17 The worst case generalization error after each iteration can be computed 109 H ENRICH AND O BERMAYER by evaluating the maximum spherical distance of the chosen classiﬁer to the vertices of the spherical simplex. [sent-142, score-0.672]

18 To be explicit, the following statement holds: Proposition 3 If S is the current simplex from the simplex algorithm (see Deﬁnition 2) with vertices v1 , . [sent-143, score-0.953]

19 Moreover, if c ∈ S denotes the center of mass, then d G (c∗ , c) ≤ max d(c, vi ) = max arccos c, vi i i is a tight and attainable upper bound for the generalization error. [sent-148, score-0.428]

20 If all elements of the simplex are admissible classiﬁers, the bound is tight. [sent-151, score-0.443]

21 Clearly, the maximal edge length of the simplex S converges to zero. [sent-153, score-0.649]

22 In Appendix A, we derive O((n+1)3 ) as a rough complexity estimate for one iteration of the simplex algorithm (see Deﬁnition 2). [sent-154, score-0.443]

23 Another question concerning the convergence of the algorithm is: How many iterations are needed until the maximum edge length of the initial simplex starts to drop? [sent-155, score-0.572]

24 Proposition 4 Let S be the initial equilateral simplex from the simplex algorithm (see Deﬁnition 2). [sent-157, score-0.943]

25 The easiest method to obtain isometries from the n-sphere to other data spaces is to consider charts of the sphere together with the induced metric. [sent-162, score-0.419]

26 Being isometries, they preserve the geometry, and any generalization bounds derived for the sphere can be applied without modiﬁcations. [sent-163, score-0.318]

27 The stereographic projection σN : Sn \ {N} → n , (x1 , . [sent-168, score-0.283]

28 Hence, we can identify S n \ {N} with north pole N of the sphere point on the sphere 2 ¡ plane of projection ( ) image under projection Figure 2: The stereographic projection S2 \{N} → 2 is a diffeomorphism from the sphere with the north pole removed to the plane. [sent-179, score-1.302]

29 If one considers a ray from the north pole to some point on the plane, the stereographic projection will map the intersection point of this ray with the sphere onto its intersection point with the plane. [sent-181, score-0.656]

30 When viewed under stereographic projection, our spherical classiﬁers fall in three categories: If the boundary of the hemisphere, which is a great (n − 1)-sphere within Sn , contains the north pole, its projection is a hyperplane through the origin in n . [sent-190, score-0.626]

31 The equatorial great sphere {x ∈ Sn ⊂ n+1 | xn+1 = 0} is projected onto Sn−1 ⊂ n . [sent-191, score-0.299]

32 All other great spheres become spheres intersecting S n−1 ⊂ n orthogonally. [sent-192, score-0.369]

33 Hence, any data which is separable by these classes of hypersurfaces in n is separable by great spheres on Sn and vice versa. [sent-193, score-0.374]

34 sn   , H ENRICH AND O BERMAYER 2 ¡ plane of projection ( ) north pole N of the sphere image under projection point on the sphere Figure 3: Illustration of the gnomonic projection for the case S 2 ⊃ {x3 > 0} → 2 . [sent-207, score-1.272]

35 Their intersection points with the sphere are mapped to the corresponding intersection points with the plane. [sent-210, score-0.284]

36 The gnomonic projection distorts lengths as well as angles, but maps great circles to straight lines. [sent-211, score-0.315]

37 As was the case with the stereographic chart, gnomonic projection is an isometry from the upper half-sphere to ( n , g). [sent-217, score-0.456]

38 Products of Spheres We now extend the simplex algorithm (see Deﬁnition 2) to other data manifolds and other sets of classiﬁers using a simple product construction. [sent-235, score-0.58]

39 × (Snk , gk ), (4) where each factor (Sn j , g j ) is a unit sphere of dimension n j with its standard metric g j . [sent-240, score-0.307]

40 This leads to the extended spherical simplex algorithm: 113 H ENRICH AND O BERMAYER Deﬁnition 6 (Extended simplex algorithm) 1. [sent-260, score-1.137]

41 For each factor S n j , create an initial simplex using a random orthogonal matrix whose columns are interpreted as a basis for n j +1 . [sent-262, score-0.443]

42 Find one of the edges of maximal length of each factor of the current simplex product S. [sent-265, score-0.695]

43 Replace the old simplex product S by the product of the parts in direction of u j . [sent-277, score-0.559]

44 Proposition 7 If S is the current product simplex from the extended simplex algorithm (see Deﬁnition 6) with maximal edge lengths d1 , . [sent-288, score-1.148]

45 The complexity estimate for one iteration of the extended simplex algorithm is O((n1 + 1)3 + . [sent-294, score-0.443]

46 This estimate can be deduced directly from the complexity analysis of the simplex algorithm given in Appendix A. [sent-302, score-0.443]

47 For each factor in Equation 4, we may choose one of stereographic or gnomonic projection. [sent-305, score-0.342]

48 At each step of the extended simplex algorithm, the version space is a hypercube of equal edge length l. [sent-345, score-0.572]

49 The Riemannian volume form belonging to the metric g on M induces a volume form ω on Sn which, in general, is not equal to the spherical volume ω. [sent-354, score-0.542]

50 c∈S Therefore, the simplex algorithm (see Deﬁnition 2) will still converge, as it reduces an upper bound of the generalization error. [sent-361, score-0.543]

51 Nevertheless, we can force a simple upper bound by assuming the deviation of the induced volume form from spherical volume to be small: sup |1 − f (x)| < ε. [sent-363, score-0.445]

52 = πVol(Sn ) ≤ Inserting the above upper bound into proposition 3 we obtain Proposition 9 Let S be the current simplex from the simplex algorithm (see Deﬁnition 2) with vertices v1 , . [sent-371, score-1.077]

53 If c∗ ∈ Sn denotes the unknown true classiﬁer, the generalization error of c is bounded by d G (c, c∗ ) ≤ (1 + ε)Vol(Sn ) πVol(Sn ) max d(vi , v j ), i, j where d(vi , v j ) denotes the spherical distance of the vertices. [sent-375, score-0.354]

54 The usefulness of the above proposition depends on how much the scaled volume form f ω deviates from the canonical spherical volume ω. [sent-376, score-0.55]

55 The generalization distance is given by Z f dx = D(c1 ,c2 ) {s∈Sn−1 | Fs ⊂D(c1 ,c2 )} where dx, dr, ds denote the canonical volume forms on different ﬁbers Fs have different mass in the sense that ¡ Sn−1 → Z Z , s→ Z ¡ d G (c1 , c2 ) = n, f (s, . [sent-392, score-0.304]

56 Consequently, all results derived for the spherical simplex algorithm (see Deﬁnition 2) apply, including the hard upper bounds on the generalization error. [sent-402, score-0.834]

57 118 ACTIVE L EARNING BY S PHERICAL S UBDIVISION Proposition 11 If S is the current simplex from the simplex algorithm (see Deﬁnition 2) with vertices v1 , . [sent-405, score-0.953]

58 Let us denote this center by p∗ ∈ V ⊂ Sn−1 , where V ⊂ Sn−1 is the version space (a spherical polytope, see Section 3) on the hypersphere. [sent-416, score-0.303]

59 Motivated by these insights, they propose the following pool-based strategy for selecting an unlabeled data point x ∈ Sn−1 to be labeled: Choose x such that the (spherical) distance from the (n − 2)-dimensional great sphere X := {s ∈ Sn−1 | x, s = 0} to p∗ is minimal. [sent-418, score-0.304]

60 After the data point x is labeled, the version space will be cut into two pieces along the great sphere X ⊂ Sn−1 . [sent-419, score-0.304]

61 The goal of this strategy is to reduce the volume of the spherical polytope V as quickly as possible. [sent-420, score-0.37]

62 Up to now, a closed formula for the volume of a n-spherical simplex is not known, not to speak of polytopes, hence, there is no way of computing the exact volume of a version space on a hypersphere. [sent-423, score-0.601]

63 The SVM algorithm works with the Euclidean scalar product on n , and therefore implicitly assumes the canonical Riemannian metric and volume form on the unit sphere. [sent-427, score-0.311]

64 Experimental Results The following results were obtained from a C++ implementation of the simplex algorithm (see Deﬁnition 2). [sent-430, score-0.443]

65 The numerically most sensitive operation of the algorithm is the computation of a normal vector u ∈ Sn to the hyperplane H ⊂ n+1 whose intersection I = H ∩ Sn with Sn cuts the current simplex S into two pieces. [sent-431, score-0.478]

66 Select the basis of H given by the midpoint of the longest edge and all vertices of the simplex excluding the end points of the longest edge. [sent-435, score-0.71]

67 In order to test the performance of the simplex algorithm on spheres of different dimensions a series of numerical experiments was conducted, where the following quantities were measured. [sent-441, score-0.602]

68 , vn+1 ) denote the vertices of the current simplex S of the simplex algorithm, the maximal edge length max d(v1 , v j ) i, j is an upper bound on the generalization error, regardless which point of the simplex is chosen as the learned classiﬁer. [sent-445, score-1.702]

69 i This yields a tight upper bound on the generalization error if we choose the center of mass as the learned classiﬁer. [sent-448, score-0.286]

70 Test data were sampled (i) from a uniform density on the sphere and (ii) from an aspherical density with two distinct “clusters” at opposite poles. [sent-450, score-0.426]

71 The aspherical density was constructed by mapping the uniform distribution from an open parameter cuboid onto the sphere using n-spherical coordinates. [sent-451, score-0.34]

72 Figure 4 illustrates the relation between a sample drawn from this density on the sphere S 2 and its stereographic projection onto the plane 2 . [sent-452, score-0.598]

73 Any density with two peaks at p 1 , p2 ∈ n can be identiﬁed with a density on Sn with peaks at opposite poles: Firstly, we apply a translation to move the midpoint of the line p1 p2 to the origin. [sent-454, score-0.296]

74 Now inverse stereographic projection will map the peaks onto opposite poles. [sent-463, score-0.363]

75 ¡ The above quantities were computed for each step of the simplex algorithm. [sent-469, score-0.443]

76 The simplex algorithm is initialized with an equilateral simplex. [sent-486, score-0.5]

77 During the ﬁrst learning steps, the center of mass moves towards those vertices whose adjacent edges are cut already. [sent-487, score-0.293]

78 The simplex becomes a “thin pyramid” with small base, and the following drop in the plots then corresponds to a cut of a line connecting the apex to the base. [sent-489, score-0.482]

79 For 122 ACTIVE L EARNING BY S PHERICAL S UBDIVISION If the data is drawn from the spherical distribution, the approximate generalization error changes smoothly with the number of selected training data, and its variance is very small. [sent-494, score-0.315]

80 Nevertheless, the numerical experiments show that the spherical simplex algorithm performs well even in the case of non-uniform densities. [sent-496, score-0.694]

81 So far, we restricted the experiments to single spheres instead of products, because the simulation of the extended simplex algorithm on M = S n1 × . [sent-498, score-0.602]

82 We implemented the extended simplex algorithm on the n-torus. [sent-518, score-0.443]

83 sint ¡ ¡ Using this mapping, we can visualize the iterations of the extended simplex algorithm on von Mises distributed data. [sent-521, score-0.494]

84 Figure 9 depicts the stereographic projection of a sample drawn from the von Mises distribution together with the projected classiﬁer in 2 . [sent-523, score-0.368]

85 The resulting values were averaged over 1,000 ¡ 123 H ENRICH AND O BERMAYER Figure 8: The extended simplex algorithm on the 2-torus. [sent-528, score-0.443]

86 The meaning of the nested regions is the following (light to dark): positively classiﬁed area of the true classiﬁer, version space after initialization (step one of the extended simplex algorithm), version space after ﬁrst iteration, version space after second iteration. [sent-531, score-0.472]

87 Figure 9: The image of a data sample from two superposed von Mises distributions on the twodimensional torus under the stereographic projection (deﬁned in Section 4) to 2 . [sent-532, score-0.425]

88 Therefore, the initialization of the extended simplex algorithm yields a classiﬁer whose error is by far smaller than the average generalization error of its spherical counterpart. [sent-544, score-0.758]

89 The algorithm was ﬁrst formulated for the generic case of a binary classiﬁcation problem, where data lies on a n-dimensional hypersphere and where both classes are separable using (n − 1)dimensional great spheres as classiﬁers. [sent-551, score-0.343]

90 The purpose of this appendix is to give a more detailed analysis of the complexity of the simplex algorithm (see Deﬁnition 2) as well as a proof of Proposition 4. [sent-569, score-0.443]

91 The edge lengths d(vi , v j ) := arccos( vi , v j ), between vertices vi , v j of the simplex must be computed in order to determine which edge is to be cut next. [sent-572, score-0.856]

92 After step 2, the current simplex is cut by a plane through the midpoint m of some edge (a, b). [sent-574, score-0.651]

93 Since the computational complexity of the other steps is negligible we obtain O((n + 1)3 ) as a rough complexity estimate for one iteration of the simplex algorithm (see Deﬁnition 2). [sent-582, score-0.443]

94 The computation of the center of mass in step seven amounts to adding up all vertex vectors of the current simplex and normalizing their sum to length one. [sent-586, score-0.66]

95 ¡ 126 ACTIVE L EARNING BY S PHERICAL S UBDIVISION We now restate and prove Proposition 4 from Section 3: Proposition 12 Let S be the initial equilateral simplex from the simplex algorithm (see Deﬁnition 2). [sent-588, score-0.943]

96 Since the degree of each vertex is n, each vertex of the initial simplex is end point of an edge of full length. [sent-593, score-0.611]

97 Now the spherical law of cosines tells us that all edges not 2 adjacent to e have length π . [sent-603, score-0.368]

98 It can be covered by two charts, stereographic projection from the north and south pole. [sent-623, score-0.324]

99 ¡ ¡ ¡ ¡ 127 H ENRICH AND O BERMAYER c 2 3 e b a 1 Figure 11: The ﬁgure shows a subdivision of a spherical simplex on S 3 ⊂ 4 under stereographic projection (see Section 4). [sent-628, score-1.011]

100 We use this equality in Section 4 to compute the metric in stereographic and gnomonic coordinates. [sent-636, score-0.396]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('simplex', 0.443), ('sn', 0.341), ('spherical', 0.251), ('sphere', 0.214), ('stereographic', 0.205), ('riemannian', 0.19), ('vol', 0.184), ('spheres', 0.159), ('bermayer', 0.148), ('enrich', 0.148), ('gnomonic', 0.137), ('pherical', 0.137), ('ubdivision', 0.137), ('active', 0.124), ('isometries', 0.114), ('charts', 0.091), ('hemispheres', 0.091), ('torus', 0.091), ('proposition', 0.088), ('dxn', 0.08), ('hemisphere', 0.08), ('subsimplex', 0.08), ('manifolds', 0.079), ('volume', 0.079), ('projection', 0.078), ('edge', 0.078), ('mises', 0.078), ('maximal', 0.077), ('densities', 0.076), ('arccos', 0.069), ('mass', 0.069), ('aspherical', 0.068), ('hypersphere', 0.068), ('vertices', 0.067), ('edges', 0.066), ('separable', 0.065), ('tight', 0.065), ('classi', 0.064), ('generalization', 0.064), ('ers', 0.06), ('density', 0.058), ('product', 0.058), ('equilateral', 0.057), ('metric', 0.054), ('products', 0.053), ('canonical', 0.053), ('peaks', 0.052), ('center', 0.052), ('length', 0.051), ('great', 0.051), ('von', 0.051), ('vi', 0.051), ('lengths', 0.049), ('fs', 0.048), ('chart', 0.048), ('midpoint', 0.048), ('pole', 0.048), ('ber', 0.046), ('subdivisions', 0.046), ('earning', 0.045), ('vertex', 0.045), ('plane', 0.043), ('er', 0.043), ('disagreement', 0.041), ('geodesic', 0.041), ('north', 0.041), ('bounds', 0.04), ('attainable', 0.04), ('polytope', 0.04), ('cut', 0.039), ('tp', 0.039), ('distance', 0.039), ('unit', 0.039), ('constructive', 0.038), ('tong', 0.037), ('longest', 0.037), ('upper', 0.036), ('intersection', 0.035), ('henrich', 0.034), ('hypersurfaces', 0.034), ('subdivide', 0.034), ('subdivision', 0.034), ('volumina', 0.034), ('projected', 0.034), ('labeled', 0.033), ('det', 0.033), ('tangent', 0.033), ('manifold', 0.033), ('curves', 0.033), ('nk', 0.032), ('smooth', 0.032), ('dr', 0.032), ('euclidean', 0.032), ('balcan', 0.029), ('positively', 0.029), ('freund', 0.029), ('xn', 0.028), ('scalar', 0.028), ('opposite', 0.028), ('query', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 9 jmlr-2008-Active Learning by Spherical Subdivision

Author: Falk-Florian Henrich, Klaus Obermayer

2 0.11638141 69 jmlr-2008-Non-Parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive computationally efﬁcient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. A bias-variance analysis and an experimental study demonstrate the applicability of the proposed method. Keywords: ranked data, partially ordered sets, kernel smoothing

3 0.06474939 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine

Author: Ja-Yong Koo, Yoonkyung Lee, Yuwon Kim, Changyi Park

Abstract: The support vector machine has been successful in a variety of applications. Also on the theoretical front, statistical properties of the support vector machine have been studied quite extensively with a particular attention to its Bayes risk consistency under some conditions. In this paper, we study somewhat basic statistical properties of the support vector machine yet to be investigated, namely the asymptotic behavior of the coefﬁcients of the linear support vector machine. A Bahadur type representation of the coefﬁcients is established under appropriate conditions, and their asymptotic normality and statistical variability are derived on the basis of the representation. These asymptotic results do not only help further our understanding of the support vector machine, but also they can be useful for related statistical inferences. Keywords: asymptotic normality, Bahadur representation, classiﬁcation, convexity lemma, Radon transform

4 0.059107691 75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis

Author: Alexandre d'Aspremont, Francis Bach, Laurent El Ghaoui

Abstract: Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefﬁcients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semideﬁnite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefﬁcients, with total complexity O(n3 ), where n is the number of variables. We then use the same relaxation to derive sufﬁcient conditions for global optimality of a solution, which can be tested in O(n3 ) per pattern. We discuss applications in subset selection and sparse recovery and show on artiﬁcial examples and biological data that our algorithm does provide globally optimal solutions in many cases. Keywords: PCA, subset selection, sparse eigenvalues, sparse recovery, lasso

5 0.056628909 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

Author: Xianchao Xie, Zhi Geng

Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is ﬁrst decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efﬁciency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning

6 0.055234812 52 jmlr-2008-Learning from Multiple Sources

7 0.053337272 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds

8 0.052104395 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

9 0.050806522 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs (Special Topic on Causality)

10 0.050644491 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

11 0.049157292 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers

12 0.043634217 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

13 0.043520637 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces

14 0.039257489 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

15 0.037978396 7 jmlr-2008-A Tutorial on Conformal Prediction

16 0.037112687 96 jmlr-2008-Visualizing Data using t-SNE

17 0.036244899 57 jmlr-2008-Manifold Learning: The Price of Normalization

18 0.035772499 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

19 0.035612646 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

20 0.035004381 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.197), (1, 0.022), (2, -0.017), (3, -0.059), (4, 0.034), (5, -0.057), (6, 0.082), (7, -0.044), (8, 0.007), (9, -0.119), (10, 0.013), (11, -0.016), (12, -0.101), (13, 0.126), (14, 0.172), (15, 0.133), (16, 0.054), (17, 0.016), (18, 0.063), (19, 0.105), (20, -0.037), (21, -0.175), (22, -0.259), (23, -0.204), (24, -0.022), (25, -0.05), (26, 0.173), (27, -0.095), (28, 0.103), (29, -0.372), (30, 0.165), (31, -0.015), (32, -0.046), (33, 0.118), (34, -0.013), (35, 0.02), (36, 0.018), (37, 0.051), (38, 0.031), (39, -0.137), (40, 0.002), (41, -0.017), (42, -0.113), (43, 0.137), (44, -0.009), (45, 0.055), (46, -0.023), (47, 0.072), (48, 0.075), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94175678 9 jmlr-2008-Active Learning by Spherical Subdivision

Author: Falk-Florian Henrich, Klaus Obermayer

2 0.71938974 69 jmlr-2008-Non-Parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

3 0.305181 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine

Author: Ja-Yong Koo, Yoonkyung Lee, Yuwon Kim, Changyi Park

4 0.27164707 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

Author: Vojtěch Franc, Bogdan Savchynskyy

Abstract: The max-sum classiﬁer predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions deﬁned over neighbouring pairs of labels and observable variables. Predicting labels as MAP assignments of a Random Markov Field is a particular example of the max-sum classiﬁer. Learning parameters of the max-sum classiﬁer is a challenging problem because even computing the response of such classiﬁer is NP-complete in general. Estimating parameters using the Maximum Likelihood approach is feasible only for a subclass of max-sum classiﬁers with an acyclic structure of neighbouring pairs. Recently, the discriminative methods represented by the perceptron and the Support Vector Machines, originally designed for binary linear classiﬁers, have been extended for learning some subclasses of the max-sum classiﬁer. Besides the max-sum classiﬁers with the acyclic neighbouring structure, it has been shown that the discriminative learning is possible even with arbitrary neighbouring structure provided the quality functions fulﬁll some additional constraints. In this article, we extend the discriminative approach to other three classes of max-sum classiﬁers with an arbitrary neighbourhood structure. We derive learning algorithms for two subclasses of max-sum classiﬁers whose response can be computed in polynomial time: (i) the max-sum classiﬁers with supermodular quality functions and (ii) the max-sum classiﬁers whose response can be computed exactly by a linear programming relaxation. Moreover, we show that the learning problem can be approximately solved even for a general max-sum classiﬁer. Keywords: max-xum classiﬁer, hidden Markov networks, support vector machines

5 0.24013366 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

Author: Andreas Maurer

Abstract: A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classiﬁers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected by a simple technique of stochastic gradient descent. The resulting representations are tested on transfer problems in image processing, involving plane and spatial geometric invariants, handwritten characters and face recognition. Keywords: learning similarity, similarity, transfer learning

6 0.2257742 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers

7 0.21263893 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

8 0.2121373 52 jmlr-2008-Learning from Multiple Sources

9 0.2113533 75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis

10 0.20309147 96 jmlr-2008-Visualizing Data using t-SNE

11 0.19062395 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

12 0.18746699 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers

13 0.18643683 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting

14 0.18318684 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds

15 0.17730644 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

16 0.17699923 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs (Special Topic on Causality)

17 0.17607245 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

18 0.17534012 64 jmlr-2008-Model Selection in Kernel Based Regression using the Influence Function (Special Topic on Model Selection)

19 0.17139609 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

20 0.16294751 47 jmlr-2008-Learning Balls of Strings from Edit Corrections

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (5, 0.015), (40, 0.041), (54, 0.047), (58, 0.045), (60, 0.014), (61, 0.022), (66, 0.06), (76, 0.028), (78, 0.024), (88, 0.103), (89, 0.373), (92, 0.044), (94, 0.055), (99, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77560037 19 jmlr-2008-Bouligand Derivatives and Robustness of Support Vector Machines for Regression

Author: Andreas Christmann, Arnout Van Messem

Abstract: We investigate robustness properties for a broad class of support vector machines with non-smooth loss functions. These kernel methods are inspired by convex risk minimization in inﬁnite dimensional Hilbert spaces. Leading examples are the support vector machine based on the ε-insensitive loss function, and kernel based quantile regression based on the pinball loss function. Firstly, we propose with the Bouligand inﬂuence function (BIF) a modiﬁcation of F.R. Hampel’s inﬂuence function. The BIF has the advantage of being positive homogeneous which is in general not true for Hampel’s inﬂuence function. Secondly, we show that many support vector machines based on a Lipschitz continuous loss function and a bounded kernel have a bounded BIF and are thus robust in the sense of robust statistics based on inﬂuence functions. Keywords: Bouligand derivatives, empirical risk minimization, inﬂuence function, robustness, support vector machines

same-paper 2 0.75018078 9 jmlr-2008-Active Learning by Spherical Subdivision

Author: Falk-Florian Henrich, Klaus Obermayer

3 0.37295395 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

Author: Hsuan-Tien Lin, Ling Li

Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a ﬁnite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classiﬁer with a larger or even an inﬁnite number of hypotheses. In addition, constructing an inﬁnite ensemble itself is a challenging task. In this paper, we formulate an inﬁnite ensemble learning framework based on the support vector machine (SVM). The framework can output an inﬁnite and nonsparse ensemble through embedding inﬁnitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies inﬁnitely many decision stumps, and the perceptron kernel embodies inﬁnitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies inﬁnitely many decision trees, and can thus be explained through inﬁnite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the beneﬁt of faster parameter selection. These properties make the novel kernels favorable choices in practice. Keywords: ensemble learning, boosting, support vector machine, kernel

4 0.3697482 57 jmlr-2008-Manifold Learning: The Price of Normalization

Author: Yair Goldberg, Alon Zakai, Dan Kushnir, Ya'acov Ritov

Abstract: We analyze the performance of a class of manifold-learning algorithms that ﬁnd their output by minimizing a quadratic form under some normalization constraints. This class consists of Locally Linear Embedding (LLE), Laplacian Eigenmap, Local Tangent Space Alignment (LTSA), Hessian Eigenmaps (HLLE), and Diffusion maps. We present and prove conditions on the manifold that are necessary for the success of the algorithms. Both the ﬁnite sample case and the limit case are analyzed. We show that there are simple manifolds in which the necessary conditions are violated, and hence the algorithms cannot recover the underlying manifolds. Finally, we present numerical results that demonstrate our claims. Keywords: dimensionality reduction, manifold learning, Laplacian eigenmap, diffusion maps, locally linear embedding, local tangent space alignment, Hessian eigenmap

5 0.36712962 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

Author: Matthias W. Seeger

Abstract: The linear model with sparsity-favouring prior on the coefﬁcients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identiﬁcation from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in signiﬁcant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identiﬁcation application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing

6 0.36708498 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

7 0.36582553 96 jmlr-2008-Visualizing Data using t-SNE

8 0.36372373 58 jmlr-2008-Max-margin Classification of Data with Absent Features

9 0.36056554 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

10 0.36025834 86 jmlr-2008-SimpleMKL

11 0.35947791 56 jmlr-2008-Magic Moments for Structured Output Prediction

12 0.35936108 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

13 0.35860312 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)

14 0.35801506 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

15 0.35694525 83 jmlr-2008-Robust Submodular Observation Selection

16 0.35672808 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

17 0.35510984 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

18 0.35383058 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss

19 0.35313877 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers

20 0.35275236 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers