nips nips2003 nips2003-81 knowledge-graph by maker-knowledge-mining

81 nips-2003-Geometric Analysis of Constrained Curves

Source: pdf

Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen

Abstract: We present a geometric approach to statistical shape analysis of closed curves in images. The basic idea is to specify a space of closed curves satisfying given constraints, and exploit the differential geometry of this space to solve optimization and inference problems. We demonstrate this approach by: (i) deﬁning and computing statistics of observed shapes, (ii) deﬁning and learning a parametric probability model on shape space, and (iii) designing a binary hypothesis test on this space. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a geometric approach to statistical shape analysis of closed curves in images. [sent-9, score-0.833]

2 The basic idea is to specify a space of closed curves satisfying given constraints, and exploit the differential geometry of this space to solve optimization and inference problems. [sent-10, score-0.604]

3 We demonstrate this approach by: (i) deﬁning and computing statistics of observed shapes, (ii) deﬁning and learning a parametric probability model on shape space, and (iii) designing a binary hypothesis test on this space. [sent-11, score-0.633]

4 On the other hand, planar curves that represent contours of objects have been studied independently for a long time. [sent-15, score-0.47]

5 An emerging opinion in the vision community is that global features such as shapes of contours should also be taken into account for the successful detection and recognition of objects. [sent-16, score-0.607]

6 A common approach to analyzing curves in images is to treat them as level sets of functions, and algorithms involving such active contours are governed usually by partial differential equations (PDEs) driven by appropriate data terms and smoothness penalties (see for example [10]). [sent-17, score-0.414]

7 Regularized curve evolutions and region-based active contours offer alternatives in similar frameworks. [sent-18, score-0.285]

8 In this approach, a fundamental element is a space of closed curves, with additional constraints to impose equivalence of shapes under rotation, translation, and scale. [sent-21, score-0.577]

9 We exploit the geometry of these spaces using elements such as tangents, normals, geodesics and gradient ﬂows, to solve optimization and statistical inference problems for a variety of cost functions and probability densities. [sent-22, score-0.493]

10 This framework differs from those employed in previous works on “geometry-driven ﬂows” [8] in the sense that here both the geometry of the curves and the geometry of spaces of curves are utilized. [sent-23, score-0.623]

11 Here the dynamics of active contours is described by vector ﬁelds on spaces of curves. [sent-24, score-0.251]

12 It is important to emphasize that a shape space is usually a non-linear, inﬁnite-dimensional manifold, and its elements are the individual curves of interest. [sent-25, score-0.73]

13 Several interesting applications can be addressed in this formulation, including: 1) Efﬁcient deformations between any two curves are generated by geodesic paths connecting the elements they represent in the shape space. [sent-26, score-1.125]

14 Geodesic lengths also provide a natural metric for shape comparisons. [sent-27, score-0.493]

15 2) Given a set of curves (or shapes), one can deﬁne the concepts of mean and covariance using geodesic paths, and thus develop statistical frameworks for studying shapes. [sent-28, score-0.687]

16 Furthermore, one can deﬁne probabilities on a shape space to perform curve (or shape) classiﬁcation via hypothesis testing. [sent-29, score-0.673]

17 The study of the structure of the shape space provides new insights and solutions to problems involving dynamic contours and problems in quantitative shape analysis. [sent-32, score-1.21]

18 Once the constraints are imposed in deﬁnitions of shape spaces, the resulting solutions automatically satisfy these constraints. [sent-33, score-0.524]

19 The main strength of this approach is its exploitation of the differential geometry of the shape space. [sent-35, score-0.709]

20 For instance, a geodesic or gradient ﬂow Xt of an energy function E can be generated as a solution of an ordinary differential equation of the type dXt = Π( E(Xt )) , dt (1) where Π denotes an appropriate projection onto a tangent space. [sent-36, score-0.665]

21 The geometry of shape space also enables us to derive statistical elements: probability measures, means and covariances; these quantities have rarely been treated in previous studies. [sent-38, score-0.76]

22 In shape extraction, the main focus in past works has been on solving PDEs driven by image features under smoothness constraints, and not on the statistical analysis of shapes of curves. [sent-39, score-1.125]

23 The use of geodesic paths, or piecewise geodesic paths, has also seen limited use in the past. [sent-40, score-0.666]

24 One drawback is that curve evolutions can not handle certain changes in topology, which is one of the key features of level-set methods; a shape space is purposely setup to not allow curves to branch into several components. [sent-42, score-0.811]

25 Despite these limitations, the proposed methodology provides powerful algorithms for the analysis of planar curves as demonstrated by the examples presented later. [sent-44, score-0.293]

26 This paper is laid out as follows: Section 2 studies geometric representations of constrained curves as elements of a shape space. [sent-46, score-0.739]

27 Geometric analysis tools on the shape space are presented in Section 3. [sent-47, score-0.634]

28 Section 4 provides examples of statistical analysis on the shape space, while Section 5 concludes the paper with a brief summary. [sent-48, score-0.577]

29 2 Representations of Shapes In this paper we restrict the discussion to curves in R2 although curves in R3 can be handled similarly. [sent-49, score-0.329]

30 Consider the problem of studying shapes of contours or silhouettes of imaged objects as closed, planar curves in R2 , parametrized by arc length. [sent-56, score-1.002]

31 Since shapes are invariant to rigid motions (rotations and translations) and uniform scaling, a shape representation should be insensitive to these transformations. [sent-57, score-0.898]

32 Scaling can be resolved by ﬁxing the length of α to be 2π, and translations by representing curves via their direction functions. [sent-58, score-0.27]

33 Thus, we represent curves by direction functions satisfying the average-π and closure conditions; we call this space of direction functions D. [sent-62, score-0.467]

34 Summarizing, D is the subspace of L2 consisting of all (direction) functions satisfying the constraints 1 2π 2π 2π θ(s) ds = π ; 0 2π cos(θ(s)) ds = 0 ; 0 sin(θ(s)) ds = 0 . [sent-63, score-0.332]

35 To remove the variability due to this re-parametrization group, deﬁne the quotient space C ≡ D/S 1 as the space of continuous, planar shapes. [sent-68, score-0.263]

36 3 Geometric Tools for Shape Analysis The main idea in the proposed framework is to use the geometric structure of a shape space to solve optimization and statistical inference problems on these spaces. [sent-70, score-0.723]

37 Thus, we must study issues related to the differential geometry and topology of a shape space. [sent-72, score-0.713]

38 In this paper we restrict to the tangent and normal bundles, exponential maps, and their inverses on these spaces. [sent-73, score-0.364]

39 1 Tangents and Normals to Shape Space The main reason for studying the tangential and normal structures is the following: We wish to employ iterative numerical methods in the simulation of geodesic and gradient ﬂows on the shape space. [sent-75, score-0.946]

40 At each step in the iteration, we ﬁrst ﬂow in the linear space L 2 using standard methods, and then project the new point back onto the shape space using our knowledge of the normal structure. [sent-76, score-0.694]

41 It is difﬁcult to specify the tangent spaces to D directly, because they are inﬁnite-dimensional. [sent-78, score-0.316]

42 It can be shown that a vector f ∈ L2 is tangent to D at θ if and only if f is orthogonal to the subspace spanned by {1, sin θ, cos θ}. [sent-80, score-0.386]

43 Implicitly, the tangent space is given as: Tθ (D) = {f ∈ L2 |f ⊥ span{1, cos θ, sin θ}} . [sent-82, score-0.375]

44 Geodesics on D are realized as exponential maps from tangent spaces to D. [sent-87, score-0.355]

45 Therefore, we adopt an iterative strategy, where in each step, we ﬁrst ﬂow inﬁnitesimally in the prescribed tangent direction in the space L2 , and then project the end point of the path to D. [sent-89, score-0.494]

46 Next, we parallel transport the velocity vector to the new point by projecting the previous velocity orthogonally onto the tangent space of D at the new point. [sent-90, score-0.407]

47 A geodesic can be speciﬁed by an initial condition θ ∈ D and a direction f ∈ T θ (D), the space of all tangent directions at θ. [sent-96, score-0.714]

48 We will denote the corresponding geodesic by Ψ(θ, t, f ), where t is the time parameter. [sent-97, score-0.333]

49 3 Shape Logarithms Next, we focus on the problem of ﬁnding a geodesic path between any two given shapes θ1 , θ2 ∈ D. [sent-101, score-0.771]

50 The main issue is to ﬁnd that appropriate direction f ∈ Tθ1 (D) such that a geodesic from θ1 in that direction passes through θ2 at time t = 1. [sent-103, score-0.58]

51 One can treat the search for this direction as an optimization problem over the tangent space Tθ1 (D). [sent-105, score-0.415]

52 For any two shapes θ1 , θ2 ∈ D, we have used a shooting method to ﬁnd the optimal f [4]. [sent-109, score-0.405]

53 Finally, to ﬁnd the shortest path between two shapes in C, we compute the shortest geodesic connecting representatives of the given shapes in D. [sent-111, score-1.268]

54 Shown in Figure 1 are three examples of geodesic paths in C connecting given shapes. [sent-113, score-0.451]

55 Drawn in between are shapes corresponding to equally spaced points along the geodesic paths. [sent-114, score-0.738]

56 5 0 2 4 6 8 10 10 12 14 16 18 20 2 1 0 0 2 4 6 8 0 2 4 6 8 12 14 16 18 20 2 1 0 10 12 14 16 18 20 Figure 1: Top panels show examples of shapes manually extracted from the images. [sent-120, score-0.512]

57 Bottom panels show examples of evolving one shape into another via a geodesic path. [sent-121, score-0.966]

58 In each case, the leftmost shape is θ1 , rightmost curves are θ2 , and intermediate shapes are equi-spaced points along the geodesic. [sent-122, score-1.043]

59 1 Sample Means on Shape Spaces Algorithms for ﬁnding geodesic paths on the shape space allow us to compute means and covariances in these spaces. [sent-124, score-1.01]

60 Then, deﬁne the Karcher mean of the given shapes to be any point µ ∈ C for which V (µ) is a local minimum. [sent-131, score-0.441]

61 However, there may be collections of shapes for which µ is not unique. [sent-134, score-0.405]

62 An iterative algorithm for ﬁnding a Karcher mean of given shapes is given in [4] and see [4] for details. [sent-135, score-0.441]

63 2 Shape Learning Another important problem in statistical analysis of shapes is to “learn” probability models from the observed shapes. [sent-137, score-0.528]

64 Once the shapes are clustered, we assume that elements in the same cluster are (random) samples from the same probability model, and try to learn this model. [sent-138, score-0.516]

65 These models can then be used for future Bayesian discoveries of shapes or for classiﬁcation of new shapes. [sent-139, score-0.405]

66 To learn a probability model amounts to estimating a probability density function on the shape space, a task that is rather difﬁcult to perform precisely. [sent-140, score-0.604]

67 Tangent Space: Since C is a nonlinear manifold, we impose a probability density on a tangent space rather than on C directly. [sent-143, score-0.4]

68 For a mean shape µ ∈ C, the space of all tangents to the shape space at µ, Tµ (C) ⊂ L2 , is an inﬁnite-dimensional vector space. [sent-144, score-1.21]

69 We approximate a tangent function by a truncated Fourier series to obtain a ﬁnite-dimensional representation. [sent-149, score-0.233]

70 Let a tangent element g ∈ Tµ (C) be represented by its Fourier expansion: g(s) = m m i=1 xi ei (s), for a large positive integer m. [sent-151, score-0.29]

71 Estimation of µ and K from the observed shapes follows the usual procedures. [sent-158, score-0.435]

72 Computation of the mean shape µ is described in [4]. [sent-159, score-0.529]

73 Using µ and any observed shapes θ j , we ﬁnd the tangent vectors gj ∈ Tµ (C) such that the geodesic from µ in the direction gj passes through θj in unit time. [sent-160, score-1.168]

74 This tangent vector is actually computed via its ﬁnite-dimensional representation and results in the corresponding vector of coefﬁcients xj . [sent-161, score-0.233]

75 The density function associated with this family of shapes is given by: h(θ; µ, K) ≡ 1 T exp(−(x − µ) K −1 (x − µ)/2) , (2π)m/2 det(K)1/2 where Ψ(µ, g, 1) = θ and g = m1 i=1 (3) (xi − µi )ei (s). [sent-164, score-0.438]

76 An example of this shape learning is shown in Figure 2. [sent-165, score-0.493]

77 The top panels show infrared pictures of tanks, followed by their extracted contours in the second row of images. [sent-166, score-0.319]

78 These contours are then used in analyzing shapes of tanks. [sent-167, score-0.573]

79 As an example, the 12 panels in bottom left show the observed contours of a tank when viewed from a variety of angles, and we are interested in capturing this shape variation. [sent-168, score-0.797]

80 Repeating earlier process, the mean shape is shown in the top middle panel and the eigen values are plotted in the bottom middle panel. [sent-169, score-0.63]

81 Twelve panels on the right show shape generated randomly from a parametric model h(θ; µ, Σ). [sent-170, score-0.628]

82 In Figure 3 we present an interesting example of samples from three different shape models. [sent-171, score-0.529]

83 5 0 2 4 6 8 10 12 Figure 2: Top two rows: Infrared images and extracted contours of two tanks M60 and T72 at different viewing angles. [sent-184, score-0.218]

84 Bottom row: For the 12 observed M60 shapes shown in left, the middle panels show the mean shape and the principal eigenvalues of covariance, and the right panels show 12 random samples from Gaussian model h(θ; µ, K). [sent-185, score-1.251]

85 3 Hypothesis Testing This framework of shape representations and statistical models on shape spaces has important applications in decision theory. [sent-190, score-1.116]

86 One is to recognize an imaged object according to the shape of its boundary. [sent-191, score-0.549]

87 Statistical analysis on shape spaces can be used to make a variety of decisions such as: Does this shape belong to a given family of shapes? [sent-192, score-1.106]

88 Does these two families of shapes have similar means and variances? [sent-193, score-0.448]

89 Given a test shape and two competing probability models, which one explains the test shape better? [sent-194, score-1.025]

90 We restrict to the case of binary hypothesis testing since for multiple hypotheses, one can ﬁnd the best hypothesis using a sequence of binary hypothesis tests. [sent-195, score-0.258]

91 Consider two shape families speciﬁed by their probability models: h1 and h2 . [sent-196, score-0.575]

92 For an observed shape θ ∈ C, we are interested in selecting one of two following hypotheses: H0 : θ ∼ h1 or H1 : θ ∼ h2 . [sent-197, score-0.493]

93 Let x1 be the vector of Fourier coefﬁcients that encode the tangent direction from µ1 to θ, and x2 be the same for direction m m from µ2 to θ. [sent-201, score-0.417]

94 The curved nature of the shape space C makes the analysis of this test difﬁcult. [sent-204, score-0.586]

95 As a ﬁrst order approximation, one can write x2 ∼ N (¯ , Σ1 ), where x is the coefﬁcient vector of tangent x ¯ direction in Tµ2 (C) that corresponds to the geodesic from µ2 to µ1 . [sent-206, score-0.658]

96 5 Conclusion We have presented an overview of an ambitious framework for solving optimization and inference problems on a shape space. [sent-208, score-0.527]

97 The main idea is to exploit the differential geometry of the manifold to obtain simpler solutions as compared to those obtained with PDE-based methods. [sent-209, score-0.289]

98 In particular, these ideas lead to a novel statistical theory of shapes of planar objects with powerful tools for shape analysis. [sent-211, score-1.15]

99 Learning shape models from examples using automatic shape clustering and Procrustes analysis. [sent-225, score-0.986]

100 Analysis of planar shapes using geodesic paths on shape spaces. [sent-237, score-1.424]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('shape', 0.493), ('shapes', 0.405), ('geodesic', 0.333), ('tangent', 0.233), ('contours', 0.168), ('curves', 0.145), ('geometry', 0.125), ('planar', 0.111), ('panels', 0.107), ('geodesics', 0.101), ('karcher', 0.101), ('mio', 0.101), ('tallahassee', 0.101), ('direction', 0.092), ('fourier', 0.086), ('spaces', 0.083), ('paths', 0.082), ('florida', 0.08), ('klassen', 0.076), ('srivastava', 0.076), ('tangents', 0.076), ('hypothesis', 0.073), ('ows', 0.07), ('fl', 0.07), ('subspace', 0.067), ('evolutions', 0.066), ('normals', 0.066), ('geometric', 0.065), ('differential', 0.063), ('ds', 0.061), ('ei', 0.057), ('space', 0.056), ('covariance', 0.056), ('imaged', 0.056), ('coef', 0.053), ('normal', 0.053), ('satisfying', 0.051), ('curve', 0.051), ('anuj', 0.05), ('elastica', 0.05), ('pdes', 0.05), ('tanks', 0.05), ('prescribed', 0.05), ('tools', 0.048), ('mathematics', 0.048), ('past', 0.047), ('statistical', 0.047), ('closed', 0.046), ('objects', 0.046), ('covariances', 0.046), ('manifold', 0.045), ('infrared', 0.044), ('cos', 0.043), ('sin', 0.043), ('families', 0.043), ('velocity', 0.041), ('quotient', 0.04), ('restrict', 0.039), ('probability', 0.039), ('exponential', 0.039), ('studying', 0.039), ('impose', 0.039), ('driven', 0.038), ('det', 0.038), ('textures', 0.037), ('xt', 0.037), ('analysis', 0.037), ('mean', 0.036), ('onto', 0.036), ('samples', 0.036), ('connecting', 0.036), ('elements', 0.036), ('middle', 0.036), ('passes', 0.035), ('rotations', 0.035), ('subtracting', 0.035), ('gj', 0.035), ('ow', 0.034), ('optimization', 0.034), ('vision', 0.034), ('evolving', 0.033), ('translations', 0.033), ('density', 0.033), ('path', 0.033), ('cients', 0.032), ('parametrized', 0.032), ('topology', 0.032), ('principal', 0.031), ('constraints', 0.031), ('closure', 0.031), ('frameworks', 0.031), ('image', 0.03), ('adopt', 0.03), ('usual', 0.03), ('bottom', 0.029), ('shortest', 0.028), ('ej', 0.028), ('exploit', 0.028), ('main', 0.028), ('parametric', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 81 nips-2003-Geometric Analysis of Constrained Curves

Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen

2 0.21720676 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler

Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly ﬁlls-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1

3 0.19157751 53 nips-2003-Discriminating Deformable Shape Classes

Author: Salvador Ruiz-correa, Linda G. Shapiro, Marina Meila, Gabriel Berson

Abstract: We present and empirically test a novel approach for categorizing 3-D free form object shapes represented by range data . In contrast to traditional surface-signature based systems that use alignment to match speciﬁc objects, we adapted the newly introduced symbolic-signature representation to classify deformable shapes [10]. Our approach constructs an abstract description of shape classes using an ensemble of classiﬁers that learn object class parts and their corresponding geometrical relationships from a set of numeric and symbolic descriptors. We used our classiﬁcation engine in a series of large scale discrimination experiments on two well-deﬁned classes that share many common distinctive features. The experimental results suggest that our method outperforms traditional numeric signature-based methodologies. 1 1

4 0.12197485 85 nips-2003-Human and Ideal Observers for Detecting Image Curves

Author: Fang Fang, Daniel Kersten, Paul R. Schrater, Alan L. Yuille

Abstract: This paper compares the ability of human observers to detect target image curves with that of an ideal observer. The target curves are sampled from a generative model which speciﬁes (probabilistically) the geometry and local intensity properties of the curve. The ideal observer performs Bayesian inference on the generative model using MAP estimation. Varying the probability model for the curve geometry enables us investigate whether human performance is best for target curves that obey speciﬁc shape statistics, in particular those observed on natural shapes. Experiments are performed with data on both rectangular and hexagonal lattices. Our results show that human observers’ performance approaches that of the ideal observer and are, in general, closest to the ideal for conditions where the target curve tends to be straight or similar to natural statistics on curves. This suggests a bias of human observers towards straight curves and natural statistics.

5 0.10191377 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors

Author: D. Philipona, J.k. O'regan, J.-p. Nadal, Olivier Coenen

Abstract: Is there a way for an algorithm linked to an unknown body to infer by itself information about this body and the world it is in? Taking the case of space for example, is there a way for this algorithm to realize that its body is in a three dimensional world? Is it possible for this algorithm to discover how to move in a straight line? And more basically: do these questions make any sense at all given that the algorithm only has access to the very high-dimensional data consisting of its sensory inputs and motor outputs? We demonstrate in this article how these questions can be given a positive answer. We show that it is possible to make an algorithm that, by analyzing the law that links its motor outputs to its sensory inputs, discovers information about the structure of the world regardless of the devices constituting the body it is linked to. We present results from simulations demonstrating a way to issue motor orders resulting in “fundamental” movements of the body as regards the structure of the physical world. 1

6 0.098390371 149 nips-2003-Optimal Manifold Representation of Data: An Information Theoretic Approach

7 0.097362712 126 nips-2003-Measure Based Regularization

8 0.07721635 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

9 0.069082543 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

10 0.062570646 42 nips-2003-Bounded Finite State Controllers

11 0.061287202 51 nips-2003-Design of Experiments via Information Theory

12 0.061002243 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

13 0.05989426 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

14 0.057465073 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

15 0.057112392 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

16 0.051773708 115 nips-2003-Linear Dependent Dimensionality Reduction

17 0.05132363 119 nips-2003-Local Phase Coherence and the Perception of Blur

18 0.051110648 73 nips-2003-Feature Selection in Clustering Problems

19 0.050329953 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

20 0.050300859 92 nips-2003-Information Bottleneck for Gaussian Variables

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.2), (1, -0.044), (2, 0.022), (3, -0.035), (4, -0.099), (5, -0.004), (6, 0.044), (7, -0.006), (8, 0.088), (9, -0.1), (10, -0.062), (11, 0.058), (12, 0.16), (13, 0.099), (14, -0.028), (15, -0.045), (16, 0.125), (17, -0.067), (18, 0.177), (19, -0.168), (20, 0.087), (21, -0.025), (22, -0.015), (23, 0.065), (24, -0.035), (25, 0.118), (26, 0.197), (27, 0.016), (28, -0.129), (29, 0.254), (30, -0.121), (31, -0.228), (32, 0.151), (33, -0.113), (34, -0.029), (35, 0.143), (36, -0.043), (37, -0.019), (38, -0.038), (39, -0.149), (40, -0.024), (41, 0.007), (42, 0.073), (43, 0.153), (44, -0.075), (45, 0.036), (46, 0.113), (47, -0.086), (48, 0.006), (49, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97904712 81 nips-2003-Geometric Analysis of Constrained Curves

Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen

2 0.71368206 53 nips-2003-Discriminating Deformable Shape Classes

Author: Salvador Ruiz-correa, Linda G. Shapiro, Marina Meila, Gabriel Berson

3 0.68820238 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler

4 0.5660069 85 nips-2003-Human and Ideal Observers for Detecting Image Curves

Author: Fang Fang, Daniel Kersten, Paul R. Schrater, Alan L. Yuille

5 0.35671443 149 nips-2003-Optimal Manifold Representation of Data: An Information Theoretic Approach

Author: Denis V. Chigirev, William Bialek

Abstract: We introduce an information theoretic method for nonparametric, nonlinear dimensionality reduction, based on the inﬁnite cluster limit of rate distortion theory. By constraining the information available to manifold coordinates, a natural probabilistic map emerges that assigns original data to corresponding points on a lower dimensional manifold. With only the information-distortion trade off as a parameter, our method determines the shape of the manifold, its dimensionality, the probabilistic map and the prior that provide optimal description of the data. 1 A simple example Some data sets may not be as complicated as they appear. Consider the set of points on a plane in Figure 1. As a two dimensional set, it requires a two dimensional density ρ(x, y) for its description. Since the data are sparse the density will be almost singular. We may use a smoothing kernel, but then the data set will be described by a complicated combination of troughs and peaks with no obvious pattern and hence no ability to generalize. We intuitively, however, see a strong one dimensional structure (a curve) underlying the data. In this paper we attempt to capture this intuition formally, through the use of the inﬁnite cluster limit of rate distortion theory. Any set of points can be embedded in a hypersurface of any intrinsic dimensionality if we allow that hypersurface to be highly “folded.” For example, in Figure 1, any curve that goes through all the points gives a one dimensional representation. We would like to avoid such solutions, since they do not help us discover structure in the data. Looking for a simpler description one may choose to penalize the curvature term [1]. The problem with this approach is that it is not easily generalized to multiple dimensions, and requires the dimensionality of the solution as an input. An alternative approach is to allow curves of all shapes and sizes, but to send the reduced coordinates through an information bottleneck. With a ﬁxed number of bits, position along a highly convoluted curve becomes uncertain. This will penalize curves that follow the data too closely (see Figure 1). There are several advantages to this approach. First, it removes the artiﬁciality introduced by Hastie [2] of adding to the cost function only orthogonal errors. If we believe that data points fall out of the manifold due to noise, there is no reason to treat the projection onto the manifold as exact. Second, it does not require the dimension- 9 8 Figure 1: Rate distortion curve for a data set of 25 points (red). We used 1000 points to represent the curve which where initialized by scattering them uniformly on the plane. Note that the produced curve is well deﬁned, one dimensional and smooth. 7 6 5 4 3 2 1 0 2 4 6 8 10 12 ality of the solution manifold as an input. By adding extra dimensions, one quickly looses the precision with which manifold points are speciﬁed (due to the ﬁxed information bottleneck). Hence, the optimal dimension emerges naturally. This also means that the method works well in many dimensions with no adjustments. Third, the method handles sparse data well. This is important since in high dimensional spaces all data sets are sparse, i.e. they look like points in Figure 1, and the density estimation becomes impossible. Luckily, if the data are truly generated by a lower dimensional process, then density estimation in the data space is not important (from the viewpoint of prediction or any other). What is critical is the density of the data along the manifold (known in latent variable modeling as a prior), and our algorithm ﬁnds it naturally. 2 Latent variable models and dimensionality reduction Recently, the problem of reducing the dimensionality of a data set has received renewed attention [3,4]. The underlying idea, due to Hotelling [5], is that most of the variation in many high dimensional data sets can often be explained by a few latent variables. Alternatively, we say that rather than ﬁlling the whole space, the data lie on a lower dimensional manifold. The dimensionality of this manifold is the dimensionality of the latent space and the coordinate system on this manifold provides the latent variables. Traditional tools of principal component analysis (PCA) and factor analysis (FA) are still the most widely used methods in data analysis. They project the data onto a hyperplane, so the reduced coordinates are easy to interpret. However, these methods are unable to deal with nonlinear correlations in a data set. To accommodate nonlinearity in a data set, one has to relax the assumption that the data is modeled by a hyperplane, and allow a general low dimensional manifold of unknown shape and dimensionality. The same questions that we asked in the previous section apply here. What do we mean by requiring that “the manifold models the data well”? In the next section, we formalize this notion by deﬁning the manifold description of data as a doublet (the shape of the manifold and the projection map). Note that we do not require the probability distribution over the manifold (known for generative models [6,7] as a prior distribution over the latent variables and postulated a priori). It is completely determined by the doublet. Nonlinear correlations in data can also be accommodated implicitly, without constructing an actual low dimensional manifold. By mapping the data from the original space to an even higher dimensional feature space, we may hope that the correlations will become linearized and PCA will apply. Kernel methods [8] allow us to do this without actually constructing an explicit map to feature space. They introduce nonlinearity through an a priori nonlinear kernel. Alternatively, autoassociative neural networks [9] force the data through a bottleneck (with an internal layer of desired dimensionality) to produce a reduced description. One of the disadvantages of these methods is that the results are not easy to interpret. Recent attempts to describe a data set with a low dimensional representation generally follow into two categories: spectral methods and density modeling methods. Spectral methods (LLE [3], ISOMAP [4], Laplacian eigenmaps [10]) give reduced coordinates of an a priori dimensionality by introducing a quadratic cost function in reduced coordinates (hence eigenvectors are solutions) that mimics the relationships between points in the original data space (geodesic distance for ISOMAP, linear reconstruction for LLE). Density modeling methods (GTM [6], GMM [7]) are generative models that try to reproduce the data with fewer variables. They require a prior and a parametric generative model to be introduced a priori and then ﬁnd optimal parameters via maximum likelihood. The approach that we will take is inspired by the work of Kramer [9] and others who tried to formulate dimensionality reduction as a compression problem. They tried to solve the problem by building an explicit neural network encoder-decoder system which restricted the information implicitly by limiting the number of nodes in the bottleneck layer. Extending their intuition with the tools of information theory, we recast dimensionality reduction as a compression problem where the bottleneck is the information available to manifold coordinates. This allows us to deﬁne the optimal manifold description as that which produces the best reconstruction of the original data set, given that the coordinates can only be transmitted through a channel of ﬁxed capacity. 3 Dimensionality reduction as compression Suppose that we have a data set X in a high dimensional state space RD described by a density function ρ(x). We would like to ﬁnd a “simpliﬁed” description of this data set. One may do so by visualizing a lower dimensional manifold M that “almost” describes the data. If we have a manifold M and a stochastic map PM : x → PM (µ|x) to points µ on the manifold, we will say that they provide a manifold description of the data set X. Note that the stochastic map here is well justiﬁed: if a data point does not lie exactly on the manifold then we should expect some uncertainty in the estimation of the value of its latent variables. Also note that we do not need to specify the inverse (generative) map: M → RD ; it can be obtained by Bayes’ rule. The manifold description (M, PM ) is a less than faithful representation of the data. To formalize this notion we will introduce the distortion measure D(M, PM , ρ): ρ(x)PM (µ|x) x − µ 2 dD xDµ. D(M, PM , ρ) = x∈RD (1) µ∈M Here we have assumed the Euclidean distance function for simplicity. The stochastic map, PM (µ|x), together with the density, ρ(x), deﬁne a joint probability function P (M, X) that allows us to calculate the mutual information between the data and its manifold representation: I(X, M) = P (x, µ) log x∈X µ∈M P (x, µ) dD xDµ. ρ(x)PM (µ) (2) This quantity tells us how many bits (on average) are required to encode x into µ. If we view the manifold representation of X as a compression scheme, then I(X, M) tells us the necessary capacity of the channel needed to transmit the compressed data. Ideally, we would like to obtain a manifold description {M, PM (M|X)} of the data set X that provides both a low distortion D(M, PM , ρ) and a good compression (i.e. small I(X, M)). The more bits we are willing to provide for the description of the data, the more detailed a manifold that can be constructed. So there is a trade off between how faithful a manifold representation can be and how much information is required for its description. To formalize this notion we introduce the concept of an optimal manifold. DEFINITION. Given a data set X and a channel capacity I, a manifold description (M, PM (M|X)) that minimizes the distortion D(M, PM , X), and requires only information I for representing an element of X, will be called an optimal manifold M(I, X). Note that another way to deﬁne an optimal manifold is to require that the information I(M, X) is minimized while the average distortion is ﬁxed at value D. The shape and the dimensionality of optimal manifold depends on our information resolution (or the description length that we are willing to allow). This dependence captures our intuition that for real world, multi-scale data, a proper manifold representation must reﬂect the compression level we are trying to achieve. To ﬁnd the optimal manifold (M(I), PM(I) ) for a given data set X, we must solve a constrained optimization problem. Let us introduce a Lagrange multiplier λ that represents the trade off between information and distortion. Then optimal manifold M(I) minimizes the functional: F(M, PM ) = D + λI. (3) Let us parametrize the manifold M by t (presumably t ∈ Rd for some d ≤ D). The function γ(t) : t → M maps the points from the parameter space onto the manifold and therefore describes the manifold. Our equations become: D = dD x dd t ρ(x)P (t|x) x − γ(t) 2 , I = dD x dd t ρ(x)P (t|x) log P (t|x) , P (t) F(γ(t), P (t|x)) = D + λI. (4) (5) (6) Note that both information and distortion measures are properties of the manifold description doublet {M, PM (M|X)} and are invariant under reparametrization. We require the variations of the functional to vanish for optimal manifolds δF/δγ(t) = 0 and δF/δP (t|x) = 0, to obtain the following set of self consistent equations: P (t) = γ(t) = P (t|x) = Π(x) = dD x ρ(x)P (t|x), 1 dD x xρ(x)P (t|x), P (t) P (t) − 1 x−γ (t) 2 e λ , Π(x) 2 1 dd t P (t)e− λ x−γ (t) . (7) (8) (9) (10) In practice we do not have the full density ρ(x), but only a discrete number of samples. 1 So we have to approximate ρ(x) = N δ(x − xi ), where N is the number of samples, i is the sample label, and xi is the multidimensional vector describing the ith sample. Similarly, instead of using a continuous variable t we use a discrete set t ∈ {t1 , t2 , ..., tK } of K points to model the manifold. Note that in (7 − 10) the variable t appears only as an argument for other functions, so we can replace the integral over t by a sum over k = 1..K. Then P (t|x) becomes Pk (xi ),γ(t) is now γ k , and P (t) is Pk . The solution to the resulting set of equations in discrete variables (11 − 14) can be found by an iterative Blahut-Arimoto procedure [11] with an additional EM-like step. Here (n) denotes the iteration step, and α is a coordinate index in RD . The iteration scheme becomes: (n) Pk (n) γk,α = = N 1 N (n) Pk (xi ) = Π(n) (xi ) N 1 1 (n) N P k where α (11) i=1 = (n) xi,α Pk (xi ), (12) i=1 1, . . . , D, K (n) 1 (n) Pk e− λ xi −γ k 2 (13) k=1 (n) (n+1) Pk (xi ) = (n) 2 Pk 1 . e− λ xi −γ k (n) (x ) Π i (14) 0 0 One can initialize γk and Pk (xi ) by choosing K points at random from the data set and 0 letting γk = xi(k) and Pk = 1/K, then use equations (13) and (14) to initialize the 0 association map Pk (xi ). The iteration procedure (11 − 14) is terminated once n−1 n max |γk − γk | < , (15) k where determines the precision with which the manifold points are located. The above algorithm requires the information distortion cost λ = −δD/δI as a parameter. If we want to ﬁnd the manifold description (M, P (M|X)) for a particular value of information I, we can plot the curve I(λ) and, because it’s monotonic, we can easily ﬁnd the solution iteratively, arbitrarily close to a given value of I. 4 Evaluating the solution The result of our algorithm is a collection of K manifold points, γk ∈ M ⊂ RD , and a stochastic projection map, Pk (xi ), which maps the points from the data space onto the manifold. Presumably, the manifold M has a well deﬁned intrinsic dimensionality d. If we imagine a little ball of radius r centered at some point on the manifold of intrinsic dimensionality d, and then we begin to grow the ball, the number of points on the manifold that fall inside will scale as rd . On the other hand, this will not be necessarily true for the original data set, since it is more spread out and resembles locally the whole embedding space RD . The Grassberger-Procaccia algorithm [12] captures this intuition by calculating the correlation dimension. First, calculate the correlation integral: 2 C(r) = N (N − 1) N N H(r − |xi − xj |), (16) i=1 j>i where H(x) is a step function with H(x) = 1 for x > 0 and H(x) = 0 for x < 0. This measures the probability that any two points fall within the ball of radius r. Then deﬁne 0 original data manifold representation -2 ln C(r) -4 -6 -8 -10 -12 -14 -5 -4 -3 -2 -1 0 1 2 3 4 ln r Figure 2: The semicircle. (a) N = 3150 points randomly scattered around a semicircle of radius R = 20 by a normal process with σ = 1 and the ﬁnal positions of 100 manifold points. (b) Log log plot of C(r) vs r for both the manifold points (squares) and the original data set (circles). the correlation dimension at length scale r as the slope on the log log plot. dcorr (r) = d log C(r) . d log r (17) For points lying on a manifold the slope remains constant and the dimensionality is ﬁxed, while the correlation dimension of the original data set quickly approaches that of the embedding space as we decrease the length scale. Note that the slope at large length scales always tends to decrease due to ﬁnite span of the data and curvature effects and therefore does not provide a reliable estimator of intrinsic dimensionality. 5 5.1 Examples Semi-Circle We have randomly generated N = 3150 data points scattered by a normal distribution with σ = 1 around a semi-circle of radius R = 20 (Figure 2a). Then we ran the algorithm with K = 100 and λ = 8, and terminated the iterative algorithm once the precision = 0.1 had been reached. The resulting manifold is depicted in red. To test the quality of our solution, we calculated the correlation dimension as a function of spatial scale for both the manifold points and the original data set (Figure 2b). As one can see, the manifold solution is of ﬁxed dimensionality (the slope remains constant), while the original data set exhibits varying dimensionality. One should also note that the manifold points have dcorr (r) = 1 well into the territory where the original data set becomes two dimensional. This is what we should expect: at a given information level (in this case, I = 2.8 bits), the information about the second (local) degree of freedom is lost, and the resulting structure is one dimensional. A note about the parameters. Letting K → ∞ does not alter the solution. The information I and distortion D remain the same, and the additional points γk also fall on the semi-circle and are simple interpolations between the original manifold points. This allows us to claim that what we have found is a manifold, and not an agglomeration of clustering centers. Second, varying λ changes the information resolution I(λ): for small λ (high information rate) the local structure becomes important. At high information rate the solution undergoes 3.5 3 3 3 2.5 2.5 2 2.5 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0 0.5 -0.5 0 0 -1 5 -0.5 -0.5 4 1 3 0.5 2 -1 -1 0 1 -0.5 0 -1 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Figure 3: S-shaped sheet in 3D. (a) N = 2000 random points on a surface of an S-shaped sheet in 3D. (b) Normal noise added. XY-plane projection of the data. (c) Optimal manifold points in 3D, projected onto an XY plane for easy visualization. a phase transition, and the resulting manifold becomes two dimensional to take into account the local structure. Alternatively, if we take λ → ∞, the cost of information rate becomes very high and the whole manifold collapses to a single point (becomes zero dimensional). 5.2 S-surface Here we took N = 2000 points covering an S-shaped sheet in three dimensions (Figure 3a), and then scattered the position of each point by adding Gaussian noise. The resulting manifold is difﬁcult to visualize in three dimensions, so we provided its projection onto an XY plane for an illustrative purpose (Figure 3b). After running our algorithm we have recovered the original structure of the manifold (Figure 3c). 6 Discussion The problem of ﬁnding low dimensional manifolds in high dimensional data requires regularization to avoid hgihly folded, Peano curve like solutions which are low dimensional in the mathematical sense but fail to capture our geometric intuition. Rather than constraining geometrical features of the manifold (e.g., the curvature) we have constrained the mutual information between positions on the manifold and positions in the original data space, and this is invariant to all invertible coordinate transformations in either space. This approach enforces “smoothness” of the manifold only implicitly, but nonetheless seems to work. Our information theoretic approach has considerable generality relative to methods based on speciﬁc smoothing criteria, but requires a separate algorithm, such as LLE, to give the manifold points curvilinear coordinates. For data points not in the original data set, equations (9-10) and (13-14) provide the mapping onto the manifold. Eqn. (7) gives the probability distribution over the latent variable, known in the density modeling literature as “the prior.” The running time of the algorithm is linear in N . This compares favorably with other methods and makes it particularly attractive for very large data sets. The number of manifold points K usually is chosen as large as possible, given the computational constraints, to have a dense sampling of the manifold. However, a value of K << N is often sufﬁcient, since D(λ, K) → D(λ) and I(λ, K) → I(λ) approach their limits rather quickly (the convergence improves for large λ and deteriorates for small λ). In the example of a semi-circle, the value of K = 30 was sufﬁcient at the compression level of I = 2.8 bits. In general, the threshold value for K scales exponentially with the latent dimensionality (rather than with the dimensionality of the embedding space). The choice of λ depends on the desired information resolution, since I depends on λ. Ideally, one should plot the function I(λ) and then choose the region of interest. I(λ) is a monotonically decreasing function, with the kinks corresponding to phase transitions where the optimal manifold abruptly changes its dimensionality. In practice, we may want to run the algorithm only for a few choices of λ, and we would like to start with values that are most likely to correspond to a low dimensional latent variable representation. In this case, as a rule of thumb, we choose λ smaller, but on the order of the largest linear dimension (i.e. λ/2 ∼ Lmax ). The dependence of the optimal manifold M(I) on information resolution reﬂects the multi-scale nature of the data and should not be taken as a shortcoming. References [1] Bregler, C. & Omohundro, S. (1995) Nonlinear image interpolation using manifold learning. Advances in Neural Information Processing Systems 7. MIT Press. [2] Hastie, T. & Stuetzle, W. (1989) Principal curves. Journal of the American Statistical Association, 84(406), 502-516. [3] Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. [4] Tenenbaum, J., de Silva, V., & Langford, J. (2000) A global geometric framework for nonlinear dimensionality reduction. Science, 290 , 2319–2323. [5] Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417-441,498-520. [6] Bishop, C., Svensen, M. & Williams, C. (1998) GTM: The generative topographic mapping. Neural Computation,10, 215–234. [7] Brand, M. (2003) Charting a manifold. Advances in Neural Information Processing Systems 15. MIT Press. [8] Scholkopf, B., Smola, A. & Muller K-R. (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299-1319. [9] Kramer, M. (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37, 233-243. [10] Belkin M. & Niyogi P. (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396. [11] Blahut, R. (1972) Computation of channel capacity and rate distortion function. IEEE Trans. Inform. Theory, IT-18, 460-473. [12] Grassberger, P., & Procaccia, I. (1983) Characterization of strange attractors. Physical Review Letters, 50, 346-349.

6 0.3490721 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors

7 0.33408508 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

8 0.32052475 126 nips-2003-Measure Based Regularization

9 0.29399192 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

10 0.26059705 168 nips-2003-Salient Boundary Detection using Ratio Contour

11 0.24814585 66 nips-2003-Extreme Components Analysis

12 0.24061893 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

13 0.23468819 120 nips-2003-Locality Preserving Projections

14 0.23318268 88 nips-2003-Image Reconstruction by Linear Programming

15 0.22324376 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

16 0.22066034 43 nips-2003-Bounded Invariance and the Formation of Place Fields

17 0.2204553 30 nips-2003-Approximability of Probability Distributions

18 0.21550892 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

19 0.21113299 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

20 0.2076678 51 nips-2003-Design of Experiments via Information Theory

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.093), (11, 0.031), (29, 0.014), (30, 0.03), (35, 0.055), (53, 0.118), (66, 0.015), (69, 0.018), (71, 0.082), (74, 0.235), (76, 0.051), (85, 0.044), (91, 0.133), (99, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9206897 14 nips-2003-A Nonlinear Predictive State Representation

Author: Matthew R. Rudary, Satinder P. Singh

Abstract: Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to deﬁne a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. 1

2 0.91268188 84 nips-2003-How to Combine Expert (and Novice) Advice when Actions Impact the Environment?

Author: Daniela Pucci de Farias, Nimrod Megiddo

Abstract: The so-called “experts algorithms” constitute a methodology for choosing actions repeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies (“experts”), each of which may recommend which action to choose. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any ﬁxed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated non-zero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player’s actions on the opponent’s actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner’s Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play. 1

3 0.88554132 129 nips-2003-Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons

Author: Hsin Chen, Patrice Fleury, Alan F. Murray

Abstract: This paper presents VLSI circuits with continuous-valued probabilistic behaviour realized by injecting noise into each computing unit(neuron). Interconnecting the noisy neurons forms a Continuous Restricted Boltzmann Machine (CRBM), which has shown promising performance in modelling and classifying noisy biomedical data. The Minimising-Contrastive-Divergence learning algorithm for CRBM is also implemented in mixed-mode VLSI, to adapt the noisy neurons’ parameters on-chip. 1

same-paper 4 0.84097868 81 nips-2003-Geometric Analysis of Constrained Curves

Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen

5 0.67726535 113 nips-2003-Learning with Local and Global Consistency

Author: Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, Bernhard Schölkopf

Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufﬁciently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classiﬁcation problems and demonstrates effective use of unlabeled data. 1

6 0.67528498 107 nips-2003-Learning Spectral Clustering

7 0.67113382 78 nips-2003-Gaussian Processes in Reinforcement Learning

8 0.66452652 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

9 0.66001105 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

10 0.65986365 55 nips-2003-Distributed Optimization in Adaptive Networks

11 0.65921324 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

12 0.65876263 143 nips-2003-On the Dynamics of Boosting

13 0.65774935 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

14 0.65698057 126 nips-2003-Measure Based Regularization

15 0.65518051 48 nips-2003-Convex Methods for Transduction

16 0.65499496 30 nips-2003-Approximability of Probability Distributions

17 0.65484464 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

18 0.65389359 73 nips-2003-Feature Selection in Clustering Problems

19 0.65284008 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

20 0.65208626 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms