nips nips2006 nips2006-95 knowledge-graph by maker-knowledge-mining

95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions


Source: pdf

Author: Christian Walder, Olivier Chapelle, Bernhard Schölkopf

Abstract: We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. [sent-4, score-0.284]

2 The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. [sent-5, score-0.559]

3 We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. [sent-6, score-0.443]

4 We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data. [sent-7, score-0.082]

5 1 Introduction The problem of reconstructing a surface from a set of points frequently arises in computer graphics. [sent-8, score-0.229]

6 Numerous methods of sampling physical surfaces are now available, including laser scanners, optical triangulation systems and mechanical probing methods. [sent-9, score-0.102]

7 Inferring a surface from millions of points sampled with noise is a non-trivial task however, for which a variety of methods have been proposed. [sent-10, score-0.229]

8 The class of implicit or level set surface representations is a rather large one, however other methods have also been suggested – for a review see [1]. [sent-11, score-0.342]

9 The first approach uses compactly supported kernel functions (we define and discuss kernel functions in Section 2), leading to fast algorithms that are easy to implement [4]. [sent-15, score-0.482]

10 As noted in [5], compactly supported basis functions “yield surfaces with many undesirable artifacts in addition to the lack of extrapolation across holes”. [sent-17, score-0.541]

11 The second means of overcoming the aforementioned computational problem does not suffer from these problems however, as demonstrated by the FastRBFTM algorithm [5], which uses the the Fast Multipole Method (FMM) [8] to overcome the computational problems of non-compactly supported kernels. [sent-19, score-0.054]

12 We believe that by applying them in a different manner, compactly supported basis functions can lead to high quality results, and the present work is an attempt to bring the reader to the same conclusion. [sent-21, score-0.439]

13 In Section 3 we introduce a new technique for regularising such basis functions which Figure 1: (a) Rendered implicit surface model of “Lucy”, constructed from 14 million points with normals. [sent-22, score-0.685]

14 (c) A black dot at each of the 364,982 compactly supported basis function centres which, along with the corresponding dilations and magnitudes, define the implicit. [sent-24, score-0.497]

15 2 Implicit Surface Fitting by Regularised Regression Here we discuss the use of regularised regression [2] for the problem of implicit surface fitting. [sent-29, score-0.558]

16 1 we motivate and introduce a clean and direct means of making use of normal vectors. [sent-31, score-0.055]

17 3 discusses the choice of regulariser (and associated kernel function), as well as the associated computational problems that we overcome in Section 3. [sent-36, score-0.275]

18 The norm f H is a regulariser which takes on larger values for less “smooth” functions. [sent-39, score-0.261]

19 We take H to be a reproducing kernel Hilbert space (RKHS) with representer of evaluation (kernel function) k(·, ·), so that we have the reproducing property, f (x) = f, k(x, ·) H . [sent-40, score-0.259]

20 (2) i Note as a technical aside that the thin-plate kernel case – which we will adopt – requires a somewhat more technical interpretatiosn, as it is only conditionally positive definite. [sent-42, score-0.045]

21 We discuss the positive definite case for clarity only, as it is simpler and yet sufficient to demonstrate the ideas involved. [sent-43, score-0.028]

22 We now propose more direct method, novel in the context of implicit fitting, which avoids these problems. [sent-47, score-0.146]

23 The approach is suggested by the fact that the normal direction of the implicit surface is given by the gradient of the embedding function – thus normal vectors can be incorporated by regression with gradient targets. [sent-48, score-0.522]

24 The function that we seek is the minimiser of: m m f 2 H 2 ( f ) (xi ) − ni (f (xi )) + C2 + C1 2 Rd , (3) i=1 i=1 which uses the given surface point/normal pairs (xi , ni ) directly. [sent-49, score-0.31]

25 By imposing stationarity and using the reproducing property we can solve for the optimal f . [sent-50, score-0.092]

26 Here we provide only the result, which is that we have to solve for m coefficients αi as well as a further md coefficients βlj to obtain the optimal solution m f (x) = m d i l αi k (xi , x) + i βli kl (xi , x) , (4) . [sent-52, score-0.08]

27 where we define kl (xi , x) = [( k) (xi , x)]l , the partial derivative of k in the l-th component of its first argument. [sent-53, score-0.046]

28 1 The coefficients α and βl of the solution are found by solving the system given by 0 = (K + I/C1 )α + Kl βl (5) l Nm = Km α + (Kmm + I/C2 )βk + Klm βl , m = 1 . [sent-54, score-0.029]

29 d (6) l=m where, writing klm for the second derivatives of k(·, ·) (defined similarly to the first), we’ve defined [Nl ]i = [ni ]l ; [βl ]i = βli ; [Kl ]i,j = kl (xi , xj ) ; [α]i = αi [K]i,j = k(xi , xj ) [Klm ]i,j = klm (xi , xj ). [sent-57, score-0.342]

30 In summary, minimum norm approximation in an RKHS with gradient target values is optimally solved by a function in the span of the kernels and derivatives thereof as per Equation 4 (cf. [sent-58, score-0.088]

31 As we saw in the previous section however, if gradients also appear in the risk function to be minimised, then gradients of the kernel function appear in the optimal solution. [sent-63, score-0.045]

32 We now make a more general statement – the case in the previous section corresponds to the following if we choose the linear operators Li (which we define shortly) as either identities or partial derivatives. [sent-64, score-0.06]

33 The theorem is a generalisation of [9] (using the same proof idea) with equivalence if we choose all Li to be identity operators. [sent-65, score-0.044]

34 The case of general linear operators was in fact dealt with already in [2] (which merely states the earlier result in [10]) – but only for the case of a specific loss function c. [sent-66, score-0.06]

35 The following theorem therefore combines the two frameworks: Theorem 1 Denote by X a non-empty set, by k a reproducing kernel with reproducing kernel Hilbert space H, by Ω a strictly monotonic increasing real-valued function on [0, ∞), by c : Rm → R ∪ {∞} an arbitrary cost function, and by L1 , . [sent-67, score-0.206]

36 Each minimiser f ∈ H of the regularised risk functional c((L1 f )(x1 ), . [sent-71, score-0.212]

37 (Lm f )(xm )) + Ω(||f ||2 ) H admits the form m αi L∗ kxi , i f= i=1 where kx 1 (7) k(·, x) and L∗ denotes the adjoint of Li . [sent-74, score-0.134]

38 Decompose f into f = i=1 αi L∗ kxi + f⊥ , with αi ∈ R and f⊥ , L∗ kxi i i i = 1 . [sent-77, score-0.268]

39 Due to the reproducing property we can write, for j = 1 . [sent-81, score-0.058]

40 m, (Lj f )(xj ) = (Lj f ), k(·, xj ) H = 0, for each H m αi Lj L∗ kxi , k(·, xj ) i H αi Lj L∗ kxi , k(·, xj ) i = H. [sent-84, score-0.388]

41 + (Lj f⊥ ), k(·, xj ) H i=1 m = i=1 Thus, the first term in Equation 7 is independent of f⊥ . [sent-85, score-0.04]

42 Moreover, it is clear due to orthogonality that if f⊥ = 0 then     2 m αi L∗ kxi i Ω + f⊥ i=1 2 m αi L∗ kxi i  > Ω i=1 H , H so that for any fixed αi ∈ R, Equation 7 is minimised when f⊥ = 0. [sent-86, score-0.306]

43 [2]), the choice of regulariser (the function norm in Equation 3) leads to a particular kernel function k(·, ·) to be used in Equation 4. [sent-90, score-0.306]

44 For geometrical problems, an excellent regulariser is the thin-plate energy, which for arbitrary order m and dimension d is given by [2]: f 2 H = = ψf, ψf d X i1 =1 ··· (9) L2 d X Z im =1 ∞ x1 =−∞ Z ∞ „ ··· xd =−∞ ∂ ∂ f ··· ∂xi1 ∂xim «„ ∂ ∂ f ··· ∂xi1 ∂xim « dx1 . [sent-91, score-0.26]

45 dxd , (10) where ψ is a regularisation operator taking all partial derivatives of order m, which corresponds to a “radial” kernel function of the form k(x, y) = t(||x − y||), where [11] t(r) = r2m−d ln(r) if 2m > d and d is even, r2m−d otherwise. [sent-94, score-0.15]

46 There are a number of good reasons to use this regulariser rather than those leading to compactly supported kernels, as we touched on in the introduction. [sent-95, score-0.507]

47 The scheme we propose in Section 3 solves these problems, previously associated with compactly supported basis functions, by defining and computing the regulariser separately from the function basis. [sent-97, score-0.632]

48 3 A Fast Scheme using Compactly Supported Basis Functions Here we present a fast approximate scheme for solving the problem of the previous Section, in which we restrict the class of functions to the span of a compactly supported, multi-scale basis, as described in Section 3. [sent-98, score-0.358]

49 1, and minimise the thin-plate regulariser within this span as per Section 3. [sent-99, score-0.322]

50 1 Restricting the Set of Available Functions Computationally, using the thin-plate spline leads to the problem that the linear system we need to solve (Equations 5 and 6), which is of size m(d + 1), is dense in the sense of having almost all non-zero entries. [sent-102, score-0.136]

51 Since solving such a system na¨vely has a cubic time complexity in m, we propose ı forcing f (·) to take the form: p f (·) = πk fk (·), k=1 (11) where the individual basis functions are fk (·) = φ(||vk −·||/sk ) for some function φ : R+ → R with support [0, 1). [sent-103, score-0.381]

52 The vk and sk are the basis function centres and dilations (or scales), respectively. [sent-104, score-0.22]

53 2 d+1 n=0 (12) although this choice is rather inconsequential since, as we shall ensure, the regulariser is unrelated to the function basis – any smooth compactly supported basis function could be used. [sent-106, score-0.757]

54 In order to achieve the same interpolating properties as the thin-plate spline, we wish to minimise our regularised risk function given by Equation 3 within the span of Equation 11. [sent-107, score-0.301]

55 The key to doing this is to note that as given before in Equation 9, the regulariser (function norm) can be written as 2 f H = ψf, ψf L2 . [sent-108, score-0.23]

56 The computational advantage is that the coefficients that we need are now given by a sparse pdimensional positive semi-definite linear system, which can be constructed efficiently by simple code that takes advantage of software libraries for fast nearest neighbour type searches (see e. [sent-110, score-0.041]

57 The system can then be solved efficiently using conjugate gradient type methods. [sent-113, score-0.029]

58 In [1] we describe how we construct a basis with p m that results in a highly sparse linear system, but that still contains good solutions. [sent-114, score-0.125]

59 2 Computing the Regularisation Matrix We now come to the crucial point of calculating Kreg , which can be thought of as the regularisation matrix. [sent-117, score-0.105]

60 The present Section is highly related to [13], however there numerical methods were resorted to for the calculation of Kreg – presently we shall derive closed form solutions. [sent-118, score-0.038]

61 Also worth comparing to the present Section is [14], where a prior over the expansion coefficients (here the π) is used to mimic a given regulariser within an arbitrary basis, achieving a similar result but without the computational advantages we are aiming for. [sent-119, score-0.23]

62 As we have already noted we can write 2 f H = ψf, ψf L2 [2], so that for the function given by Equation 11 we have: ‚ p ‚2 ‚X ‚ ‚ ‚ πj fj (·)‚ ‚ ‚ ‚ j=1 * = ψ p X πj fj (·), ψ j=1 H = p X p X + πk fk (·) k=1 L2 πj πk ψfj (·), ψfk (·) L2 . [sent-120, score-0.183]

63 j,k=1 To build the sparse matrix Kreg , a fast range search library (e. [sent-122, score-0.041]

64 [12]) can be used to identify the non-zero entries – that is, all those [Kreg ]i,j for which i and j satisfy vi − vj ≤ si + sj . [sent-124, score-0.03]

65 In order to evaluate ψfj (·), ψfk (·) L2 , it is necessary to solve the integral of Equation 10, the full derivation of which we relegate to [1] – here we just provide the main results. [sent-125, score-0.066]

66 It turns out that since the fi are all dilations and translations of the same function φ( · ), then it is sufficient solve for the following . [sent-126, score-0.091]

67 Figure 3: Ray traced three dimensional implicits, “Happy Buddha” (543K points with normals) and the “Thai Statue” (5 million points with normals). [sent-128, score-0.148]

68 where j 2 = −1, Φ = Fx [φ(x)], and by F (and F −1 ) we mean the Fourier (inverse Fourier) transform operators in the subscripted variable. [sent-129, score-0.06]

69 3 Interpretation as a Gaussian Process Presently we use ideas from [15] to demonstrate that the approximation described in this Section 3 is equivalent to inference in an exact Gaussian Process with covariance function depending on the choice of function basis. [sent-134, score-0.028]

70 Column one is the number of points, each of which has an associated normal vector, and column two is the number of basis vectors (the p of Section 3. [sent-188, score-0.209]

71 By comparison with (11) and (13) (but with C1 = 1/σ 2 , C2 = 0 and y = 0) we can see that the mean of the posterior distribution is identical to our approximate regularised solution based on compactly supported basis functions. [sent-191, score-0.576]

72 Experiments We fit models to 3D data sets of up to 14 million data points – timings are given in Table 1, where we also see that good compression ratios are attained, in that relatively few basis functions represent the shapes. [sent-193, score-0.277]

73 Also note that the fitting time scales rather well, from 38 seconds for the Stanford Bunny (35 thousand points with normals) to 4 hours 23 minutes for the Lucy statue (14 million points with normals ≈ 14×106 ×(1 value term + 3 gradient terms ) ≈ 56 million “regression targets”). [sent-194, score-0.383]

74 Some rendered examples are given in Figures 1 and 3, and the well-behaved nature of the implicit over the entire 3D volume of interest is shown for the Lucy data-set in the accompanying video. [sent-196, score-0.233]

75 In practice the system is extremely robust and produces excellent results without any parameter adjustment – smaller values of C1 and C2 in Equation 3 simply lead to the smoothing effect shown in Figure 2. [sent-197, score-0.059]

76 The system also handles missing and noisy data gracefully, as demonstrated in [1]. [sent-198, score-0.029]

77 We demonstrate this in the accompanying video, which shows that 4D surfaces yield superior 3D animation results in comparison to a sequence of 3D models. [sent-200, score-0.207]

78 Also interesting are interpolations in 4D – in the accompanying video we effectively interpolate between two three dimensional shapes. [sent-201, score-0.085]

79 5 Summary We have presented ideas both theoretically and practically useful for the computer graphics and machine learning communities, demonstrating them within the framework of implicit surface fitting. [sent-202, score-0.37]

80 Many authors have demonstrated fast but limited quality results that occur with compactly supported function bases. [sent-203, score-0.318]

81 The present work differs by precisely minimising a well justified regulariser within the span of such a basis, achieving fast and high quality results. [sent-204, score-0.385]

82 We also showed how normal vectors can be incorporated directly into the usual regression based implicit surface fitting framework, giving a generalisation of the representer theorem. [sent-205, score-0.609]

83 We demonstrated the algorithm on 3D problems of up to 14 million data points and in the accompanying video we showed the advantage of constructing a 4D surface (3D + time) for 3D animation, rather than a sequence of 3D surfaces. [sent-206, score-0.396]

84 Figure 4: Reconstruction of the Stanford bunny after adding Gaussian noise with standard deviation, from left to right, 0, 0. [sent-207, score-0.066]

85 6 percent of the radius of the smallest enclosing sphere – the normal vectors were similarly corrupted assuming they had length equal to this radius. [sent-210, score-0.055]

86 Implicit surface modelling with a globally regularised basis o of compact support. [sent-216, score-0.495]

87 Interpolating implicit surfaces from scattered surface data using compactly supported radial basis functions. [sent-233, score-0.941]

88 Reconstruction and representation of 3d objects with radial basis functions. [sent-254, score-0.185]

89 A multi-scale approach to 3d scattered data interpolation with compactly supported basis functions. [sent-263, score-0.469]

90 How to choose the covariance for gaussian process regression independently of the basis. [sent-307, score-0.042]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kreg', 0.506), ('kxz', 0.352), ('regulariser', 0.23), ('compactly', 0.223), ('surface', 0.196), ('regularised', 0.174), ('implicit', 0.146), ('kxi', 0.134), ('ktz', 0.132), ('kzt', 0.132), ('basis', 0.125), ('kzx', 0.11), ('regularisation', 0.105), ('surfaces', 0.102), ('representer', 0.098), ('fx', 0.098), ('yx', 0.098), ('fk', 0.095), ('klm', 0.088), ('kxv', 0.088), ('kxvl', 0.088), ('lj', 0.087), ('normals', 0.087), ('million', 0.082), ('lucy', 0.077), ('spline', 0.073), ('equation', 0.067), ('bunny', 0.066), ('regularising', 0.066), ('statue', 0.066), ('walder', 0.066), ('operators', 0.06), ('radial', 0.06), ('xi', 0.058), ('reproducing', 0.058), ('yt', 0.058), ('dilations', 0.057), ('minimising', 0.057), ('turk', 0.057), ('span', 0.057), ('accompanying', 0.056), ('normal', 0.055), ('supported', 0.054), ('gr', 0.052), ('nl', 0.049), ('animation', 0.049), ('fourier', 0.047), ('kl', 0.046), ('kernel', 0.045), ('fj', 0.044), ('belyaev', 0.044), ('buddha', 0.044), ('dragon', 0.044), ('fastrbftm', 0.044), ('fmm', 0.044), ('implicits', 0.044), ('ohtake', 0.044), ('queensland', 0.044), ('thai', 0.044), ('xim', 0.044), ('generalisation', 0.044), ('regression', 0.042), ('fast', 0.041), ('cients', 0.041), ('xj', 0.04), ('ni', 0.038), ('presently', 0.038), ('centres', 0.038), ('greg', 0.038), ('minimised', 0.038), ('minimiser', 0.038), ('tting', 0.037), ('functions', 0.037), ('li', 0.036), ('coef', 0.036), ('interpolating', 0.035), ('christian', 0.035), ('cb', 0.035), ('scattered', 0.035), ('minimise', 0.035), ('solve', 0.034), ('points', 0.033), ('interpolation', 0.032), ('bernhard', 0.032), ('integral', 0.032), ('shape', 0.032), ('rkhs', 0.031), ('norm', 0.031), ('rendered', 0.031), ('excellent', 0.03), ('sj', 0.03), ('system', 0.029), ('olivier', 0.029), ('lm', 0.029), ('plate', 0.029), ('column', 0.029), ('video', 0.029), ('acm', 0.029), ('ideas', 0.028), ('incorporated', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

Author: Christian Walder, Olivier Chapelle, Bernhard Schölkopf

Abstract: We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data. 1

2 0.16516204 110 nips-2006-Learning Dense 3D Correspondence

Author: Florian Steinke, Volker Blanz, Bernhard Schölkopf

Abstract: Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads. 1

3 0.12694672 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung

Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.

4 0.11505503 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation

Author: Gavin C. Cawley, Nicola L. Talbot, Mark Girolami

Abstract: Multinomial logistic regression provides the standard penalised maximumlikelihood solution to multi-class pattern recognition problems. More recently, the development of sparse multinomial logistic regression models has found application in text processing and microarray classification, where explicit identification of the most informative features is of value. In this paper, we propose a sparse multinomial logistic regression method, in which the sparsity arises from the use of a Laplace prior, but where the usual regularisation parameter is integrated out analytically. Evaluation over a range of benchmark datasets reveals this approach results in similar generalisation performance to that obtained using cross-validation, but at greatly reduced computational expense. 1

5 0.068842754 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1

6 0.068255402 75 nips-2006-Efficient sparse coding algorithms

7 0.064675681 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

8 0.064349689 116 nips-2006-Learning from Multiple Sources

9 0.059552431 7 nips-2006-A Local Learning Approach for Clustering

10 0.057280924 186 nips-2006-Support Vector Machines on a Budget

11 0.056940332 150 nips-2006-On Transductive Regression

12 0.054504409 203 nips-2006-implicit Online Learning with Kernels

13 0.054338019 128 nips-2006-Manifold Denoising

14 0.049782269 153 nips-2006-Online Clustering of Moving Hyperplanes

15 0.049182739 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

16 0.049065258 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

17 0.048629042 79 nips-2006-Fast Iterative Kernel PCA

18 0.048404448 111 nips-2006-Learning Motion Style Synthesis from Perceptual Observations

19 0.042725906 6 nips-2006-A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems

20 0.041949704 167 nips-2006-Recursive ICA


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.166), (1, 0.052), (2, 0.011), (3, 0.033), (4, -0.013), (5, 0.033), (6, -0.006), (7, -0.029), (8, -0.023), (9, 0.066), (10, -0.055), (11, -0.053), (12, -0.028), (13, -0.106), (14, 0.017), (15, 0.01), (16, -0.067), (17, -0.063), (18, -0.062), (19, 0.057), (20, 0.008), (21, 0.103), (22, 0.062), (23, -0.046), (24, -0.02), (25, -0.067), (26, 0.177), (27, -0.073), (28, -0.063), (29, -0.063), (30, 0.157), (31, -0.198), (32, -0.046), (33, 0.007), (34, 0.034), (35, 0.059), (36, 0.008), (37, -0.03), (38, 0.033), (39, 0.033), (40, 0.115), (41, 0.018), (42, -0.136), (43, -0.015), (44, -0.175), (45, -0.042), (46, -0.096), (47, -0.27), (48, 0.032), (49, -0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92616218 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

Author: Christian Walder, Olivier Chapelle, Bernhard Schölkopf

Abstract: We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data. 1

2 0.65728122 110 nips-2006-Learning Dense 3D Correspondence

Author: Florian Steinke, Volker Blanz, Bernhard Schölkopf

Abstract: Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads. 1

3 0.58953965 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation

Author: Gavin C. Cawley, Nicola L. Talbot, Mark Girolami

Abstract: Multinomial logistic regression provides the standard penalised maximumlikelihood solution to multi-class pattern recognition problems. More recently, the development of sparse multinomial logistic regression models has found application in text processing and microarray classification, where explicit identification of the most informative features is of value. In this paper, we propose a sparse multinomial logistic regression method, in which the sparsity arises from the use of a Laplace prior, but where the usual regularisation parameter is integrated out analytically. Evaluation over a range of benchmark datasets reveals this approach results in similar generalisation performance to that obtained using cross-validation, but at greatly reduced computational expense. 1

4 0.47708806 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung

Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.

5 0.41897532 6 nips-2006-A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems

Author: Yoshinobu Kawahara, Takehisa Yairi, Kazuo Machida

Abstract: In this paper, we present a subspace method for learning nonlinear dynamical systems based on stochastic realization, in which state vectors are chosen using kernel canonical correlation analysis, and then state-space systems are identified through regression with the state vectors. We construct the theoretical underpinning and derive a concrete algorithm for nonlinear identification. The obtained algorithm needs no iterative optimization procedure and can be implemented on the basis of fast and reliable numerical schemes. The simulation result shows that our algorithm can express dynamics with a high degree of accuracy. 1

6 0.38305348 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

7 0.38030189 153 nips-2006-Online Clustering of Moving Hyperplanes

8 0.37850913 29 nips-2006-An Information Theoretic Framework for Eukaryotic Gradient Sensing

9 0.35827711 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

10 0.32904139 75 nips-2006-Efficient sparse coding algorithms

11 0.32695273 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

12 0.30977622 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

13 0.30698007 179 nips-2006-Sparse Representation for Signal Classification

14 0.304104 180 nips-2006-Speakers optimize information density through syntactic reduction

15 0.30405039 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

16 0.2944124 174 nips-2006-Similarity by Composition

17 0.2897884 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis

18 0.28840563 79 nips-2006-Fast Iterative Kernel PCA

19 0.28759027 128 nips-2006-Manifold Denoising

20 0.27686799 147 nips-2006-Non-rigid point set registration: Coherent Point Drift


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.113), (3, 0.036), (6, 0.295), (7, 0.071), (8, 0.011), (9, 0.054), (12, 0.012), (20, 0.013), (22, 0.057), (44, 0.05), (45, 0.01), (57, 0.063), (64, 0.021), (65, 0.047), (69, 0.039), (83, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82324511 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Author: Alexander T. Ihler, Padhraic Smyth

Abstract: Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what “factors” are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real–world data sets of count data involving both vehicles and people are used to illustrate the technique. 1

same-paper 2 0.78343654 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

Author: Christian Walder, Olivier Chapelle, Bernhard Schölkopf

Abstract: We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data. 1

3 0.71756083 5 nips-2006-A Kernel Method for the Two-Sample-Problem

Author: Arthur Gretton, Karsten M. Borgwardt, Malte Rasch, Bernhard Schölkopf, Alex J. Smola

Abstract: We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be computed in O(m2 ) time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

4 0.67129445 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

Author: Jeremy Lewi, Robert Butera, Liam Paninski

Abstract: Adaptively optimizing experiments can significantly reduce the number of trials needed to characterize neural responses using parametric statistical models. However, the potential for these methods has been limited to date by severe computational challenges: choosing the stimulus which will provide the most information about the (typically high-dimensional) model parameters requires evaluating a high-dimensional integration and optimization in near-real time. Here we present a fast algorithm for choosing the optimal (most informative) stimulus based on a Fisher approximation of the Shannon information and specialized numerical linear algebra techniques. This algorithm requires only low-rank matrix manipulations and a one-dimensional linesearch to choose the stimulus and is therefore efficient even for high-dimensional stimulus and parameter spaces; for example, we require just 15 milliseconds on a desktop computer to optimize a 100-dimensional stimulus. Our algorithm therefore makes real-time adaptive experimental design feasible. Simulation results show that model parameters can be estimated much more efficiently using these adaptive techniques than by using random (nonadaptive) stimuli. Finally, we generalize the algorithm to efficiently handle both fast adaptation due to spike-history effects and slow, non-systematic drifts in the model parameters. Maximizing the efficiency of data collection is important in any experimental setting. In neurophysiology experiments, minimizing the number of trials needed to characterize a neural system is essential for maintaining the viability of a preparation and ensuring robust results. As a result, various approaches have been developed to optimize neurophysiology experiments online in order to choose the “best” stimuli given prior knowledge of the system and the observed history of the cell’s responses. The “best” stimulus can be defined a number of different ways depending on the experimental objectives. One reasonable choice, if we are interested in finding a neuron’s “preferred stimulus,” is the stimulus which maximizes the firing rate of the neuron [1, 2, 3, 4]. Alternatively, when investigating the coding properties of sensory cells it makes sense to define the optimal stimulus in terms of the mutual information between the stimulus and response [5]. Here we take a system identification approach: we define the optimal stimulus as the one which tells us the most about how a neural system responds to its inputs [6, 7]. We consider neural systems in † ‡ http://www.prism.gatech.edu/∼gtg120z http://www.stat.columbia.edu/∼liam which the probability p(rt |{xt , xt−1 , ..., xt−tk }, {rt−1 , . . . , rt−ta }) of the neural response rt given the current and past stimuli {xt , xt−1 , ..., xt−tk }, and the observed recent history of the neuron’s activity, {rt−1 , . . . , rt−ta }, can be described by a model p(rt |{xt }, {rt−1 }, θ), specified by a finite vector of parameters θ. Since we estimate these parameters from experimental trials, we want to choose our stimuli so as to minimize the number of trials needed to robustly estimate θ. Two inconvenient facts make it difficult to realize this goal in a computationally efficient manner: 1) model complexity — we typically need a large number of parameters to accurately model a system’s response p(rt |{xt }, {rt−1 }, θ); and 2) stimulus complexity — we are typically interested in neural responses to stimuli xt which are themselves very high-dimensional (e.g., spatiotemporal movies if we are dealing with visual neurons). In particular, it is computationally challenging to 1) update our a posteriori beliefs about the model parameters p(θ|{rt }, {xt }) given new stimulus-response data, and 2) find the optimal stimulus quickly enough to be useful in an online experimental context. In this work we present methods for solving these problems using generalized linear models (GLM) for the input-output relationship p(rt |{xt }, {rt−1 }, θ) and certain Gaussian approximations of the posterior distribution of the model parameters. Our emphasis is on finding solutions which scale well in high dimensions. We solve problem (1) by using efficient rank-one update methods to update the Gaussian approximation to the posterior, and problem (2) by a reduction to a highly tractable onedimensional optimization problem. Simulation results show that the resulting algorithm produces a set of stimulus-response pairs which is much more informative than the set produced by random sampling. Moreover, the algorithm is efficient enough that it could feasibly run in real-time. Neural systems are highly adaptive and more generally nonstatic. A robust approach to optimal experimental design must be able to cope with changes in θ. We emphasize that the model framework analyzed here can account for three key types of changes: stimulus adaptation, spike rate adaptation, and random non-systematic changes. Adaptation which is completely stimulus dependent can be accounted for by including enough stimulus history terms in the model p(rt |{xt , ..., xt−tk }, {rt−1 , ..., rt−ta }). Spike-rate adaptation effects, and more generally spike history-dependent effects, are accounted for explicitly in the model (1) below. Finally, we consider slow, non-systematic changes which could potentially be due to changes in the health, arousal, or attentive state of the preparation. Methods We model a neuron as a point process whose conditional intensity function (instantaneous firing rate) is given as the output of a generalized linear model (GLM) [8, 9]. This model class has been discussed extensively elsewhere; briefly, this class is fairly natural from a physiological point of view [10], with close connections to biophysical models such as the integrate-and-fire cell [9], and has been applied in a wide variety of experimental settings [11, 12, 13, 14]. The model is summarized as: tk λt = E(rt ) = f ta aj rt−j ki,t−l xi,t−l + i l=1 (1) j=1 In the above summation the filter coefficients ki,t−l capture the dependence of the neuron’s instantaneous firing rate λt on the ith component of the vector stimulus at time t − l, xt−l ; the model therefore allows for spatiotemporal receptive fields. For convenience, we arrange all the stimulus coefficients in a vector, k, which allows for a uniform treatment of the spatial and temporal components of the receptive field. The coefficients aj model the dependence on the observed recent activity r at time t − j (these terms may reflect e.g. refractory effects, burstiness, firing-rate adaptation, etc., depending on the value of the vector a [9]). For convenience we denote the unknown parameter vector as θ = {k; a}. The experimental objective is the estimation of the unknown filter coefficients, θ, given knowledge of the stimuli, xt , and the resulting responses rt . We chose the nonlinear stage of the GLM, the link function f (), to be the exponential function for simplicity. This choice ensures that the log likelihood of the observed data is a concave function of θ [9]. Representing and updating the posterior. As emphasized above, our first key task is to efficiently update the posterior distribution of θ after t trials, p(θt |xt , rt ), as new stimulus-response pairs are trial 100 trial 500 trial 2500 trial 5000 θ true 1 info. max. trial 0 0 random −1 (a) random info. max. 2000 Time(Seconds) Entropy 1500 1000 500 0 −500 0 1000 2000 3000 Iteration (b) 4000 5000 0.1 total time diagonalization posterior update 1d line Search 0.01 0.001 0 200 400 Dimensionality 600 (c) Figure 1: A) Plots of the estimated receptive field for a simulated visual neuron. The neuron’s receptive field θ has the Gabor structure shown in the last panel (spike history effects were set to zero for simplicity here, a = 0). The estimate of θ is taken as the mean of the posterior, µt . The images compare the accuracy of the estimates using information maximizing stimuli and random stimuli. B) Plots of the posterior entropies for θ in these two cases; note that the information-maximizing stimuli constrain the posterior of θ much more effectively than do random stimuli. C) A plot of the timing of the three steps performed on each iteration as a function of the dimensionality of θ. The timing for each step was well-fit by a polynomial of degree 2 for the diagonalization, posterior update and total time, and degree 1 for the line search. The times are an average over many iterations. The error-bars for the total time indicate ±1 std. observed. (We use xt and rt to abbreviate the sequences {xt , . . . , x0 } and {rt , . . . , r0 }.) To solve this problem, we approximate this posterior as a Gaussian; this approximation may be justified by the fact that the posterior is the product of two smooth, log-concave terms, the GLM likelihood function and the prior (which we assume to be Gaussian, for simplicity). Furthermore, the main theorem of [7] indicates that a Gaussian approximation of the posterior will be asymptotically accurate. We use a Laplace approximation to construct the Gaussian approximation of the posterior, p(θt |xt , rt ): we set µt to the peak of the posterior (i.e. the maximum a posteriori (MAP) estimate of θ), and the covariance matrix Ct to the negative inverse of the Hessian of the log posterior at µt . In general, computing these terms directly requires O(td2 + d3 ) time (where d = dim(θ); the time-complexity increases with t because to compute the posterior we must form a product of t likelihood terms, and the d3 term is due to the inverse of the Hessian matrix), which is unfortunately too slow when t or d becomes large. Therefore we further approximate p(θt−1 |xt−1 , rt−1 ) as Gaussian; to see how this simplifies matters, we use Bayes to write out the posterior: 1 −1 log p(θ|rt , xt ) = − (θ − µt−1 )T Ct−1 (θ − µt−1 ) + − exp {xt ; rt−1 }T θ 2 + rt {xt ; rt−1 }T θ + const d log p(θ|rt , xt ) −1 = −(θ − µt−1 )T Ct−1 + (2) − exp({xt ; rt−1 }T θ) + rt {xt ; rt−1 }T dθ d2 log p(θ|rt , xt ) −1 = −Ct−1 − exp({xt ; rt−1 }T θ){xt ; rt−1 }{xt ; rt−1 }T dθi dθj (3) Now, to update µt we only need to find the peak of a one-dimensional function (as opposed to a d-dimensional function); this follows by noting that that the likelihood only varies along a single direction, {xt ; rt−1 }, as a function of θ. At the peak of the posterior, µt , the first term in the gradient must be parallel to {xt ; rt−1 } because the gradient is zero. Since Ct−1 is non-singular, µt − µt−1 must be parallel to Ct−1 {xt ; rt−1 }. Therefore we just need to solve a one dimensional problem now to determine how much the mean changes in the direction Ct−1 {xt ; rt−1 }; this requires only O(d2 ) time. Moreover, from the second derivative term above it is clear that computing Ct requires just a rank-one matrix update of Ct−1 , which can be evaluated in O(d2 ) time via the Woodbury matrix lemma. Thus this Gaussian approximation of p(θt−1 |xt−1 , rt−1 ) provides a large gain in efficiency; our simulations (data not shown) showed that, despite this improved efficiency, the loss in accuracy due to this approximation was minimal. Deriving the (approximately) optimal stimulus. To simplify the derivation of our maximization strategy, we start by considering models in which the firing rate does not depend on past spiking, so θ = {k}. To choose the optimal stimulus for trial t + 1, we want to maximize the conditional mutual information I(θ; rt+1 |xt+1 , xt , rt ) = H(θ|xt , rt ) − H(θ|xt+1 , rt+1 ) (4) with respect to the stimulus xt+1 . The first term does not depend on xt+1 , so maximizing the information requires minimizing the conditional entropy H(θ|xt+1 , rt+1 ) = p(rt+1 |xt+1 ) −p(θ|rt+1 , xt+1 ) log p(θ|rt+1 , xt+1 )dθ = Ert+1 |xt+1 log det[Ct+1 ] + const. rt+1 (5) We do not average the entropy of p(θ|rt+1 , xt+1 ) over xt+1 because we are only interested in the conditional entropy for the particular xt+1 which will be presented next. The equality above is due to our Gaussian approximation of p(θ|xt+1 , rt+1 ). Therefore, we need to minimize Ert+1 |xt+1 log det[Ct+1 ] with respect to xt+1 . Since we set Ct+1 to be the negative inverse Hessian of the log-posterior, we have: −1 Ct+1 = Ct + Jobs (rt+1 , xt+1 ) −1 , (6) Jobs is the observed Fisher information. Jobs (rt+1 , xt+1 ) = −∂ 2 log p(rt+1 |ε = xt θ)/∂ε2 xt+1 xt t+1 t+1 (7) Here we use the fact that for the GLM, the likelihood depends only on the dot product, ε = xt θ. t+1 We can use the Woodbury lemma to evaluate the inverse: Ct+1 = Ct I + D(rt+1 , ε)(1 − D(rt+1 , ε)xt Ct xt+1 )−1 xt+1 xt Ct t+1 t+1 (8) where D(rt+1 , ε) = ∂ 2 log p(rt+1 |ε)/∂ε2 . Using some basic matrix identities, log det[Ct+1 ] = log det[Ct ] − log(1 − D(rt+1 , ε)xt Ct xt+1 ) t+1 = log det[Ct ] + D(rt+1 , ε)xt Ct xt+1 t+1 + o(D(rt+1 , ε)xt Ct xt+1 ) t+1 (9) (10) Ignoring the higher order terms, we need to minimize Ert+1 |xt+1 D(rt+1 , ε)xt Ct xt+1 . In our case, t+1 with f (θt xt+1 ) = exp(θt xt+1 ), we can use the moment-generating function of the multivariate Trial info. max. i.i.d 2 400 −10−4 0 0.05 −10−1 −2 ai 800 2 0 −2 −7 −10 i i.i.d k info. max. 1 1 50 i 1 50 i 1 10 1 i (a) i 100 0 −0.05 10 1 1 (b) i 10 (c) Figure 2: A comparison of parameter estimates using information-maximizing versus random stimuli for a model neuron whose conditional intensity depends on both the stimulus and the spike history. The images in the top row of A and B show the MAP estimate of θ after each trial as a row in the image. Intensity indicates the value of the coefficients. The true value of θ is shown in the second row of images. A) The estimated stimulus coefficients, k. B) The estimated spike history coefficients, a. C) The final estimates of the parameters after 800 trials: dashed black line shows true values, dark gray is estimate using information maximizing stimuli, and light gray is estimate using random stimuli. Using our algorithm improved the estimates of k and a. Gaussian p(θ|xt , rt ) to evaluate this expectation. After some algebra, we find that to maximize I(θ; rt+1 |xt+1 , xt , rt ), we need to maximize 1 F (xt+1 ) = exp(xT µt ) exp( xT Ct xt+1 )xT Ct xt+1 . t+1 t+1 2 t+1 (11) Computing the optimal stimulus. For the GLM the most informative stimulus is undefined, since increasing the stimulus power ||xt+1 ||2 increases the informativeness of any putatively “optimal” stimulus. To obtain a well-posed problem, we optimize the stimulus under the usual power constraint ||xt+1 ||2 ≤ e < ∞. We maximize Eqn. 11 under this constraint using Lagrange multipliers and an eigendecomposition to reduce our original d-dimensional optimization problem to a onedimensional problem. Expressing Eqn. 11 in terms of the eigenvectors of Ct yields: 1 2 2 F (xt+1 ) = exp( u i yi + ci yi ) ci yi (12) 2 i i i = g( 2 ci yi ) ui yi )h( i (13) i where ui and yi represent the projection of µt and xt+1 onto the ith eigenvector and ci is the corresponding eigenvalue. To simplify notation we also introduce the functions g() and h() which are monotonically strictly increasing functions implicitly defined by Eqn. 12. We maximize F (xt+1 ) by breaking the problem into an inner and outer problem by fixing the value of i ui yi and maximizing h() subject to that constraint. A single line search over all possible values of i ui yi will then find the global maximum of F (.). This approach is summarized by the equation: max F (y) = max g(b) · y:||y||2 =e b max y:||y||2 =e,y t u=b 2 ci yi ) h( i Since h() is increasing, to solve the inner problem we only need to solve: 2 ci yi max y:||y||2 =e,y t u=b (14) i This last expression is a quadratic function with quadratic and linear constraints and we can solve it using the Lagrange method for constrained optimization. The result is an explicit system of 1 true θ random info. max. info. max. no diffusion 1 0.8 0.6 trial 0.4 0.2 400 0 −0.2 −0.4 800 1 100 θi 1 θi 100 1 θi 100 1 θ i 100 −0.6 random info. max. θ true θ i 1 0 −1 Entropy θ i 1 0 −1 random info. max. 250 200 i 1 θ Trial 400 Trial 200 Trial 0 (a) 0 −1 20 40 (b) i 60 80 100 150 0 200 400 600 Iteration 800 (c) Figure 3: Estimating the receptive field when θ is not constant. A) The posterior means µt and true θt plotted after each trial. θ was 100 dimensional, with its components following a Gabor function. To simulate nonsystematic changes in the response function, the center of the Gabor function was moved according to a random walk in between trials. We modeled the changes in θ as a random walk with a white covariance matrix, Q, with variance .01. In addition to the results for random and information-maximizing stimuli, we also show the µt given stimuli chosen to maximize the information under the (mistaken) assumption that θ was constant. Each row of the images plots θ using intensity to indicate the value of the different components. B) Details of the posterior means µt on selected trials. C) Plots of the posterior entropies as a function of trial number; once again, we see that information-maximizing stimuli constrain the posterior of θt more effectively. equations for the optimal yi as a function of the Lagrange multiplier λ1 . ui e yi (λ1 ) = ||y||2 2(ci − λ1 ) (15) Thus to find the global optimum we simply vary λ1 (this is equivalent to performing a search over b), and compute the corresponding y(λ1 ). For each value of λ1 we compute F (y(λ1 )) and choose the stimulus y(λ1 ) which maximizes F (). It is possible to show (details omitted) that the maximum of F () must occur on the interval λ1 ≥ c0 , where c0 is the largest eigenvalue. This restriction on the optimal λ1 makes the implementation of the linesearch significantly faster and more stable. To summarize, updating the posterior and finding the optimal stimulus requires three steps: 1) a rankone matrix update and one-dimensional search to compute µt and Ct ; 2) an eigendecomposition of Ct ; 3) a one-dimensional search over λ1 ≥ c0 to compute the optimal stimulus. The most expensive step here is the eigendecomposition of Ct ; in principle this step is O(d3 ), while the other steps, as discussed above, are O(d2 ). Here our Gaussian approximation of p(θt−1 |xt−1 , rt−1 ) is once again quite useful: recall that in this setting Ct is just a rank-one modification of Ct−1 , and there exist efficient algorithms for rank-one eigendecomposition updates [15]. While the worst-case running time of this rank-one modification of the eigendecomposition is still O(d3 ), we found the average running time in our case to be O(d2 ) (Fig. 1(c)), due to deflation which reduces the cost of matrix multiplications associated with finding the eigenvectors of repeated eigenvalues. Therefore the total time complexity of our algorithm is empirically O(d2 ) on average. Spike history terms. The preceding derivation ignored the spike-history components of the GLM model; that is, we fixed a = 0 in equation (1). Incorporating spike history terms only affects the optimization step of our algorithm; updating the posterior of θ = {k; a} proceeds exactly as before. The derivation of the optimization strategy proceeds in a similar fashion and leads to an analogous optimization strategy, albeit with a few slight differences in detail which we omit due to space constraints. The main difference is that instead of maximizing the quadratic expression in Eqn. 14 to find the maximum of h(), we need to maximize a quadratic expression which includes a linear term due to the correlation between the stimulus coefficients, k, and the spike history coefficients,a. The results of our simulations with spike history terms are shown in Fig. 2. Dynamic θ. In addition to fast changes due to adaptation and spike-history effects, animal preparations often change slowly and nonsystematically over the course of an experiment [16]. We model these effects by letting θ experience diffusion: θt+1 = θt + wt (16) Here wt is a normally distributed random variable with mean zero and known covariance matrix Q. This means that p(θt+1 |xt , rt ) is Gaussian with mean µt and covariance Ct + Q. To update the posterior and choose the optimal stimulus, we use the same procedure as described above1 . Results Our first simulation considered the use of our algorithm for learning the receptive field of a visually sensitive neuron. We took the neuron’s receptive field to be a Gabor function, as a proxy model of a V1 simple cell. We generated synthetic responses by sampling Eqn. 1 with θ set to a 25x33 Gabor function. We used this synthetic data to compare how well θ could be estimated using information maximizing stimuli compared to using random stimuli. The stimuli were 2-d images which were rasterized in order to express x as a vector. The plots of the posterior means µt in Fig. 1 (recall these are equivalent to the MAP estimate of θ) show that the information maximizing strategy converges an order of magnitude more rapidly to the true θ. These results are supported by the conclusion of [7] that the information maximization strategy is asymptotically never worse than using random stimuli and is in general more efficient. The running time for each step of the algorithm as a function of the dimensionality of θ is plotted in Fig. 1(c). These results were obtained on a machine with a dual core Intel 2.80GHz XEON processor running Matlab. The solid lines indicate fitted polynomials of degree 1 for the 1d line search and degree 2 for the remaining curves; the total running time for each trial scaled as O(d2 ), as predicted. When θ was less than 200 dimensions, the total running time was roughly 50 ms (and for dim(θ) ≈ 100, the runtime was close to 15 ms), well within the range of tolerable latencies for many experiments. In Fig. 2 we apply our algorithm to characterize the receptive field of a neuron whose response depends on its past spiking. Here, the stimulus coefficients k were chosen to follow a sine-wave; 1 The one difference is that the covariance matrix of p(θt+1 |xt+1 , rt+1 ) is in general no longer just a rankone modification of the covariance matrix of p(θt |xt , rt ); thus, we cannot use the rank-one update to compute the eigendecomposition. However, it is often reasonable to take Q to be white, Q = cI; in this case the eigenvectors of Ct + Q are those of Ct and the eigenvalues are ci + c where ci is the ith eigenvalue of Ct ; thus in this case, our methods may be applied without modification. the spike history coefficients a were inhibitory and followed an exponential function. When choosing stimuli we updated the posterior for the full θ = {k; a} simultaneously and maximized the information about both the stimulus coefficients and the spike history coefficients. The information maximizing strategy outperformed random sampling for estimating both the spike history and stimulus coefficients. Our final set of results, Fig. 3, considers a neuron whose receptive field drifts non-systematically with time. We take the receptive field to be a Gabor function whose center moves according to a random walk (we have in mind a slow random drift of eye position during a visual experiment). The results demonstrate the feasibility of the information-maximization strategy in the presence of nonstationary response properties θ, and emphasize the superiority of adaptive methods in this context. Conclusion We have developed an efficient implementation of an algorithm for online optimization of neurophysiology experiments based on information-theoretic criterion. Reasonable approximations based on a GLM framework allow the algorithm to run in near-real time even for high dimensional parameter and stimulus spaces, and in the presence of spike-rate adaptation and time-varying neural response properties. Despite these approximations the algorithm consistently provides significant improvements over random sampling; indeed, the differences in efficiency are large enough that the information-optimization strategy may permit robust system identification in cases where it is simply not otherwise feasible to estimate the neuron’s parameters using random stimuli. Thus, in a sense, the proposed stimulus-optimization technique significantly extends the reach and power of classical neurophysiology methods. Acknowledgments JL is supported by the Computational Science Graduate Fellowship Program administered by the DOE under contract DE-FG02-97ER25308 and by the NSF IGERT Program in Hybrid Neural Microsystems at Georgia Tech via grant number DGE-0333411. LP is supported by grant EY018003 from the NEI and by a Gatsby Foundation Pilot Grant. We thank P. Latham for helpful conversations. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] I. Nelken, et al., Hearing Research 72, 237 (1994). P. Foldiak, Neurocomputing 38–40, 1217 (2001). K. Zhang, et al., Proceedings (Computational and Systems Neuroscience Meeting, 2004). R. C. deCharms, et al., Science 280, 1439 (1998). C. Machens, et al., Neuron 47, 447 (2005). A. Watson, et al., Perception and Psychophysics 33, 113 (1983). L. Paninski, Neural Computation 17, 1480 (2005). P. McCullagh, et al., Generalized linear models (Chapman and Hall, London, 1989). L. Paninski, Network: Computation in Neural Systems 15, 243 (2004). E. Simoncelli, et al., The Cognitive Neurosciences, M. Gazzaniga, ed. (MIT Press, 2004), third edn. P. Dayan, et al., Theoretical Neuroscience (MIT Press, 2001). E. Chichilnisky, Network: Computation in Neural Systems 12, 199 (2001). F. Theunissen, et al., Network: Computation in Neural Systems 12, 289 (2001). L. Paninski, et al., Journal of Neuroscience 24, 8551 (2004). M. Gu, et al., SIAM Journal on Matrix Analysis and Applications 15, 1266 (1994). N. A. Lesica, et al., IEEE Trans. On Neural Systems And Rehabilitation Engineering 13, 194 (2005).

5 0.52449429 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

6 0.52409041 65 nips-2006-Denoising and Dimension Reduction in Feature Space

7 0.52145928 35 nips-2006-Approximate inference using planar graph decomposition

8 0.5205217 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

9 0.51876873 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

10 0.51860869 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

11 0.51860291 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

12 0.51619107 75 nips-2006-Efficient sparse coding algorithms

13 0.51447874 167 nips-2006-Recursive ICA

14 0.51421911 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

15 0.51413298 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

16 0.51351571 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

17 0.51340508 20 nips-2006-Active learning for misspecified generalized linear models

18 0.51166552 158 nips-2006-PG-means: learning the number of clusters in data

19 0.51146179 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

20 0.51099437 117 nips-2006-Learning on Graph with Laplacian Regularization