nips nips2012 nips2012-340 knowledge-graph by maker-knowledge-mining

340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition


Source: pdf

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The representer theorem for Hilbert spaces: a necessary and sufficient condition Francesco Dinuzzo and Bernhard Sch¨ lkopf o Max Planck Institute for Intelligent Systems Spemannstrasse 38,72076 T¨ bingen u Germany [fdinuzzo@tuebingen. [sent-1, score-0.808]

2 de] Abstract The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. [sent-5, score-0.941]

3 A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. [sent-6, score-1.739]

4 A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. [sent-7, score-2.337]

5 In this paper, we extend such result by weakening the assumptions on the regularization term. [sent-8, score-0.208]

6 In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. [sent-9, score-1.793]

7 In this paper, we focus on regularization problems defined over a real Hilbert space H. [sent-11, score-0.231]

8 A Hilbert space is a vector space endowed with a inner product and a norm that is complete1 . [sent-12, score-0.026]

9 The focus of our study is the general problem of minimizing an extended real-valued regularization functional J : H → R ∪ {+∞} of the form J(w) = f (L1 w, . [sent-14, score-0.31]

10 The functional J is the sum of an error term f , which typically depends on empirical data, and a regularization term Ω that enforces certain desirable properties on the solution. [sent-21, score-0.368]

11 By allowing the error term f to take the value +∞, problems with hard constraints on the values Li w (for instance, interpolation problems) are included in the framework. [sent-22, score-0.077]

12 Moreover, by allowing Ω to take the value +∞, regularization problems of the Ivanov type are also taken into account. [sent-23, score-0.284]

13 1 It is easy to see that this setting is a particular case of (1), where the dependence on the data pairs (xi , yi ) can be absorbed into the definition of f , and Li are point-wise evaluation functionals, i. [sent-26, score-0.055]

14 Several popular techniques can be cast in such regularization framework. [sent-29, score-0.208]

15 Given binary labels yi = ±1, the SVM classifier (without bias) can be interpreted as a regularization method corresponding to the choice c ((x1 , y1 , w(x1 )), · · · , (x , y , w(x )) = γ max{0, 1 − yi w(xi )}, i=1 and Ω(w) = w 2 . [sent-34, score-0.318]

16 Kernel PCA can be shown to be equivalent to a regularization problem where c ((x1 , y1 , w(x1 )), · · · , (x , y , w(x )) = 1 0, i=1 w(xi ) − +∞, otherwise 2 1 j=1 w(xj ) =1 , and Ω is any strictly monotonically increasing function of the norm w , see [4]. [sent-37, score-0.234]

17 In this problem, there are no labels yi , but the feature extractor function w is constrained to produce vectors with unitary empirical variance. [sent-38, score-0.09]

18 The possibility of choosing general continuous linear functionals Li in (1) allows to consider a much broader class of regularization problems. [sent-39, score-0.668]

19 Given a “input signal” u : X → R, assume that the convolution u ∗ w is well-defined for any w ∈ H, and the point-wise evaluated convolution functionals Li w = (u ∗ w)(xi ) = u(s)w(xi − s)ds, X are continuous. [sent-42, score-0.491]

20 A possible way to recover w from noisy measurements yi of the “output signal” is to solve regularization problems such as min w∈H 2 γ (yi − (u ∗ w)(xi )) + w 2 , i=1 where the objective functional is of the form (1). [sent-43, score-0.388]

21 X Then, given output labels yi , one can learn a input-output relationship by solving regularization problems of the form min c ((y1 , EP1 (w)), · · · , (y , EP (w)) + w 2 . [sent-47, score-0.286]

22 w∈H If the expectations are bounded linear functionals, such regularization functional is of the form (1). [sent-48, score-0.372]

23 By allowing the regularization term Ω to take the value +∞, we can also take into account the whole class of Ivanov-type regularization problems of the form min f (L1 w, . [sent-50, score-0.551]

24 , L w), subject to φ(w) ≤ 1, w∈H by reformulating them as the minimization of a functional of the type (1), where Ω(w) = 0, φ(w) ≤ 1 . [sent-53, score-0.13]

25 1 The representer theorem Let’s now go back to the general formulation (1). [sent-55, score-0.676]

26 By the Riesz representation theorem [5, 6], J can be rewritten as J(w) = f ( w, w1 , . [sent-56, score-0.099]

27 , w, w ) + Ω(w), where wi is the representer of the linear functional Li with respect to the inner product. [sent-59, score-0.737]

28 A family F of regularization functionals of the form (1) is said to admit a linear representer theorem if, for any J ∈ F, and any choice of bounded linear functionals Li , there exists a minimizer w∗ that can be written as a linear combination of the representers: w∗ = ci wi . [sent-62, score-2.122]

29 i=1 If a linear representer theorem holds, the regularization problem under study can be reduced to a -dimensional optimization problem on the scalar coefficients ci , independently of the dimension of H. [sent-63, score-0.911]

30 Sufficient conditions under which a family of functionals admits a representer theorem have been widely studied in the literature of statistics, inverse problems, and machine learning. [sent-65, score-1.255]

31 The theorem also provides the foundations of learning techniques such as regularized kernel methods and support vector machines, see [7, 8, 9] and references therein. [sent-66, score-0.156]

32 Representer theorems are of particular interest when H is a reproducing kernel Hilbert space (RKHS) [10]. [sent-67, score-0.207]

33 Given a non-empty set X , a RKHS is a space of functions w : X → R such that point-wise evaluation functionals are bounded, namely, for any x ∈ X , there exists a non-negative real number Cx such that |w(x)| ≤ Cx w , ∀w ∈ H. [sent-68, score-0.433]

34 It can be shown that a RKHS can be uniquely associated to a positive-semidefinite kernel function K : X × X → R (called reproducing kernel), such that the so-called reproducing property holds: w(x) = w, Kx , ∀ (x, w) ∈ X × H, where the kernel sections Kx are defined as Kx (y) = K(x, y), ∀y ∈ X . [sent-69, score-0.397]

35 The reproducing property states that the representers of point-wise evaluation functionals coincide with the kernel sections. [sent-70, score-0.683]

36 Starting from the reproducing property, it is also easy to show that the representer of any bounded linear functional L is given by a function KL ∈ H such that KL (x) = LKx , ∀x ∈ X . [sent-71, score-0.853]

37 Therefore, in a RKHS, the representer of any bounded linear functional can be obtained explicitly in terms of the reproducing kernel. [sent-72, score-0.853]

38 If the regularization functional (1) admits minimizers, and the regularization term Ω is a nondecreasing function of the norm, i. [sent-73, score-0.837]

39 Ω(w) = h( w ), with h : R → R ∪ {+∞}, nondecreasing, (2) the linear representer theorem follows easily from the Pythagorean identity. [sent-75, score-0.703]

40 A proof that the condition (2) is sufficient appeared in [11] in the case where H is a RKHS and Li are point-wise evaluation functionals. [sent-76, score-0.046]

41 Earlier instances of representer theorems can be found in [12, 13, 14]. [sent-77, score-0.615]

42 More recently, the question of whether condition (2) is also necessary for the existence of linear representer theorems has been investigated [15]. [sent-78, score-0.786]

43 In particular, [15] shows that, if Ω is differentiable (and certain technical existence conditions hold), then (2) is a necessary and sufficient condition for certain classes of regularization functionals to admit a representer theorem. [sent-79, score-1.524]

44 In the following, we indeed show that (2) is necessary and sufficient for the family of regularization functionals of the form (1) to admit a linear representer theorem, by merely assuming that Ω is lower semicontinuous and satisfies basic conditions for the existence of minimizers. [sent-81, score-1.81]

45 The proof is based on a characterization of radial nondecreasing functions defined on a Hilbert space. [sent-82, score-0.357]

46 3 2 A characterization of radial nondecreasing functions In this section, we present a characterization of radial nondecreasing functions defined over Hilbert spaces. [sent-83, score-0.714]

47 The following Theorem provides a geometric characterization of radial nondecreasing functions defined on a Hilbert space that generalizes the analogous result of [15] for differentiable functions. [sent-88, score-0.397]

48 Let H denote a Hilbert space such that dim H ≥ 2, and Ω : H → R ∪ {+∞} a lower semicontinuous function. [sent-90, score-0.324]

49 Then, (2) holds if and only if Ω(x + y) ≥ max{Ω(x), Ω(y)}, ∀x, y ∈ H : x, y = 0. [sent-91, score-0.025]

50 Since dim H ≥ 2, by fixing a generic vector x ∈ X \ {0} and a number λ ∈ [0, 1], there exists a vector y such that y = 1 and λ = 1 − cos2 θ, where x, y . [sent-96, score-0.034]

51 Since the last inequality trivially holds also when x = 0, we conclude that Ω(x) ≥ Ω(λx), ∀x ∈ H, ∀λ ∈ [0, 1], (4) so that Ω is nondecreasing along all the rays passing through the origin. [sent-98, score-0.272]

52 Now, for any c ≥ Ω(0), consider the sublevel sets Sc = {x ∈ H : Ω(x) ≤ c} . [sent-100, score-0.024]

53 In addition, since Ω is lower semicontinuous, Sc is also closed. [sent-102, score-0.028]

54 We now show that Sc is either a closed ball centered at the origin, or the whole space. [sent-103, score-0.084]

55 To this end, we show that, for any x ∈ Sc , the whole ball B = {y ∈ H : y ≤ x }, is contained in Sc . [sent-104, score-0.084]

56 First, take any y ∈ int(B) \ span{x}, where int denotes the interior. [sent-105, score-0.077]

57 Then, y has norm strictly less than x , that is 0< y < x , and is not aligned with x, i. [sent-106, score-0.064]

58 See Figure 1 for a geometrical illustration of the sequence xk . [sent-111, score-0.232]

59 By orthogonality, we have xk+1 2 = xk 2 + a2 = xk k 2 θ n 1 + tan2 = y 2 1 + tan2 θ n k+1 . [sent-112, score-0.34]

60 (5) In addition, the angle between xk+1 and xk is given by ak xk θk = arctan = θ , n so that the total angle between y and xn is given by n−1 θk = θ. [sent-113, score-0.585]

61 k=0 Since all the points xk belong to the subspace spanned by x and y, and the angle between x and xn is zero, we have that xn is positively aligned with x, that is xn = λx, λ ≥ 0. [sent-114, score-0.47]

62 Indeed, from (5) we have λ2 = xn x 2 = 2 y x 1 + tan2 and it can be verified that θ n n , n θ = 1, n→+∞ n therefore λ ≤ 1 for a sufficiently large n. [sent-116, score-0.055]

63 Now, write the difference vector in the form lim 1 + tan2 n−1 λx − y = (xk+1 − xk ), k=0 and observe that xk+1 − xk , xk = 0. [sent-117, score-0.51]

64 Since Sc is closed and the closure of int(B) \ span{x} is the whole ball B, every point y ∈ B is also included in Sc . [sent-119, score-0.084]

65 This proves that Sc is either a closed ball centered at the origin, or the whole space H. [sent-120, score-0.084]

66 5 y x Figure 1: The sequence xk constructed in the proof of Theorem 1 is associated with a geometrical construction known as spiral of Theodorus. [sent-122, score-0.256]

67 Starting from any y in the interior of the ball (excluding points aligned with x), a point of the type λx (with 0 ≤ λ ≤ 1) can be reached by using a finite number of right triangles. [sent-123, score-0.117]

68 3 Representer theorem: a necessary and sufficient condition In this section, we prove that condition (2) is necessary and sufficient for suitable families of regularization functionals of the type (1) to admit a linear representer theorem. [sent-124, score-1.573]

69 Let F denote a family of functionals J : H → R ∪ {+∞} of the form (1) that admit minimizers, and assume that F contains a set of functionals of the form γ Jp (w) = γf ( w, p ) + Ω (w) , ∀p ∈ H, ∀γ ∈ R+ , (6) where f (z) is uniquely minimized at z = 1. [sent-127, score-1.154]

70 Then, for any lower semicontinuous Ω, the family F admits a linear representer theorem if and only if (2) holds. [sent-128, score-1.139]

71 The first part of the theorem (sufficiency) follows from an orthogonality argument. [sent-130, score-0.127]

72 Any minimizer w∗ of J can be uniquely decomposed as w∗ = u + v, u ∈ R, v ∈ R⊥ . [sent-136, score-0.107]

73 Now, let’s prove the second part of the theorem (necessity). [sent-138, score-0.099]

74 First of all, observe that the functional γ J0 (w) = γf (0) + Ω(w), γ obtained by setting p = 0 in (6), belongs to F. [sent-139, score-0.102]

75 In addition, by the representer theorem, the only admissible minimizer of J0 is the origin, that is Ω(y) ≥ Ω(0), ∀y ∈ H. [sent-141, score-0.625]

76 x 2 p= γ By the representer theorem, the functional Jp of the form (6) admits a minimizer of the type w = λ(γ)x. [sent-143, score-0.846]

77 By using the fact that f (z) is minimized at z = 1, and the linear representer theorem, we have γ γ γf (1) + Ω (λ(γ)x) ≤ γf (λ(γ)) + Ω (λ(γ)x) = Jp (λ(γ)x) ≤ Jp (x + y) = γf (1) + Ω (x + y) . [sent-145, score-0.656]

78 In the first case, we trivially have Ω (x + y) ≥ Ω(x). [sent-148, score-0.024]

79 (9) Let γk denote a sequence such that limk→+∞ γk = +∞, and consider the sequence ak = γk (f (λ(γk )) − f (1)) . [sent-150, score-0.118]

80 Since z = 1 is the only minimizer of f (z), the sequence ak can remain bounded only if lim λ(γk ) = 1. [sent-152, score-0.177]

81 k→+∞ By taking the limit inferior in (8) for γ → +∞, and using the fact that Ω is lower semicontinuous, we obtain condition (3). [sent-153, score-0.074]

82 The second part of Theorem 2 states that any lower semicontinuous regularization term Ω has to be of the form (2) in order for the family F to admit a linear representer theorem. [sent-155, score-1.308]

83 Observe that Ω is not required to be differentiable or even continuous. [sent-156, score-0.04]

84 Moreover, it needs not to have bounded lower level sets. [sent-157, score-0.063]

85 For the necessary condition to hold, the family F has to be broad enough to contain at least a set of regularization functionals of the form (6). [sent-158, score-0.818]

86 The following examples show how to apply the necessary condition of Theorem 2 to classes of regularization problems with standard loss functions. [sent-159, score-0.32]

87 • Let L : R2 → R ∪ {+∞} denote any loss function of the type L(y, z) = L(y − z), such that L(t) is uniquely minimized at t = 0. [sent-160, score-0.139]

88 Then, for any lower semicontinuous regularation term Ω, the family of regularization functionals of the form J(w) = γ L (yi , w, wi ) + Ω(w), i=1 admits a linear representer theorem if and only if (2) holds. [sent-161, score-1.84]

89 To see that the hypotheses of Theorem 2 are satisfied, it is sufficient to consider the subset of functionals with = 1, y1 = 1, and w1 = p ∈ H. [sent-162, score-0.457]

90 These functionals can be written in the form (6) with f (z) = L(1, z). [sent-163, score-0.433]

91 • The class of regularization problems with the hinge (SVM) loss of the form J(w) = γ max{0, 1 − yi w, wi } + Ω(w), i=1 with Ω lower semicontinuous, admits a linear representer theorem if and only if Ω satisfy (2). [sent-164, score-1.139]

92 For instance, by choosing = 2, and (y1 , w1 ) = (1, p), (y2 , w2 ) = (−1, p/2), we obtain regularization functionals of the form (6) with f (z) = max{0, 1 − z} + max{0, 1 + z/2}, and it is easy to verify that f is uniquely minimized at z = 1. [sent-165, score-0.752]

93 7 4 Conclusions Sufficiently broad families of regularization functionals defined over a Hilbert space with lower semicontinuous regularization term admit a linear representer theorem if and only if the regularization term is a radial nondecreasing function. [sent-166, score-2.564]

94 As a concluding remark, it is important to observe that other types of regularization terms are possible if the representer theorem is only required to hold for a restricted subset of the data functionals. [sent-168, score-0.884]

95 Exploring necessary conditions for the existence of representer theorems under different types of restrictions on the data functionals is an interesting future research direction. [sent-169, score-1.146]

96 Nonlinear component analysis as a kernel eigeno u value problem. [sent-195, score-0.057]

97 Sur une esp` ce de g´ om´ trie analytique des syst` mes de fonctions sommables. [sent-199, score-0.109]

98 e e e e Comptes rendus de l’Acad´ mie des sciences Paris, 144:1409–1411, 1907. [sent-200, score-0.139]

99 Sur les ensembles de fonctions et les op´ rations lin´ aires. [sent-203, score-0.09]

100 Comptes rendus de e e e l’Acad´ mie des sciences Paris, 144:1414–1416, 1907. [sent-204, score-0.139]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('representer', 0.577), ('functionals', 0.433), ('semicontinuous', 0.262), ('regularization', 0.208), ('nondecreasing', 0.199), ('sc', 0.175), ('xk', 0.17), ('hilbert', 0.13), ('admit', 0.122), ('reproducing', 0.112), ('radial', 0.102), ('functional', 0.102), ('theorem', 0.099), ('rkhs', 0.092), ('admits', 0.091), ('representers', 0.081), ('jp', 0.072), ('ak', 0.07), ('kx', 0.066), ('uk', 0.064), ('uniquely', 0.059), ('li', 0.057), ('kernel', 0.057), ('characterization', 0.056), ('xn', 0.055), ('yi', 0.055), ('family', 0.055), ('existence', 0.055), ('ivanov', 0.054), ('span', 0.053), ('int', 0.052), ('minimizers', 0.052), ('minimized', 0.052), ('ball', 0.051), ('des', 0.051), ('minimizer', 0.048), ('angle', 0.048), ('condition', 0.046), ('comptes', 0.044), ('mie', 0.044), ('rendus', 0.044), ('necessary', 0.043), ('lkopf', 0.043), ('differentiable', 0.04), ('theorems', 0.038), ('aligned', 0.038), ('geometrical', 0.038), ('tikhonov', 0.036), ('fonctions', 0.036), ('sur', 0.036), ('origin', 0.036), ('bounded', 0.035), ('sch', 0.035), ('unitary', 0.035), ('spline', 0.035), ('acad', 0.034), ('dim', 0.034), ('suf', 0.033), ('broad', 0.033), ('whole', 0.033), ('cx', 0.032), ('wi', 0.031), ('svm', 0.031), ('term', 0.029), ('convolution', 0.029), ('argyriou', 0.029), ('type', 0.028), ('lower', 0.028), ('orthogonality', 0.028), ('les', 0.027), ('linear', 0.027), ('subspace', 0.026), ('norm', 0.026), ('xi', 0.025), ('take', 0.025), ('holds', 0.025), ('paris', 0.025), ('sequence', 0.024), ('trivially', 0.024), ('rays', 0.024), ('syst', 0.024), ('om', 0.024), ('arctan', 0.024), ('sublevel', 0.024), ('dpi', 0.024), ('fdinuzzo', 0.024), ('spiral', 0.024), ('winston', 0.024), ('riesz', 0.024), ('esp', 0.024), ('hypotheses', 0.024), ('spanned', 0.023), ('problems', 0.023), ('orthogonal', 0.023), ('machines', 0.022), ('ceedings', 0.022), ('une', 0.022), ('epi', 0.022), ('deconvolution', 0.022), ('dinuzzo', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

2 0.15180667 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.12184626 188 nips-2012-Learning from Distributions via Support Measure Machines

Author: Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, Bernhard Schölkopf

Abstract: This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework. 1

4 0.12104792 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

5 0.10059207 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

6 0.096456505 16 nips-2012-A Polynomial-time Form of Robust Regression

7 0.091748193 231 nips-2012-Multiple Operator-valued Kernel Learning

8 0.09158282 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

9 0.089856915 27 nips-2012-A quasi-Newton proximal splitting method

10 0.076277807 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

11 0.073837101 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

12 0.07075423 227 nips-2012-Multiclass Learning with Simplex Coding

13 0.06747853 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

14 0.060304966 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

15 0.058341879 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

16 0.054501504 247 nips-2012-Nonparametric Reduced Rank Regression

17 0.053994555 167 nips-2012-Kernel Hyperalignment

18 0.053351123 361 nips-2012-Volume Regularization for Binary Classification

19 0.049608536 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

20 0.047471397 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.136), (1, 0.032), (2, 0.089), (3, -0.062), (4, 0.099), (5, 0.066), (6, 0.015), (7, 0.016), (8, 0.1), (9, -0.016), (10, -0.052), (11, 0.059), (12, 0.094), (13, -0.059), (14, -0.022), (15, -0.137), (16, 0.022), (17, 0.036), (18, 0.014), (19, -0.007), (20, -0.074), (21, -0.021), (22, 0.009), (23, -0.068), (24, 0.037), (25, 0.074), (26, -0.013), (27, -0.044), (28, -0.022), (29, 0.006), (30, -0.021), (31, -0.047), (32, -0.031), (33, 0.122), (34, 0.064), (35, 0.018), (36, -0.003), (37, -0.079), (38, 0.01), (39, -0.017), (40, 0.025), (41, -0.017), (42, 0.054), (43, -0.006), (44, -0.023), (45, -0.009), (46, 0.057), (47, -0.003), (48, -0.019), (49, -0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94265139 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

2 0.68793678 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.63572985 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

4 0.62840384 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

5 0.62358016 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

6 0.6002298 231 nips-2012-Multiple Operator-valued Kernel Learning

7 0.59856272 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

8 0.59610683 188 nips-2012-Learning from Distributions via Support Measure Machines

9 0.58680081 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

10 0.55193347 167 nips-2012-Kernel Hyperalignment

11 0.53195447 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

12 0.52850503 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

13 0.50484347 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

14 0.50110316 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

15 0.47373965 16 nips-2012-A Polynomial-time Form of Robust Regression

16 0.46670589 34 nips-2012-Active Learning of Multi-Index Function Models

17 0.45409444 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

18 0.4381403 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

19 0.42420232 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

20 0.41974187 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.016), (21, 0.034), (38, 0.107), (39, 0.011), (42, 0.014), (54, 0.017), (55, 0.478), (74, 0.023), (76, 0.121), (80, 0.059), (92, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81276405 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

2 0.8110348 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

3 0.80844557 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz

Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1

4 0.80056852 155 nips-2012-Human memory search as a random walk in a semantic network

Author: Joseph L. Austerweil, Joshua T. Abbott, Thomas L. Griffiths

Abstract: The human mind has a remarkable ability to store a vast amount of information in memory, and an even more remarkable ability to retrieve these experiences when needed. Understanding the representations and algorithms that underlie human memory search could potentially be useful in other information retrieval settings, including internet search. Psychological studies have revealed clear regularities in how people search their memory, with clusters of semantically related items tending to be retrieved together. These findings have recently been taken as evidence that human memory search is similar to animals foraging for food in patchy environments, with people making a rational decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that the results that were taken as evidence for this account also emerge from a random walk on a semantic network, much like the random web surfer model used in internet search engines. This offers a simpler and more unified account of how people search their memory, postulating a single process rather than one process for exploring a cluster and one process for switching between clusters. 1

5 0.73224741 211 nips-2012-Meta-Gaussian Information Bottleneck

Author: Melanie Rey, Volker Roth

Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1

6 0.64793456 215 nips-2012-Minimizing Uncertainty in Pipelines

7 0.63935602 95 nips-2012-Density-Difference Estimation

8 0.50521481 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

9 0.49904141 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

10 0.49693769 231 nips-2012-Multiple Operator-valued Kernel Learning

11 0.49480566 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

12 0.48730129 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

13 0.48329461 193 nips-2012-Learning to Align from Scratch

14 0.47411859 298 nips-2012-Scalable Inference of Overlapping Communities

15 0.4625192 188 nips-2012-Learning from Distributions via Support Measure Machines

16 0.46177021 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines

17 0.45788154 210 nips-2012-Memorability of Image Regions

18 0.45699769 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

19 0.45042834 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

20 0.44599819 319 nips-2012-Sparse Prediction with the $k$-Support Norm