nips nips2012 nips2012-285 knowledge-graph by maker-knowledge-mining

285 nips-2012-Query Complexity of Derivative-Free Optimization


Source: pdf

Author: Ben Recht, Kevin G. Jamieson, Robert Nowak

Abstract: This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. [sent-8, score-0.606]

2 However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. [sent-9, score-0.301]

3 A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. [sent-10, score-0.085]

4 This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. [sent-11, score-0.366]

5 We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same. [sent-12, score-0.328]

6 With training data or simulations one can evaluate the relative merit, or incurred loss, of different parameter settings, but it may be unclear how each parameter influences the overall objective function. [sent-14, score-0.051]

7 In such cases, derivatives of the objective function with respect to the parameters are unavailable. [sent-15, score-0.029]

8 When function evaluations are noiseless, DFO methods can achieve the same rates of convergence as noiseless gradient methods up to a small factor depending on a low-order polynomial of the dimension [9, 5, 10]. [sent-17, score-0.443]

9 This leads one to wonder if the same equivalence can be extended to the case when function evaluations and gradients are noisy. [sent-18, score-0.286]

10 We show that when function evaluations are noisy, the optip mization error of any DFO is ⌦( 1/T ), where T is the number of evaluations. [sent-20, score-0.224]

11 This lower bound holds even for strongly convex functions. [sent-21, score-0.241]

12 In contrast, noisy gradient methods exhibit ⇥(1/T ) error scaling for strongly convex functions [9, 11]. [sent-22, score-0.264]

13 A consequence of our theory is that finite differencing cannot achieve the rates of gradient methods when the function evaluations are noisy. [sent-23, score-0.336]

14 On the positive side, we also present a new derivative-free algorithm that achieves this lower bound with near optimal dimension dependence. [sent-24, score-0.14]

15 Moreover, the algorithm uses only boolean comparisons of function values, not actual function values. [sent-25, score-0.25]

16 This makes the algorithm applicable to situations in which the optimization is only able to probably correctly decide if the value of one configuration is better than the value of another. [sent-26, score-0.111]

17 This is especially interesting in optimization based on human subject feedback, where paired comparisons are often used instead of numerical scoring. [sent-27, score-0.338]

18 The convergence rate of the new algorithm is optimal in terms of T and near-optimal in terms of its dependence on the ambient dimension. [sent-28, score-0.134]

19 Surprisingly, our lower bounds show that this new algorithm that uses only function comparisons achieves the same rate in terms of T as any algorithm that has access to function evaluations. [sent-29, score-0.451]

20 1 2 Problem formulation and background We now formalize the notation and conventions for our analysis of DFO. [sent-30, score-0.043]

21 A function f is strongly convex with constant ⌧ on a convex set B ⇢ Rd if there exists a constant ⌧ > 0 such that ⌧ f (y) f (x) + hrf (x), y xi + ||x y||2 2 for all x, y 2 B. [sent-31, score-0.286]

22 The gradient of f , if it exists, denoted rf , is Lipschitz with constant L if ||rf (x) rf (y)||  L||x y|| for some L > 0. [sent-32, score-0.246]

23 The class of strongly convex functions with Lipschitz gradients defined on a nonempty, convex set B ⇢ Rn which take their minimum in B with parameters ⌧ and L is denoted by F⌧,L,B . [sent-33, score-0.298]

24 The problem we consider is minimizing a function f 2 F⌧,L,B . [sent-34, score-0.029]

25 An optimization procedure may only query the function in one of the following two ways. [sent-36, score-0.141]

26 Function Evaluation Oracle: For any point x 2 B an optimization procedure can observe Ef (x) = f (x) + w where w 2 R is a random variable with E[w] = 0 and E[w2 ] = 2 . [sent-37, score-0.057]

27 Function Comparison Oracle: For any pair of points x, y 2 B an optimization procedure can observe a binary random variable Cf (x, y) satisfying 1 + min 0 , µ|f (y) f (x)| 1 (1) 2 for some 0 < 0  1/2, µ > 0 and  1. [sent-38, score-0.057]

28 Note  = 1 implies that the comparison oracle is correct with a probability that is greater than 1/2 and independent of x, y. [sent-40, score-0.499]

29 If  > 1, then the oracle’s reliability decreases as the difference between f (x) and f (y) decreases. [sent-41, score-0.029]

30 P (Cf (x, y) = sign{f (y) f (x)}) To illustrate how the function comparison oracle and function evaluation oracles relate to each other, suppose Cf (x, y) = sign{Ef (y) Ef (x)} where Ef (x) is a function evaluation oracle with additive noise w. [sent-42, score-1.309]

31 In fact, this choice of w corresponds to Thurston’s law of comparative judgment which is a popular model for outcomes of pairwise comparisons from human subjects [12]. [sent-44, score-0.406]

32 If w is a “spikier” distribution such as a two-sided Gamma distribution with shape parameter in the range of (0, 1] then all values of  2 (1, 2] can be realized (see supplementary materials). [sent-45, score-0.03]

33 Interest in the function comparison oracle is motivated by certain popular derivative-free optimization procedures that use only comparisons of function evaluations (e. [sent-46, score-0.932]

34 [7]) and by optimization problems involving human subjects making paired comparisons (for instance, getting fitted for prescription lenses or a hearing aid where unknown parameters specific to each person are tuned with the familiar queries “better or worse? [sent-48, score-0.761]

35 Pairwise comparisons have also been suggested as a novel way to tune web-search algorithms [13]. [sent-50, score-0.187]

36 Pairwise comparison strategies have previously been analyzed in the finite setting where the task is to identify the best alternative among a finite set of alternatives (sometimes referred to as the dueling-bandit problem) [13, 14]. [sent-51, score-0.105]

37 The function comparison oracle presented in this work and its analysis are novel. [sent-52, score-0.528]

38 While there are known theoretical results for DFO in the noiseless setting [15, 5, 10], to the best of our knowledge we are the first to characterize lower bounds for DFO in the stochastic setting. [sent-55, score-0.307]

39 Moreover, we believe we are the first to show a novel upper bound for stochastic DFO using a function comparison oracle (which also applies to the function evaluation oracle). [sent-56, score-0.764]

40 However, there are algorithms with upper bounds on the rates of convergence for stochastic 2 DFO with the function evaluation oracle [15, 16]. [sent-57, score-0.788]

41 We discuss the relevant results in the next section following the lower bounds . [sent-58, score-0.158]

42 While there remains many open problems in stochastic DFO (see Section 6), rates of convergence with a stochastic gradient oracle are well known and were first lower bounded by Nemirovski and Yudin [15]. [sent-59, score-0.738]

43 These classic results were recently tightened to show a dependence on the dimension of the problem [17]. [sent-60, score-0.155]

44 And then tightened again to show a better dependence on the noise [11] which matches the upper bound achieved by stochastic gradient descent [9]. [sent-61, score-0.382]

45 The aim of this work is to start filling in the knowledge gaps of stochastic DFO so that it is as well understood as the stochastic gradient oracle. [sent-62, score-0.214]

46 Our bounds are based on simple techniques borrowed from the statistical learning literature that use natural functions and oracles in the same spirit of [11]. [sent-63, score-0.281]

47 3 Main results The results below are presented with simplifying constants that encompass many factors to aid in exposition. [sent-64, score-0.166]

48 Explicit constants are given in the proofs in Sections 4 and 5. [sent-65, score-0.043]

49 The expectation in the bounds is with respect to the noise in the oracle f queries and (possible) optimization algorithm randomization. [sent-67, score-0.717]

50 1 Query complexity of the function comparison oracle Theorem 1. [sent-69, score-0.528]

51 For every f 2 F⌧,L,B let Cf be a function comparison oracle with parameters (, µ, 0 ). [sent-70, score-0.528]

52 The constants c1 , c2 , c3 depend the oracle and function class parameters, as well as the geometry of B, but are independent of T and n. [sent-72, score-0.49]

53 For upper bounds we propose a specific algorithm based on coordinate-descent in Section 5 and prove the following theorem for the case of unconstrained optimization, that is, B = Rn . [sent-73, score-0.134]

54 For every f 2 F⌧,L,B with B = Rn let Cf be a function comparison oracle with parameters (, µ, 0 ). [sent-75, score-0.528]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dfo', 0.688), ('oracle', 0.418), ('comparisons', 0.161), ('evaluations', 0.157), ('cf', 0.146), ('oracles', 0.14), ('bounds', 0.103), ('madison', 0.103), ('queries', 0.096), ('rf', 0.094), ('tightened', 0.093), ('wisconsin', 0.092), ('noiseless', 0.088), ('strongly', 0.087), ('lipschitz', 0.084), ('comparison', 0.081), ('paired', 0.074), ('convex', 0.073), ('unavoidable', 0.071), ('nowak', 0.066), ('subjects', 0.066), ('gradients', 0.065), ('evaluation', 0.061), ('stochastic', 0.061), ('aid', 0.06), ('gradient', 0.058), ('optimization', 0.057), ('query', 0.055), ('materials', 0.055), ('lower', 0.055), ('situations', 0.054), ('access', 0.047), ('yudin', 0.047), ('differencing', 0.047), ('exposing', 0.047), ('noisy', 0.046), ('wi', 0.046), ('human', 0.046), ('rates', 0.045), ('constants', 0.043), ('conventions', 0.043), ('prescription', 0.043), ('brecht', 0.043), ('judgment', 0.043), ('noise', 0.043), ('pairwise', 0.042), ('hearing', 0.04), ('merit', 0.04), ('convergence', 0.04), ('usa', 0.04), ('borrowed', 0.038), ('mization', 0.038), ('dependence', 0.036), ('rn', 0.036), ('wonder', 0.035), ('nonempty', 0.035), ('encompass', 0.035), ('matches', 0.034), ('gaps', 0.034), ('lling', 0.034), ('involving', 0.033), ('near', 0.033), ('recht', 0.032), ('sign', 0.032), ('upper', 0.031), ('nemirovski', 0.031), ('familiar', 0.031), ('mum', 0.031), ('ambient', 0.031), ('derivative', 0.031), ('free', 0.031), ('boolean', 0.031), ('uences', 0.03), ('realized', 0.03), ('function', 0.029), ('reliability', 0.029), ('getting', 0.029), ('presence', 0.029), ('benjamin', 0.028), ('applies', 0.028), ('simplifying', 0.028), ('wider', 0.028), ('supremum', 0.028), ('rate', 0.027), ('kevin', 0.027), ('merely', 0.027), ('distinctive', 0.027), ('feedback', 0.027), ('bound', 0.026), ('tune', 0.026), ('incurred', 0.026), ('dimension', 0.026), ('person', 0.025), ('unclear', 0.025), ('tted', 0.025), ('outcomes', 0.024), ('proves', 0.024), ('exists', 0.024), ('alternatives', 0.024), ('comparative', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 285 nips-2012-Query Complexity of Derivative-Free Optimization

Author: Ben Recht, Kevin G. Jamieson, Robert Nowak

Abstract: This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same. 1

2 0.14680368 160 nips-2012-Imitation Learning by Coaching

Author: He He, Jason Eisner, Hal Daume

Abstract: Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle’s ability and the learner’s policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to cost-sensitive dynamic feature selection, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic feature selection and two static feature selection methods. 1

3 0.1125403 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

Author: Andre Wibisono, Martin J. Wainwright, Michael I. Jordan, John C. Duchi

Abstract: We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that √ gradient estimates based on random perturbations suffer a factor use of at most d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problemdependent quantities: they cannot be improved by more than constant factors. 1

4 0.10955525 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

Author: Emile Richard, Stephane Gaiffas, Nicolas Vayatis

Abstract: In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efficiently using proximal methods through a generalized forward-backward agorithm. 1

5 0.083637282 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

Author: Jiarong Jiang, Adam Teichert, Jason Eisner, Hal Daume

Abstract: Users want inference to be both fast and accurate, but quality often comes at the cost of speed. The field has experimented with approximate inference algorithms that make different speed-accuracy tradeoffs (for particular problems and datasets). We aim to explore this space automatically, focusing here on the case of agenda-based syntactic parsing [12]. Unfortunately, off-the-shelf reinforcement learning techniques fail to learn good policies: the state space is simply too large to explore naively. An attempt to counteract this by applying imitation learning algorithms also fails: the “teacher” follows a far better policy than anything in our learner’s policy space, free of the speed-accuracy tradeoff that arises when oracle information is unavailable, and thus largely insensitive to the known reward functfion. We propose a hybrid reinforcement/apprenticeship learning algorithm that learns to speed up an initial policy, trading off accuracy for speed according to various settings of a speed term in the loss function. 1

6 0.076939426 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

7 0.074838892 324 nips-2012-Stochastic Gradient Descent with Only One Projection

8 0.067598626 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

9 0.062034734 165 nips-2012-Iterative ranking from pair-wise comparisons

10 0.061185591 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

11 0.058522906 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

12 0.058319796 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

13 0.057206981 204 nips-2012-MAP Inference in Chains using Column Generation

14 0.055913553 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

15 0.055824 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

16 0.054128833 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

17 0.053882524 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

18 0.0503335 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

19 0.048710745 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

20 0.047690611 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.129), (1, -0.026), (2, 0.059), (3, -0.026), (4, 0.057), (5, 0.048), (6, -0.005), (7, 0.026), (8, -0.0), (9, 0.007), (10, -0.013), (11, 0.061), (12, -0.078), (13, 0.02), (14, -0.003), (15, 0.062), (16, -0.034), (17, -0.045), (18, 0.044), (19, 0.058), (20, -0.022), (21, -0.018), (22, 0.019), (23, -0.033), (24, -0.023), (25, -0.034), (26, -0.041), (27, 0.069), (28, 0.035), (29, -0.089), (30, 0.087), (31, 0.066), (32, -0.01), (33, -0.071), (34, -0.012), (35, -0.013), (36, -0.006), (37, -0.037), (38, 0.105), (39, -0.026), (40, -0.006), (41, 0.001), (42, -0.035), (43, 0.047), (44, 0.041), (45, -0.043), (46, -0.076), (47, 0.096), (48, 0.096), (49, -0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91436291 285 nips-2012-Query Complexity of Derivative-Free Optimization

Author: Ben Recht, Kevin G. Jamieson, Robert Nowak

Abstract: This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same. 1

2 0.62595546 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

Author: Andre Wibisono, Martin J. Wainwright, Michael I. Jordan, John C. Duchi

Abstract: We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that √ gradient estimates based on random perturbations suffer a factor use of at most d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problemdependent quantities: they cannot be improved by more than constant factors. 1

3 0.59724343 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Author: Xi Chen, Qihang Lin, Javier Pena

Abstract: This paper considers a wide spectrum of regularized stochastic optimization problems where both the loss function and regularizer can be non-smooth. We develop a novel algorithm based on the regularized dual averaging (RDA) method, that can simultaneously achieve the optimal convergence rates for both convex and strongly convex loss. In particular, for strongly convex loss, it achieves the opti1 1 mal rate of O( N + N 2 ) for N iterations, which improves the rate O( log N ) for preN vious regularized dual averaging algorithms. In addition, our method constructs the final solution directly from the proximal mapping instead of averaging of all previous iterates. For widely used sparsity-inducing regularizers (e.g., 1 -norm), it has the advantage of encouraging sparser solutions. We further develop a multistage extension using the proposed algorithm as a subroutine, which achieves the 1 uniformly-optimal rate O( N + exp{−N }) for strongly convex loss. 1

4 0.56220818 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

Author: Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

Abstract: We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse. Previous approaches are able to exploit only one of these two structures, yielding a O(d/T ) convergence rate for strongly convex objectives in d dimensions and O( s(log d)/T ) convergence rate when the optimum is s-sparse. Our algorithm is based on successively solving a series of ℓ1 -regularized optimization problems using Nesterov’s dual averaging algorithm. We establish that the error of our solution after T iterations is at most O(s(log d)/T ), with natural extensions to approximate sparsity. Our results apply to locally Lipschitz losses including the logistic, exponential, hinge and least-squares losses. By recourse to statistical minimax results, we show that our convergence rates are optimal up to constants. The effectiveness of our approach is also confirmed in numerical simulations where we compare to several baselines on a least-squares regression problem.

5 0.55738252 160 nips-2012-Imitation Learning by Coaching

Author: He He, Jason Eisner, Hal Daume

Abstract: Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle’s ability and the learner’s policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to cost-sensitive dynamic feature selection, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic feature selection and two static feature selection methods. 1

6 0.55153334 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization

7 0.5302915 217 nips-2012-Mixability in Statistical Learning

8 0.51862699 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

9 0.50591153 275 nips-2012-Privacy Aware Learning

10 0.49709141 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

11 0.49269316 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

12 0.49065107 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

13 0.47564653 142 nips-2012-Generalization Bounds for Domain Adaptation

14 0.47217557 324 nips-2012-Stochastic Gradient Descent with Only One Projection

15 0.46491396 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

16 0.45623353 30 nips-2012-Accuracy at the Top

17 0.41898185 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

18 0.41672963 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

19 0.40001866 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

20 0.39954871 361 nips-2012-Volume Regularization for Binary Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.038), (34, 0.191), (38, 0.214), (42, 0.054), (54, 0.018), (55, 0.013), (74, 0.056), (76, 0.172), (80, 0.097), (92, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95340854 322 nips-2012-Spiking and saturating dendrites differentially expand single neuron computation capacity

Author: Romain Cazé, Mark Humphries, Boris S. Gutkin

Abstract: The integration of excitatory inputs in dendrites is non-linear: multiple excitatory inputs can produce a local depolarization departing from the arithmetic sum of each input’s response taken separately. If this depolarization is bigger than the arithmetic sum, the dendrite is spiking; if the depolarization is smaller, the dendrite is saturating. Decomposing a dendritic tree into independent dendritic spiking units greatly extends its computational capacity, as the neuron then maps onto a two layer neural network, enabling it to compute linearly non-separable Boolean functions (lnBFs). How can these lnBFs be implemented by dendritic architectures in practise? And can saturating dendrites equally expand computational capacity? To address these questions we use a binary neuron model and Boolean algebra. First, we confirm that spiking dendrites enable a neuron to compute lnBFs using an architecture based on the disjunctive normal form (DNF). Second, we prove that saturating dendrites as well as spiking dendrites enable a neuron to compute lnBFs using an architecture based on the conjunctive normal form (CNF). Contrary to a DNF-based architecture, in a CNF-based architecture, dendritic unit tunings do not imply the neuron tuning, as has been observed experimentally. Third, we show that one cannot use a DNF-based architecture with saturating dendrites. Consequently, we show that an important family of lnBFs implemented with a CNF-architecture can require an exponential number of saturating dendritic units, whereas the same family implemented with either a DNF-architecture or a CNF-architecture always require a linear number of spiking dendritic units. This minimization could explain why a neuron spends energetic resources to make its dendrites spike. 1

same-paper 2 0.88726807 285 nips-2012-Query Complexity of Derivative-Free Optimization

Author: Ben Recht, Kevin G. Jamieson, Robert Nowak

Abstract: This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same. 1

3 0.83894664 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

Author: Azadeh Khaleghi, Daniil Ryabko

Abstract: The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical results are complemented with experimental evaluations. 1

4 0.83799201 179 nips-2012-Learning Manifolds with K-Means and K-Flats

Author: Guillermo Canas, Tomaso Poggio, Lorenzo Rosasco

Abstract: We study the problem of estimating a manifold from random samples. In particular, we consider piecewise constant and piecewise linear estimators induced by k-means and k-flats, and analyze their performance. We extend previous results for k-means in two separate directions. First, we provide new results for k-means reconstruction on manifolds and, secondly, we prove reconstruction bounds for higher-order approximation (k-flats), for which no known results were previously available. While the results for k-means are novel, some of the technical tools are well-established in the literature. In the case of k-flats, both the results and the mathematical tools are new. 1

5 0.83728188 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

Author: Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

Abstract: We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse. Previous approaches are able to exploit only one of these two structures, yielding a O(d/T ) convergence rate for strongly convex objectives in d dimensions and O( s(log d)/T ) convergence rate when the optimum is s-sparse. Our algorithm is based on successively solving a series of ℓ1 -regularized optimization problems using Nesterov’s dual averaging algorithm. We establish that the error of our solution after T iterations is at most O(s(log d)/T ), with natural extensions to approximate sparsity. Our results apply to locally Lipschitz losses including the logistic, exponential, hinge and least-squares losses. By recourse to statistical minimax results, we show that our convergence rates are optimal up to constants. The effectiveness of our approach is also confirmed in numerical simulations where we compare to several baselines on a least-squares regression problem.

6 0.83722794 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

7 0.83695304 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

8 0.83691669 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

9 0.83587223 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

10 0.83575767 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.8357141 187 nips-2012-Learning curves for multi-task Gaussian process regression

12 0.83528739 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

13 0.83520234 361 nips-2012-Volume Regularization for Binary Classification

14 0.83494645 16 nips-2012-A Polynomial-time Form of Robust Regression

15 0.83480793 358 nips-2012-Value Pursuit Iteration

16 0.83458823 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

17 0.83418602 86 nips-2012-Convex Multi-view Subspace Learning

18 0.83359039 206 nips-2012-Majorization for CRFs and Latent Likelihoods

19 0.833368 34 nips-2012-Active Learning of Multi-Index Function Models

20 0.83325869 275 nips-2012-Privacy Aware Learning