nips nips2013 nips2013-311 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin
Abstract: In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. We first examine a two stages exploration-exploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. This method attains a suboptimal convergence rate even under strong assumption on the objectives. Our second approach is an efficient primal-dual stochastic algorithm. It leverages on the theory of Lagrangian method √ conin strained optimization and attains the optimal convergence rate of O(1/ T ) in high probability for general Lipschitz continuous objectives.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. [sent-6, score-0.512]
2 We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. [sent-7, score-1.121]
3 We first examine a two stages exploration-exploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. [sent-8, score-0.833]
4 This method attains a suboptimal convergence rate even under strong assumption on the objectives. [sent-9, score-0.196]
5 Our second approach is an efficient primal-dual stochastic algorithm. [sent-10, score-0.176]
6 It leverages on the theory of Lagrangian method √ conin strained optimization and attains the optimal convergence rate of O(1/ T ) in high probability for general Lipschitz continuous objectives. [sent-11, score-0.248]
7 Unlike multiple objective optimization where we have access to the complete objective functions, in stochastic multiple objective optimization, only stochastic samples of objective functions are available for optimization. [sent-13, score-1.303]
8 Compared to the standard setup of stochastic optimization, the fundamental challenge of stochastic multiple objective optimization is how to make appropriate tradeoff between different objectives given that we only have access to stochastic oracles for different objectives. [sent-14, score-1.17]
9 In particular, an algorithm for this setting has to ponder conflicting objective functions and accommodate the uncertainty in the objectives. [sent-15, score-0.23]
10 A simple approach toward stochastic multiple objective optimization is to linearly combine multiple objectives with a fixed weight assigned to each objective. [sent-16, score-0.858]
11 It converts stochastic multiple objective optimization into a standard stochastic optimization problem, and is guaranteed to produce Pareto efficient solutions. [sent-17, score-0.767]
12 The main difficulty with this approach is how to decide an appropriate weight for each objective, which is particularly challenging when the complete objective functions are unavailable. [sent-18, score-0.279]
13 In this work, we consider an alternative formulation that casts multiple objective optimization into a constrained optimization problem. [sent-19, score-0.459]
14 More specifically, we choose one of the objectives as the target to be optimized, and use the rest of the objectives as constraints in order to ensure that each of these objectives is below a specified level. [sent-20, score-0.932]
15 Our assumption is that although full objective functions are unknown, their desirable levels can be provied due to the prior knowledge of the domain. [sent-21, score-0.292]
16 Below, we provide a few examples that demonstrate the application of stochastic multiple objective optimization in the form of stochastic constrained optimization. [sent-22, score-0.696]
17 The investor needs to consider conflicting objectives such as rate of return, liquidity and risk in maximizing his wealth [2]. [sent-26, score-0.428]
18 Let hypothesis class be a parametrized convex set W = {w → ⟨w, x⟩ : w ∈ Rd , ∥w∥ ≤ R} and for all (x, y) ∈ Ξ ≡ Rd × {−1, +1} the loss function ℓ : W × Ξ → R+ be a non-negative convex function. [sent-33, score-0.179]
19 In many applications in economics, most notably in welfare and utility theory, and management parameters are known only stochastically and it is unreasonable to assume that the objective functions and the solution domain are deterministically fixed. [sent-36, score-0.41]
20 In this paper, we first examine two methods that try to eliminate the multi-objective aspect or the stochastic nature of stochastic multiple objective optimization and reduce the problem to a standard convex optimization problem. [sent-39, score-0.943]
21 We show that both methods fail to tackle the problem of stochastic multiple objective optimization in general and require strong assumptions on the stochastic objectives, which limits their applications to real world problems. [sent-40, score-0.688]
22 We achieve this by √ an efficient primal-dual stochastic gradient descent method that is able to attain an O(1/ T ) convergence rate for all the objectives under the standard assumption of the Lipschitz continuity of objectives which is known to be optimal (see for instance [3]). [sent-42, score-1.037]
23 We note that there is a flurry of research on heuristics-based methods to address the multi-objective stochastic optimization problem (see e. [sent-43, score-0.273]
24 Finally, we would like to distinguish our work from robust optimization [5] and online learning with long term constraint [13]. [sent-47, score-0.25]
25 Robust optimization was designed to deal with uncertainty within the optimization systems. [sent-48, score-0.194]
26 Although it provides a principled framework for dealing with stochastic constraints, it often ends up with non-convex optimization problems that are not computationally tractable. [sent-49, score-0.273]
27 Instead of requiring the constraints to be satisfied by every solution generated by online learning, it allows the constraints to be satisfied by the entire sequence of solutions. [sent-51, score-0.259]
28 However, unlike stochastic multiple objective optimization, in online learning with long term constraints, constraint functions are fixed and known before the start of online learning. [sent-52, score-0.665]
29 Section 4 presents our efficient primal-dual stochastic optimization algorithm. [sent-57, score-0.273]
30 Statement of the Problem In this work, we generalize online stochastic convex optimization to the case of multiple objectives. [sent-65, score-0.471]
31 In particular, at each iteration, the learner is asked to present a 2 solution wt , which will be evaluated by multiple loss functions ft0 (w), ft1 (w), . [sent-66, score-0.967]
32 A fundamental difference between single- and multi-objective optimization is that for the latter it is not obvious how to evaluate the optimization quality. [sent-70, score-0.194]
33 Since it is impossible to simultaneously minimize multiple loss functions and in order to avoid complications caused by handling more than one objective, we choose one function as the objective and try to bound other objectives by appropriate thresholds. [sent-71, score-0.668]
34 Specifically, the goal of OCO with multiple objectives becomes to minimize ∑T 0 t=1 ft (wt ) and at the same time keep the other objective functions below a given threshold, i. [sent-72, score-0.612]
35 , wT are the solutions generated by the online learner and γi specifies the level of loss that is acceptable to the ith objective function. [sent-77, score-0.302]
36 , full adversarial setup) is challenging for online convex optimization even with two objectives [14], in this work, we consider a simple scenario where all the loss functions fti (w), i ∈ [m] are i. [sent-80, score-0.901]
37 We also note that our goal is NOT to find a Pareto efficient solution (a solution is Pareto efficient if it is not dominated by any solution in the decision space). [sent-83, score-0.237]
38 Instead, we aim to find a solution that (i) optimizes one selected objective, and (ii) satisfies all the other objectives with respect to the specified thresholds. [sent-84, score-0.364]
39 , m the expected loss function of sampled function fti (w). [sent-88, score-0.354]
40 In stochastic multiple objective optimization, we assume that we do not have direct access to the expected loss functions and the only information available to the solver is through a stochastic oracle that returns a stochastic realization of the expected loss function at each call. [sent-89, score-0.953]
41 Our goal is to efficiently compute a solution wT after T trials that (i) obeys all the constraints, i. [sent-96, score-0.164]
42 ¯ ¯ f i (wT ) ≤ γi , i ∈ [m] and (ii) minimizes the objective f 0 with respect to the optimal solution w∗ , ¯ ¯0 (wT ) − f 0 (w∗ ). [sent-98, score-0.276]
43 objective function, and to ft (w) and f Before discussing the algorithms, we first mention a few assumptions made in our analysis. [sent-102, score-0.2]
44 We also make the standard assumption that all the loss functions, including both the objective function and constraint functions, are Lipschitz continuous, i. [sent-104, score-0.325]
45 , |fti (w) − fti (w′ )| ≤ L∥w − w′ ∥ for any w, w′ ∈ B. [sent-106, score-0.275]
46 3 Problem Reduction and its Limitations Here we examine two algorithms to cope with the complexity of stochastic optimization with multiple objectives and discuss some negative results which motivate the primal-dual algorithm presented in Section 4. [sent-107, score-0.639]
47 The first method transforms a stochastic multi-objective problem into a stochastic single-objective optimization problem and then solves the latter problem by any stochastic programming approach. [sent-108, score-0.649]
48 Alternatively, one can eliminate the randomness of the problem by estimating the stochastic objectives and transform the problem into a deterministic multi-objective problem. [sent-109, score-0.535]
49 Although this naive idea considerably facilitates the computational challenge of the problem, unfortunately, it is difficult to decide the weight for each objective, such that the specified levels for different objectives are obeyed. [sent-113, score-0.355]
50 Beyond the hardness of optimally determining the weight of individual functions, it is also unclear how to bound the sub-optimality of final solution for individual objective functions. [sent-114, score-0.314]
51 Our naive approach is to replace ¯ the expected constraint function f i (w) with its empirical estimation based on sampled objective functions. [sent-118, score-0.289]
52 This approach circumvents the problem of stochastically optimizing multiple objective 3 into the original online convex optimization with complex projections, and therefore can be solved by projected gradient descent. [sent-119, score-0.531]
53 More specifically, at trial t, given the current solution wt and received loss functions fti (w), i = 0, 1, . [sent-120, score-1.21]
54 One problem with the above approach is that although it is feasible to satisfy all the constraints based on the true expected constraint functions, there is no guarantee that the approximate domain Wt is not empty. [sent-124, score-0.199]
55 One way to address this issue is to estimate the expected constraint functions by burning the first bT trials, where b ∈ (0, 1) is a constant that needs to be adjusted to obtain the optimal performance, and keep the estimated constraint functions unchanged afterwards. [sent-125, score-0.359]
56 To ensure the correctness of the above approach, we need to establish some kind of uniform (strong) convergence assumption to make sure that the solutions obtained by projection onto the estimated domain W ′ will be close to the true domain W with high probability. [sent-133, score-0.335]
57 Using the estimated ˆ domain W ′ , for trial t ∈ [bT + 1, T ], we update the solution by wt+1 = ΠW ′ (wt − η ft0 (wt )). [sent-142, score-0.183]
58 Since the first bT trials are used for estimating the constraint functions, only the last (1−b)T trials are used for searching for the optimal solution. [sent-144, score-0.211]
59 The total amount of violation of individual constraint functions for the last (1 − b)T trials, ∑T ¯ given by t=bT +1 f i (wt ), is O((1 − b)b−q T 1−q ), where each of the (1 − b)T trials receives a −q violation of O([bT ] ). [sent-145, score-0.412]
60 Using the same trick as in [13], to obtain −q 1−q a + √solution with zero violation of constraints, we will have a regret bound O((1 − b)b T −1/2 −q (1 − b)T ), which yields a convergence rate of O(T + T ) which could be worse than the optimal rate O(T −1/2 ) when q < 1/2. [sent-147, score-0.319]
61 This is in contrast to the typical assumption of online learning where only the solution is memorized. [sent-149, score-0.209]
62 We finally remark on the uniform convergence assumption, which holds when the constraint functions are linear [25], but unfortunately does not hold for general convex Lipschitz functions. [sent-151, score-0.305]
63 In particular, one can simply show examples where there is no uniform convergence for stochastic convex Lipchitz functions in infinite dimensional spaces [21]. [sent-152, score-0.397]
64 Without uniform convergence assumption, the approximate domain W ′ may depart from the true W significantly at some unknown point, which makes the above approach to fail for general convex objectives. [sent-153, score-0.211]
65 We note that our result is closely related to the recent studies of learning from the viewpoint of optimization [23], which state that solutions found by stochastic gradient descent can be statistically consistent even when uniform convergence theorem does not hold. [sent-155, score-0.464]
66 , T do 4: Submit the solution wt 5: Receive loss functions fti , i = 0, 1, . [sent-159, score-1.183]
67 , m 6: Compute the gradients fti (wt ), i = 0, 1, . [sent-162, score-0.303]
68 , m 7: Update the solution w and λ by ]) ( [ m ∑ wt+1 = ΠB (wt − η w Lt (wt , λt )) = ΠB wt − η ft0 (wt ) + λi fti (wt ) , t i=1 ( λi = Π[0,λi ] λi + η t t+1 0 8: end for ∑T ˆ 9: Return wT = t=1 wt /T 4 λi ( [ ]) ) Lt (wt , λt ) = Π[0,λi ] λi + η fti (wt ) − γi . [sent-165, score-2.077]
69 We show that with a high probability, the solution found by the √ proposed algorithm will exactly satisfy the expected constraints and achieves a regret bound of O( T ). [sent-167, score-0.233]
70 The main idea of the proposed algorithm is to design an appro¯ ¯ priate objective that combines the loss function f 0 (w) with f i (w), i ∈ [m]. [sent-168, score-0.199]
71 As mentioned before, owing to the presence of conflicting goals and the randomness nature of the objective functions, we resort to seek for a solution that satisfies all the objectives instead of an optimal one. [sent-169, score-0.624]
72 To this end, we define the following objective function m ∑ ¯0 (w) + ¯ ¯ L(w, λ) = f λi (f i (w) − γi ). [sent-170, score-0.162]
73 i=1 Note that the objective function consists of both the primal variable w ∈ W and dual variable λ = (λ1 , . [sent-171, score-0.276]
74 By exploring convex-concave optimization theory [16], we will show √ that with a high probability, the solution of regret O( T ) exactly obeyes the constraints. [sent-176, score-0.22]
75 Since at each iteration, we only observed a randomly sampled loss functions fti (w), i = 0, 1, . [sent-180, score-0.401]
76 , m, the objective function given by m ∑ Lt (w, λ) = ft0 (w) + λi (fti (w) − γi ) i=1 ¯ provides an unbiased estimate of L(w, λ). [sent-183, score-0.162]
77 Given the approximate objective Lt (w, λ), the proposed algorithm tries to minimize the objective Lt (w, λ) with respect to the primal variable w and maximize the objective with respect to the dual variable λ. [sent-184, score-0.6]
78 , λm )⊤ as the optimal primal and dual solutions to the above ∗ ∗ convex-concave optimization problem, respectively, i. [sent-192, score-0.281]
79 We later show that this assumption holds under a mild condition on the objective functions. [sent-196, score-0.224]
80 Under the preceding assumption, in the following theorem, we show that under appropriate conditions, the average solution wT generated by of Algorithm 1 attains a convergence rate of √ O(1/ T ) for both the regret and the violation of the constraints. [sent-201, score-0.351]
81 The parameter θ ∈ R+ is a quantity that may be set to obtain sharper upper bound on the violation of constraints and may be chosen arbitrarily. [sent-207, score-0.201]
82 In particular, a larger value for θ imposes larger penalty on the violation of the constraints and results in a smaller violation for the objectives. [sent-208, score-0.28]
83 ¯ f T T δ To bound the violation of constraints we set w = w∗ , λi = λi , i ∈ [m], and λj = λj , j ̸= i in (6). [sent-227, score-0.201]
84 w∈B λ≥0 1≤i≤m (7) Evidently w∗ is the optimal primal solution to (7). [sent-239, score-0.163]
85 Let λa be the optimal dual solution to the problem in (7). [sent-240, score-0.179]
86 We have the following proposition that links λi , i ∈ [m], the optimal dual solution to (2), ∗ with λa , the optimal dual solution to (7). [sent-241, score-0.383]
87 Let∑ be the optimal dual solution to (7) and λi , i ∈ [m] be the optimal solution to λa ∗ m (2). [sent-243, score-0.293]
88 The purpose of introducing this assumption is to ensure that the optimal dual variable is well bounded from the above. [sent-250, score-0.183]
89 ∗ τ Finally, we note that by having λ∗ bounded, Assumption 2 is guaranteed by setting G2 = ( )2 ∑m ∑m ¯ max(L2 1 + i=1 λi , max i=1 (f i (w) − γi )2 ) which follows from Lipschitz continuity of 0 w∈B the objective functions. [sent-257, score-0.232]
90 5 Conclusions and Open Questions In this paper we have addressed the problem of stochastic convex optimization with multiple objectives underlying many applications in machine learning. [sent-259, score-0.688]
91 We first examined a simple problem reduction technique that eliminates the stochastic aspect of constraint functions by approximating them using the sampled functions from each iteration. [sent-260, score-0.424]
92 Then, we presented a novel efficient primal-dual algorithm which attains √ the optimal convergence rate O(1/ T ) for all the objectives relying only on the Lipschitz continuity of the objective functions. [sent-264, score-0.648]
93 In particular, it would be interesting to see whether or not making stronger assumptions on the analytical properties of objective functions such as smoothness or strong convexity may yield improved convergence rate. [sent-266, score-0.299]
94 Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. [sent-293, score-0.247]
95 Non-asymptotic analysis of stochastic approximation algorithms for machine learning. [sent-298, score-0.176]
96 Stochastic approach versus multia n objective approach for obtaining efficient solutions in stochastic multiobjective programming problems. [sent-322, score-0.484]
97 Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. [sent-331, score-0.255]
98 Trading regret for efficiency: online convex optimization with long term constraints. [sent-353, score-0.28]
99 Making gradient descent optimal for strongly convex stochastic optimization. [sent-386, score-0.332]
100 Convergence analysis of sample average approximation methods for a class of stochastic mathematical programs with equality constraints. [sent-427, score-0.176]
wordName wordTfidf (topN-words)
[('wt', 0.724), ('objectives', 0.285), ('fti', 0.275), ('lt', 0.216), ('stochastic', 0.176), ('objective', 0.162), ('violation', 0.112), ('bt', 0.106), ('optimization', 0.097), ('solution', 0.079), ('convex', 0.071), ('icting', 0.068), ('pareto', 0.068), ('functions', 0.068), ('online', 0.068), ('multiobjective', 0.067), ('dual', 0.065), ('operational', 0.065), ('lipschitz', 0.064), ('constraint', 0.064), ('assumption', 0.062), ('multiple', 0.059), ('domain', 0.058), ('constraints', 0.056), ('trials', 0.056), ('portfolio', 0.053), ('investor', 0.051), ('convergence', 0.051), ('continuity', 0.05), ('primal', 0.049), ('jin', 0.047), ('mahdavi', 0.047), ('ln', 0.045), ('scalarization', 0.045), ('regret', 0.044), ('attains', 0.043), ('randomness', 0.042), ('ft', 0.038), ('inequality', 0.038), ('loss', 0.037), ('solutions', 0.035), ('optimal', 0.035), ('bound', 0.033), ('eliminate', 0.032), ('uniform', 0.031), ('wealth', 0.03), ('decide', 0.029), ('obeys', 0.029), ('gradient', 0.029), ('gradients', 0.028), ('trial', 0.027), ('aspect', 0.027), ('michigan', 0.026), ('constrained', 0.026), ('proposition', 0.025), ('utilizes', 0.025), ('rd', 0.025), ('min', 0.024), ('replaced', 0.024), ('programming', 0.024), ('european', 0.024), ('try', 0.024), ('theorem', 0.024), ('stochastically', 0.023), ('rate', 0.022), ('projected', 0.022), ('shamir', 0.022), ('examine', 0.022), ('ensure', 0.021), ('attain', 0.021), ('sampled', 0.021), ('naive', 0.021), ('goals', 0.021), ('expected', 0.021), ('robust', 0.021), ('descent', 0.021), ('weight', 0.02), ('optimally', 0.02), ('access', 0.02), ('remark', 0.02), ('subject', 0.02), ('max', 0.02), ('risk', 0.02), ('liquidity', 0.02), ('lipchitz', 0.02), ('oz', 0.02), ('multia', 0.02), ('welfare', 0.02), ('burning', 0.02), ('oco', 0.02), ('volkovs', 0.02), ('rongjin', 0.02), ('risky', 0.02), ('tyang', 0.02), ('return', 0.02), ('updating', 0.019), ('estimated', 0.019), ('setup', 0.019), ('strong', 0.018), ('casts', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin
Abstract: In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. We first examine a two stages exploration-exploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. This method attains a suboptimal convergence rate even under strong assumption on the objectives. Our second approach is an efficient primal-dual stochastic algorithm. It leverages on the theory of Lagrangian method √ conin strained optimization and attains the optimal convergence rate of O(1/ T ) in high probability for general Lipschitz continuous objectives.
2 0.57537115 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
Author: Lijun Zhang, Mehrdad Mahdavi, Rong Jin
Abstract: For smooth and strongly convex optimizations, the optimal iteration complexity of √ the gradient-based algorithm is O( κ log 1/ǫ), where κ is the condition number. In the case that the optimization problem is ill-conditioned, we need to evaluate a large number of full gradients, which could be computationally expensive. In this paper, we propose to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use a combination of the full and stochastic gradients to update the intermediate solution. Theoretical analysis shows that EMGD is able to find an ǫ-optimal solution by computing O(log 1/ǫ) full gradients and O(κ2 log 1/ǫ) stochastic gradients. 1
3 0.32249624 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
Author: Abhradeep Guha Thakurta, Adam Smith
Abstract: We give differentially private algorithms for a large class of online learning algorithms, in both the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T , of the optimal nonprivate regret bounds up to logarithmic factors in T . Our algorithms require logarithmic space and update time. 1
4 0.28819865 193 nips-2013-Mixed Optimization for Smooth Functions
Author: Mehrdad Mahdavi, Lijun Zhang, Rong Jin
Abstract: It is well known that the optimal convergence rate for stochastic optimization of √ smooth functions is O(1/ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T 2 ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ). 1
5 0.2077072 89 nips-2013-Dimension-Free Exponentiated Gradient
Author: Francesco Orabona
Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1
6 0.17329165 26 nips-2013-Adaptive Market Making via Online Learning
7 0.15763012 233 nips-2013-Online Robust PCA via Stochastic Optimization
8 0.1555744 230 nips-2013-Online Learning with Costly Features and Labels
9 0.15415654 20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
10 0.15344207 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
11 0.10047433 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
12 0.098217651 325 nips-2013-The Pareto Regret Frontier
13 0.096153095 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
14 0.090936579 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
15 0.0780508 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
16 0.0769272 188 nips-2013-Memory Limited, Streaming PCA
17 0.075358428 318 nips-2013-Structured Learning via Logistic Regression
18 0.074926063 7 nips-2013-A Gang of Bandits
19 0.07384146 232 nips-2013-Online PCA for Contaminated Data
20 0.073410951 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
topicId topicWeight
[(0, 0.2), (1, -0.058), (2, 0.241), (3, -0.168), (4, 0.02), (5, 0.003), (6, -0.236), (7, 0.209), (8, 0.306), (9, 0.152), (10, -0.048), (11, 0.161), (12, 0.097), (13, -0.206), (14, -0.03), (15, 0.16), (16, -0.097), (17, 0.066), (18, 0.013), (19, -0.012), (20, 0.02), (21, 0.01), (22, -0.066), (23, -0.039), (24, 0.082), (25, 0.1), (26, -0.092), (27, 0.003), (28, -0.107), (29, -0.02), (30, 0.095), (31, -0.048), (32, 0.025), (33, 0.058), (34, -0.01), (35, 0.012), (36, 0.072), (37, -0.02), (38, -0.069), (39, -0.003), (40, -0.002), (41, 0.111), (42, -0.025), (43, -0.081), (44, 0.027), (45, -0.022), (46, 0.018), (47, -0.02), (48, -0.003), (49, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.97118908 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin
Abstract: In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. We first examine a two stages exploration-exploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. This method attains a suboptimal convergence rate even under strong assumption on the objectives. Our second approach is an efficient primal-dual stochastic algorithm. It leverages on the theory of Lagrangian method √ conin strained optimization and attains the optimal convergence rate of O(1/ T ) in high probability for general Lipschitz continuous objectives.
2 0.95142186 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
Author: Lijun Zhang, Mehrdad Mahdavi, Rong Jin
Abstract: For smooth and strongly convex optimizations, the optimal iteration complexity of √ the gradient-based algorithm is O( κ log 1/ǫ), where κ is the condition number. In the case that the optimization problem is ill-conditioned, we need to evaluate a large number of full gradients, which could be computationally expensive. In this paper, we propose to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use a combination of the full and stochastic gradients to update the intermediate solution. Theoretical analysis shows that EMGD is able to find an ǫ-optimal solution by computing O(log 1/ǫ) full gradients and O(κ2 log 1/ǫ) stochastic gradients. 1
3 0.84952992 193 nips-2013-Mixed Optimization for Smooth Functions
Author: Mehrdad Mahdavi, Lijun Zhang, Rong Jin
Abstract: It is well known that the optimal convergence rate for stochastic optimization of √ smooth functions is O(1/ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T 2 ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ). 1
4 0.75460839 20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Author: Rie Johnson, Tong Zhang
Abstract: Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG). However, our analysis is significantly simpler and more intuitive. Moreover, unlike SDCA or SAG, our method does not require the storage of gradients, and thus is more easily applicable to complex problems such as some structured prediction problems and neural network learning. 1
5 0.66945851 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
Author: Abhradeep Guha Thakurta, Adam Smith
Abstract: We give differentially private algorithms for a large class of online learning algorithms, in both the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T , of the optimal nonprivate regret bounds up to logarithmic factors in T . Our algorithms require logarithmic space and update time. 1
6 0.6180498 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
7 0.59188426 26 nips-2013-Adaptive Market Making via Online Learning
8 0.50804394 89 nips-2013-Dimension-Free Exponentiated Gradient
9 0.44879887 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
10 0.43067634 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent
11 0.39699662 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization
12 0.37206757 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
13 0.37098199 230 nips-2013-Online Learning with Costly Features and Labels
14 0.34755781 314 nips-2013-Stochastic Optimization of PCA with Capped MSG
15 0.33060792 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
16 0.32966179 233 nips-2013-Online Robust PCA via Stochastic Optimization
17 0.32494166 111 nips-2013-Estimation, Optimization, and Parallelism when Data is Sparse
18 0.31653324 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
19 0.29205245 325 nips-2013-The Pareto Regret Frontier
20 0.28959945 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
topicId topicWeight
[(2, 0.017), (16, 0.047), (33, 0.193), (34, 0.07), (41, 0.103), (49, 0.027), (56, 0.121), (70, 0.031), (85, 0.067), (89, 0.034), (93, 0.037), (95, 0.025), (96, 0.136)]
simIndex simValue paperId paperTitle
same-paper 1 0.89755327 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin
Abstract: In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. We first examine a two stages exploration-exploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. This method attains a suboptimal convergence rate even under strong assumption on the objectives. Our second approach is an efficient primal-dual stochastic algorithm. It leverages on the theory of Lagrangian method √ conin strained optimization and attains the optimal convergence rate of O(1/ T ) in high probability for general Lipschitz continuous objectives.
2 0.88722217 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
Author: Ulrike Von Luxburg, Morteza Alamgir
Abstract: Consider an unweighted k-nearest neighbor graph on n points that have been sampled i.i.d. from some unknown density p on Rd . We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or any distance or similarity scores. The key insights are that local differences in link numbers can be used to estimate a local function of the gradient of p, and that integrating this function along shortest paths leads to an estimate of the underlying density. 1
3 0.88270473 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?
Author: Cheston Tan, Jedediah M. Singer, Thomas Serre, David Sheinberg, Tomaso Poggio
Abstract: The macaque Superior Temporal Sulcus (STS) is a brain area that receives and integrates inputs from both the ventral and dorsal visual processing streams (thought to specialize in form and motion processing respectively). For the processing of articulated actions, prior work has shown that even a small population of STS neurons contains sufficient information for the decoding of actor invariant to action, action invariant to actor, as well as the specific conjunction of actor and action. This paper addresses two questions. First, what are the invariance properties of individual neural representations (rather than the population representation) in STS? Second, what are the neural encoding mechanisms that can produce such individual neural representations from streams of pixel images? We find that a simple model, one that simply computes a linear weighted sum of ventral and dorsal responses to short action “snippets”, produces surprisingly good fits to the neural data. Interestingly, even using inputs from a single stream, both actor-invariance and action-invariance can be accounted for, by having different linear weights. 1
Author: Arash Amini, Xuanlong Nguyen
Abstract: We propose a general formalism of iterated random functions with semigroup property, under which exact and approximate Bayesian posterior updates can be viewed as specific instances. A convergence theory for iterated random functions is presented. As an application of the general theory we analyze convergence behaviors of exact and approximate message-passing algorithms that arise in a sequential change point detection problem formulated via a latent variable directed graphical model. The sequential inference algorithm and its supporting theory are illustrated by simulated examples.
5 0.8720907 11 nips-2013-A New Convex Relaxation for Tensor Completion
Author: Bernardino Romera-Paredes, Massimiliano Pontil
Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1
6 0.86492205 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
7 0.86417079 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
8 0.8580783 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
9 0.85490113 193 nips-2013-Mixed Optimization for Smooth Functions
10 0.85422736 20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
11 0.84955019 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
12 0.84441394 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
13 0.84326524 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
14 0.83935815 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
15 0.83920795 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
16 0.8377139 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
17 0.83687216 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors
18 0.83579952 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
19 0.83514172 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
20 0.8340351 233 nips-2013-Online Robust PCA via Stochastic Optimization