nips nips2013 nips2013-80 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu
Abstract: We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. This set is a ball around a density function estimated from data samples, i.e., it is data-driven and random. Polynomial optimization problems are inherently hard due to nonconvex objectives and constraints. However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder. We show that the optimum to the distributionally robust problem is the limit of a sequence of tractable semidefinite programming relaxations. We also give finite-sample consistency guarantees for the data-driven uncertainty sets. Finally, we apply our model and solution method in a water network optimization problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. [sent-6, score-1.213]
2 However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder. [sent-11, score-1.149]
3 We show that the optimum to the distributionally robust problem is the limit of a sequence of tractable semidefinite programming relaxations. [sent-12, score-0.763]
4 We also give finite-sample consistency guarantees for the data-driven uncertainty sets. [sent-13, score-0.375]
5 Non-convex polynomial functions are needed to describe the model accurately. [sent-19, score-0.265]
6 The resulting polynomial optimization problems are hard in general. [sent-20, score-0.439]
7 Another salient feature of realworld problems is uncertainty in the parameters of the problem (e. [sent-21, score-0.346]
8 , due to measurement errors, fundamental principles, or incomplete information), and the need for optimal solutions to be robust against worst case realizations of the uncertainty. [sent-23, score-0.271]
9 Robust optimization and polynomial optimization are already an important topic in machine learning and operations research. [sent-24, score-0.461]
10 In this paper, we combine the polynomial and uncertain features and consider robust polynomial optimization. [sent-25, score-0.919]
11 We introduce a new notion of data-driven distributional robustness: the uncertain problem parameter is a probability distribution from which samples can be observed. [sent-26, score-0.435]
12 Consequently, it is natural to take as the uncertainty set a set of functions, such as a norm ball around an estimated probability distribution. [sent-27, score-0.301]
13 This approach gives solutions that are less conservative than classical robust optimization with a set for the uncertain parameters. [sent-28, score-0.599]
14 It is easy to see that the set uncertainty setting is an extreme case of a distributional uncertainty set comprised of a set of Dirac densities. [sent-29, score-0.847]
15 First, we take care to estimate the distribution of the uncertain parameter using polynomial basis functions. [sent-34, score-0.419]
16 This ensures that the resulting robust optimization problem can be reduced to a polynomial optimization problem. [sent-35, score-0.696]
17 Using tools from machine learning, we give a finite-sample consistency guarantee on the estimated uncertainty set. [sent-37, score-0.401]
18 1 Section 2 presents the model of data-driven distributionally robust polynomial optimization—DRO for short. [sent-39, score-0.94]
19 In Section 4, we consider the general case of uncertain multivariate distribution, which yields a generalized problem of moments for the distributionally robust counterpart. [sent-42, score-0.934]
20 In Section 5, we introduce an efficient histogram approximation for the case of uncertain univariate distributions, which yields instead a polynomial optimization problem for the distributionally robust counterpart. [sent-43, score-1.363]
21 2 Problem statement Consider the following polynomial optimization problem min h(x, ξ), x∈X (1) where ξ ∈ Rn is an uncertain parameter of the problem. [sent-45, score-0.581]
22 We allow h to be a polynomial in x ∈ Rm and X to be a basic closed semialgebraic set. [sent-46, score-0.512]
23 The expectation Ef is with respect to a density function f , which belongs to an uncertainty set Dε,N . [sent-49, score-0.472]
24 This uncertainty set itself is a set of possible probability density functions constructed from a given sequence of samples ξ1 , . [sent-50, score-0.57]
25 according to the unknown density function f ∗ of the uncertain parameter ξ. [sent-56, score-0.351]
26 We call Dε,N a distributional uncertainty set, it is a random set constructed as follows: Dε,N = {f : a prob. [sent-57, score-0.583]
27 We describe the construction of the distributional uncertainty set in the cases of multivariate and univariate samples in Sections 4 and 5. [sent-64, score-0.703]
28 We say that a robust optimization problem is data-driven when the uncertainty set is an element of a sequence of uncertainty sets Dε,1 ⊇ Dε,2 ⊇ . [sent-65, score-1.001]
29 This definition allows us to completely separate the problem of robust optimization from that of constructing the appropriate uncertainty set Dε,N . [sent-69, score-0.634]
30 The underlying assumption is that the uncertainty set (due to finite-sample estimation of the parameter ξ) adapts continuously to the data as the sample size N increases. [sent-70, score-0.327]
31 A polynomial optimization problem (POP) is given by min q(x), x∈K where K = {x ∈ Rd | g1 (x) 0, . [sent-78, score-0.427]
32 One of our key results arises from the observation that the distributional robust counterpart of a POP is a POP as well. [sent-85, score-0.543]
33 A set K defined by a finite number of multivariate polynomial inequality constraints is called a basic closed semialgebraic set. [sent-86, score-0.602]
34 As shown in [1], if the basic closed semialgebraic set K compact and archimedian, there is a hierarchy of SDP relaxations whose minima 2 converge to the minimum of (4) for increasing order of the relaxation. [sent-87, score-0.318]
35 Our work combines robust optimization with notions from statistical machine learning, such as density estimation and consistency. [sent-89, score-0.504]
36 Our data-driven robust polynomial optimization method applies to a number of machine learning problems. [sent-90, score-0.598]
37 One example arises in Markov decision problems where a high-dimensional value-function is approximated by a low-dimensional polynomial V . [sent-91, score-0.342]
38 A distributionally robust variant of value iteration can be cast as: max min Ef {r(x, a, ξ) + γ a∈A f ∈Dε,N P (x | x, a, ξ)V (x )}, x ∈X where ξ is a random parameter with unknown distribution and the uncertainty set Dε,N of possible distribution is constructed by estimation. [sent-92, score-1.141]
39 For instance, the problem of robust investment in a vector of (random) financial positions ξ ∈ Rn is min sup −EQ U (v · ξ) , v∈∆n Q∈Q where Q denotes a set of probability distributions, U is a utility function, and v · ξ is an allocation among financial positions. [sent-109, score-0.393]
40 If U is polynomial, then the robust utility functional is a special case of DRO. [sent-110, score-0.263]
41 3 Our contribution in context To situate our work within the literature, it is important to note that we consider distributional uncertainty sets and polynomial constraints and objectives. [sent-111, score-0.891]
42 In this section, we outline related works with different and similar uncertainty sets, constraints and objectives. [sent-112, score-0.34]
43 Robust optimization problems of the form of (2) have been studied in the literature with different uncertain sets. [sent-113, score-0.297]
44 In several works, the uncertainty sets are defined in terms of moment constraints [3, 4, 5]. [sent-114, score-0.43]
45 Moment based uncertainty sets are motivated by the fact that probabilistic constraints can be replaced by constraints on the first and second moments in some cases [6]. [sent-115, score-0.474]
46 In contrast, we do not consider moment constraints, but distributional uncertainty sets based on probability density functions with the Lp -norm as the metric. [sent-116, score-0.807]
47 For example uncertainty sets defined using Kantorovich distance are considered in [5, Section 4] and [11] while [5, Section 3] and [12] consider distributional uncertainty with both measure bounds (of the form µ1 µ µ2 ) and moment constraints. [sent-122, score-0.937]
48 [13] considers distributional uncertainty sets with a φ−divergence metric. [sent-123, score-0.587]
49 A notion of distributional uncertainty set has also been studied in the setting of Markov decision problems [14]. [sent-124, score-0.623]
50 However, in those works, the uncertainty set is not data-driven. [sent-125, score-0.301]
51 3 Robust optimization formulations for polynomial optimization problems have been studied in [1, 15] with deterministic uncertainty sets (i. [sent-126, score-0.884]
52 A contribution is to show how to transform distributionally robust counterparts of polynomial optimization problems into polynomial optimization problems. [sent-129, score-1.472]
53 Another contribution of this work is to use sampled information to construct distributional uncertainty sets more suitable for problems where more and more data is collected over time. [sent-131, score-0.658]
54 4 Multivariate uncertainty around polynomial density estimate In this section, we construct a data-driven uncertainty set in the L2 -space—with the uniform norm · 2 . [sent-132, score-1.038]
55 Furthermore we assume, the support of ξ is contained in some basic closed semialgebraic set S := {z ∈ Rn | sj (z) 0, j = 1, . [sent-133, score-0.311]
56 In order to construct a data-driven distributional uncertainty set, we need to estimate the density f ∗ of the parameter ξ. [sent-137, score-0.717]
57 However, to ensure that the resulting robust optimization problem remains an polynomial optimization problem, we define the empirical density estimate fN as a multivariate polynomial (cf. [sent-142, score-1.183]
58 In turn, we define the following uncertainty set: Dd, ,N = f ∈ R[z]d | f (z) d z = 1, f − fN S . [sent-153, score-0.301]
59 1 Solving the DRO Next, we present asymptotic guarantees for solving distributionally robust polynomial optimization through SDP relaxations. [sent-158, score-1.038]
60 There exist u ∈ R[x] and v ∈ R[z] such that u = u0 + j=1 uj kj r and v = v0 + j=1 vj sj for some sum-of-squares polynomials {uj }t , {vj }r , and the level j=0 j=0 sets {x | u(x) 0} and {z | v(z) 0} compact. [sent-169, score-0.271]
61 V for (ii) If (5) has a unique minimizer x , and mr the sequence of subvectors of optimal solutions of SDPr associated with the first order moments of monomials in x only. [sent-191, score-0.268]
62 2 Consistency of the uncertainty set In this section, we show that the uncertainty set that we constructed is consistent. [sent-195, score-0.639]
63 In other words, given constants and δ, we give number of samples N needed to ensure that the closest polynomial to the unknown density f ∗ belongs to the uncertainty set Dd,ε,N with probability 1 − δ. [sent-196, score-0.825]
64 Let gd denote the polynomial ∗ α 1 function gd (x) = α:|α| d cα x . [sent-208, score-0.355]
65 5 Univariate uncertainty around histogram density estimate In this section, we describe an additional layer of approximation for the univariate uncertainty setting. [sent-215, score-0.944]
66 In contrast to Section 4, by approximating the uncertainty set Dε,N by a set of histogram density functions, we reduce the DRO problem to a polynomial optimization problem of degree identical with the original problem. [sent-216, score-0.963]
67 In this section, we approximate the uncertainty set Dε,N by a subset of the simplex in RK : Wε,N = p ∈ ∆K : p − pN,K ∞ ε , K where p = (p1 , . [sent-241, score-0.301]
68 W for (ii) If (7) has a unique minimizer x , let mr the sequence of subvectors of optimal solutions of SDPr associated with the first order moments of the monomials in x only. [sent-257, score-0.268]
69 For every γ Kε/(B − A) and density function f ∈ Dγ,N , we have a density vector p ∈ Wε,N such that K−1 h(x, z) f (z)dz − z∈[A,B] 5. [sent-268, score-0.342]
70 k=0 Consistency of the uncertainty set Given ε and δ, we consider in this section the number of samples N that we need to ensure that the unknown probability density is in the uncertainty set Dε,N with probability 1 − δ. [sent-270, score-0.835]
71 The consistency guarantee for the univariate histogram uncertainty set follows as a corollary of the following univariate Dvoretzky-Kiefer-Wolfowitz Inequality. [sent-271, score-0.617]
72 Let p∗ denotes the histogram density vector of ξ induced by the true density f ∗ . [sent-278, score-0.443]
73 Provided that the density f ∗ is Lipchitz continuous, it follows that the optimal value of (A1) converges to the optimal value without uncertainty as the size ε of the uncertainty set tend to zero and the number of sample N tends to infinity. [sent-281, score-0.773]
74 6 Application to water network optimization In this section, we consider a problem of optimal operation of a water distribution network (WDN). [sent-282, score-0.362]
75 Let wi denote the pressure, ei the elevation, and ξ i the demand at node i ∈ V , qi,j the flow from i to j, and i,j the loss caused by friction in case of flow from i to j for (i, j) ∈ E. [sent-286, score-0.271]
76 6 (8) j∈V i∈V1 2 ξj − wi + σ h(w, q, ξ) := qk,j + k=j qj,l , l=j X := {(w, q) ∈ R|N |+2|E| | wmin wi wmax , qmin qi,j qmax , qi,j (wj + ej − wi − ei + i,j (qi,j )) 0, wj + ej − wi − ei + i,j (qi,j ) 0, ∀(i, j)}. [sent-288, score-0.39]
77 Therefore, the distributionally robust counterpart of (8) is of the form of ADRO (6). [sent-305, score-0.738]
78 In our experiment we assume the demands at all nodes, except at node 15, are fixed; for node 15 N = 120 samples of daily demands were collected over four months—the dataset is shown in Figure 1 (b). [sent-310, score-0.308]
79 First, we consider the uncertainty set Wε,N constructed from a histogram estimation with K = 5 15 ¯ bins. [sent-312, score-0.439]
80 We consider, (a) the deterministic problem (8) with three values ξmin := mini ξi , ξ := 1 15 15 and ξmax := maxi ξi as the demand at node 15, (b) the distributionally robust i ξi N counterpart (A1) with = 0. [sent-313, score-0.945]
81 2 and σ = 1, and (c) the classical robust formulation of (8) with an uncertainty range [ξmin , ξmax ] without any distributional assumption, i. [sent-314, score-0.805]
82 We solve (9) by solving the two polynomial optimization problems. [sent-317, score-0.363]
83 All three cases (a)–(c), are polynomial optimization problems which we solve by first applying the sparse SDP relaxation of first order [19] with SDPA [20] as the SDP solver, and then applying IPOPT [21] with the SparsePOP solution as starting point. [sent-318, score-0.458]
84 The results for the deterministic case (a) show that the optimal setting and the overall pressure sum i∈V1 wi differ even when the demand at only one node changes, as reported in Table 1. [sent-346, score-0.414]
85 Comparing the distributionally robust (b) and robust (c) optimal solution for the optimal PRV setting problem, we observe, that the objective value of the distributionally robust counterpart is substantially smaller than the robust one. [sent-347, score-1.909]
86 Thus, the distributionally robust solution is less conservative than the robust solution. [sent-348, score-0.988]
87 Moreover, the distributionally robust setting is very close to the average case ¯ deterministic solution ξ - but it does not coincide. [sent-349, score-0.737]
88 Moreover, note that solving the distributionally robust (and robust ) counterpart requires the same order of magnitude in computational time as the deterministic problem. [sent-351, score-1.009]
89 That may be due to the fact that both the deterministic and the robust problems are hard polynomial optimization problems. [sent-352, score-0.71]
90 7 Discussion We introduced a notion of distributional robustness for polynomial optimization problems. [sent-353, score-0.608]
91 The distributional uncertainty sets based on statistical estimates for the probability density functions have the advantage that they are data-driven and consistent with the data for increasing samplesize. [sent-354, score-0.758]
92 Moreover, they give solutions that are less conservative than classical robust optimization with valued-based uncertainty sets. [sent-355, score-0.772]
93 We have shown that these distributional robust counterparts of polynomial optimization problems remain in the same class problems from the perspective of computational complexity. [sent-356, score-0.959]
94 This methodology is promising for a numerous real-world decision problems, where one faces the combined challenge of hard, non-convex models and uncertainty in the input parameters. [sent-357, score-0.333]
95 We can extend the histogram method of Section 5 to the case of multivariate uncertainty, but it is well-known that the sample-complexity of histogram density-estimation is greater than polynomial density-estimation. [sent-358, score-0.518]
96 An alternative definition of the distributional uncertainty set Dε,N is to allow functions that are not proper density functions by removing some constraints; this gives a trade-off between reduced computational complexity and more conservative solutions. [sent-359, score-0.769]
97 Distributionally robust optimization under moment uncertainty with applications to data-driven problems. [sent-380, score-0.683]
98 Models and algorithms for distributionally robust least squares problems. [sent-397, score-0.675]
99 Robust solutions of optimization problems affected by uncertain probabilities. [sent-443, score-0.333]
100 SparsePOP: a sparse semidefinite programming relaxation of polynomial optimization problems. [sent-480, score-0.426]
wordName wordTfidf (topN-words)
[('distributionally', 0.44), ('uncertainty', 0.301), ('polynomial', 0.265), ('dro', 0.258), ('distributional', 0.245), ('robust', 0.235), ('sdp', 0.172), ('density', 0.171), ('semialgebraic', 0.159), ('sdpr', 0.155), ('uncertain', 0.154), ('pressure', 0.138), ('polynomials', 0.113), ('demand', 0.111), ('water', 0.107), ('histogram', 0.101), ('optimization', 0.098), ('pop', 0.094), ('fn', 0.09), ('dd', 0.083), ('wdn', 0.079), ('legendre', 0.077), ('pipes', 0.077), ('relaxations', 0.071), ('univariate', 0.07), ('wi', 0.069), ('ireland', 0.068), ('min', 0.064), ('sj', 0.064), ('counterpart', 0.063), ('demands', 0.063), ('mr', 0.063), ('node', 0.06), ('ibm', 0.06), ('moments', 0.054), ('rn', 0.054), ('mk', 0.053), ('closed', 0.053), ('conservative', 0.052), ('adro', 0.052), ('bertsimas', 0.052), ('nakata', 0.052), ('sdpa', 0.052), ('sparsepop', 0.052), ('valves', 0.052), ('multivariate', 0.051), ('moment', 0.049), ('ch', 0.048), ('tc', 0.048), ('consistency', 0.048), ('monomials', 0.045), ('prvs', 0.045), ('subvectors', 0.045), ('problems', 0.045), ('pk', 0.045), ('gd', 0.045), ('semide', 0.044), ('ow', 0.043), ('prv', 0.042), ('ridge', 0.042), ('sets', 0.041), ('investment', 0.039), ('months', 0.039), ('programming', 0.039), ('constraints', 0.039), ('max', 0.038), ('componentwise', 0.037), ('constructed', 0.037), ('deterministic', 0.036), ('samples', 0.036), ('solutions', 0.036), ('basic', 0.035), ('gx', 0.034), ('zi', 0.034), ('decision', 0.032), ('ei', 0.031), ('hard', 0.031), ('nonconvex', 0.03), ('uk', 0.029), ('kj', 0.028), ('utility', 0.028), ('degree', 0.027), ('sup', 0.027), ('corollary', 0.027), ('tools', 0.026), ('counterparts', 0.026), ('ej', 0.026), ('give', 0.026), ('assumption', 0.026), ('collected', 0.026), ('solution', 0.026), ('nancial', 0.026), ('unknown', 0.026), ('sequence', 0.025), ('uj', 0.025), ('employing', 0.025), ('network', 0.025), ('optimum', 0.024), ('relaxation', 0.024), ('classical', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
Author: Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu
Abstract: We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. This set is a ball around a density function estimated from data samples, i.e., it is data-driven and random. Polynomial optimization problems are inherently hard due to nonconvex objectives and constraints. However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder. We show that the optimum to the distributionally robust problem is the limit of a sequence of tractable semidefinite programming relaxations. We also give finite-sample consistency guarantees for the data-driven uncertainty sets. Finally, we apply our model and solution method in a water network optimization problem. 1
2 0.094152577 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes
Author: Shiau Hong Lim, Huan Xu, Shie Mannor
Abstract: An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system. We consider a problem setting where some unknown parts of the state space can have arbitrary transitions while other parts are purely stochastic. We devise an algorithm that is adaptive to potentially adversarial behavior and show that it achieves similar regret bounds as the purely stochastic case. 1
3 0.072917409 280 nips-2013-Robust Data-Driven Dynamic Programming
Author: Grani Adiwena Hanasusanto, Daniel Kuhn
Abstract: In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits an efficient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. We show that the arising minmax problems in the DP recursions reduce to tractable conic programs. We also demonstrate that the proposed robust DP algorithm dominates various non-robust schemes in out-of-sample tests across several application domains. 1
4 0.072868101 269 nips-2013-Regression-tree Tuning in a Streaming Setting
Author: Samory Kpotufe, Francesco Orabona
Abstract: We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. We prove that it is possible to maintain such a structure in time O (log n) at ˜ any time step n while achieving a nearly-optimal regression rate of O n−2/(2+d) in terms of the unknown metric dimension d. Finally we prove a new regression lower-bound which is independent of a given data size, and hence is more appropriate for the streaming setting. 1
5 0.068772651 158 nips-2013-Learning Multiple Models via Regularized Weighting
Author: Daniel Vainsencher, Shie Mannor, Huan Xu
Abstract: We consider the general problem of Multiple Model Learning (MML) from data, from the statistical and algorithmic perspectives; this problem includes clustering, multiple regression and subspace clustering as special cases. A common approach to solving new MML problems is to generalize Lloyd’s algorithm for clustering (or Expectation-Maximization for soft clustering). However this approach is unfortunately sensitive to outliers and large noise: a single exceptional point may take over one of the models. We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. This enhances robustness by making assumptions on class balance. We further provide generalization bounds and explain how the new iterations may be computed efficiently. We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i.e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers. 1
6 0.067449912 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
7 0.062897883 63 nips-2013-Cluster Trees on Manifolds
8 0.062600642 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation
9 0.061678492 233 nips-2013-Online Robust PCA via Stochastic Optimization
10 0.061535154 184 nips-2013-Marginals-to-Models Reducibility
11 0.060775973 270 nips-2013-Regret based Robust Solutions for Uncertain Markov Decision Processes
12 0.058872804 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
13 0.058820434 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
14 0.05647219 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
15 0.056147777 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
16 0.055892427 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
17 0.055769697 75 nips-2013-Convex Two-Layer Modeling
18 0.055389926 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
19 0.054896869 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
20 0.054078672 91 nips-2013-Dirty Statistical Models
topicId topicWeight
[(0, 0.164), (1, 0.019), (2, 0.045), (3, 0.023), (4, 0.025), (5, 0.071), (6, 0.006), (7, 0.005), (8, -0.022), (9, 0.038), (10, 0.037), (11, 0.015), (12, 0.013), (13, 0.029), (14, 0.015), (15, 0.04), (16, 0.064), (17, 0.003), (18, 0.027), (19, -0.017), (20, -0.006), (21, 0.067), (22, -0.005), (23, -0.002), (24, -0.007), (25, 0.021), (26, -0.041), (27, 0.063), (28, -0.041), (29, -0.001), (30, -0.04), (31, -0.006), (32, -0.023), (33, 0.056), (34, -0.016), (35, 0.009), (36, -0.04), (37, -0.023), (38, -0.086), (39, 0.062), (40, -0.048), (41, -0.047), (42, -0.088), (43, 0.017), (44, -0.047), (45, 0.032), (46, 0.051), (47, 0.04), (48, 0.068), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.94571823 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
Author: Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu
Abstract: We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. This set is a ball around a density function estimated from data samples, i.e., it is data-driven and random. Polynomial optimization problems are inherently hard due to nonconvex objectives and constraints. However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder. We show that the optimum to the distributionally robust problem is the limit of a sequence of tractable semidefinite programming relaxations. We also give finite-sample consistency guarantees for the data-driven uncertainty sets. Finally, we apply our model and solution method in a water network optimization problem. 1
2 0.61756957 158 nips-2013-Learning Multiple Models via Regularized Weighting
Author: Daniel Vainsencher, Shie Mannor, Huan Xu
Abstract: We consider the general problem of Multiple Model Learning (MML) from data, from the statistical and algorithmic perspectives; this problem includes clustering, multiple regression and subspace clustering as special cases. A common approach to solving new MML problems is to generalize Lloyd’s algorithm for clustering (or Expectation-Maximization for soft clustering). However this approach is unfortunately sensitive to outliers and large noise: a single exceptional point may take over one of the models. We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. This enhances robustness by making assumptions on class balance. We further provide generalization bounds and explain how the new iterations may be computed efficiently. We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i.e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers. 1
Author: Yuchen Zhang, John Duchi, Michael Jordan, Martin J. Wainwright
Abstract: We establish lower bounds on minimax risks for distributed statistical estimation under a communication budget. Such lower bounds reveal the minimum amount of communication required by any procedure to achieve the centralized minimax-optimal rates for statistical estimation. We study two classes of protocols: one in which machines send messages independently, and a second allowing for interactive communication. We establish lower bounds for several problems, including various types of location models, as well as for parameter estimation in regression models. 1
4 0.5912587 280 nips-2013-Robust Data-Driven Dynamic Programming
Author: Grani Adiwena Hanasusanto, Daniel Kuhn
Abstract: In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits an efficient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. We show that the arising minmax problems in the DP recursions reduce to tractable conic programs. We also demonstrate that the proposed robust DP algorithm dominates various non-robust schemes in out-of-sample tests across several application domains. 1
5 0.57927257 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch
Abstract: In this paper we introduce a novel method that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. Our method can be regarded as a natural extension of the one-class SVM (OCSVM) algorithm that finds multiple parallel separating hyperplanes in a reproducing kernel Hilbert space. We call our method q-OCSVM, as it can be used to estimate q quantiles of a highdimensional distribution. For this purpose, we introduce a new global convex optimization program that finds all estimated sets at once and show that it can be solved efficiently. We prove the correctness of our method and present empirical results that demonstrate its superiority over existing methods. 1
6 0.57333726 184 nips-2013-Marginals-to-Models Reducibility
7 0.56786174 244 nips-2013-Parametric Task Learning
8 0.56297594 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
9 0.56052625 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
10 0.55740368 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
11 0.55533218 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
12 0.55309087 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
13 0.55093318 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
14 0.54537511 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
15 0.54196888 217 nips-2013-On Poisson Graphical Models
16 0.54040289 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
17 0.53790361 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
18 0.53558397 347 nips-2013-Variational Planning for Graph-based MDPs
19 0.53519803 63 nips-2013-Cluster Trees on Manifolds
20 0.53471249 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression
topicId topicWeight
[(16, 0.021), (33, 0.14), (34, 0.089), (41, 0.031), (49, 0.029), (56, 0.143), (66, 0.261), (70, 0.026), (85, 0.04), (89, 0.07), (93, 0.039), (95, 0.031)]
simIndex simValue paperId paperTitle
1 0.79313725 352 nips-2013-What do row and column marginals reveal about your dataset?
Author: Behzad Golshan, John Byers, Evimaria Terzi
Abstract: Numerous datasets ranging from group memberships within social networks to purchase histories on e-commerce sites are represented by binary matrices. While this data is often either proprietary or sensitive, aggregated data, notably row and column marginals, is often viewed as much less sensitive, and may be furnished for analysis. Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i, j) of interest. We do this for all the cells of H simultaneously, without generating realizations, but rather via implicitly sampling the datasets that satisfy the input marginals. The end result is an efficient algorithm with asymptotic running time the same as that required by standard sampling techniques to generate a single dataset from the same dataspace. Our experimental evaluation demonstrates the efficiency and the efficacy of our framework in multiple settings. 1
2 0.7834751 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We consider the problem of sampling from a probability distribution defined over a high-dimensional discrete set, specified for instance by a graphical model. We propose a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods. Our scheme can leverage fast combinatorial optimization tools as a blackbox and, unlike MCMC methods, samples produced are guaranteed to be within an (arbitrarily small) constant factor of the true probability distribution. We demonstrate that by using state-of-the-art combinatorial search tools, PAWS can efficiently sample from Ising grids with strong interactions and from software verification instances, while MCMC and variational methods fail in both cases. 1
same-paper 3 0.78301877 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
Author: Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu
Abstract: We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. This set is a ball around a density function estimated from data samples, i.e., it is data-driven and random. Polynomial optimization problems are inherently hard due to nonconvex objectives and constraints. However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder. We show that the optimum to the distributionally robust problem is the limit of a sequence of tractable semidefinite programming relaxations. We also give finite-sample consistency guarantees for the data-driven uncertainty sets. Finally, we apply our model and solution method in a water network optimization problem. 1
Author: Anima Anandkumar, Daniel Hsu, Majid Janzamin, Sham M. Kakade
Abstract: Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of “higher order” expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allow for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition. Keywords: Overcomplete representation, admixture models, generic identifiability, tensor decomposition.
5 0.66825962 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes
Author: Shiau Hong Lim, Huan Xu, Shie Mannor
Abstract: An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system. We consider a problem setting where some unknown parts of the state space can have arbitrary transitions while other parts are purely stochastic. We devise an algorithm that is adaptive to potentially adversarial behavior and show that it achieves similar regret bounds as the purely stochastic case. 1
6 0.66726273 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
7 0.66622066 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
8 0.66513073 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
9 0.66487932 249 nips-2013-Polar Operators for Structured Sparse Estimation
10 0.66406316 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
11 0.66399324 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
12 0.66227829 233 nips-2013-Online Robust PCA via Stochastic Optimization
13 0.66213226 224 nips-2013-On the Sample Complexity of Subspace Learning
14 0.66047752 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
15 0.65952426 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation
16 0.65882593 184 nips-2013-Marginals-to-Models Reducibility
17 0.65858656 344 nips-2013-Using multiple samples to learn mixture models
18 0.65799046 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
19 0.65792435 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
20 0.65776318 355 nips-2013-Which Space Partitioning Tree to Use for Search?