nips nips2009 nips2009-11 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. [sent-3, score-0.308]
2 We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. [sent-4, score-0.621]
3 By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. [sent-5, score-0.713]
4 A parametric approach to handling uncertainty in such a setting would be to fit a specific parametric model to the known moments and then apply stochastic programming techniques to solve for an optimal decision. [sent-11, score-0.207]
5 However, such a parametric strategy can be too bold, hard to justify, and might incur significant loss if the fitting distribution does not match the true underlying distribution very well. [sent-13, score-0.156]
6 Such a minimax formulation has been studied in several fields [1; 2; 3; 4; 5; 6] and is also the focus of this paper. [sent-15, score-0.22]
7 Although Bayesian optimal decision theory is a rightfully wellestablished approach for decision making under uncertainty, minimax has proved to be a useful alternative in many domains, such as finance, where it is difficult to formulate appropriate priors over models. [sent-16, score-0.235]
8 In these fields, minimax formulation combined with stochastic programming [7] have been extensively studied and successfully applied. [sent-17, score-0.298]
9 We make a contribution to minimax probability theory and apply the results to problems arising in four different areas. [sent-18, score-0.161]
10 Specifically, we generalize a classic result on the linear projection property of distribution families: we show that any linear projection between distribution families with fixed mean and covariance, regardless of their dimensions, is surjective. [sent-19, score-0.62]
11 Our proof imposes no conditions on the deterministic matrix X, hence extends the classic projection result in [6], which assumes X is a vector. [sent-21, score-0.21]
12 We furthermore extend this surjective property to some restricted distribution 1 families, which allows additional prior information to be incorporated and hence less conservative solutions to be obtained. [sent-22, score-0.361]
13 In particular, we prove that surjectivity of linear projections remains to hold for distribution families that are additionally symmetric, log-concave, or symmetric linear unimodal. [sent-23, score-0.502]
14 An immediate application of these results is to reduce the worst-case analysis of multivariate expectations to the univariate (or reduced multivariate) ones, which have been long studied and produced many fruitful results. [sent-25, score-0.332]
15 In this direction, we conduct worst-case analyses of some common restricted distribution families. [sent-26, score-0.136]
16 We illustrate our results on problems that incorporate a classic worst case value-at-risk constraint: minimax probability classification [2]; chance constrained linear programming (CCLP) [3]; portfolio selection [4]; and Markov decision processes (MDPs) with reward uncertainty [8]. [sent-27, score-0.912]
17 Additionally, we provide extensions to other constrained distribution families, which makes the minimax formulation less conservative in each case. [sent-29, score-0.321]
18 These results are then extended to the more recent conditional value-at-risk constraint, and new bounds are proved, including a new bound on the survival function for symmetric unimodal distributions. [sent-30, score-0.511]
19 2 A General Projection Property First we establish a generalized linear projection property for distribution families. [sent-31, score-0.202]
20 The key application will be to reduce worst-case multivariate stochastic programming problems to lower dimensional equivalents; see Corollary 1. [sent-32, score-0.178]
21 Popescu [6] has proved the special case of reduction to one dimension, however we provide a simpler proof that can be more easily extended to other distribution families1 . [sent-33, score-0.137]
22 Let (µ, Σ) denote the family of distributions sharing common mean µ and covariance Σ, and let µX = X T µ and ΣX = X T ΣX. [sent-34, score-0.139]
23 Theorem 1 (General Projection Property (GPP)) For all µ, Σ ≽ 0, and X ∈ Rm×d , the projection X T R = r from m-variate distributions R ∼ (µ, Σ) to d-variate distributions r ∼ (µX , ΣX ) is surjective and many-to-one. [sent-37, score-0.263]
24 Although this establishes the general result, we extend it to distribution families under additional constraints below. [sent-44, score-0.363]
25 In such cases, if a general linear projection property can still be shown to hold, the additional assumptions can be used to make the minimax approach less conservative in a simple, direct manner. [sent-46, score-0.449]
26 We thus consider a number of additionally restricted distribution families. [sent-47, score-0.132]
27 Definition 1 A random vector X is called (centrally) symmetric about µ, if for all vectors x, Pr(X ≥ µ + x) = Pr(X ≤ µ − x). [sent-48, score-0.125]
28 A univariate random variable is called unimodal about a if its cumulative distribution function (c. [sent-49, score-0.444]
29 A random m-vector X is called linear unimodal about 0m if for all a ∈ Rm , aT X is (univariate) unimodal about 0. [sent-57, score-0.477]
30 2 let (µ, Σ)SU denote the family of distributions that are additionally symmetric and linear unimodal about µ. [sent-59, score-0.515]
31 Lemma 1 (a) If random vector X is symmetric about 0, then AX + µ is symmetric about µ. [sent-61, score-0.25]
32 (b) If X, Y are independent and both symmetric about 0, Z = X + Y is also symmetric about 0. [sent-62, score-0.25]
33 Although once misbelieved, it is now clear that the convolution of two (univariate) unimodal distributions need not be unimodal. [sent-63, score-0.308]
34 However, for symmetric, unimodal distributions we have Lemma 2 ([10] Theorem 1. [sent-64, score-0.28]
35 6) If two independent random variables x and y are both symmetric and unimodal about 0, then z = x + y is also unimodal about 0. [sent-65, score-0.573]
36 There are several non-equivalent extensions of unimodality to multivariate random variables. [sent-66, score-0.331]
37 Theorem 2 (GPP for Symmetric, Log-concave, and Symmetric Linear Unimodal Distributions) For all µ, Σ ≽ 0 and X ∈ Rm×d , the projection X T R = r from m-variate R ∼ (µ, Σ)S to d-variate r ∼ (µX , ΣX )S is surjective and many-to-one. [sent-78, score-0.151]
38 Then, respectively, symmetry of the constructed R follows from Lemma 1; log-concavity of R follows from Lemma 3; and linear unimodality of R follows from the definition and Lemma 2. [sent-81, score-0.423]
39 An immediate application of the general projection property is to reduce worst-case analyses of multivariate expectations to the univariate case. [sent-83, score-0.475]
40 Corollary 1 For any matrix X and any function g(·) (including in particular when X is a vector) sup E[g(X T R)] = sup E[g(r)]. [sent-85, score-0.198]
41 4 3 Application to Worst-case Value-at-risk We now apply these projection properties to analyze the worst case value-at-risk (VaR) —a useful risk criterion in many application areas. [sent-90, score-0.29]
42 Consider the following constraint on a distribution R Pr(−xT R ≤ α) ≥ 1 − ϵ, (2) 2 A sufficient but not necessary condition for log-concavity is having log-concave densities. [sent-91, score-0.12]
43 In the univariate case, log-concave distributions are called strongly unimodal, which is only a proper subset of univariate unimodal distributions [10]. [sent-93, score-0.682]
44 3 If X is a vector we can also extend this theorem to other multivariate unimodalities such as symmetric star/block/convex unimodal. [sent-94, score-0.332]
45 4 The closure of (µ, Σ), (µ, Σ)S , (µ, Σ)L , and (µ, Σ)SU under linear projection is critical for Corollary 1 to hold. [sent-95, score-0.117]
46 Corollary 1 fails for other kinds of multivariate unimodalities, such as symmetric star/block/convex unimodal. [sent-96, score-0.194]
47 Within certain restricted distribution families, such as Q-radially symmetric distributions, (2) can be (equivalently) transformed to a deterministic second order cone constraint (depending on the range of ϵ) [3]. [sent-101, score-0.287]
48 Suppose however that one knew the distribution of R belonged to a certain family, such as (µ, Σ). [sent-103, score-0.14]
49 If we have additional information about the underlying distribution, such as symmetry or unimodality, the worstcase ϵ-VaR can be reduced. [sent-107, score-0.209]
50 ) Proof: From Corollary 1 it follows that inf R∼(µ,Σ) Pr(−xT R ≤ α) = inf 2 r∼(−µx ,σx ) Pr(r ≤ α) = 1 − sup Pr(r > α). [sent-118, score-0.247]
51 [2] and [3] provide a proof of (4) based on the multivariate Chebyshev inequality in [12]; [4] proves (4) from dual optimality; and the proof in [6] utilizes two point support property of the general constraint (3). [sent-124, score-0.29]
52 6 4 Figure 1: Comparison of the coefficients in front of σx for different distribution families in Proposition 1 (left) and Proposition 2 (right). [sent-129, score-0.234]
53 Figure 1 compares the coefficients on σx among the different worst case VaR for different distribution families. [sent-132, score-0.111]
54 The large gap between coefficients for general and symmetric (linear) unimodal distributions demonstrates how additional constraints can generate much less conservative solutions while still ensuring robustness. [sent-133, score-0.58]
55 Beyond simplifying existing proofs, Proposition 1 can be used to extend some of the uses of the VaR criterion in different application areas. [sent-134, score-0.133]
56 From the data, the distribution families (µ1 , Σ1 ) and (µ2 , Σ2 ) can be estimated. [sent-138, score-0.234]
57 Then a robust hyperplane can be recovered by minimizing the worst-case error [ ] [ ] min ϵ s. [sent-139, score-0.153]
58 inf Pr(xT R1 ≤ α) ≥ 1 − ϵ and inf Pr(xT R2 ≥ α) ≥ 1 − ϵ, (12) x̸=0,α,ϵ R1 ∼(µ1 ,Σ1 ) R2 ∼(µ2 ,Σ2 ) where x is the normal vector of the hyperplane, α is the offset and ϵ controls the error probability. [sent-141, score-0.148]
59 In this case, suppose we knew in advance that the optimal ϵ lay in [ 2 , 1), meaning that no hyperplane predicts better than random guessing. [sent-145, score-0.113]
60 Then the constraints in (12) become linear, covariance information becomes useless in determining the optimal hyperplane, and the optimization concentrates solely on separating the means of two classes. [sent-146, score-0.158]
61 Although such a result might seem surprising, it is a direct consequence of symmetry: the worst-case distributions are forced to put probability mass arbitrarily far away on both sides of the mean, thereby eliminating any information brought by covariance. [sent-147, score-0.13]
62 When the optimal ϵ lies in (0, 1 ), however, covariance information becomes 2 meaningful, since the worst-case distributions can no longer put probability mass arbitrarily far away on both sides of the mean (owing to the existence of a hyperplane that predicts labels better than random guessing). [sent-148, score-0.207]
63 Calafiore and El Ghaoui studied this problem in [3], and imposed the inequality constraint with high probability, leading to the the so-called chance constrained linear program (CCLP): [ ] min aT x s. [sent-154, score-0.238]
64 Depending on the value of ϵ, the chance constraint can be equivalently transformed into a second order cone constraint or a linear constraint. [sent-158, score-0.215]
65 The work in [3] concentrates on the general and symmetric distribution families. [sent-159, score-0.206]
66 Portfolio Selection [4]: In portfolio selection, let R represent the (uncertain) returns of a suite of financial assets, and x the weighting one would like to put on the various assets. [sent-163, score-0.313]
67 Previous work has not addressed the case when additional symmetry or linear unimodal information is available. [sent-170, score-0.434]
68 However, comparing the minimal value of α in Proposition 1, we see that such additional information, such as symmetry or unimodality, indeed decreases our potential loss, as shown clearly in Figure 1. [sent-171, score-0.181]
69 This makes sense, since the more one knows about uncertain returns the less risk one should have to bear. [sent-172, score-0.133]
70 Delage and Mannor [8] extend this problem to the uncertain case by employing the value-at-risk type constraint (2) and assuming the unknown reward model and transition kernel are drawn from a known distribution (Gaussian and Dirichlet respectively). [sent-177, score-0.303]
71 Unfortunately, [8] also proves that the constraint (2) is generally NP-hard to satisfy unless one assumes some very restricted form of distribution, such as Gaussian. [sent-178, score-0.115]
72 Alternatively, note that one can use the worst case value-at-risk formulation (3) to obtain a tractable approximation to (2) [ ] min α s. [sent-179, score-0.129]
73 inf Pr(−xT R ≤ α) ≥ 1 − ϵ, (15) x,α R∼(µ,Σ) where R is the reward function (unknown but assumed to belong to (µ, Σ)) and x represents a discounted-stationary state-action visitation distribution (which can be used to recover an optimal behavior policy). [sent-181, score-0.176]
74 Thus, given reasonable constraints, the minimax approach does not have to be overly conservative, while providing robustness and tractability. [sent-183, score-0.161]
75 4 Application to Worst-case Conditional Value-at-risk Finally, we investigate the more refined conditional value-at-risk (CVaR) criterion that bounds the conditional expectation of losses beyond the value-at-risk (VaR). [sent-184, score-0.16]
76 For example, if ϵ = 0, the optimization problem trivially says that the loss of any portfolio will be no larger than ∞. [sent-192, score-0.35]
77 Although one potential remedy might be to use Monte Carlo techniques to approximate the expectation, we instead take a robust approach: As before, suppose one knew the distribution of R belonged to a certain family, such as (µ, Σ). [sent-197, score-0.189]
78 Given such knowledge, it is natural to consider the worst-case CVaR ] ] 1 [ 1 [ f = sup min α + E (−xT R − α)+ = min sup α + E (−xT R − α)+ , (18) α ϵ ϵ R∼(µ,Σ) α R∼(µ,Σ) where the interchangeability of the min and sup operators follows from the classic minimax theorem [15]. [sent-198, score-0.661]
79 If one has additional information about the underlying distribution, such as symmetry or unimodality, the worst-case CVaR can be reduced. [sent-200, score-0.209]
80 Proof: We know from Corollary 1 that [ ] sup E (−xT R − α)+ = R∼(µ,Σ) sup 2 r∼(−µx ,σx ) [ ] E (r − α)+ , (23) which reduces the problem to the univariate case. [sent-207, score-0.371]
81 To proceed, we will need to make use of the univariate results given in Proposition 3 below. [sent-208, score-0.173]
82 α 2ϵ This is a convex univariate optimization problem in α. [sent-210, score-0.21]
83 Comparing VaR and CVaR in Figure 1 shows that unimodality has less impact on improving CVaR. [sent-216, score-0.262]
84 A key component of Proposition 2 is its reliance on the following important univariate results. [sent-217, score-0.173]
85 The following proposition gives tight bounds of the expected survival function for the various families. [sent-218, score-0.269]
86 Given the space constraints, we can only discuss the direct application of worst-case CVaR to the portfolio selection problem. [sent-222, score-0.377]
87 Next, when additional symmetry information is taken into account and ϵ ∈ (0, 1 ), CVaR and VaR again select the same portfolio but under different worst-case distri2 butions. [sent-226, score-0.494]
88 When unimodality is added, the CVaR criterion finally begins to select different portfolios than VaR. [sent-227, score-0.326]
89 5 Concluding Remarks We have provided a simpler yet broader proof of the general linear projection property for distribution families with given mean and covariance. [sent-228, score-0.479]
90 The proof strategy can be easily extended to more restricted distribution families. [sent-229, score-0.178]
91 A direct implication of our results is that worst-case analyses of multivariate expectations can often be reduced to those of univariate ones. [sent-230, score-0.318]
92 By combining this trick with classic univariate inequalities, we were able to provide worst-case analyses of two widely adopted constraints (based on value-at-risk criteria). [sent-231, score-0.329]
93 Our analysis recovers some existing results in a simpler way while also provides new insights on incorporating additional information. [sent-232, score-0.111]
94 Above, we assumed the first and second moments of the underlying distribution were precisely known, which of course is questionable in practice. [sent-233, score-0.141]
95 One strategy, proposed in [2], is to construct a (bounded and convex) uncertainty set U over (µ, Σ), and then applying a similar minimax formulation but with respect to (µ, Σ) ∈ U. [sent-235, score-0.253]
96 A second approach is simply to lower one’s confidence of the constraints and rely on the fact that the moment estimates are close to their true values within some additional confidence bound [17]. [sent-237, score-0.155]
97 That is, instead of enforcing the constraint (3) or (18) surely, one can instead plug-in the estimated moments and argue that constraints will be satisfied within some diminished probability. [sent-238, score-0.181]
98 9 Except the very recent work of [9] on portfolio selection. [sent-243, score-0.313]
99 “Worst-case value-at-risk and robust portfolio optimization: a conic programming approach”. [sent-266, score-0.412]
100 “Worst-case conditional value-at-risk with application to robust portfolio management”. [sent-271, score-0.425]
wordName wordTfidf (topN-words)
[('cvar', 0.477), ('portfolio', 0.313), ('unimodality', 0.262), ('unimodal', 0.224), ('families', 0.187), ('proposition', 0.174), ('univariate', 0.173), ('su', 0.163), ('minimax', 0.161), ('var', 0.149), ('pr', 0.149), ('xt', 0.142), ('symmetry', 0.132), ('symmetric', 0.125), ('cclp', 0.119), ('sup', 0.099), ('im', 0.098), ('uncertain', 0.09), ('projection', 0.088), ('conservative', 0.084), ('inf', 0.074), ('constraint', 0.073), ('ioana', 0.072), ('unimodalities', 0.072), ('multivariate', 0.069), ('hyperplane', 0.068), ('classic', 0.067), ('moments', 0.066), ('criterion', 0.064), ('worst', 0.064), ('uncertainty', 0.063), ('surjective', 0.063), ('survival', 0.063), ('corollary', 0.062), ('chebyshev', 0.057), ('laurent', 0.057), ('surely', 0.056), ('distributions', 0.056), ('reward', 0.055), ('proof', 0.055), ('inequalities', 0.052), ('el', 0.051), ('programming', 0.05), ('robust', 0.049), ('additional', 0.049), ('belonged', 0.048), ('delage', 0.048), ('gpp', 0.048), ('analyses', 0.047), ('distribution', 0.047), ('covariance', 0.045), ('ghaoui', 0.045), ('knew', 0.045), ('alberta', 0.043), ('risk', 0.043), ('additionally', 0.043), ('operations', 0.043), ('constraints', 0.042), ('ore', 0.042), ('surjectivity', 0.042), ('restricted', 0.042), ('mdps', 0.041), ('chance', 0.04), ('lemma', 0.039), ('csaba', 0.038), ('ingenuity', 0.038), ('szepesv', 0.038), ('extend', 0.038), ('family', 0.038), ('property', 0.038), ('sides', 0.038), ('optimization', 0.037), ('decision', 0.037), ('min', 0.036), ('brought', 0.036), ('simpler', 0.035), ('bound', 0.035), ('strategy', 0.034), ('concentrates', 0.034), ('mum', 0.034), ('selection', 0.033), ('proofs', 0.033), ('nance', 0.032), ('conditional', 0.032), ('bounds', 0.032), ('application', 0.031), ('studied', 0.03), ('coef', 0.03), ('program', 0.03), ('moment', 0.029), ('expectations', 0.029), ('rm', 0.029), ('cients', 0.029), ('formulation', 0.029), ('linear', 0.029), ('theorem', 0.028), ('underlying', 0.028), ('convolution', 0.028), ('stochastic', 0.028), ('incorporating', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 11 nips-2009-A General Projection Property for Distribution Families
Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1
2 0.18989947 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright
Abstract: We study minimax rates for estimating high-dimensional nonparametric regression models with sparse additive structure and smoothness constraints. More precisely, our goal is to estimate a function f ∗ : Rp → R that has an additive decomposition of the form ∗ ∗ f ∗ (X1 , . . . , Xp ) = j∈S hj (Xj ), where each component function hj lies in some class H of “smooth” functions, and S ⊂ {1, . . . , p} is an unknown subset with cardinality s = |S|. Given n i.i.d. observations of f ∗ (X) corrupted with additive white Gaussian noise where the covariate vectors (X1 , X2 , X3 , ..., Xp ) are drawn with i.i.d. components from some distribution P, we determine lower bounds on the minimax rate for estimating the regression function with respect to squared-L2 (P) error. Our main result is a lower bound on the minimax rate that scales as max s log(p/s) , s ǫ2 (H) . The first term reflects the sample size required for n n performing subset selection, and is independent of the function class H. The second term s ǫ2 (H) is an s-dimensional estimation term corresponding to the sample size required for n estimating a sum of s univariate functions, each chosen from the function class H. It depends linearly on the sparsity index s but is independent of the global dimension p. As a special case, if H corresponds to functions that are m-times differentiable (an mth -order Sobolev space), then the s-dimensional estimation term takes the form sǫ2 (H) ≍ s n−2m/(2m+1) . Either of n the two terms may be dominant in different regimes, depending on the relation between the sparsity and smoothness of the additive decomposition.
4 0.12680678 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
Author: Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
Abstract: Despite a large literature on upper bounds on complexity of convex optimization, relatively less attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining a understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes. We also discuss implications of these results for the understanding the inherent complexity of large-scale learning and estimation problems. 1
5 0.080660194 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.
6 0.079398774 129 nips-2009-Learning a Small Mixture of Trees
7 0.076204427 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes
8 0.074444108 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
9 0.067417555 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
10 0.065969966 154 nips-2009-Modeling the spacing effect in sequential category learning
11 0.064574331 27 nips-2009-Adaptive Regularization of Weight Vectors
12 0.064406887 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
13 0.064030878 48 nips-2009-Bootstrapping from Game Tree Search
14 0.059315454 45 nips-2009-Beyond Convexity: Online Submodular Minimization
15 0.058916263 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
16 0.057480175 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
17 0.057445858 128 nips-2009-Learning Non-Linear Combinations of Kernels
18 0.05635713 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
19 0.055982906 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
20 0.055535242 55 nips-2009-Compressed Least-Squares Regression
topicId topicWeight
[(0, -0.188), (1, 0.159), (2, 0.083), (3, -0.034), (4, 0.08), (5, 0.028), (6, 0.036), (7, -0.028), (8, 0.005), (9, -0.03), (10, 0.03), (11, -0.043), (12, -0.012), (13, 0.004), (14, 0.086), (15, -0.03), (16, -0.055), (17, 0.01), (18, 0.061), (19, 0.07), (20, 0.042), (21, -0.013), (22, -0.044), (23, 0.067), (24, -0.128), (25, -0.088), (26, 0.005), (27, -0.022), (28, -0.082), (29, -0.007), (30, -0.021), (31, -0.051), (32, -0.027), (33, -0.011), (34, 0.058), (35, -0.002), (36, -0.135), (37, 0.079), (38, -0.057), (39, -0.059), (40, -0.041), (41, 0.07), (42, -0.014), (43, 0.115), (44, 0.035), (45, 0.009), (46, -0.003), (47, -0.094), (48, 0.026), (49, -0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.94257528 11 nips-2009-A General Projection Property for Distribution Families
Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1
2 0.75652945 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
Author: Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
Abstract: Despite a large literature on upper bounds on complexity of convex optimization, relatively less attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining a understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes. We also discuss implications of these results for the understanding the inherent complexity of large-scale learning and estimation problems. 1
3 0.67526895 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright
Abstract: We study minimax rates for estimating high-dimensional nonparametric regression models with sparse additive structure and smoothness constraints. More precisely, our goal is to estimate a function f ∗ : Rp → R that has an additive decomposition of the form ∗ ∗ f ∗ (X1 , . . . , Xp ) = j∈S hj (Xj ), where each component function hj lies in some class H of “smooth” functions, and S ⊂ {1, . . . , p} is an unknown subset with cardinality s = |S|. Given n i.i.d. observations of f ∗ (X) corrupted with additive white Gaussian noise where the covariate vectors (X1 , X2 , X3 , ..., Xp ) are drawn with i.i.d. components from some distribution P, we determine lower bounds on the minimax rate for estimating the regression function with respect to squared-L2 (P) error. Our main result is a lower bound on the minimax rate that scales as max s log(p/s) , s ǫ2 (H) . The first term reflects the sample size required for n n performing subset selection, and is independent of the function class H. The second term s ǫ2 (H) is an s-dimensional estimation term corresponding to the sample size required for n estimating a sum of s univariate functions, each chosen from the function class H. It depends linearly on the sparsity index s but is independent of the global dimension p. As a special case, if H corresponds to functions that are m-times differentiable (an mth -order Sobolev space), then the s-dimensional estimation term takes the form sǫ2 (H) ≍ s n−2m/(2m+1) . Either of n the two terms may be dominant in different regimes, depending on the relation between the sparsity and smoothness of the additive decomposition.
4 0.60050237 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
5 0.58788824 69 nips-2009-Discrete MDL Predicts in Total Variation
Author: Marcus Hutter
Abstract: The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed. 1
6 0.56161982 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
7 0.54255521 154 nips-2009-Modeling the spacing effect in sequential category learning
8 0.53295279 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes
9 0.51865107 27 nips-2009-Adaptive Regularization of Weight Vectors
10 0.50576544 206 nips-2009-Riffled Independence for Ranked Data
11 0.50236666 220 nips-2009-Slow Learners are Fast
12 0.49261451 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
13 0.46449631 129 nips-2009-Learning a Small Mixture of Trees
14 0.46086338 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
15 0.45222816 7 nips-2009-A Data-Driven Approach to Modeling Choice
16 0.45180696 180 nips-2009-On the Convergence of the Concave-Convex Procedure
17 0.44924229 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
18 0.43502554 16 nips-2009-A Smoothed Approximate Linear Program
19 0.43215299 48 nips-2009-Bootstrapping from Game Tree Search
20 0.42509228 75 nips-2009-Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models
topicId topicWeight
[(24, 0.077), (25, 0.051), (35, 0.041), (36, 0.072), (39, 0.025), (58, 0.072), (61, 0.03), (71, 0.495), (86, 0.056)]
simIndex simValue paperId paperTitle
1 0.97489178 53 nips-2009-Complexity of Decentralized Control: Special Cases
Author: Martin Allen, Shlomo Zilberstein
Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1
2 0.97406471 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall
Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei
Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1
3 0.96757013 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models
Author: Laura Dietz, Valentin Dallmeier, Andreas Zeller, Tobias Scheffer
Abstract: We devise a graphical model that supports the process of debugging software by guiding developers to code that is likely to contain defects. The model is trained using execution traces of passing test runs; it reflects the distribution over transitional patterns of code positions. Given a failing test case, the model determines the least likely transitional pattern in the execution trace. The model is designed such that Bayesian inference has a closed-form solution. We evaluate the Bernoulli graph model on data of the software projects AspectJ and Rhino. 1
same-paper 4 0.94894868 11 nips-2009-A General Projection Property for Distribution Families
Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1
5 0.86323327 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization
Author: Massih Amini, Nicolas Usunier, Cyril Goutte
Abstract: We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classifiers learned on examples with multiple artificially created views. Our result uncovers a trade-off between the size of the training set, the number of views, and the quality of the view generating functions. As a consequence, we identify situations where it is more interesting to use multiple views for learning instead of classical single view learning. An extension of this framework is a natural way to leverage unlabeled multi-view data in semi-supervised learning. Experimental results on a subset of the Reuters RCV1/RCV2 collections support our findings by showing that additional views obtained from MT may significantly improve the classification performance in the cases identified by our trade-off. 1
6 0.85662067 56 nips-2009-Conditional Neural Fields
7 0.78121924 205 nips-2009-Rethinking LDA: Why Priors Matter
8 0.77621073 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
9 0.68726975 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains
10 0.68623966 204 nips-2009-Replicated Softmax: an Undirected Topic Model
11 0.67119485 206 nips-2009-Riffled Independence for Ranked Data
12 0.66648191 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
13 0.66379339 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution
14 0.64443302 226 nips-2009-Spatial Normalized Gamma Processes
15 0.64163715 96 nips-2009-Filtering Abstract Senses From Image Search Results
16 0.63997263 260 nips-2009-Zero-shot Learning with Semantic Output Codes
17 0.63541257 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
18 0.62984025 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference
19 0.62941575 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
20 0.6205163 154 nips-2009-Modeling the spacing effect in sequential category learning