nips nips2011 nips2011-69 knowledge-graph by maker-knowledge-mining

69 nips-2011-Differentially Private M-Estimators

Source: pdf

Author: Jing Lei

Abstract: This paper studies privacy preserving M-estimators using perturbed histograms. The proposed approach allows the release of a wide class of M-estimators with both differential privacy and statistical utility without knowing a priori the particular inference procedure. The performance of the proposed method is demonstrated through a careful study of the convergence rates. A practical algorithm is given and applied on a real world data set containing both continuous and categorical variables. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper studies privacy preserving M-estimators using perturbed histograms. [sent-3, score-0.791]

2 The proposed approach allows the release of a wide class of M-estimators with both differential privacy and statistical utility without knowing a priori the particular inference procedure. [sent-4, score-0.685]

3 Among various notions of privacy, differential privacy [1, 2] provides mathematically rigorous privacy guarantee and protects against essentially all kinds of identity attacks regardless of the auxiliary information that may be available to the attackers. [sent-8, score-0.982]

4 Differential privacy requires that the presence or absence of any individual data record can never greatly change the outcome and hence the user can hardly learn much about any individual data record from the output. [sent-9, score-0.5]

5 However, designing differentially private statistical inference procedures has been a challenging problem. [sent-10, score-0.711]

6 Differential privacy protects individual data by introducing uncertainty in the outcome, which generally requires the output of any inference procedure to be random even for a ﬁxed input data set. [sent-11, score-0.501]

7 This makes differentially private statistical analysis different from most traditional statistical inference procedures, which are deterministic once the data set is given. [sent-12, score-0.705]

8 Most existing works [3, 4, 5] focus on the interactive data release where a particular statistical inference problem is chosen a priori and the randomized output for that particular inference is released to the users. [sent-13, score-0.216]

9 In this work we study M-estimators under a differentially private framework. [sent-15, score-0.611]

10 The proposed method uses perturbed histograms to provide a systematic way of releasing a class of M-estimators in a non-interactive fashion. [sent-16, score-0.427]

11 Such a non-interactive method uses randomization independent of any particular inference procedure, therefore it allows the users to apply different inference procedures on the same synthetic data set without additional privacy compromise. [sent-17, score-0.539]

12 The accuracy of these private preserving estimates has also been studied and we prove that, under mild conditions on the contrast functions of the M-estimators, the proposed differentially private M-estimators are consistent. [sent-18, score-1.002]

13 The convexity is used to ensure the existence and stability of the M-estimator whereas the bounded derivative controls the bias caused by the perturbed histogram. [sent-21, score-0.473]

14 This is another evidence of the natural connection between robustness and differential privacy [4]. [sent-23, score-0.555]

15 1 Related Work The perturbed histogram is ﬁrst described under the context of differential privacy in [1]. [sent-27, score-1.16]

16 The problem of non-interactive release has also been studied by [6], which targets at releasing the differentially private distribution function or the density function in a non-parametric setting. [sent-28, score-0.773]

17 Several aspects of parameter estimation problems have been studied with differential privacy under the interactive framework. [sent-31, score-0.552]

18 In particular, [4] shows that many robust estimators can be made differentially private and that general private estimators can be obtained from composition of robust location and scale estimators. [sent-32, score-1.043]

19 [5] shows that statistical estimators with generic asymptotic normality can be made differentially private with the same asymptotic variance. [sent-33, score-0.687]

20 Both works involve estimating the inter-quartile range in a differentially private manner, where the algorithm may output “No Response” [4], or the data is assumed to have known upper and lower bounds [5]. [sent-34, score-0.611]

21 In a slightly different context, [3] considers penalized logistic regression as a special case of empirical risk minimization, where the penalized logistic regression coefﬁcients are estimated with differential privacy by minimizing a perturbed objective function. [sent-35, score-1.041]

22 The objective of data privacy is to release useful information from the data set while protecting information about any individual data entry. [sent-48, score-0.537]

23 A randomized function T (D) gives α-differential privacy if for all pairs of databases (D, D ) with H(D, D ) = 1 and all measurable subsets E of the image of T: P (T ∈ E|D) log ≤ α. [sent-50, score-0.456]

24 Suppose [0, 1]d is partitioned into cubic cells with equal bandwidth hn such that kn = h−1 is an integer. [sent-57, score-0.483]

25 Denote each n d 1 d cell as Br = j=1 [(rj − 1)hn , rj hn ), for all r = (r1 , . [sent-58, score-0.325]

26 The histogram 1 To make sure that Br ’s do form a partition of [0, 1]d , the interval should be [(kn − 1)hn , 1] when rj = kn . [sent-65, score-0.437]

27 2 density estimator is then ˆ fhist (x) = h−d n r where nr := n i=1 nr 1(x ∈ Br ), n (2) 1(Xi ∈ Br ) is the number of data points in Br . [sent-66, score-0.622]

28 Clearly the density estimator described above depends on the data only through the histogram counts (nr , r ∈ {1, . [sent-67, score-0.432]

29 If we can ﬁnd a differentially private version of (nr , r ∈ {1, . [sent-71, score-0.611]

30 , kn }d ), ˆ then the corresponding density estimator f will also be differentially private by a simple change-ofmeasure argument. [sent-74, score-0.802]

31 We consider the following perturbed histogram as described in [1]: nr = nr + zr , ∀ r ∈ {1, . [sent-75, score-1.275]

32 , kn }d , ˆ (3) where zr ’s are independent with density α exp(−α|z|/2)/4. [sent-78, score-0.298]

33 Substituting nr by nr in (2), we obtain a n ˆ ˆ differentially private version of fhist : ˆ fP H (x) = h−d n r nr ˆ 1(x ∈ Br ) . [sent-88, score-1.383]

34 To avoid these undesirable properties, [6] uses nr = (ˆ r ∨ 0) instead of nr and ˜ n ˆ n = r nr instead of n so that the resulting density estimator is non-negative and integrates to 1. [sent-90, score-0.83]

35 3 Differentially private M-estimators Combining equations (4) and (5) gives a differentially private objective function: ˆ fP H (x)m(x, θ)dx. [sent-97, score-0.967]

36 Mn,P H (θ) = (6) [0,1]d We wish to use the minimizer of Mn,P H as a differentially private estimate of θ∗ . [sent-98, score-0.666]

37 Under conditions (A1)-(A3), if hn ˆ∗ minimizer, θP H , of Mn,P H , such that ˆ∗ |θP H − θ∗ | = OP n−1/2 ∨ ( log n/n)2/(d+2) . [sent-109, score-0.273]

38 The approximation error of Mn,P H (θ) can be decomposed into three parts: ˆ (fP H (x) − f (x))g(x, θ)dx =n−1 zr h−d g(x, θ)dx Br r n +n −1 nr h−d n +n−1 g(x, θ)dx − Br r g(Xi , θ) − Eg(X, θ) . [sent-112, score-0.395]

39 As in the general theory of histogram estimators, the approximation error depends on the choice of bandwidth hn . [sent-114, score-0.619]

40 Generally speaking, if the bandwidth is small, then the histogram bias term will be small. [sent-115, score-0.425]

41 As a result, there is a trade-off between the histogram bias and Laplacian noises in the choice of bandwidth. [sent-117, score-0.412]

42 We also comment on practical choices of hn in Section 4. [sent-119, score-0.237]

43 Second, using Lipschitz property of g, the histogram bias term in (8) is of order O(hn ). [sent-122, score-0.342]

44 Therefore it sufﬁces to show that √ −d/2 −1 supθ∈Θ0 zr h−d Br m(x, θ)dx = OP ( log n/n)hn , which can be established usrn ing a concentration inequality due to Talagrand [11] (see also [12, Equation 1. [sent-123, score-0.215]

45 1 Algorithm based on perturbed histogram ˆ In practice, exact integration of fP H (x)m(x, θ) over each cell Br may be computationally expensive ˆ and approximations must be adopted to make the implementation feasible. [sent-126, score-0.681]

46 Formally, we introduce the following algorithm: Algorithm 1 (M-estimator using perturbed histogram) Input: D = {X1 , · · · , Xn }, m(·, ·), α, hn . [sent-129, score-0.573]

47 Construct perturbed histogram with bandwidth hn and privacy parameter α as in (3). [sent-131, score-1.375]

48 Let Mn,P H (θ) = n−1 r nr m(ar , θ), where ar ∈ [0, 1]d is the center of Br , with ar (j) = ˆ (rj − 0. [sent-133, score-0.429]

49 ˆ∗ Comparing to θn,P H obtained by minimizing the exact integral, the only term in (8) impacted by using g(ar , θ) instead of h−d Br g(x, θ)dx is the histogram bias term. [sent-137, score-0.342]

50 Under Assumptions (A1-A3), if Mn,P H (θ) is giv√ ˆ en by Algorithm 1 with hn ( log n/n)2/(2+d) then there exists a local minimizer, θP H , of Mn,P H (θ), such that √ ˆ |θP H − θ∗ | = OP (1/ n ∨ ( log n/n)2/(2+d) ). [sent-140, score-0.309]

51 To be speciﬁc, suppose [0, 1] is partitioned into equal-sized cells (Br , 1 ≤ r ≤ kn ) as in the ordinary univariate histogram. [sent-146, score-0.189]

52 The joint histogram for (X, Y ) is constructed by counting the number of data points in each of the product cells Br,j := Br × {j} for j = 0, 1. [sent-147, score-0.366]

53 Note that Theorems 3 and 4 do not guarantee the uniqueness or even existence of a global minimizer for the perturbed objective function Mn,P H (θ). [sent-150, score-0.417]

54 This is because sometimes with small probability some perturbed histogram count nr can be negative hence the corresponding objective function ˆ Mn,P H may not be convex. [sent-151, score-0.906]

55 Algorithm 1 (Perturbed histogram with nonnegative counts) Input: D = {X1 , · · · , Xn }, m(·, ·), α, hn . [sent-154, score-0.536]

56 1 Construct perturbed histogram with bandwidth hn and privacy parameter α as in (3). [sent-155, score-1.375]

57 ˜ 2 Let Mn,P H (θ) = n−1 nr m(ar , θ), where nr = max(ˆ r , 0). [sent-156, score-0.49]

58 Under Assumptions (A1-A3) and hn (log n/n)1/(1+d) , the estimator given by Algorithm 1 satisﬁes ˜ |θP H − θ∗ | = OP ((log n/n)1/(1+d) ). [sent-160, score-0.28]

59 The proof follows essentially from that of Theorem 3, with a different choice of bandwidth hn . [sent-162, score-0.32]

60 The concentration inequality result no longer holds for r zr g(ar , θ) where zr = ˜ ˜ max(zr , −nr ), because zr ’s are not independent. [sent-163, score-0.479]

61 The histogram bias is still n O(hn ) as we mentioned in the discussion of Algorithm 1. [sent-166, score-0.342]

62 Therefore the convergence rate is optimized by choosing hn (log n/n)1/(1+d) . [sent-167, score-0.26]

63 The robustness of sample quantiles also makes them good candidates for differentially private data release. [sent-174, score-0.761]

64 Differentially private quantile estimators are indeed major building blocks for some existing privacy preserving statistical estimators [4, 5]. [sent-175, score-1.01]

65 Our result in this subsection shows that perturbed histograms can give simple, consistent, and differentially private quantile estimators. [sent-176, score-1.133]

66 Under conditions (B1-B3) and hn √ ˆ ( log n/n)2/(2+d) , any minimizer θP H of Mn,P H given by Algorithm 1 satisﬁes (9). [sent-181, score-0.328]

67 For quantiles the contrast function is piecewise linear, so for most cells in the histogram there would be no approximation error if the data points are approximated by the cell center. [sent-193, score-0.585]

68 For quantiles, we have d = 1 and the quantile estimators described above can be extended to any continuous random variable whose density function is supported on (−∞, ∞). [sent-197, score-0.268]

69 Applying the perturbed histogram quantile estimator on {Xi , i√= 1, . [sent-205, score-0.776]

70 , n} we obtain qX,P H (τ ), the differentially priˆ by vate τ -th qunatile of X, which is 1/ n-consistent√ Corollary 6. [sent-208, score-0.281]

71 Once the histogram is constructed, following Algorithm 1, we can view it as a set of h−d = O(n2d/(2+d) ) weighted data points n ar , r ∈ {1, . [sent-217, score-0.391]

72 , h−1 }d1 } be the corresponding set of histogram cells in [0, 1]d1 . [sent-244, score-0.366]

73 Then the joint histogram for (X, Y ) is constructed with cells d2 Br,y , r ∈ {1, . [sent-245, score-0.366]

74 j=1 Because only the continuous variables have histogram approximation error, the theoretical results developed in Section 3 are applicable with sample size n and dimensionality d1 . [sent-252, score-0.342]

75 To alleviate this problem, we threshold the histogram with an enhanced cut-off value: nr = nr 1(ˆ r ≥ A log n/α), where A > 0 is a tuning parameter. [sent-256, score-0.871]

76 3 Application to housing price data As an illustration, we apply our method to a housing price data consisting of 348,189 houses sold in San Francisco Bay Area between 2003 and 2006. [sent-260, score-0.316]

77 The inference problem of interest is to study the relationship between housing price and other variables [14]. [sent-262, score-0.202]

78 In our case, we want to build a simple linear regression model to predict the housing price using the other variables while protecting each individual transaction record with differential privacy. [sent-263, score-0.398]

79 For each (year, county) combination, a perturbed histogram is constructed over the two continuous variables with privacy parameter α and K levels in each continuous dimension. [sent-268, score-1.141]

80 Then there are 4 × 6 × K 2 cells, each having a perturbed histogram count. [sent-269, score-0.635]

81 1, the perturbed data can be viewed as a data set with 24K 2 data points weighted by the perturbed histogram counts. [sent-271, score-0.971]

82 A differentially private regression coefﬁcient is obtained by applying a weighted least square regression on this data set. [sent-272, score-0.726]

83 To assess the performance, the privacy preserving regression coefﬁcients are compared with those given by the non-private ordinary least square (OLS) estimates. [sent-273, score-0.555]

84 Recall that a smaller value of α indicates a stronger privacy guarantee. [sent-278, score-0.42]

85 For α = 1 the coefﬁcients given by the perturbed histogram are close to those given by OLS with most relative deviances below 5%. [sent-281, score-0.635]

86 1, the perturbed histogram still gives reasonably close estimates with average deviance below 10% for all parameters 7 Table 1: Linear regression coefﬁcients using the Bay Area housing data. [sent-285, score-0.804]

87 We compare estimate given by (1) perturbed histogram (PH, Algorithm 1) and (2) perturbed histogram with enhanced thresholding (THLD) as described in Subsection 4. [sent-287, score-1.364]

88 The histogram with use K = 10 segments in each continuous dimension. [sent-290, score-0.342]

89 This variable has the smallest OLS coefﬁcient among all county dummy variables, so weight ﬂuctuation in the histogram causes a relatively larger impact on the relative deviance. [sent-325, score-0.446]

90 Even though, the perturbed histogram still gives at least qualitatively correct estimate. [sent-326, score-0.635]

91 We also observe that the thresholded histogram gives more accurate estimate for all coefﬁcients except for County4 when α = 0. [sent-327, score-0.299]

92 Our theory suggests K = O(n2/(2+d) ) where d is the dimensionality of the histogram and hence equals the number of continuous variables. [sent-330, score-0.342]

93 5 Further Discussions We demonstrate how histograms can be used as a basic tool for statistical parameter estimation under strong privacy constraints. [sent-337, score-0.48]

94 The perturbed histogram adds to each histogram count a doubleexponential noise with constant parameter depending only on the privacy budget α. [sent-338, score-1.354]

95 The histogram approximation bias and the additive noise on the cell counts result in a bias-variance trade-off as usually seen for histogram-based methods. [sent-339, score-0.476]

96 One possibility is to perturb the minimum sufﬁcient statistics because the dimensionality of minimum sufﬁcient statistics is usually much smaller than the number of histogram cells. [sent-342, score-0.323]

97 The perturbed histogram is also related to “error in variable” inference problems. [sent-346, score-0.679]

98 Suppose the original data is just the histogram, then the perturbed version can be thought as the true histogram counts contaminated by some measurement errors. [sent-347, score-0.696]

99 However, plugging in the perturbed values does not necessarily give the best inference procedure and better alternatives may be possible, see [15] for a hypothesis testing example in contingency tables. [sent-349, score-0.38]

100 A positive answer to this question will help establish a lower bound of approximation error and better understand the power and limit of perturbed histograms. [sent-351, score-0.336]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('privacy', 0.42), ('perturbed', 0.336), ('private', 0.33), ('histogram', 0.299), ('differentially', 0.281), ('nr', 0.245), ('hn', 0.237), ('br', 0.176), ('zr', 0.15), ('quantiles', 0.12), ('differential', 0.105), ('op', 0.101), ('quantile', 0.098), ('housing', 0.098), ('county', 0.098), ('kn', 0.096), ('ar', 0.092), ('ols', 0.084), ('bandwidth', 0.083), ('fp', 0.073), ('noises', 0.07), ('cells', 0.067), ('dx', 0.06), ('price', 0.06), ('releasing', 0.056), ('thld', 0.056), ('minimizer', 0.055), ('release', 0.054), ('subsection', 0.053), ('categorical', 0.053), ('density', 0.052), ('estimators', 0.051), ('thresholding', 0.048), ('enhanced', 0.046), ('cell', 0.046), ('ph', 0.045), ('bay', 0.045), ('inference', 0.044), ('estimator', 0.043), ('lipschitz', 0.043), ('year', 0.043), ('continuous', 0.043), ('bias', 0.043), ('rj', 0.042), ('regression', 0.041), ('laplacian', 0.041), ('mn', 0.039), ('counts', 0.038), ('utility', 0.037), ('dwork', 0.037), ('fhist', 0.037), ('protecting', 0.037), ('protects', 0.037), ('supr', 0.037), ('talagrand', 0.037), ('log', 0.036), ('logistic', 0.036), ('kj', 0.035), ('histograms', 0.035), ('convexity', 0.035), ('preserving', 0.035), ('coef', 0.034), ('cients', 0.033), ('square', 0.033), ('lei', 0.033), ('qz', 0.033), ('caused', 0.032), ('procedures', 0.031), ('xi', 0.031), ('fz', 0.03), ('deviance', 0.03), ('robustness', 0.03), ('perturbation', 0.029), ('record', 0.029), ('concentration', 0.029), ('transaction', 0.028), ('derivative', 0.027), ('ces', 0.027), ('piecewise', 0.027), ('ordinal', 0.027), ('jing', 0.027), ('interactive', 0.027), ('additive', 0.026), ('contrast', 0.026), ('objective', 0.026), ('ordinary', 0.026), ('dummy', 0.025), ('statistical', 0.025), ('mle', 0.024), ('usually', 0.024), ('variable', 0.024), ('condition', 0.024), ('discretize', 0.024), ('convergence', 0.023), ('theorem', 0.023), ('house', 0.023), ('measurement', 0.023), ('outcome', 0.022), ('released', 0.022), ('arg', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 69 nips-2011-Differentially Private M-Estimators

Author: Jing Lei

2 0.074320845 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

Author: Daniel J. Lizotte

Abstract: Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, “Expansion-Constrained Ordinary Least Squares” (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain the least squares regression operator to be a non-expansion in the ∞-norm. We show that the space of function approximators that satisfy this constraint is more rich than the space of “averagers,” we prove a minimax property of the ECOLS residual error, and we give an efﬁcient algorithm for computing the coefﬁcients of ECOLS based on constraint generation. We illustrate the algorithmic convergence of FVI with ECOLS in a suite of experiments, and discuss its properties. 1

3 0.071910143 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

Author: Po-ling Loh, Martin J. Wainwright

Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difﬁcult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1

4 0.066349171 198 nips-2011-On U-processes and clustering performance

Author: Stéphan J. Clémençcon

Abstract: Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to deﬁne a general statistical framework, relying on the theory of U -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the √ excess of clustering risk is proved to be of the order OP (1/ n). Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. 1

5 0.057569683 186 nips-2011-Noise Thresholds for Spectral Clustering

Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh

Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1

6 0.057141099 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

7 0.05457107 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes

8 0.052139048 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

9 0.05129762 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

10 0.049308255 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

11 0.048065882 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

12 0.047876686 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

13 0.046408694 123 nips-2011-How biased are maximum entropy models?

14 0.044427995 305 nips-2011-k-NN Regression Adapts to Local Intrinsic Dimension

15 0.043786999 265 nips-2011-Sparse recovery by thresholded non-negative least squares

16 0.041935321 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

17 0.041835241 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

18 0.041252967 165 nips-2011-Matrix Completion for Multi-label Image Classification

19 0.040960606 26 nips-2011-Additive Gaussian Processes

20 0.04035842 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.133), (1, -0.008), (2, -0.002), (3, -0.054), (4, -0.038), (5, -0.009), (6, -0.007), (7, -0.041), (8, 0.011), (9, 0.042), (10, -0.012), (11, -0.007), (12, 0.004), (13, -0.02), (14, 0.026), (15, 0.023), (16, 0.046), (17, -0.027), (18, -0.019), (19, 0.06), (20, -0.014), (21, 0.039), (22, -0.01), (23, -0.003), (24, 0.008), (25, 0.065), (26, -0.066), (27, -0.007), (28, 0.053), (29, 0.033), (30, 0.1), (31, 0.043), (32, 0.031), (33, 0.017), (34, -0.029), (35, -0.085), (36, 0.103), (37, 0.077), (38, 0.003), (39, -0.002), (40, -0.177), (41, -0.041), (42, -0.022), (43, -0.108), (44, 0.037), (45, -0.006), (46, 0.055), (47, -0.054), (48, -0.019), (49, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90968084 69 nips-2011-Differentially Private M-Estimators

Author: Jing Lei

2 0.61802948 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

Author: Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama

Abstract: Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high ﬂuctuation, divergence estimation is still a challenging task in practice. In this paper, we propose to use relative divergences for distribution comparison, which involves approximation of relative density-ratios. Since relative density-ratios are always smoother than corresponding ordinary density-ratios, our proposed method is favorable in terms of the non-parametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overﬁts even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach. 1

3 0.5787133 306 nips-2011-t-divergence Based Approximate Inference

Author: Nan Ding, Yuan Qi, S.v.n. Vishwanathan

Abstract: Approximate inference is an important technique for dealing with large, intractable graphical models based on the exponential family of distributions. We extend the idea of approximate inference to the t-exponential family by deﬁning a new t-divergence. This divergence measure is obtained via convex duality between the log-partition function of the t-exponential family and a new t-entropy. We illustrate our approach on the Bayes Point Machine with a Student’s t-prior. 1

4 0.56812245 295 nips-2011-Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction

Author: Siwei Lyu

Abstract: When used to learn high dimensional parametric probabilistic models, the classical maximum likelihood (ML) learning often suffers from computational intractability, which motivates the active developments of non-ML learning methods. Yet, because of their divergent motivations and forms, the objective functions of many non-ML learning methods are seemingly unrelated, and there lacks a uniﬁed framework to understand them. In this work, based on an information geometric view of parametric learning, we introduce a general non-ML learning principle termed as minimum KL contraction, where we seek optimal parameters that minimizes the contraction of the KL divergence between the two distributions after they are transformed with a KL contraction operator. We then show that the objective functions of several important or recently developed non-ML learning methods, including contrastive divergence [12], noise-contrastive estimation [11], partial likelihood [7], non-local contrastive objectives [31], score matching [14], pseudo-likelihood [3], maximum conditional likelihood [17], maximum mutual information [2], maximum marginal likelihood [9], and conditional and marginal composite likelihood [24], can be uniﬁed under the minimum KL contraction framework with different choices of the KL contraction operators. 1

5 0.56668276 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Author: Sham M. Kakade, Varun Kanade, Ohad Shamir, Adam Kalai

Abstract: Generalized Linear Models (GLMs) and Single Index Models (SIMs) provide powerful generalizations of linear regression, where the target variable is assumed to be a (possibly unknown) 1-dimensional function of a linear predictor. In general, these problems entail non-convex estimation procedures, and, in practice, iterative local search heuristics are often used. Kalai and Sastry (2009) provided the ﬁrst provably efﬁcient method, the Isotron algorithm, for learning SIMs and GLMs, under the assumption that the data is in fact generated under a GLM and under certain monotonicity and Lipschitz (bounded slope) constraints. The Isotron algorithm interleaves steps of perceptron-like updates with isotonic regression (ﬁtting a one-dimensional non-decreasing function). However, to obtain provable performance, the method requires a fresh sample every iteration. In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efﬁcient. We modify the isotonic regression step in Isotron to ﬁt a Lipschitz monotonic function, and also provide an efﬁcient O(n log(n)) algorithm for this step, improving upon the previous O(n2 ) algorithm. We provide a brief empirical study, demonstrating the feasibility of our algorithms in practice. 1

6 0.55661589 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

7 0.55627435 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

8 0.51873481 123 nips-2011-How biased are maximum entropy models?

9 0.48692647 92 nips-2011-Expressive Power and Approximation Errors of Restricted Boltzmann Machines

10 0.47720271 197 nips-2011-On Tracking The Partition Function

11 0.47646448 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers

12 0.47082028 162 nips-2011-Lower Bounds for Passive and Active Learning

13 0.46646917 55 nips-2011-Collective Graphical Models

14 0.44716212 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

15 0.44242048 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression

16 0.43572882 240 nips-2011-Robust Multi-Class Gaussian Process Classification

17 0.43055338 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection

18 0.42986768 239 nips-2011-Robust Lasso with missing and grossly corrupted observations

19 0.42782369 33 nips-2011-An Exact Algorithm for F-Measure Maximization

20 0.42489868 198 nips-2011-On U-processes and clustering performance

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (4, 0.028), (9, 0.273), (20, 0.046), (26, 0.044), (31, 0.065), (33, 0.014), (43, 0.11), (45, 0.11), (48, 0.014), (57, 0.032), (74, 0.049), (83, 0.042), (84, 0.01), (99, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81601459 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Author: Hai-son P. Le, Ziv Bar-joseph

Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1

same-paper 2 0.78428996 69 nips-2011-Differentially Private M-Estimators

Author: Jing Lei

3 0.7632935 181 nips-2011-Multiple Instance Learning on Structured Data

Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence

Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efﬁcient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with speciﬁed precision on each set of constraints. Experimental results on three different applications, i.e., webpage classiﬁcation, market targeting, and protein fold identiﬁcation, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1

4 0.62101847 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

5 0.57454425 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

Author: Po-ling Loh, Martin J. Wainwright

6 0.5741421 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning

7 0.57329804 258 nips-2011-Sparse Bayesian Multi-Task Learning

8 0.57280725 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

9 0.57219726 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

10 0.57090163 239 nips-2011-Robust Lasso with missing and grossly corrupted observations

11 0.57029748 198 nips-2011-On U-processes and clustering performance

12 0.56880468 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

13 0.56867158 186 nips-2011-Noise Thresholds for Spectral Clustering

14 0.56717718 25 nips-2011-Adaptive Hedge

15 0.5668726 29 nips-2011-Algorithms and hardness results for parallel large margin learning

16 0.56621325 265 nips-2011-Sparse recovery by thresholded non-negative least squares

17 0.56536627 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation

18 0.56515741 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection

19 0.56492388 22 nips-2011-Active Ranking using Pairwise Comparisons

20 0.56457466 294 nips-2011-Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning