nips nips2012 nips2012-18 knowledge-graph by maker-knowledge-mining

18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release


Source: pdf

Author: Moritz Hardt, Katrina Ligett, Frank Mcsherry

Abstract: We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. [sent-5, score-0.378]

2 1 Introduction Sensitive statistical data on individuals are ubiquitous, and publishable analysis of such private data is an important objective. [sent-7, score-0.227]

3 When releasing statistics or synthetic data based on sensitive data sets, one must balance the inherent tradeoff between the usefulness of the released information and the privacy of the affected individuals. [sent-8, score-0.415]

4 Against this backdrop, differential privacy [1, 2, 3] has emerged as a compelling privacy definition that allows one to understand this tradeoff via formal, provable guarantees. [sent-9, score-0.799]

5 In recent years, the theoretical literature on differential privacy has provided a large repertoire of techniques for achieving the definition in a variety of settings (see, e. [sent-10, score-0.459]

6 However, data analysts have found that several algorithms for achieving differential privacy add unacceptable levels of noise. [sent-13, score-0.459]

7 In this work we develop a broadly applicable, simple, and easy-to-implement algorithm, capable of substantially improving the performance of linear queries on many realistic datasets. [sent-14, score-0.334]

8 Linear queries are equivalent to statistical queries (in the sense of [6]) and can serve as the basis of a wide range of data analysis and learning algorithms (see [7] for some examples). [sent-15, score-0.59]

9 We present experimental results for differentially private data release for a variety of problems studied in prior work: range queries as studied by [11, 12], contingency table release across a collection of statistical benchmarks as in [13], and datacube release as studied by [14]. [sent-18, score-1.333]

10 We empirically evaluate the accuracy of the differentially private data produced by MWEM using the same query class and accuracy metric proposed by each of the corresponding prior works, improving on all. [sent-19, score-0.623]

11 Beyond empirical improvements in these settings, MWEM matches the best known and nearly optimal theoretical accuracy guarantees for differentially private data analysis with linear queries. [sent-20, score-0.404]

12 1 Finally, we describe a scalable implementation of MWEM capable of processing datasets of substantial complexity. [sent-23, score-0.065]

13 Producing synthetic data for the classes of queries we consider is known to be computationally hard in the worst-case [15, 16]. [sent-24, score-0.312]

14 We repeatedly improve the accuracy of this approximation with respect to the private dataset and the desired query set by selecting and posing a query poorly served by our approximation and improving the approximation to better reflect the true answer to this query. [sent-29, score-0.641]

15 We select and pose queries using the Exponential [10] and Laplace Mechanisms [3], whose definitions and privacy properties we review in Subsection 2. [sent-30, score-0.621]

16 1 Differential Privacy and Mechanisms Differential privacy is a constraint on a randomized computation that the computation should not reveal specifics of individual records present in the input. [sent-35, score-0.441]

17 It places this constraint by requiring the mechanism to behave almost identically on any two datasets that are sufficiently close. [sent-36, score-0.209]

18 Imagine a dataset A whose records are drawn from some abstract domain D, and which is described as a function from D to the natural numbers N, with A(x) indicating the frequency (number of occurrences) of x in the dataset. [sent-37, score-0.16]

19 We use kA Bk to indicate the sum of the absolute values of difference in frequencies (how many records would have to be added or removed to change A to B). [sent-38, score-0.101]

20 A mechanism M mapping datasets to distributions over an output space R provides (", )-differential privacy if for every S ✓ R and for all data sets A, B where kA Bk  1, P r[M (A) 2 S]  e" Pr[M (B) 2 S] + . [sent-41, score-0.549]

21 The Exponential Mechanism [10] is an "-differentially private mechanism that can be used to select among the best of a discrete set of alternatives, where “best” is defined by a function relating each alternative to the underlying secret data. [sent-43, score-0.394]

22 Intuitively, the mechanism selects result r biased exponentially by its quality score. [sent-48, score-0.188]

23 A linear query (also referred to as counting query or statistical query) is specified by a function q mapping data records to the interval [ 1, +1]. [sent-50, score-0.479]

24 The answer of a linear query on a data set D, denoted P q(B), is the sum x2D q(x) ⇥ B(x). [sent-51, score-0.168]

25 The Laplace Mechanism is an "-differentially private mechanism which reports approximate sums of bounded functions across a dataset. [sent-52, score-0.394]

26 2 Inputs: Data set B over a universe D; Set Q of linear queries; Number of iterations T 2 N; Privacy parameter " > 0; Number of records n. [sent-55, score-0.142]

27 Exponential Mechanism: Select a query qi 2 Q using the Exponential Mechanism parameterized with epsilon value "/2T and the score function si (B, q) = |q(Ai 1) q(B)| . [sent-61, score-0.299]

28 Note that these bounds are worst-case bounds, over adversarially chosen data and query sets. [sent-69, score-0.168]

29 The algorithm is embarrassingly parallel: query evaluation can be conducted independently, implemented using modern database technology; the only required serialization is that the T steps must proceed in sequence, but within each step essentially all work is parallelizable. [sent-74, score-0.168]

30 [17] show that for worst case data, producing differentially private synthetic data for a set of counting queries requires time |D|0. [sent-76, score-0.732]

31 Moreover, Ullman and Vadhan [16] showed that similar lower bounds also hold for more basic query classes such as we consider in Section 3. [sent-78, score-0.168]

32 Despite these hardness results, we provide an alternate implementation of our algorithm in Section 4 and demonstrate that its running time is acceptable on real-world data even in cases where |D| is as large as 277 , and on simple synthetic input datasets where |D| is as large as 21000 . [sent-80, score-0.123]

33 Second, in each iteration we apply the multiplicative weights update rule for all measuments taken, multiple times; as long as any measurements do not agree with the approximating distribution (within error) we can improve the result. [sent-85, score-0.195]

34 Finally, it is occasionally helpful to initialize A0 by performing a noisy count for each element of the domain; this consumes from the privacy budget and lessens the accuracy of subsequent queries, but is often a good trade-off. [sent-86, score-0.366]

35 , polynomial in |D|) as well as algorithms that work in the interactive query setting. [sent-93, score-0.211]

36 The latest of these results is the private Multiplicative Weights method of Hardt and Rothblum [8] p which achieves error rates of O( n log(|Q|)) for (", )-differential privacy (which is the same dependence achieved by applying k-fold adaptive composition [19] and optimizing T in our Theorem 2. [sent-94, score-0.59]

37 While their algorithm works in the interactive setting, it can also be used non-interactively to produce synthetic data, albeit at a computational overhead of O(n). [sent-96, score-0.074]

38 Prior work on linear queries includes Fienberg et al. [sent-99, score-0.31]

39 [22] on range queries (and substantial related work [23, 24, 22, 11, 12, 25] which Li and Miklau [11, 25] show can all be seen as instances of the matrix mechanism of [22]); and Ding et al. [sent-102, score-0.528]

40 3 Experimental Evaluation We evaluate MWEM across a variety of query classes, datasets, and metrics as explored by prior work, demonstrating improvement in the quality of approximation (often significant) in each case. [sent-106, score-0.189]

41 The problems we consider are: (1) range queries under the total squared error metric, (2) binary 4 contingency table release under the relative entropy metric, and (3) datacube release under the average absolute error metric. [sent-107, score-0.918]

42 Although contingency table release and datacube release are very similar, prior work on the two have had different focuses: small datasets over many binary attributes vs. [sent-108, score-0.691]

43 large datasets over few categorical attributes, low-order marginals vs. [sent-109, score-0.093]

44 Our general conclusion is that intelligently selecting the queries to measure can result in significant accuracy improvements, in settings where accuracy is a scare resource. [sent-112, score-0.333]

45 When the privacy parameters are very lax, or the query set very simple, direct measurement of all queries yields better results than expending some fraction of the privacy budget determining what to measure. [sent-113, score-1.148]

46 On the other hand, in the more challenging case of restrictions on privacy for complex data and query sets, MWEM can substantially out-perform previous algorithms. [sent-114, score-0.536]

47 1 Range Queries A range query over a domain D = {1, . [sent-116, score-0.228]

48 , N } is a counting query specified by the indicator function of an interval I ✓ D. [sent-119, score-0.21]

49 Dd a range query is defined by the product of indicator functions. [sent-123, score-0.196]

50 Differentially private algorithms for range queries were specifically considered by [18, 23, 24, 22, 11, 12, 25]. [sent-124, score-0.536]

51 As noted in [11, 25], all previously implemented algorithms for range queries can be seen as instances of the matrix mechanism of [22]. [sent-125, score-0.476]

52 Moreover, [11, 25] show a lower bound on the total squared error achieved by the matrix mechanism in terms of the singular values of a matrix associated with the set of queries. [sent-126, score-0.19]

53 The y-axis measures the average squared error per query, averaged over 5 independent repetitions of the experiment, as epsilon varies. [sent-168, score-0.125]

54 The improvement is most significant for small epsilon, diminishing as epsilon increases. [sent-169, score-0.102]

55 We chose these data sets as they feature numerical attributes of suitable size. [sent-171, score-0.145]

56 In Figure 2, we compare the performance of MWEM on sets of randomly chosen range queries against the SVD lower bound proved by [11, 25], varying " while keeping the number of queries fixed. [sent-172, score-0.59]

57 The SVD lower bound holds for algorithms achieving the strictly weaker guarantee of (", )differential privacy with > 0, permitting some probability of unbounded disclosure. [sent-173, score-0.34]

58 9 1 Figure 3: Relative entropy (y-axis) as a function of epsilon (x-axis) for the mildew, rochdale, and czech datasets, respectively. [sent-215, score-0.14]

59 The solid black horizontal line is the stated relative entropy values from Fienberg et al. [sent-219, score-0.088]

60 bound depends on ; in our experiments we fixed = 1/n when instantiating the SVD bound, as any larger value of permits mechanisms capable of exact release of individual records. [sent-221, score-0.17]

61 [21] describe an approach to differentially private contingency table release using linear queries defined by the Hadamard matrix. [sent-225, score-0.954]

62 Importantly, all k-dimensional marginals d can be exactly recovered by examination of relatively few such queries: roughly k out of the possible 2d , improving over direct measurement of the marginals by a factor of 2k . [sent-226, score-0.146]

63 x2D We use several statistical datasets from Fienberg et al. [sent-232, score-0.071]

64 [21] which combines its observations using multiplicative weights (we find that without this modification, [21] is terrible with respect to relative entropy). [sent-234, score-0.177]

65 These experiments are therefore largely assessing the selective choice of measurements to take, rather than the efficacy of multiplicative weights. [sent-235, score-0.152]

66 Our findings here are fairly uniform across the datasets: the ability to measure only those queries that are informative about the dataset results in substantial savings over taking all possible measurements. [sent-237, score-0.331]

67 For our initial settings, maintaining all three-way marginals, we see similar behavior as above: the ability to choose the measurements that are important allows substantially higher accuracy on those that matter. [sent-241, score-0.093]

68 The red (dashed) curve represents Barak et al, and the multiple blue (solid) curves represent MWEM, with 20, 30, and 40 queries (top to bottom, respectively). [sent-292, score-0.31]

69 As before, the xaxis is the value of epsilon guaranteed, and the y-axis is the relative entropy between the produced distribution and actual dataset. [sent-294, score-0.161]

70 3 Data Cubes We now change our terminology and objectives, shifting our view of contingency tables to one of datacubes. [sent-298, score-0.176]

71 The two concepts are interchangeable, a contingency table corresponding to the datacube, and a marginal corresponding to its cuboids. [sent-299, score-0.153]

72 Although MWEM is defined with respect to a single query at a time, it generalizes to sets of counting queries, as reflected in a cuboid. [sent-302, score-0.21]

73 The Exponential Mechanism can select a cuboid to measure using a quality score function summing the absolute values of the errors within the cells of the cuboid. [sent-303, score-0.116]

74 We also (heuristically) subtract the number of cells from the score of a cuboid to bias the selection away from cuboids with many cells, which would collect Laplace error in each cell. [sent-304, score-0.147]

75 An entire cuboid can be measured with a single differentially private query, as any record contributes to at most one cell (this is a generalization of the Laplace Mechanism to multiple dimensions, from [3]). [sent-306, score-0.504]

76 Finally, Multiplicative Weights works unmodified, increasing and decreasing weights based on the over- or under-estimation of the count to which the record contributes. [sent-307, score-0.074]

77 5 2 MWEM (T = 10) Figure 5: Comparison of MWEM with the custom approaches from [14], varying epsilon through the reported values from [14]. [sent-314, score-0.102]

78 Each cuboid (marginal) is assessed by its average error, and either the average or maximum over all 256 marginals is taken to evaluate the technique. [sent-315, score-0.146]

79 As the number of attributes grows, the universe D grows exponentially, and it can quickly become infeasible to track the distribution explicitly. [sent-321, score-0.186]

80 Recall that the heart of MWEM maintains a distribution Ai over D that is then used in the Exponential Mechanism to select queries poorly approximated by the current distribution. [sent-323, score-0.304]

81 From the definition of the Multiplicative Weights distribution, we see that the weight Ai (x) can be determined from the history Hi = {(qj , mj ) : j  i}: 0 1 X Ai (x) / exp @ qj (x) ⇥ (mj qj (Aj 1 ))/2nA . [sent-324, score-0.154]

82 ji We explicitly record the scaling factors lj = mj qj (Aj {(qj , mj , lj ) : j  i}, to remove the dependence on prior Aj . [sent-325, score-0.191]

83 If we partition these attributes into disjoint parts D1 , D2 , . [sent-327, score-0.145]

84 Dk so that no query in Hi involves attributes from more than one part, then the distribution produced by Multiplicative Weights is a product distribution over D1 ⇥D2 ⇥. [sent-330, score-0.313]

85 For query classes that factorize over the attributes of the domain (for example, range queries, marginal queries, and cuboid queries) we can rewrite and efficiently perform the integration over D using 0 1 X Y X @ q(x) ⇥ Ai (x) = q(xj ) ⇥ Aj (xj )A . [sent-334, score-0.468]

86 is a mini Multiplicative Weights over attributes in part Dj , using only the relevant queries So long as the measurements taken reflect modest groups of independent attributes, the integration can be efficiently performed. [sent-339, score-0.465]

87 Experimentally, we are able to process a binarized form of the Adult dataset with 27 attributes efficiently (taking 80 seconds to process completely), and the addition of 50 new independent binary attributes, corresponding to a domain of size 277 , results in neglible performance impact. [sent-342, score-0.204]

88 5 Conclusions We introduced MWEM, a simple algorithm for releasing data maintaining a high fidelity to the protected source data, as well as differential privacy with respect to the records. [sent-344, score-0.503]

89 An analyst does not require a complicated mathematical understanding of the nature of the queries (as the community has for linear algebra [11] and the Hadamard transform [21]), but rather only needs to enumerate those measurements that should be preserved. [sent-348, score-0.32]

90 The promise of differential privacy: A tutorial on algorithmic techniques. [sent-370, score-0.119]

91 A multiplicative weights mechanism for interactive privacy-preserving data analysis. [sent-382, score-0.366]

92 An adaptive mechanism for accurate query answering under differential privacy. [sent-395, score-0.48]

93 Differential privacy and the risk-utility tradeoff for multi-dimensional contingency tables. [sent-399, score-0.493]

94 Differentially private data cubes: optimizing noise sources and consistency. [sent-402, score-0.227]

95 On the complexity of differentially private data release: efficient algorithms and hardness results. [sent-407, score-0.428]

96 PCPs and the hardness of generating private synthetic data. [sent-411, score-0.308]

97 On the complexity of differentially private data release: efficient algorithms and hardness results. [sent-420, score-0.428]

98 The median mechanism: Interactive and efficient privacy with multiple queries. [sent-429, score-0.34]

99 Privacy, accuracy, and consistency too: a holistic solution to contingency table release. [sent-438, score-0.153]

100 Measuring the achievable error of query sets under differential privacy. [sent-455, score-0.31]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mwem', 0.584), ('privacy', 0.34), ('queries', 0.281), ('private', 0.227), ('query', 0.168), ('mechanism', 0.167), ('contingency', 0.153), ('differentially', 0.151), ('attributes', 0.145), ('release', 0.142), ('barak', 0.122), ('differential', 0.119), ('multiplicative', 0.113), ('dwork', 0.102), ('epsilon', 0.102), ('records', 0.101), ('cuboid', 0.095), ('cynthia', 0.088), ('laplace', 0.077), ('mcsherry', 0.077), ('svd', 0.072), ('datacube', 0.067), ('gerome', 0.067), ('hardt', 0.067), ('adult', 0.064), ('fienberg', 0.063), ('rothblum', 0.059), ('ai', 0.052), ('pods', 0.051), ('marginals', 0.051), ('katrina', 0.05), ('miklau', 0.05), ('salil', 0.05), ('tcc', 0.05), ('transfusion', 0.05), ('qj', 0.05), ('hardness', 0.05), ('ligett', 0.044), ('releasing', 0.044), ('interactive', 0.043), ('weights', 0.043), ('datasets', 0.042), ('counting', 0.042), ('stoc', 0.042), ('chao', 0.041), ('moritz', 0.041), ('universe', 0.041), ('measurements', 0.039), ('exponential', 0.039), ('aj', 0.038), ('entropy', 0.038), ('focs', 0.037), ('ka', 0.035), ('mj', 0.034), ('hay', 0.033), ('kobbi', 0.033), ('reingold', 0.033), ('guy', 0.032), ('hadamard', 0.032), ('domain', 0.032), ('synthetic', 0.031), ('aaron', 0.031), ('record', 0.031), ('hi', 0.031), ('frank', 0.029), ('avrim', 0.029), ('cubes', 0.029), ('cuboids', 0.029), ('monetary', 0.029), ('rastogi', 0.029), ('ullman', 0.029), ('et', 0.029), ('qi', 0.029), ('mechanisms', 0.028), ('substantially', 0.028), ('range', 0.028), ('footprint', 0.027), ('recency', 0.027), ('blum', 0.027), ('experimentally', 0.027), ('dataset', 0.027), ('roth', 0.027), ('accuracy', 0.026), ('answering', 0.026), ('capital', 0.026), ('naor', 0.026), ('improving', 0.025), ('tables', 0.023), ('error', 0.023), ('delity', 0.023), ('substantial', 0.023), ('maintains', 0.023), ('li', 0.021), ('quality', 0.021), ('lj', 0.021), ('relative', 0.021), ('shaded', 0.02), ('ding', 0.02), ('history', 0.02), ('measurement', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

Author: Moritz Hardt, Katrina Ligett, Frank Mcsherry

Abstract: We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1

2 0.36220768 275 nips-2012-Privacy Aware Learning

Author: Martin J. Wainwright, Michael I. Jordan, John C. Duchi

Abstract: We study statistical risk minimization problems under a version of privacy in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, measured by convergence rate, of any statistical estimator. 1

3 0.36087579 237 nips-2012-Near-optimal Differentially Private Principal Components

Author: Kamalika Chaudhuri, Anand Sarwate, Kaushik Sinha

Abstract: Principal components analysis (PCA) is a standard tool for identifying good lowdimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We demonstrate that on real data, there is a large performance gap between the existing method and our method. We show that the sample complexity for the two procedures differs in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. 1

4 0.091430791 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection. 1

5 0.07409855 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

6 0.061321992 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

7 0.054096315 279 nips-2012-Projection Retrieval for Classification

8 0.051615272 215 nips-2012-Minimizing Uncertainty in Pipelines

9 0.050225683 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

10 0.048957929 34 nips-2012-Active Learning of Multi-Index Function Models

11 0.046329368 285 nips-2012-Query Complexity of Derivative-Free Optimization

12 0.043183636 289 nips-2012-Recognizing Activities by Attribute Dynamics

13 0.041521978 359 nips-2012-Variational Inference for Crowdsourcing

14 0.039656162 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

15 0.038867291 37 nips-2012-Affine Independent Variational Inference

16 0.036913037 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

17 0.036349386 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

18 0.03440538 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

19 0.034027677 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

20 0.033277858 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.116), (1, 0.005), (2, -0.008), (3, -0.032), (4, 0.006), (5, 0.033), (6, 0.002), (7, 0.085), (8, -0.011), (9, -0.014), (10, 0.013), (11, -0.033), (12, -0.072), (13, -0.078), (14, 0.004), (15, 0.08), (16, 0.037), (17, -0.439), (18, 0.109), (19, 0.098), (20, -0.28), (21, 0.009), (22, -0.008), (23, -0.181), (24, 0.252), (25, -0.012), (26, 0.027), (27, -0.047), (28, 0.049), (29, -0.042), (30, -0.003), (31, -0.007), (32, 0.075), (33, -0.013), (34, 0.079), (35, -0.051), (36, -0.048), (37, -0.014), (38, -0.056), (39, -0.061), (40, -0.021), (41, 0.064), (42, 0.005), (43, 0.061), (44, 0.014), (45, -0.003), (46, -0.016), (47, -0.015), (48, 0.031), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94263184 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

Author: Moritz Hardt, Katrina Ligett, Frank Mcsherry

Abstract: We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1

2 0.87867165 237 nips-2012-Near-optimal Differentially Private Principal Components

Author: Kamalika Chaudhuri, Anand Sarwate, Kaushik Sinha

Abstract: Principal components analysis (PCA) is a standard tool for identifying good lowdimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We demonstrate that on real data, there is a large performance gap between the existing method and our method. We show that the sample complexity for the two procedures differs in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. 1

3 0.75857681 275 nips-2012-Privacy Aware Learning

Author: Martin J. Wainwright, Michael I. Jordan, John C. Duchi

Abstract: We study statistical risk minimization problems under a version of privacy in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, measured by convergence rate, of any statistical estimator. 1

4 0.35447887 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

Author: Sejong Yoon, Vladimir Pavlovic

Abstract: Probabilistic approaches to computer vision typically assume a centralized setting, with the algorithm granted access to all observed data points. However, many problems in wide-area surveillance can benefit from distributed modeling, either because of physical or computational constraints. Most distributed models to date use algebraic approaches (such as distributed SVD) and as a result cannot explicitly deal with missing data. In this work we present an approach to estimation and learning of generative probabilistic models in a distributed context where certain sensor data can be missing. In particular, we show how traditional centralized models, such as probabilistic PCA and missing-data PPCA, can be learned when the data is distributed across a network of sensors. We demonstrate the utility of this approach on the problem of distributed affine structure from motion. Our experiments suggest that the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods while being able to handle challenging situations such as missing or noisy observations. 1

5 0.2479071 46 nips-2012-Assessing Blinding in Clinical Trials

Author: Ognjen Arandjelovic

Abstract: The interaction between the patient’s expected outcome of an intervention and the inherent effects of that intervention can have extraordinary effects. Thus in clinical trials an effort is made to conceal the nature of the administered intervention from the participants in the trial i.e. to blind it. Yet, in practice perfect blinding is impossible to ensure or even verify. The current standard is follow up the trial with an auxiliary questionnaire, which allows trial participants to express their belief concerning the assigned intervention and which is used to compute a measure of the extent of blinding in the trial. If the estimated extent of blinding exceeds a threshold the trial is deemed sufficiently blinded; otherwise, the trial is deemed to have failed. In this paper we make several important contributions. Firstly, we identify a series of fundamental problems of the aforesaid practice and discuss them in context of the most commonly used blinding measures. Secondly, motivated by the highlighted problems, we formulate a novel method for handling imperfectly blinded trials. We too adopt a post-trial feedback questionnaire but interpret the collected data using an original approach, fundamentally different from those previously proposed. Unlike previous approaches, ours is void of any ad hoc free parameters, is robust to small changes in auxiliary data and is not predicated on any strong assumptions used to interpret participants’ feedback. 1

6 0.23066697 285 nips-2012-Query Complexity of Derivative-Free Optimization

7 0.22922343 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

8 0.19709666 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

9 0.18645851 254 nips-2012-On the Sample Complexity of Robust PCA

10 0.18640435 34 nips-2012-Active Learning of Multi-Index Function Models

11 0.18610831 279 nips-2012-Projection Retrieval for Classification

12 0.18100613 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization

13 0.17945881 359 nips-2012-Variational Inference for Crowdsourcing

14 0.17889968 215 nips-2012-Minimizing Uncertainty in Pipelines

15 0.17010367 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

16 0.16619973 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

17 0.15959597 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

18 0.15785801 217 nips-2012-Mixability in Statistical Learning

19 0.15708563 102 nips-2012-Distributed Non-Stochastic Experts

20 0.1570607 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (17, 0.011), (21, 0.16), (28, 0.071), (38, 0.105), (39, 0.01), (42, 0.038), (53, 0.012), (54, 0.027), (55, 0.031), (74, 0.077), (76, 0.104), (80, 0.055), (86, 0.012), (92, 0.046), (93, 0.119)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84374541 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

2 0.83508754 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

Abstract: We present a dynamic nonlinear generative model for visual motion based on a latent representation of binary-gated Gaussian variables. Trained on sequences of images, the model learns to represent different movement directions in different variables. We use an online approximate inference scheme that can be mapped to the dynamics of networks of neurons. Probed with drifting grating stimuli and moving bars of light, neurons in the model show patterns of responses analogous to those of direction-selective simple cells in primary visual cortex. Most model neurons also show speed tuning and respond equally well to a range of motion directions and speeds aligned to the constraint line of their respective preferred speed. We show how these computations are enabled by a specific pattern of recurrent connections learned by the model. 1

same-paper 3 0.81299096 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

Author: Moritz Hardt, Katrina Ligett, Frank Mcsherry

Abstract: We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1

4 0.78944981 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

5 0.78101516 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

6 0.77930129 60 nips-2012-Bayesian nonparametric models for ranked data

7 0.7600913 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

8 0.75970364 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

9 0.73438811 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

10 0.72268134 23 nips-2012-A lattice filter model of the visual pathway

11 0.71676576 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

12 0.71430469 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

13 0.70429456 201 nips-2012-Localizing 3D cuboids in single-view images

14 0.70406646 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

15 0.70264727 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

16 0.69944811 94 nips-2012-Delay Compensation with Dynamical Synapses

17 0.69820863 303 nips-2012-Searching for objects driven by context

18 0.6975916 190 nips-2012-Learning optimal spike-based representations

19 0.69702268 237 nips-2012-Near-optimal Differentially Private Principal Components

20 0.69591391 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration