nips nips2010 nips2010-216 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oliver Williams, Frank Mcsherry
Abstract: We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. [sent-3, score-1.221]
2 Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. [sent-4, score-1.027]
3 We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. [sent-5, score-0.564]
4 We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured. [sent-6, score-0.316]
5 1 Introduction There has recently been significant interest in the analysis of data sets whose individual records are too sensitive to expose directly, examples of which include medical information, financial data, and personal data from social networking sites. [sent-7, score-0.274]
6 Although agencies with the resources to collate such data are unable to grant outside parties direct access to them, they may be able to safely release aggregate statistics of the data set. [sent-9, score-0.145]
7 Progress in this area has so far been driven by researchers inventing sophisticated learning algorithms which are applied directly to the data and output model parameters which can be proven to respect the privacy of the data set. [sent-10, score-0.814]
8 Proving these privacy properties requires an intricate analysis of each algorithm on a case-by-case basis. [sent-11, score-0.688]
9 While this does result in many valuable algorithms and results, it is not a scalable solution for two reasons: first, to solve a new learning problem, one must invent and analyze a new privacy-preserving algorithm; second, one must then convince the owner of the data to run this algorithm. [sent-12, score-0.225]
10 In this paper, we show a natural connection between differential privacy, one of the leading privacy definitions, and probabilistic inference. [sent-14, score-1.007]
11 Specifically, differential privacy exposes the conditional distribution of its observable outputs given any input data set. [sent-15, score-1.059]
12 Combining the conditional distributions of differentially-private observations with generative models for the data permits new inferences about the data without the need to invent and analyze new differentially-private computations. [sent-16, score-0.243]
13 In some cases, one can rely on previously reported differentially private measurements. [sent-17, score-0.301]
14 As well as this flexibility, probabilistic inference can improve the accuracy of existing approaches, provide a measure of uncertainty in any predictions made, combine multiple observations in a principled way, and integrate prior knowledge about the data or parameters. [sent-19, score-0.334]
15 In Section 3 we explore the marginal likelihood of the differentially-private observations given generative model parameters for the data. [sent-21, score-0.151]
16 Section 4 shows several experimental results validating our hypothesis that probabilistic inference can be fruitfully applied to differentially-private computation. [sent-23, score-0.206]
17 In particular, we show how the application of principled, probabilistic inference to measurements made by an existing, heuristic algorithm for logistic regression improves performance, as well as providing confidence on the predictions made. [sent-24, score-0.49]
18 1 Related work There is a substantial amount of research on privacy, and differential privacy in particular, connected with machine learning and statistics. [sent-26, score-0.902]
19 Nonetheless, we are unaware of any research that uses exact knowledge of the conditional distribution over outputs given inputs to perform inference over model parameters, or other features of the data. [sent-27, score-0.127]
20 Chaudhuri and Monteleoni [5, 6] introduced the NIPS community to the problem of differentiallyprivate logistic regression. [sent-31, score-0.151]
21 Although we will also consider the problem of logistic regression (and compare our findings with theirs) we should stress that the aim of the paper is not specifically to attack the problem of logistic regression. [sent-32, score-0.195]
22 Rather, the problem serves as a good example where prior work on differentially-private logistic regression can be improved through probabilistic inference. [sent-33, score-0.198]
23 2 Differential Privacy Differential privacy [7] applies to randomized computations executed against a dataset and returning an aggregate result for the entire set. [sent-34, score-0.763]
24 It prevents inference about specific records by requiring that the result of the computation yield nearly identical distributions for similar data sets. [sent-35, score-0.295]
25 Formally, a randomized computation M satisfies -differential privacy if for any two possible input data sets A and B, and any subset of possible outputs S, P (M (A) ∈ S) ≤ P (M (B) ∈ S) × exp( × |A B|) , (1) where A B is the set of records in A or B, but not both. [sent-36, score-0.948]
26 When A B is small, the relative bound on probabilities limits the inference an attacker can make about whether the true underlying data were actually A or B. [sent-37, score-0.253]
27 Inferences about the presence, absence, or specific values of individual records are strongly constrained. [sent-38, score-0.154]
28 One example of a differentially private computation is the exponential mechanism[8], characterized by a function φ : Dn × R → R scoring each pair of data set and possible result with a real value. [sent-39, score-0.413]
29 While any differentially-private mechanism can be expressed as a φ function, verifying that a function φ satisfies the constraint | ln φ(z, A) − ln φ(z, B)| ≤ |A B| is generally not easy, and requires some form of proof on a case by case basis. [sent-41, score-0.228]
30 This subclass is useful practically, as data providers can ensure differential privacy by clamping each φ(z, x) value to the range [e−1 , e+1 ], without having to understand the φ function. [sent-43, score-1.04]
31 We will refer to this subclass as the factored exponential mechanism. [sent-44, score-0.232]
32 As we can see from the definition of the exponential mechanism, a differentially-private mechanism draws its guarantees from its inherent randomness, rather than from secrecy about its specification. [sent-45, score-0.246]
33 Although differential privacy has many other redeeming features, it is this feature alone that we 2 i=1. [sent-46, score-0.902]
34 (a) If the data X = {xi } are directly observable (shaded nodes), the canonical learning task is to infer the posterior over θ given a model relating X and θ. [sent-51, score-0.242]
35 (b) In the private setting, the data are not observable; instead we observe the private measurement z, related to X by a known measurement process. [sent-52, score-0.64]
36 By the same token, although there are many other privacy definitions with varying guarantees, we can apply inference to any definition exhibiting one key feature: an explicit probabilistic relationship between the input data sets and output observations. [sent-54, score-1.024]
37 3 Inference and privacy Differential privacy limits what can be inferred about a single record in a data set, but does not directly limit inference about larger scale, aggregate properties of data sets. [sent-55, score-1.685]
38 For example, many tasks in machine learning and statistics infer global parameters describing a model of the data set without explicit dependence on any single record, and we may still expect to be see a meaningful relationship between differentially-private measurements and model parameters. [sent-56, score-0.154]
39 One way to model a data set is to propose a generative probabilistic model for the data p(X|θ). [sent-57, score-0.185]
40 (3) Armed with the marginal likelihood, it is possible to bring all the techniques of probabilistic inference to bear. [sent-62, score-0.271]
41 This will generally include adding a prior distribution over θ, and combining multiple measurements to form a posterior p(zj |θ) p(θ|z1 . [sent-63, score-0.175]
42 Therefore, the remainder of this section is devoted to the development of several bounds on the marginal likelihood for cases in which the measurement is generated via the factored exponential mechanism. [sent-69, score-0.479]
43 1 Factored exponential mechanism The factored exponential mechanism of Section 2 is a special case of differentially-private mechanism that admits efficient approximation of the marginal likelihood. [sent-72, score-0.775]
44 We will be able to use the independence in p(X|θ) = i p(xi |θ) and φ(z, X) = i φ(z, xi ) to factorize lower and upper 3 bounds on the integral (3), resulting in a small number of integrals over only the domain of records, rather than the domain of data sets. [sent-73, score-0.266]
45 φ(z , x) dx p(x|θ) φ(z, x) p(z|θ) ≥ z ∈Z p(z|θ) ≤ e−H[q] n −1 (5a) φ(z, x) z ∈Z φ(z , x) dx p(x|θ) n (5b) q(z ) where the upper bound is defined in terms of a variational distribution q(z) [9] such that z∈ q(z) = 1. [sent-75, score-0.41]
46 Notice that the integrations appearing in either bound are over the space of a single record in a data set and not over the entire dataset as they were in (3). [sent-77, score-0.169]
47 Proof of lower bound To prove the lower bound, we will apply Jensen’s inequality with the function f (x) = 1/x to the marginal likelihood of the exponential mechanism. [sent-78, score-0.342]
48 p(xi |θ) dxn i φ(z , xi ) φ(z, xi ) dxi p(xi |θ) = i = dx p(x|θ) φ(z , xi ) φ(z, xi ) φ(z , x) φ(z, x) n . [sent-82, score-0.347]
49 Proof of upper bound We can lower bound the normalizing term z ∈Z φ(z , X) in (2) by introducing a variational distribution q(z), and applying Jensen’s inequality with the function f (x) = log x. [sent-83, score-0.272]
50 z ∈Z Applying this bound to the marginal likelihood gives us the bound dX p(X|θ) φ(z, X) ≤ e−H[q] z ∈Z φ(z , X) dX p(X|θ) = e−H[q] p(xi |θ) dX i = e−H[q] φ(z, X) z ∈Z φ(z , X) dx p(x|θ) q(z ) φ(z, xi ) z ∈Z φ(z , xi ) φ(z, x) z ∈Z φ(z , x) q(z ) n q(z ) . [sent-85, score-0.487]
51 While the upper bound is true for any q distribution, the tightest bound is found for the q which minimizes the bound. [sent-86, score-0.186]
52 4 Upper bound Lower bound Actual p(z|theta) 0. [sent-87, score-0.132]
53 8 theta (b) Figure 2: Error in upper and lower bounds for coin-flipping problem. [sent-99, score-0.218]
54 (a) For each epsilon, we plot the maximum across all θ of the error between the true distribution and each of the upper and lower bounds is plotted. [sent-100, score-0.197]
55 5, we show the shape of the upper bound, lower bound, and true distribution when differentially-private measurement returned was z = 0. [sent-102, score-0.225]
56 1 Chosing a φ function The upper and lower bounds in (5) are true for any admissible φ function, but leave unanswered the question of what to chose in this rˆ le. [sent-106, score-0.204]
57 In the absence of privacy we might try to find a good fit for o the parameters θ by maximum likelihood. [sent-107, score-0.688]
58 In the private setting this is not possible because the data are not directly observable, but the output of the factored exponential mechanism has a very similar form: Max likelihood: Exp. [sent-108, score-0.617]
59 mechanism: θ∗ = arg max θ∈Θ p(xi |θ) z ∗ = noisy max z∈Z (6a) i φ(z, xi ) (6b) i f (z) where noisy maxz∈Z f (z) samples from . [sent-109, score-0.146]
60 By making the analogy between (6a) and (6b), z ∈Z f (z ) we might let z range over elements of Θ (or a finite subset), and take φ(z, xi ) to be the likelihood of xi under parameters z. [sent-110, score-0.167]
61 The exponential mechanism is then likely to choose parameters z that fit the data well, informing us that the posterior over θ is likely in the vicinity of z. [sent-111, score-0.35]
62 For φ to be admissible, we must clamp very small values of φ up to 1/e, limiting the ability of very poorly fit records to influence our decisions strongly. [sent-112, score-0.154]
63 2 Evaluation of the bounds To demonstrate the effectiveness of these bounds we consider a problem in which it is possible to analytically compute the marginal likelihood. [sent-114, score-0.213]
64 We see in figure 2a that the error in both the upper and lower bounds, across the entire density function, is essentially zero for small epsilon. [sent-122, score-0.123]
65 As epsilon increases the bounds deteriorate, but we are most interested in the case of small values of epsilon, where privacy guarantees are meaningfully strong. [sent-123, score-0.979]
66 Figure 2b shows the shape of the two bounds, and the true density between, for epsilon = 0. [sent-124, score-0.193]
67 This large value was chosen as it is in the region for which the bounds are less tight and the difference between the bounds and the truth can be seen. [sent-126, score-0.148]
68 5 The upper bound is defined in terms of a variational distribution q. [sent-127, score-0.164]
69 In general, however, these (and other) test show that both bounds are equally good for reasonable values of and we therefore use the lower bound for the experiments in this paper, since it is simpler to compute. [sent-129, score-0.182]
70 First, we consider applying probabilistic inference to an existing differentially-private computation, specifically a logistic regression heuristic taken from a suite of differentially-private algorithms. [sent-131, score-0.401]
71 The heuristic is not representable in the factored exponential mechanism, and as such we must attempt to approximate the full integral over the space of data sets directly. [sent-132, score-0.364]
72 In our second experiment, we choose a problem and measurement process appropriate for the factored exponential mechanism, principal components analysis, previously only ever addressed through noisy observation of the covariance matrix. [sent-133, score-0.356]
73 1 Logistic Regression To examine the potential of probabilistic inference to improve the quality of existing differentiallyprivate computations, we consider a heuristic algorithm for logistic regression included in the Privacy Integrated Queries distribution [10]. [sent-135, score-0.483]
74 This heuristic uses a noisy sum primitive to repeatedly compute and step in the direction of an approximate gradient. [sent-136, score-0.185]
75 When the number of records is large compared to the noise introduced, the approximate gradient is relatively accurate, and the algorithm performs well. [sent-137, score-0.178]
76 When the records are fewer or the privacy requirements demand more noise, its performance suffers. [sent-138, score-0.842]
77 Probabilistic inference has the potential to improve performance by properly integrating the information extracted from the data across the multiple gradient measurements and managing the uncertainty associated with the noisy measurements. [sent-139, score-0.329]
78 We test our proposals against three synthetic data sets (CM1 and CM2 from [5] and one of our own: SYNTH) and two data sets from the UCI repository (PIMA and ADULT) [11]. [sent-140, score-0.189]
79 SYNTH Records Dimensions Positive examples Test set records CM1 CM2 PIMA ADULT 1000 4 497 1000 17500 10 8770 17500 17500 10 8694 17500 691 8 237 767* 16000 6 7841 8000 Table 1: Data sets used and their statistics. [sent-144, score-0.194]
80 PIMA and ADULT are standard data sets [11] containing diabetes records, and census data respectively, both of which correspond to the types of data one might expect to be protected by differential privacy. [sent-147, score-0.374]
81 1 Error Rates and Log-Likelihood Tables 2 and 3 report the classification accuracy of several approaches when the privacy parameter is set to 0. [sent-151, score-0.715]
82 These results are computed from 50 executions of the heuristic gradient descent algorithm. [sent-154, score-0.17]
83 We can see a trend of general improvement from the heuristic approach to the probabilistic inference, both in terms of the average error rate and the standard deviation. [sent-155, score-0.234]
84 1 All measurements are in per cent; errors are reported as the mean ± one standard deviation computed from 50 independent executions with random starting points. [sent-188, score-0.133]
85 Benchmark is the best maximum likelihood solution found by gradient ascent when the data are directly observable and forms a baseline for expected performance. [sent-191, score-0.248]
86 Each colored path describes an execution with a fixed level of accuracy in each iteration, and all are plotted on the common scale of total privacy consumption. [sent-226, score-0.715]
87 Even with very few additional examples, probabilistic inference is capable of exploiting this information and performance improves dramatically. [sent-234, score-0.206]
88 2 Principal components To demonstrate inference on another model, and to highlight the applicability of the factored exponential mechanism, we consider the problem of probabilistically finding the first principal compo7 0. [sent-236, score-0.309]
89 (b) Incorporating non-private observations A compelling benefit of probabilistic inference is how easily alternate sources of information are added. [sent-246, score-0.237]
90 The horizontal line indicates the performance of the benchmark maximum likelihood solution computed from the data without privacy. [sent-247, score-0.133]
91 1 Figure 4: Posterior distribution as a function of The same synthetic data set under differentiallyprivate measurements with varying epsilon. [sent-251, score-0.239]
92 The posterior is noticeably more concentrated and accurate as epsilon increases. [sent-253, score-0.279]
93 nent of a data set where we model the data as iid draws from a Gaussian p(x|θ) = N (0, θθT + σ 2 I). [sent-254, score-0.128]
94 Figure 4 demonstrates three instances of inference applied to the same data set with three different values of . [sent-256, score-0.141]
95 We stress that the posterior and its concentration are returned to the analyst; each image is the result of a single differentially-private measurement, rather than a visualization of multiple runs. [sent-258, score-0.178]
96 5 Conclusions Most work in the area of learning from private data forms an intrinsic analysis. [sent-262, score-0.237]
97 That is, a complex algorithm is run by the owner of the data, directly on that data, and a single output is produced which appropriately indicates the desired parameters (modulo noise). [sent-263, score-0.126]
98 In contrast, this paper has shown that it is possible to do a great deal with an extrinsic analysis, where standard, primitive, measurements are made against the data, and a posterior over model parameters is inferred post hoc. [sent-264, score-0.199]
99 This paper brings together two complementary lines of research: the design and analysis of differentially-private algorithms, and probabilistic inference. [sent-265, score-0.129]
100 Our primary goal is not to weigh-in on new differentially-private algorithms, nor to find new methods for probabilistic inferences – it is to present the observation that the two approaches are complementary in way that can be mutually enriching. [sent-266, score-0.178]
wordName wordTfidf (topN-words)
[('privacy', 0.688), ('differential', 0.214), ('private', 0.197), ('epsilon', 0.193), ('synth', 0.165), ('records', 0.154), ('mechanism', 0.152), ('pima', 0.124), ('dx', 0.123), ('factored', 0.11), ('probabilistic', 0.105), ('differentially', 0.104), ('measurement', 0.103), ('heuristic', 0.102), ('inference', 0.101), ('adult', 0.098), ('mcsherry', 0.096), ('measurements', 0.089), ('posterior', 0.086), ('differentiallyprivate', 0.082), ('bounds', 0.074), ('exponential', 0.072), ('logistic', 0.069), ('observable', 0.067), ('bound', 0.066), ('marginal', 0.065), ('xi', 0.056), ('cent', 0.055), ('invent', 0.055), ('owner', 0.055), ('likelihood', 0.055), ('chaudhuri', 0.054), ('upper', 0.054), ('subclass', 0.05), ('inferences', 0.049), ('monteleoni', 0.048), ('theta', 0.048), ('noisy', 0.045), ('dwork', 0.044), ('executions', 0.044), ('variational', 0.044), ('aggregate', 0.043), ('lower', 0.042), ('sets', 0.04), ('data', 0.04), ('record', 0.039), ('ln', 0.038), ('ascent', 0.038), ('benchmark', 0.038), ('primitive', 0.038), ('mountain', 0.036), ('admissible', 0.034), ('concentration', 0.033), ('stress', 0.033), ('computations', 0.032), ('jensen', 0.032), ('observations', 0.031), ('integration', 0.031), ('uncertainty', 0.03), ('rates', 0.03), ('repository', 0.029), ('varying', 0.028), ('permits', 0.028), ('accuracy', 0.027), ('error', 0.027), ('principal', 0.026), ('valuable', 0.026), ('returned', 0.026), ('iid', 0.026), ('graphical', 0.026), ('outputs', 0.026), ('infer', 0.025), ('run', 0.025), ('directly', 0.024), ('paths', 0.024), ('permitted', 0.024), ('exposes', 0.024), ('clamping', 0.024), ('convince', 0.024), ('multiples', 0.024), ('integrations', 0.024), ('calibrating', 0.024), ('nissim', 0.024), ('owners', 0.024), ('armed', 0.024), ('attacker', 0.024), ('extrinsic', 0.024), ('meaningfully', 0.024), ('oliver', 0.024), ('providers', 0.024), ('gradient', 0.024), ('complementary', 0.024), ('regression', 0.024), ('concerned', 0.023), ('output', 0.022), ('draws', 0.022), ('uci', 0.022), ('limits', 0.022), ('collate', 0.022), ('protection', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 216 nips-2010-Probabilistic Inference and Differential Privacy
Author: Oliver Williams, Frank Mcsherry
Abstract: We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured. 1
2 0.34919572 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
Author: Manas Pathak, Shantanu Rane, Bhiksha Raj
Abstract: As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances. Though the differential privacy model provides a framework to analyze such mechanisms for databases belonging to a single party, this framework has not yet been considered in a multi-party setting. In this paper, we propose a privacy-preserving protocol for composing a differentially private aggregate classifier using classifiers trained locally by separate mutually untrusting parties. The protocol allows these parties to interact with an untrusted curator to construct additive shares of a perturbed aggregate classifier. We also present a detailed theoretical analysis containing a proof of differential privacy of the perturbed aggregate classifier and a bound on the excess risk introduced by the perturbation. We verify the bound with an experimental evaluation on a real dataset. 1
3 0.095496155 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
Author: Sivan Sabato, Nathan Srebro, Naftali Tishby
Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1
4 0.092896104 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
Author: Yangqing Jia, Mathieu Salzmann, Trevor Darrell
Abstract: Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation. 1
5 0.076978229 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti
Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We first consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efficient mean field approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.
6 0.059771847 268 nips-2010-The Neural Costs of Optimal Control
7 0.059486713 290 nips-2010-t-logistic regression
8 0.057792764 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
9 0.057064027 284 nips-2010-Variational bounds for mixed-data factor analysis
10 0.056655128 265 nips-2010-The LASSO risk: asymptotic results and real world examples
11 0.056275528 99 nips-2010-Gated Softmax Classification
12 0.056014661 101 nips-2010-Gaussian sampling by local perturbations
13 0.054870028 217 nips-2010-Probabilistic Multi-Task Feature Selection
14 0.05310861 148 nips-2010-Learning Networks of Stochastic Differential Equations
15 0.051381003 118 nips-2010-Implicit Differentiation by Perturbation
16 0.049891334 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
17 0.04866584 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
18 0.0454899 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
19 0.045218319 22 nips-2010-Active Estimation of F-Measures
20 0.044920765 283 nips-2010-Variational Inference over Combinatorial Spaces
topicId topicWeight
[(0, 0.165), (1, 0.028), (2, 0.032), (3, 0.016), (4, -0.039), (5, 0.062), (6, -0.037), (7, 0.012), (8, -0.053), (9, -0.054), (10, -0.063), (11, -0.053), (12, 0.016), (13, -0.063), (14, -0.069), (15, 0.021), (16, 0.041), (17, 0.107), (18, 0.076), (19, 0.021), (20, -0.036), (21, -0.039), (22, -0.086), (23, -0.032), (24, 0.039), (25, -0.025), (26, -0.0), (27, 0.109), (28, -0.044), (29, -0.097), (30, 0.237), (31, -0.165), (32, -0.054), (33, -0.125), (34, -0.078), (35, -0.074), (36, 0.106), (37, -0.03), (38, -0.194), (39, -0.156), (40, -0.248), (41, -0.169), (42, -0.061), (43, 0.096), (44, -0.104), (45, -0.042), (46, -0.091), (47, -0.208), (48, 0.13), (49, 0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.91366684 216 nips-2010-Probabilistic Inference and Differential Privacy
Author: Oliver Williams, Frank Mcsherry
Abstract: We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured. 1
2 0.82684028 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
Author: Manas Pathak, Shantanu Rane, Bhiksha Raj
Abstract: As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances. Though the differential privacy model provides a framework to analyze such mechanisms for databases belonging to a single party, this framework has not yet been considered in a multi-party setting. In this paper, we propose a privacy-preserving protocol for composing a differentially private aggregate classifier using classifiers trained locally by separate mutually untrusting parties. The protocol allows these parties to interact with an untrusted curator to construct additive shares of a perturbed aggregate classifier. We also present a detailed theoretical analysis containing a proof of differential privacy of the perturbed aggregate classifier and a bound on the excess risk introduced by the perturbation. We verify the bound with an experimental evaluation on a real dataset. 1
3 0.40815827 284 nips-2010-Variational bounds for mixed-data factor analysis
Author: Mohammad E. Khan, Guillaume Bouchard, Kevin P. Murphy, Benjamin M. Marlin
Abstract: We propose a new variational EM algorithm for fitting factor analysis models with mixed continuous and categorical observations. The algorithm is based on a simple quadratic bound to the log-sum-exp function. In the special case of fully observed binary data, the bound we propose is significantly faster than previous variational methods. We show that EM is significantly more robust in the presence of missing data compared to treating the latent factors as parameters, which is the approach used by exponential family PCA and other related matrix-factorization methods. A further benefit of the variational approach is that it can easily be extended to the case of mixtures of factor analyzers, as we show. We present results on synthetic and real data sets demonstrating several desirable properties of our proposed method. 1
4 0.35266834 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
Author: Matthias Broecheler, Lise Getoor
Abstract: Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficiency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence. 1
5 0.35191178 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti
Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We first consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efficient mean field approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.
6 0.34164256 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
7 0.33694226 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
8 0.32657999 125 nips-2010-Inference and communication in the game of Password
9 0.32382908 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
10 0.31966588 262 nips-2010-Switched Latent Force Models for Movement Segmentation
11 0.31694224 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models
12 0.30653015 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
13 0.29989448 9 nips-2010-A New Probabilistic Model for Rank Aggregation
14 0.29312259 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
15 0.2898567 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
16 0.28782007 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics
17 0.28104377 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning
18 0.26283395 2 nips-2010-A Bayesian Approach to Concept Drift
19 0.26049122 148 nips-2010-Learning Networks of Stochastic Differential Equations
20 0.25703081 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
topicId topicWeight
[(13, 0.047), (17, 0.014), (20, 0.239), (27, 0.047), (30, 0.074), (45, 0.207), (50, 0.101), (52, 0.03), (60, 0.039), (77, 0.051), (78, 0.02), (90, 0.052)]
simIndex simValue paperId paperTitle
1 0.86805058 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
Author: Chris Barber, Joseph Bockhorst, Paul Roebber
Abstract: Accurate short-term wind forecasts (STWFs), with time horizons from 0.5 to 6 hours, are essential for efficient integration of wind power to the electrical power grid. Physical models based on numerical weather predictions are currently not competitive, and research on machine learning approaches is ongoing. Two major challenges confronting these efforts are missing observations and weather-regime induced dependency shifts among wind variables. In this paper we introduce approaches that address both of these challenges. We describe a new regime-aware approach to STWF that use auto-regressive hidden Markov models (AR-HMM), a subclass of conditional linear Gaussian (CLG) models. Although AR-HMMs are a natural representation for weather regimes, as with CLG models in general, exact inference is NP-hard when observations are missing (Lerner and Parr, 2001). We introduce a simple approximate inference method for AR-HMMs, which we believe has applications in other problem domains. In an empirical evaluation on publicly available wind data from two geographically distinct regions, our approach makes significantly more accurate predictions than baseline models, and uncovers meteorologically relevant regimes. 1
same-paper 2 0.79373318 216 nips-2010-Probabilistic Inference and Differential Privacy
Author: Oliver Williams, Frank Mcsherry
Abstract: We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured. 1
3 0.72123915 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
Author: Dahua Lin, Eric Grimson, John W. Fisher
Abstract: We present a novel method for constructing dependent Dirichlet processes. The approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models. 1
Author: Matthias Broecheler, Lise Getoor
Abstract: Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficiency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence. 1
5 0.71765995 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
6 0.71761405 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
7 0.71602505 63 nips-2010-Distributed Dual Averaging In Networks
8 0.71545428 148 nips-2010-Learning Networks of Stochastic Differential Equations
9 0.71238893 117 nips-2010-Identifying graph-structured activation patterns in networks
10 0.71179211 222 nips-2010-Random Walk Approach to Regret Minimization
11 0.71103144 243 nips-2010-Smoothness, Low Noise and Fast Rates
12 0.71065962 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
13 0.7106176 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
14 0.70985556 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
15 0.70976782 158 nips-2010-Learning via Gaussian Herding
16 0.70937556 202 nips-2010-Parallelized Stochastic Gradient Descent
17 0.70899916 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices
18 0.70891482 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
19 0.70865726 282 nips-2010-Variable margin losses for classifier design
20 0.7081269 155 nips-2010-Learning the context of a category