nips nips2008 nips2008-185 knowledge-graph by maker-knowledge-mining

185 nips-2008-Privacy-preserving logistic regression

Source: pdf

Author: Kamalika Chaudhuri, Claire Monteleoni

Abstract: This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacy-preserving logistic regression. First we apply an idea of Dwork et al. [6] to design a privacy-preserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classiﬁer with noise proportional to the sensitivity. We then provide a privacy-preserving regularized logistic regression algorithm based on a new privacy-preserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [6]. We provide learning guarantees for both algorithms, which are tighter for our new algorithm, in cases in which one would typically apply logistic regression. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacy-preserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Privacy-preserving logistic regression Kamalika Chaudhuri Information Theory and Applications University of California, San Diego kamalika@soe. [sent-1, score-0.292]

2 edu Abstract This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. [sent-5, score-1.056]

3 This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classiﬁer with noise proportional to the sensitivity. [sent-9, score-0.481]

4 We then provide a privacy-preserving regularized logistic regression algorithm based on a new privacy-preserving technique: solving a perturbed optimization problem. [sent-10, score-0.392]

5 We prove that our algorithm preserves privacy in the model due to [6]. [sent-11, score-0.894]

6 We provide learning guarantees for both algorithms, which are tighter for our new algorithm, in cases in which one would typically apply logistic regression. [sent-12, score-0.303]

7 Experiments demonstrate improved learning performance of our method, versus the sensitivity method. [sent-13, score-0.179]

8 Our privacy-preserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. [sent-14, score-0.266]

9 Moreover, private data such as medical and ﬁnancial records are increasingly being digitized, stored, and managed by independent companies. [sent-17, score-0.174]

10 In the literature on cryptography and information security, data privacy deﬁnitions have been proposed, however designing machine learning algorithms that adhere to them has not been well-explored. [sent-18, score-0.906]

11 On the other hand, data-mining algorithms have been introduced that aim to respect other notions of privacy that may be less formally justiﬁed. [sent-19, score-0.828]

12 Our goal is to bridge the gap between approaches in the cryptography and information security community, and those in the data-mining community. [sent-20, score-0.104]

13 This is necessary, as there is a tradeoff between the privacy of a protocol, and the learnability of functions that respect the protocol. [sent-21, score-0.859]

14 In this work, we provide algorithms for learning in a privacy model introduced by Dwork et al. [sent-23, score-0.828]

15 The -differential privacy model limits how much information an adversary can gain about a particular private value, by observing a function learned from a database containing that value, even if she knows every other value in the database. [sent-25, score-1.146]

16 An initial positive result [6] in this setting depends on the sensitivity of the function to be learned, which is the maximum amount the function value can change due to an arbitrary change in one input. [sent-26, score-0.179]

17 Using this method requires bounding the sensitivity of the function class to be learned, and then adding noise proportional to the sensitivity. [sent-27, score-0.255]

18 First we apply the sensitivity-based method of designing privacy-preserving algorithms [6] to a speciﬁc machine learning algorithm, logistic regression. [sent-31, score-0.308]

19 Then we present a second privacy-preserving logistic regression algorithm. [sent-32, score-0.292]

20 We prove that the new method is private in the -differential privacy model. [sent-34, score-1.01]

21 We provide learning performance guarantees for both algorithms, which are tighter for our new algorithm, in cases in which one would typically apply logistic regression. [sent-35, score-0.303]

22 However, this is problematic, because an adversary may have some auxiliary information, which may even be publicly available, and which can be used to breach privacy. [sent-40, score-0.084]

23 To formally address this issue, we need a deﬁnition of privacy which works in the presence of auxiliary knowledge by the adversary. [sent-42, score-0.808]

24 We explain this deﬁnition and privacy model in more detail in Section 2. [sent-45, score-0.808]

25 [8] shows how to ﬁnd classiﬁers that preserve -differential privacy; however, their algorithm takes time exponential in d for inputs in Rd . [sent-48, score-0.098]

26 [3] provides a general method for publishing data-sets while preserving -differential privacy such that the answers to all queries of a certain type with low VC-dimension are approximately correct. [sent-49, score-0.869]

27 There has been a substantial amount of work on privacy in the literature, spanning several communities. [sent-52, score-0.808]

28 Much work on privacy has been done in the data-mining community [1, 7], [14, 10], however the privacy deﬁnitions used in these papers are different, and some are susceptible to attacks when the adversary has some prior information. [sent-53, score-1.773]

29 In contrast, the privacy deﬁnition we use avoids these attacks, and is very strong. [sent-54, score-0.808]

30 2 Sensitivity and the -differential privacy model Before we deﬁne the privacy model that we study, we will note a few preliminary points. [sent-55, score-1.616]

31 Both in that model, and for our algorithm and analyses, we assume that each value in the database is a real vector with norm at most one. [sent-56, score-0.086]

32 , xn , where xi ∈ Rd , and xi ≤ 1 for all i ∈ {1, . [sent-60, score-0.165]

33 In both privacy-preserving logistic regression algorithms that we state, the output is a parameter vector, w, which makes prediction SGN(w · x), on a point x. [sent-67, score-0.347]

34 The privacy deﬁnition we use is due to Dwork et al. [sent-71, score-0.808]

35 In this model, users have access to private data about individuals through a sanitization mechanism, usually denoted by M . [sent-73, score-0.247]

36 The interaction between the sanitization mechanism and an adversary is modelled as a sequence of queries, made by the adversary, and responses, made by the sanitizer. [sent-74, score-0.175]

37 The sanitizer, which is typically a randomized algorithm, is said to preserve -differential privacy, if the private value of any one individual in the data set does not affect the likelihood of a speciﬁc answer by the sanitizer by more than . [sent-75, score-0.284]

38 More formally, -differential privacy can be deﬁned as follows. [sent-76, score-0.808]

39 -differential privacy is therefore a very strong notion of privacy. [sent-78, score-0.808]

40 [6] also provides a general method for computing an approximation to any function f while preserving -differential privacy. [sent-79, score-0.061]

41 Deﬁnition 2 For any function f with n inputs, we deﬁne the sensitivity S(f ) as the maximum, over all inputs, of the difference in the value of f when one input of f is changed. [sent-81, score-0.179]

42 , xn−1 , xn = a )| (a,a ) [6] shows that for any input x1 , . [sent-88, score-0.103]

43 , xn ) + η, where η is a random variable drawn from a Laplace distribution with mean 0 and standard deviation S(f ) preserves -differential privacy. [sent-94, score-0.18]

44 Although this method sometimes requires adding a smaller amount of noise to preserve privacy, in general, smoothed sensitivity of a function can be hard to compute. [sent-97, score-0.28]

45 3 A Simple Algorithm Based on [6], one can come up with a simple algorithm for privacy-preserving logistic regression, which adds noise to the classiﬁer obtained by logistic regression, proportional to its sensitivity. [sent-98, score-0.502]

46 From 2 Corollary 2, the sensitivity of logistic regression is at most nλ . [sent-99, score-0.471]

47 This leads to Algorithm 1, which obeys the privacy guarantees in Theorem 1. [sent-100, score-0.842]

48 Compute w∗ , the classiﬁer obtained by regularized logistic regression on the labelled examples (x1 , y1 ), . [sent-102, score-0.336]

49 , (xn , yn ) be a set of labelled points over Rd such that ||xi || ≤ 1 for all i. [sent-114, score-0.115]

50 P ROOF : The proof follows by a combination of [6], and Corollary 2, which states that the sensitivity 2 of logistic regression is at most nλ . [sent-116, score-0.471]

51 If w1 = argminw G(w) and w2 = argminw G(w) + g(w), then, ||w1 − w2 || ≤ G1 . [sent-118, score-0.138]

52 , yn , such that for all i, 2 ||xi || ≤ 1, the sensitivity of logistic regression with regularization parameter λ is at most nλ . [sent-128, score-0.613]

53 For a classiﬁer w, we use L(w) to denote the expected loss of w over the data distribution, and ˆ ˆ L(w) to denote the empirical average loss of w over the training data. [sent-133, score-0.076]

54 i=1 log(1 + e n Further, for a classiﬁer w, we use the notation fλ (w) to denote the quantity 1 λ||w||2 + L(w) and 2 1 ˆ ˆ fλ (w) to denote the quantity 2 λ||w||2 + L(w). [sent-138, score-0.056]

55 Our guarantees on this algorithm can be summarized by the following lemma. [sent-139, score-0.064]

56 Lemma 3 Given a logistic regression problem with regularization parameter λ, let w1 be the classiˆ ﬁer that minimizes fλ , and w2 be the classiﬁer output by Algorithm 1 respectively. [sent-140, score-0.407]

57 Then, with prob2 2 ˆ ˆ ability 1 − δ over the randomness in the privacy mechanism, fλ (w2 ) ≤ fλ (w1 ) + 2d (1+λ) log (d/δ) . [sent-141, score-0.856]

58 One question is, can we get a privacy-preserving approximation to logistic regression, which has better performance bounds for small λ? [sent-144, score-0.212]

59 4 A New Algorithm In this section, we provide a new privacy-preserving algorithm for logistic regression. [sent-146, score-0.242]

60 , xn over Rd such that ||xi || ≤ 1 for all i, a set of labels y1 , . [sent-150, score-0.103]

61 , yn for the examples, a regularization constant λ and a privacy parameter , and the output is a vector w∗ in Rd . [sent-153, score-0.985]

62 , yn and a regularization constant λ, we T T n 1 1 compute w∗ = argminw 2 λwT w + b nw + n i=1 log(1 + e−yi w xi ). [sent-165, score-0.274]

63 We observe that our method solves a convex programming problem very similar to the logistic regression convex program, and therefore it has running time similar to that of logistic regression. [sent-167, score-0.594]

64 In the sequel, we show that the output of Algorithm 2 is privacy preserving. [sent-168, score-0.843]

65 , yn , where for each i, ||xi || ≤ 1, the output of Algorithm 2 preserves -differential privacy. [sent-175, score-0.185]

66 This uniqueness holds, because both the regularization function and the loss functions are differentiable everywhere. [sent-185, score-0.11]

67 Since w∗ is the value that minimizes both the optimization problems, the derivative of both optimization functions at w∗ is 0. [sent-187, score-0.08]

68 , yn−1 , xn = a, yn = y] = = e− 2 (||b1 ||−||b2 ||) ∗ |x , . [sent-197, score-0.197]

69 , yn−1 , xn = a , yn = y ] where h(bi ) for i = 1, 2 is the density of bi . [sent-203, score-0.197]

70 We notice that the privacy guarantee for our algorithm does not depend on λ; in other words, for any value of λ, our algorithm is private. [sent-206, score-0.893]

71 On the other hand, as we show in Section 5, the performance of our algorithm does degrade with decreasing λ in the worst case, although the degradation is better than that of Algorithm 1 for λ < 1. [sent-207, score-0.079]

72 Our algorithm for privacy-preserving logistic regression can be generalized to provide privacy-preserving outputs for more general convex optimization problems, so long as the problems satisfy certain conditions. [sent-209, score-0.377]

73 , xn } be a database containing private data of individuals. [sent-214, score-0.31]

74 Suppose n we would like to compute a vector w that minimizes the function F (w) = G(w) + i=1 l(w, xi ), over w ∈ Rd for some d, such that all of the following hold: 1. [sent-215, score-0.063]

75 G(w) is strongly convex and l(w, xi ) are convex for all i 3. [sent-217, score-0.093]

76 Then, computing w∗ , where w∗ = argminw G(w) + i=1 l(w, xi ) + bT w, provides -differential privacy. [sent-220, score-0.1]

77 Lemma 4 Given a logistic regression problem with regularization parameter λ, let w1 be the clasˆ siﬁer that minimizes fλ , and w2 be the classiﬁer output by Algorithm 2 respectively. [sent-224, score-0.407]

78 Then, with 2 log2 ˆ ˆ probability 1 − δ over the randomness in the privacy mechanism, fλ (w2 ) ≤ fλ (w1 ) + 8d λn2 (d/δ) . [sent-225, score-0.838]

79 If the training ex2 d log( d )||w || 0 δ amples are drawn independently from the data distribution, and if n > C max( ||w0 || , ), 2 g g for some constant C, then, with probability 1 − δ, the classiﬁer output by Algorithm 2 has loss at most L + g over the data distribution. [sent-230, score-0.073]

80 P ROOF : Let w∗ be the classiﬁer that minimizes fλ (w) over the data distribution, and let w1 and w2 T ˆ ˆ be the classiﬁers that minimize fλ (w) and fλ (w) + b nw over the data distribution respectively. [sent-231, score-0.064]

81 Note that for problems in which (non-private) logistic regression performs well, w0 ≥ 1 if w0 has low loss, since otherwise one can show that the loss of w0 would be lower bounded by log(1 + 1 ). [sent-243, score-0.33]

82 Thus the performance e guarantee for Algorithm 2 is signiﬁcantly stronger than for Algorithm 1, for problems in which one would typically apply logistic regression. [sent-244, score-0.255]

83 We include some simulations that compare the two privacy-preserving methods, and demonstrate that using our privacy-preserving approach to logistic regression does not degrade learning performance terribly, from that of standard logistic regression. [sent-261, score-0.548]

84 Performance degradation is inevitable however, as in both cases, in order to address privacy concerns, we are adding noise, either to the learned classiﬁer, or to the objective. [sent-262, score-0.85]

85 In order to obtain a clean comparison between the various logistic regression variants, we ﬁrst experimented with artiﬁcial data that is separable through the origin. [sent-263, score-0.334]

86 We used a uniform distribution on the hypersphere in 10 dimensions with zero mass within a small margin (0. [sent-265, score-0.095]

87 Then we experimented on uniform data that is not linearly separable. [sent-267, score-0.056]

88 In both simulations, our new method is superior to the sensitivity method, although incurs more error than standard (non-private) logistic regression. [sent-274, score-0.437]

89 For both problems, we tuned the logistic regression parameter, λ, to minimize the test error of standard logistic regression, using ﬁve-fold cross-validation on a holdout set of 10,000 points (the tuned values are: λ = 0. [sent-275, score-0.504]

90 new method reach a lower ﬁnal error than the sensitivity method, but it also has better performance at most smaller training set sizes. [sent-365, score-0.207]

91 In order to observe the effect of the level of privacy on the learning performance of the privacypreserving learning algorithms, in Figure 2c)-d) we vary , the privacy parameter to the two algorithms, on both the uniform, low margin data, and the unseparable data. [sent-366, score-1.795]

92 As per the deﬁnition of -differential privacy in Section 2, strengthening the privacy guarantee corresponds to reducing . [sent-367, score-1.641]

93 For the majority of values of that we tested, the new method is superior in managing the tradeoff between privacy and learning performance. [sent-369, score-0.878]

94 For very small , corresponding to extremely stringent privacy requirements, the sensitivity method performs better but also has a predication accuracy close to chance, which is not useful for machine learning purposes. [sent-370, score-1.015]

95 7 Conclusion In conclusion, we show two ways to construct a privacy-preserving linear classiﬁer through logistic regression. [sent-371, score-0.212]

96 Using the -differential privacy deﬁnition of Dwork et al. [sent-373, score-0.808]

97 We provide learning performance guarantees for the two algorithms, which are tighter for our new algorithm, in cases in which one would typically apply logistic regression. [sent-375, score-0.303]

98 In simulations, our new algorithm outperforms the method based on [6]. [sent-376, score-0.058]

99 Our work reveals an interesting connection between regularization and privacy: the larger the regularization constant, the less sensitive the logistic regression function is to any one individual example, and as a result, the less noise one needs to add to make it privacy-preserving. [sent-377, score-0.415]

100 Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. [sent-486, score-0.808]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('privacy', 0.808), ('logistic', 0.212), ('sensitivity', 0.179), ('private', 0.174), ('dwork', 0.156), ('unseparable', 0.119), ('xn', 0.103), ('er', 0.101), ('yn', 0.094), ('adversary', 0.084), ('regression', 0.08), ('nissim', 0.079), ('roof', 0.069), ('argminw', 0.069), ('classi', 0.06), ('margin', 0.06), ('rd', 0.06), ('chaudhuri', 0.059), ('security', 0.056), ('avg', 0.056), ('lambda', 0.056), ('preserves', 0.056), ('mcsherry', 0.052), ('fold', 0.051), ('mechanism', 0.051), ('regularization', 0.048), ('epsilon', 0.048), ('attacks', 0.048), ('cryptography', 0.048), ('preserve', 0.046), ('lr', 0.04), ('eyw', 0.04), ('gehrke', 0.04), ('kamalika', 0.04), ('pods', 0.04), ('raskhodnikova', 0.04), ('sanitization', 0.04), ('sanitizer', 0.04), ('tighter', 0.039), ('loss', 0.038), ('lemma', 0.036), ('output', 0.035), ('uniform', 0.035), ('guarantees', 0.034), ('cross', 0.033), ('database', 0.033), ('individuals', 0.033), ('preserving', 0.033), ('minimizes', 0.032), ('nw', 0.032), ('separator', 0.032), ('convex', 0.031), ('xi', 0.031), ('pick', 0.03), ('algorithm', 0.03), ('randomness', 0.03), ('designing', 0.03), ('pr', 0.029), ('method', 0.028), ('knows', 0.028), ('quantity', 0.028), ('noise', 0.027), ('degrades', 0.027), ('learnability', 0.027), ('nition', 0.026), ('theorem', 0.026), ('degrade', 0.026), ('ey', 0.026), ('community', 0.025), ('guarantee', 0.025), ('ss', 0.025), ('randomized', 0.024), ('tradeoff', 0.024), ('optimization', 0.024), ('differentiable', 0.024), ('regularized', 0.023), ('stoc', 0.023), ('perturbed', 0.023), ('degradation', 0.023), ('norm', 0.023), ('inputs', 0.022), ('corollary', 0.022), ('triangle', 0.022), ('laplace', 0.022), ('labelled', 0.021), ('experimented', 0.021), ('deviation', 0.021), ('proportional', 0.021), ('separable', 0.021), ('surface', 0.02), ('pages', 0.02), ('algorithms', 0.02), ('curves', 0.019), ('learned', 0.019), ('technique', 0.018), ('superior', 0.018), ('apply', 0.018), ('simulations', 0.018), ('log', 0.018), ('lecture', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 185 nips-2008-Privacy-preserving logistic regression

Author: Kamalika Chaudhuri, Claire Monteleoni

2 0.085711531 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to deﬁne a Gaussian process model of human function learning that combines the strengths of both approaches. 1

3 0.070371099 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

Author: Pierre Garrigues, Laurent E. Ghaoui

Abstract: It has been shown that the problem of 1 -penalized least-square regression commonly referred to as the Lasso or Basis Pursuit DeNoising leads to solutions that are sparse and therefore achieves model selection. We propose in this paper RecLasso, an algorithm to solve the Lasso with online (sequential) observations. We introduce an optimization problem that allows us to compute an homotopy from the current solution to the solution after observing a new data point. We compare our method to Lars and Coordinate Descent, and present an application to compressive sensing with sequential observations. Our approach can easily be extended to compute an homotopy from the current solution to the solution that corresponds to removing a data point, which leads to an efﬁcient algorithm for leave-one-out cross-validation. We also propose an algorithm to automatically update the regularization parameter after observing a new data point. 1

4 0.069897667 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization

Author: Tong Zhang

Abstract: We study learning formulations with non-convex regularizaton that are natural for sparse linear models. There are two approaches to this problem: • Heuristic methods such as gradient descent that only ﬁnd a local minimum. A drawback of this approach is the lack of theoretical guarantee showing that the local minimum gives a good solution. • Convex relaxation such as L1 -regularization that solves the problem under some conditions. However it often leads to sub-optimal sparsity in reality. This paper tries to remedy the above gap between theory and practice. In particular, we investigate a multi-stage convex relaxation scheme for solving problems with non-convex regularization. Theoretically, we analyze the behavior of a resulting two-stage relaxation scheme for the capped-L1 regularization. Our performance bound shows that the procedure is superior to the standard L1 convex relaxation for learning sparse targets. Experiments conﬁrm the effectiveness of this method on some simulation and real data. 1

5 0.06483569 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

Abstract: The machine learning problem of classiﬁer design is studied from the perspective of probability elicitation, in statistics. This shows that the standard approach of proceeding from the speciﬁcation of a loss, to the minimization of conditional risk is overly restrictive. It is shown that a better alternative is to start from the speciﬁcation of a functional form for the minimum conditional risk, and derive the loss function. This has various consequences of practical interest, such as showing that 1) the widely adopted practice of relying on convex loss functions is unnecessary, and 2) many new losses can be derived for classiﬁcation problems. These points are illustrated by the derivation of a new loss which is not convex, but does not compromise the computational tractability of classiﬁer design, and is robust to the contamination of data with outliers. A new boosting algorithm, SavageBoost, is derived for the minimization of this loss. Experimental results show that it is indeed less sensitive to outliers than conventional methods, such as Ada, Real, or LogitBoost, and converges in fewer iterations. 1

6 0.059871864 239 nips-2008-Tighter Bounds for Structured Estimation

7 0.05945003 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

8 0.05919173 78 nips-2008-Exact Convex Confidence-Weighted Learning

9 0.055221397 62 nips-2008-Differentiable Sparse Coding

10 0.054577753 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising

11 0.052729134 196 nips-2008-Relative Margin Machines

12 0.051906697 202 nips-2008-Robust Regression and Lasso

13 0.049078606 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints

14 0.048879828 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions

15 0.048877299 228 nips-2008-Support Vector Machines with a Reject Option

16 0.046785522 99 nips-2008-High-dimensional support union recovery in multivariate regression

17 0.044720188 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers

18 0.043584924 214 nips-2008-Sparse Online Learning via Truncated Gradient

19 0.043181352 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

20 0.042966504 226 nips-2008-Supervised Dictionary Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.14), (1, -0.026), (2, -0.081), (3, 0.029), (4, -0.014), (5, 0.013), (6, -0.024), (7, 0.009), (8, -0.041), (9, 0.066), (10, 0.017), (11, 0.067), (12, -0.015), (13, -0.03), (14, -0.0), (15, -0.004), (16, -0.012), (17, -0.032), (18, 0.042), (19, 0.082), (20, -0.032), (21, -0.045), (22, 0.018), (23, 0.033), (24, 0.006), (25, 0.039), (26, 0.022), (27, -0.009), (28, -0.044), (29, -0.067), (30, 0.072), (31, -0.062), (32, -0.005), (33, -0.037), (34, -0.008), (35, -0.004), (36, -0.088), (37, 0.005), (38, -0.029), (39, -0.007), (40, -0.113), (41, 0.021), (42, 0.061), (43, 0.03), (44, 0.093), (45, 0.125), (46, -0.018), (47, -0.009), (48, -0.006), (49, 0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90049613 185 nips-2008-Privacy-preserving logistic regression

Author: Kamalika Chaudhuri, Claire Monteleoni

2 0.6686933 228 nips-2008-Support Vector Machines with a Reject Option

Author: Yves Grandvalet, Alain Rakotomamonjy, Joseph Keshet, Stéphane Canu

Abstract: We consider the problem of binary classiﬁcation where the classiﬁer may abstain instead of classifying each observation. The Bayes decision rule for this setup, known as Chow’s rule, is deﬁned by two thresholds on posterior probabilities. From simple desiderata, namely the consistency and the sparsity of the classiﬁer, we derive the double hinge loss function that focuses on estimating conditional probabilities only in the vicinity of the threshold points of the optimal decision rule. We show that, for suitable kernel machines, our approach is universally consistent. We cast the problem of minimizing the double hinge loss as a quadratic program akin to the standard SVM optimization problem and propose an active set method to solve it efﬁciently. We ﬁnally provide preliminary experimental results illustrating the interest of our constructive approach to devising loss functions. 1

3 0.63387024 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

4 0.60660374 163 nips-2008-On the Efficient Minimization of Classification Calibrated Surrogates

Author: Richard Nock, Frank Nielsen

Abstract: Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classiﬁcation calibration, ties up the minimization of the surrogates and classiﬁcation risks, and left as an important problem the algorithmic questions about the minimization of these surrogates. In this paper, we propose an algorithm which provably minimizes any classiﬁcation calibrated surrogate strictly convex and differentiable — a set whose losses span the exponential, logistic and squared losses —, with boosting-type guaranteed convergence rates under a weak learning assumption. A particular subclass of these surrogates, that we call balanced convex surrogates, has a key rationale that ties it to maximum likelihood estimation, zerosum games and the set of losses that satisfy some of the most common requirements for losses in supervised learning. We report experiments on more than 50 readily available domains of 11 ﬂavors of the algorithm, that shed light on new surrogates, and the potential of data dependent strategies to tune surrogates. 1

5 0.60192692 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

Author: Pierre Garrigues, Laurent E. Ghaoui

6 0.60081166 196 nips-2008-Relative Margin Machines

7 0.57162464 239 nips-2008-Tighter Bounds for Structured Estimation

8 0.55901998 78 nips-2008-Exact Convex Confidence-Weighted Learning

9 0.55164713 138 nips-2008-Modeling human function learning with Gaussian processes

10 0.51390135 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

11 0.50951564 85 nips-2008-Fast Rates for Regularized Objectives

12 0.49774271 226 nips-2008-Supervised Dictionary Learning

13 0.49667162 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization

14 0.47434932 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization

15 0.47058624 62 nips-2008-Differentiable Sparse Coding

16 0.46693265 5 nips-2008-A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning

17 0.46334898 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

18 0.45732233 249 nips-2008-Variational Mixture of Gaussian Process Experts

19 0.44424084 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints

20 0.43381745 25 nips-2008-An interior-point stochastic approximation method and an L1-regularized delta rule

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.082), (7, 0.075), (12, 0.028), (28, 0.152), (52, 0.313), (57, 0.055), (59, 0.014), (63, 0.023), (64, 0.01), (71, 0.029), (77, 0.039), (78, 0.01), (83, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.7665633 229 nips-2008-Syntactic Topic Models

Author: Jordan L. Boyd-graber, David M. Blei

Abstract: We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-speciﬁc topic weights and parse-tree-speciﬁc syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1

same-paper 2 0.69567811 185 nips-2008-Privacy-preserving logistic regression

Author: Kamalika Chaudhuri, Claire Monteleoni

3 0.68641174 196 nips-2008-Relative Margin Machines

Author: Tony Jebara, Pannagadatta K. Shivaswamy

Abstract: In classiﬁcation problems, Support Vector Machines maximize the margin of separation between two classes. While the paradigm has been successful, the solution obtained by SVMs is dominated by the directions with large data spread and biased to separate the classes by cutting along large spread directions. This article proposes a novel formulation to overcome such sensitivity and maximizes the margin relative to the spread of the data. The proposed formulation can be eﬃciently solved and experiments on digit datasets show drastic performance improvements over SVMs. 1

4 0.56045556 62 nips-2008-Differentiable Sparse Coding

Author: J. A. Bagnell, David M. Bradley

Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the beneﬁts of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efﬁciently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and ﬁnd that online optimization of the parameters of the KL-regularized model can signiﬁcantly improve prediction performance. 1

5 0.56037235 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive deﬁnite kernels allow to use large and potentially inﬁnite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efﬁciently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

6 0.55653965 202 nips-2008-Robust Regression and Lasso

7 0.55645347 194 nips-2008-Regularized Learning with Networks of Features

8 0.55529743 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

9 0.5548442 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

10 0.55410182 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

11 0.55361915 143 nips-2008-Multi-label Multiple Kernel Learning

12 0.55148953 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias

13 0.55009651 75 nips-2008-Estimating vector fields using sparse basis field expansions

14 0.54999584 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

15 0.54963511 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

16 0.54954803 226 nips-2008-Supervised Dictionary Learning

17 0.54949158 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

18 0.54873526 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

19 0.54857612 179 nips-2008-Phase transitions for high-dimensional joint support recovery

20 0.54800004 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations