jmlr jmlr2012 jmlr2012-94 knowledge-graph by maker-knowledge-mining

94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

Source: pdf

Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar

Abstract: Classiﬁers are often used to detect miscreant activities. We study how an adversary can systematically query a classiﬁer to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classiﬁers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classiﬁer’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classiﬁers is generally efﬁcient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We study how an adversary can systematically query a classiﬁer to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. [sent-22, score-0.386]

2 We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. [sent-24, score-0.501]

3 Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classiﬁer’s decision boundary. [sent-25, score-0.271]

4 We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classiﬁers is generally efﬁcient for both positive and negative convexity for all levels of approximation if p = 1. [sent-26, score-0.336]

5 We consider how an adversary can systematically discover blind spots by querying a ﬁxed or learningbased detector to ﬁnd a low cost (for some cost function) instance that the detector does not ﬁlter. [sent-38, score-0.401]

6 By observing the responses of the detector, the adversary can search for a modiﬁcation while using as few queries as possible. [sent-42, score-0.507]

7 We also show that near-optimal evasion does not require reverse engineering the classiﬁer’s decision boundary, which is the approach taken by Lowd and Meek (2005) for evading linear classiﬁers in a continuous domain. [sent-48, score-0.268]

8 We present algorithms for evasion that are near-optimal under weighted ℓ1 costs in Section 3, and we consider minimizing general ℓ p costs in Section 4. [sent-62, score-0.356]

9 We conclude the paper by discussing future directions for near-optimal evasion in Section 5. [sent-63, score-0.266]

10 • Even though our algorithms ﬁnd solutions for a more general family of classiﬁers, our algorithms still use only polynomially-many queries in the dimension of the feature space and the accuracy of the desired approximation. [sent-72, score-0.279]

11 , Brent, 1973), which exhibit superlinear convergence under certain conditions on the query function; that is, the number of queries is inversely quadratic in the desired error tolerance. [sent-100, score-0.361]

12 We assume the feature space representation is known by the adversary and there are no restrictions on the adversary’s queries; that is, any point x in feature space X can be queried by the adversary to learn the classiﬁer’s prediction at that point. [sent-120, score-0.328]

13 1 Adversarial Cost We assume the adversary has a notion of utility over the feature space, which we quantify with a cost function A : X → ℜ0+ (the non-negative reals); for example, for a spammer, this could be a string edit distance on email messages. [sent-135, score-0.267]

14 We focus on the general class of weighted ℓ p (0 < p ≤ ∞) cost functions relative to the target xA given by (c) Ap A x−x D = ∑ d=1 1/p A p cd xd − xd , (1) where 0 < cd < ∞ is the relative cost the adversary associates with altering the d th feature. [sent-138, score-0.556]

15 Nevertheless, the objective of this paper is not to provide practical evasion algorithms but rather to understand the theoretic capabilities of an adversary on the analytically tractable, albeit practically restrictive, family of ℓ p costs. [sent-147, score-0.387]

16 Weighted ℓ1 costs are particularly appropriate for adversarial problems in which the adversary is interested in some features more than others and his cost is assessed based on the degree to which a feature is altered. [sent-148, score-0.381]

17 Lowd and Meek (2005) deﬁne minimal adversarial cost (MAC) of a classiﬁer f to be MAC (f , A) inf A x − xA x∈Xf− ; that is, the greatest lower bound on the cost obtained by any negative instance. [sent-154, score-0.301]

18 Namely, if the adversary can determine whether an intermediate cost establishes a new upper or lower bound on the MAC, then binary search strategies can iteratively reduce the t th gap between any bounds Ct− and Ct+ with the fewest steps. [sent-166, score-0.465]

19 The t th query (∗) is Ct = Ct− ·Ct+ , the stopping criterion is Gt ≤ 1 + ε and the search achieves ε-multiplicative optimality in    (∗) log2 G0 (∗)  Lε = log2  (4)  log2 (1 + ε)    steps. [sent-176, score-0.284]

20 , any instance x† that satisﬁes the optimality criterion for cost A also satisﬁes it for A′ (x) = s · A (x) for any s > 0) whereas multiplicative optimality is scale invariant. [sent-186, score-0.279]

21 , any instance x† that satisﬁes the optimality criterion for cost A also satisﬁes it for A′ (x) = s + A (x) for any s ≥ 0) whereas multiplicative optimality is not. [sent-189, score-0.279]

22 Scale invariance is more salient in near-optimal evasion because if the cost function is also scale invariant (all proper norms are) then the optimality condition is invariant to a rescaling of the underlying feature space; for example, a change in units for all features. [sent-190, score-0.349]

23 Finally, while we express query complexity (∗) in the sequel in terms of multiplicative Lε , note that Lε = Θ(log 1 ) and so in this way our query ε complexities can be rewritten to directly depend on ε. [sent-210, score-0.294]

24 1300 Q UERY S TRATEGIES FOR E VADING C ONVEX -I NDUCING C LASSIFIERS using polynomially-many membership queries in D and Lε . [sent-225, score-0.313]

25 Unlike Lowd and Meek’s approach for continuous spaces, our algorithms construct queries to provably ﬁnd an ε-IMAC without reverse engineering the classiﬁer’s decision boundary; that is, estimating the decision surface of f or estimating the parameters that specify it. [sent-228, score-0.29]

26 Both approaches use queries to reduce the size of version space F ⊂ F ; that is, the set of classiﬁers consistent with the adversary’s membership queries. [sent-232, score-0.313]

27 Nonetheless for certain cost functions A, it is easy to determine whether a particular cost ball B C (A) is completely contained within a convex set. [sent-259, score-0.334]

28 We present an efﬁcient algorithm for optimizing (weighted) ℓ1 costs when Xf+ is convex and a polynomial randomized algorithm for optimizing any convex cost when Xf− is convex. [sent-265, score-0.354]

29 The existence of an efﬁcient query algorithm relies on three facts: (1) xA ∈ Xf+ ; (2) every ℓ1 cost C-ball centered at xA intersects with Xf− only if at least one of its vertices is in Xf− ; and (3) C-balls of ℓ1 costs only have 2 · D vertices. [sent-279, score-0.343]

30 (5) C cd normalizes for the weight cd on the d th (c) Lemma 3 For all C > 0, if there exists some x ∈ Xf− that achieves a cost of C = A1 x − xA , then there is some feature d such that a vertex of the form of Equation (5) is in Xf− (and also achieves cost C by Equation 1). [sent-284, score-0.378]

31 (a) Weighted ℓ1 balls are centered around the target xA and have 2 · D vertices; (b) Search directions in multi-line search radiate from xA to probe speciﬁc costs; (c) In general, we leverage convexity of the cost function when searching to evade. [sent-301, score-0.302]

32 By probing all search directions at a speciﬁc cost, the convex hull of the positive queries bounds the ℓ1 cost ball contained within it. [sent-302, score-0.69]

33 , the convex hull of these queries will either form an upper or lower bound on the MAC) to determine whether B C (A) ⊂ Xf+ . [sent-307, score-0.389]

34 Once a negative instance is found at cost C, we cease further queries at cost C since a single negative instance is sufﬁcient to establish a lower bound. [sent-308, score-0.504]

35 Further, when an upper bound is established for a cost C (a negative vertex is found), our algorithm prunes all directions that were positive at cost C. [sent-310, score-0.392]

36 Finally, by performing a binary search on the cost, M ULTI L INE S EARCH ﬁnds an ε-IMAC with no more than |W | · Lε queries but at least |W | + Lε queries. [sent-312, score-0.317]

37 In this case, the search issues at most 2 · D queries to determine whether B C (A1 ) is a subset of Xf+ and so Algorithm 2 is O (Lε · D). [sent-321, score-0.317]

38 1, the number of search directions required to adequately bound an ℓ p cost ball for p > 1 can be exponential in D. [sent-333, score-0.354]

39 This strategy uses fewer queries to shrink the cost gap on symmetrically rounded bodies but is unable to do so for asymmetrically elongated bodies. [sent-340, score-0.377]

40 1305 N ELSON , RUBINSTEIN , H UANG , J OSEPH , L EE , R AO AND T YGAR At each phase, the K- STEP M ULTI L INE S EARCH (Algorithm 3) chooses a single direction e and queries it for K steps to generate candidate bounds B− and B+ on the MAC. [sent-342, score-0.289]

41 It then iteratively queries all remaining directions at the candidate lower bound B+ (a breadth-ﬁrst strategy). [sent-344, score-0.36]

42 2 L OWER B OUND Here we ﬁnd a lower bound on the number of queries required by any algorithm to ﬁnd an ε-IMAC when Xf+ is convex for any convex cost function (e. [sent-355, score-0.523]

43 + Theorem 5 For any D > 0, any positive convex function A : ℜD → ℜ+ , any initial bounds 0 < C0 < C− (∗) − C0 on the MAC, and 0 < ε < C0 − 1, all algorithms must submit at least max{D, Lε } membership + 0 queries in the worst case to be ε-multiplicatively optimal on F convex,'+' . [sent-360, score-0.406]

44 Because linear classiﬁers are a special case of convex-inducing classiﬁers, Algorithm 2 can be applied, and our K- STEP M ULTI L INE S EARCH algorithm improves on complexity of Lowd and Meek’s reverseengineering technique’s O (Lε · D) queries and applies to a broader family of classiﬁers. [sent-376, score-0.279]

45 While Algorithm 2 has superior complexity, it uses 2 · D search directions rather than the D directions used in the approach of Lowd and Meek, which may require our technique to issue more queries in some practical settings. [sent-377, score-0.477]

46 For t instance, given a set W of search directions, t queries xi i=1 and their corresponding responses t yi i=1 , a search direction e can be eliminated from W if for all Ct+ ≤ α < Ct− there does not exist any classiﬁer f ∈ F consistent with all previous queries (i. [sent-379, score-0.684]

47 That is, e is feasible if and only if it is the only search direction among the set of remaining search directions, W , that would be classiﬁed as a negative for a cost α by some consistent classiﬁer. [sent-385, score-0.305]

48 Further, since subsequent queries only restrict the feasible space of α and the set of consistent classiﬁers ˆ F , pruning these infeasible directions is sound for the remainder of the search. [sent-386, score-0.322]

49 Such programs can be efﬁciently solved and may allow the adversary to rapidly eliminate infeasible search directions without issuing additional queries. [sent-396, score-0.319]

50 Extending M ULTI L INE S EARCH Algorithms to Weights cd = ∞ or cd = 0: In Algorithm 2, we reweighted the d th axis-aligned directions by a factor c1d to make unit cost vectors by implicitly assuming cd ∈ (0, ∞). [sent-398, score-0.366]

51 This algorithm performs a halving search on the exponent along a single direction to ﬁnd a positive example, then queries the remaining directions at this candidate bound. [sent-410, score-0.421]

52 Further, in this algorithm, multiple directions are probed only during iterations with positive queries and it makes at most one positive query for each direction. [sent-417, score-0.441]

53 This precursor step requires at most |W | · T queries to initialize 1 the M ULTI L INE S EARCH algorithm with a gap such that Lε = (T − 1) + log2 log (1+ε) according 2 to Equation (4). [sent-432, score-0.274]

54 2 ε-IMAC Learning for a Convex Xf− Here, we minimize a convex cost function A with bounded cost balls (we focus on weighted ℓ1 costs in Equation 1) when the feasible set Xf− is convex. [sent-436, score-0.405]

55 Then given access to an oracle returning separating hyperplanes for the A cost balls, Algorithm 7 will ﬁnd an ε-IMAC using O ∗ D5 queries with high probability. [sent-447, score-0.391]

56 Though Xf− may be unbounded, we are minimizing a cost with bounded cost balls, so we can instead use the set P 0 = Xf− ∩ B 2R (A1 ; x− ) (where R = A x− − xA > Ct ), which is a convex bounded subset of Xf− . [sent-472, score-0.276]

57 For any point y ∈ B C (A1 ), the (sub)gradient of the ℓ1 cost is given by / A hy = cd sign yd − xd d , (6) t and is a separating hyperplane for y and B C (A1 ). [sent-498, score-0.282]

58 Because every iteration in Algorithm 5 requires N = O ∗ (D) samples, each of which need K = ∗ D3 random walk steps, and there are T = O ∗ (D) iterations, the total number of membership O queries required by Algorithm 5 is O ∗ D5 . [sent-509, score-0.338]

59 This algorithm, the ROUNDING algorithm as described by Lov´ sz and Vempala (2003), uses O ∗ D4 membership queries to ﬁnd a transformation a that places P 0 into a near-isotropic position and produces an initial set of samples from it. [sent-531, score-0.313]

60 , we show that ﬁnding an ε-IMAC for this family can require exponentially many queries in D and Lε ). [sent-553, score-0.279]

61 1 to ﬁnd solutions to the near-optimal evasion problem for ℓ p cost functions with p = 1. [sent-557, score-0.289]

62 Figure 8 demonstrates how queries can be used to construct upper and lower bounds on general ℓ p costs. [sent-559, score-0.265]

63 Lemma 7 The largest ℓ p (p > 1) ball enclosed within a C-cost ℓ1 ball has a cost of C · D p = ∞ the cost is C · D−1 . [sent-561, score-0.322]

64 The lower bound provided by those M positive points is the cost of the largest ℓ p cost + ball that ﬁts entirely within their convex hull; let’s say this cost is C† ≤ C0 . [sent-565, score-0.475]

65 The ﬁrst ratio C0 /C0 is controlled solely by the accuracy ε achieved by running the multiline search algorithm for Lε steps whereas the second ratio + C0 /C† depends only on how well the ℓ p ball is approximated by the convex hull of the M search directions. [sent-568, score-0.346]

66 Each row represents a unique set of positive (red '+' points) and negative (black '∗' points) queries and each column shows the implied upper bound (the green dashed ball) and lower bound (the solid blue ball) for a different ℓ p cost. [sent-584, score-0.346]

67 In the ﬁrst row, the body is deﬁned by a random set of seven queries, in the second, the queries are along the coordinate axes, and in the third, the queries are around a circle. [sent-585, score-0.512]

68 C † = 1 for the cost function A, then M ULTI L INE S EARCH Moreover, if the M search directions yield C algorithms can achieve ε-multiplicative optimality with a query complexity that is polynomial in M (∗) and Lε for any ε > 0. [sent-590, score-0.437]

69 In fact, for some ﬁxed values of ε, there is no query-based strategy that can bound ℓ p costs using polynomially-many queries in D as the following result shows. [sent-606, score-0.365]

70 4 M ULTILINE S EARCH FOR p = 2 C− − + Theorem 11 For any D > 1, any initial bounds 0 < C0 < C0 on the MAC, and 0 < ε < C0 − 1, all + D−2 2 algorithms must submit at least αε 0 membership queries (where αε = case to be ε-multiplicatively optimal on F convex,'+' for ℓ2 costs. [sent-614, score-0.336]

71 This result says that there is no algorithm that can generally achieve ε-multiplicative optimality C− for ℓ2 costs for any ﬁxed ε > 0 using only polynomially-many queries in D since the ratio C0 could + 0 be arbitrarily large. [sent-616, score-0.387]

72 Interestingly, substituting this lower bound on ε into the bound given by Theorem 11, we get that the number of required √ queries for ε > D − 1 need only be ≥ M D−2 2 (1 + ε)2 (1 + ε)2 − 1 = D D−1 D−2 2 , √ which is a monotonically increasing function in D that asymptotes at e ≈ 1. [sent-619, score-0.318]

73 They are given by the equivalent (sub)-gradients for ℓ p cost balls: hy p,d = cd · sign A yd − xd · A |yd − xd | (c) Ap (y − xA ) p−1 , A A (c) A hy ∞,d = cd · sign yd − xd · I |yd − xd | = A∞ y − x . [sent-625, score-0.313]

74 By only changing the cost function A and the separating hyperplane hy used for the halfspace cut in Algorithms 5 and 7, the randomized ellipsoid method can also be applied for any weighted ℓ p cost (c) A p with p > 1. [sent-626, score-0.42]

75 For more general convex costs A, we still have that every C-cost ball is a convex set (i. [sent-627, score-0.283]

76 Further, since for any D > C, B C (A) ⊂ B D (A), the separating hyperplane of the D-cost ball is also a separating hyperplane of the C cost ball and can be re-used in our Algorithm 7. [sent-630, score-0.421]

77 Thus, this procedure is applicable for any convex cost function, A, so long as we can compute the separating hyperplanes of any cost ball of A for any point y not in the cost ball. [sent-631, score-0.483]

78 We present membership query algorithms that efﬁciently accomplish ε-IMAC search on this family. [sent-636, score-0.265]

79 If the adversary is unaware of which set is convex, they can trivially run both searches to discover an ε-IMAC with a combined polynomial query complexity. [sent-639, score-0.283]

80 By doing so, these algorithms only require polynomially-many queries in spite of the size of the family of all convex-inducing classiﬁers. [sent-643, score-0.279]

81 Exploring near-optimal evasion is important for understanding how an adversary may circumvent learners in security-sensitive settings. [sent-651, score-0.35]

82 The algorithms we present are invaluable tools not for an adversary to develop better attacks but rather for analysts to better understand the vulnerabilities of their ﬁlters: our framework provides the query complexity in the worst-case setting when an adversary can directly query the classiﬁer. [sent-653, score-0.566]

83 However, our analysis and algorithms do not completely answer the evasion problem and also generally can not be easily used by an adversary since there are several real-world obstacles that are not incorporated into our framework. [sent-654, score-0.35]

84 Most importantly, an adversary may not be able to query all x ∈ X ; instead their queries must be legitimate objects (such as email) that are mapped into X . [sent-656, score-0.525]

85 Can an adversary efﬁciently perform ε-IMAC search when his cost is deﬁned in an alternate feature space to the classiﬁer’s? [sent-660, score-0.342]

86 Query Complexity for K- STEP M ULTI L INE S EARCH Algorithm We consider the evasion problem as a game between classiﬁer (playing ﬁrst) and adversary (playing second) who wishes to evade detection by the classiﬁer. [sent-672, score-0.35]

87 To analyze the worst-case query complexity of K- STEP M ULTI L INE S EARCH (Algorithm 3), we consider a worst-case classiﬁer that seeks to maximize the number of queries submitted by the adversary. [sent-673, score-0.361]

88 The primary decision for the worst-case classiﬁer occurs when the adversary begins querying other directions beside e. [sent-681, score-0.275]

89 We conservatively assume that the gap only decreases for case 1, which decouples the analysis of the queries for C1 and C2 and allows us to upper bound the total number of queries. [sent-687, score-0.312]

90 At every iteration in case one, the adversary makes exactly K + |Wt | − 1 queries where Wt is the set of feasible directions remaining at the t th iteration. [sent-690, score-0.516]

91 Proof of Theorem 5 Suppose a query-based algorithm submits N < D + 1 membership queries x1 , . [sent-707, score-0.336]

92 Suppose that the responses to the queries are consistent with the classiﬁer f deﬁned as: f (x) = − +1 , if A x − xA < C0 . [sent-712, score-0.268]

93 − + + Since G contains all positive queries and C0 < C0 , the convex set Xg is consistent with the observed − + + + responses, B C0 (A) ⊂ Xg by deﬁnition, and B C0 (A) ⊂ Xg since the positive queries are all inside − the open C0 -sublevel set. [sent-727, score-0.554]

94 [−∞, 0] , if ai = −1 Proof of Theorem 10 Suppose a query-based algorithm submits N membership queries x1 , . [sent-784, score-0.336]

95 By intersecting each data point’s orthant with the set Xf+ and taking the convex hull of these regions, G is convex , contains xA and is a subset of Xf+ consistent with all the query responses of f ; that − + is, each of the M positive queries are in Xg and all the negative queries are in Xg . [sent-803, score-0.889]

96 We refer to the orthants used in G to cover the M positive queries as covering orthants and their corresponding vertices form a covering of the hypercube. [sent-808, score-0.436]

97 Since the displacement d deﬁned above is greater than 0, by applying Lemma 13, this separating hyperplane upper bounds the cost of the largest ℓ p ball enclosed in G as − MAC (g, A p ) ≤ C0 (D − K) p−1 p · w −1 p p−1 − = C0 D−K D for 1 < p < ∞ and p−1 p D−K D − for p = ∞. [sent-835, score-0.285]

98 Γ( D+1 )Γ( D ) 2 2 Γ(1+ D )Γ( D−1 ) 2 2 = D−1 D so that after Q UERY S TRATEGIES FOR E VADING C ONVEX -I NDUCING C LASSIFIERS Proof of Theorem 11 Suppose a query-based algorithm submits N membership queries x1 , . [sent-864, score-0.336]

99 Now − consider the projection of each of the positive queries onto the surface of the ℓ2 ball B C0 (A2 ), i − given by the points zi = C0 A (xx−xA ) . [sent-883, score-0.3]

100 On the power of membership queries in agnostic learning. [sent-934, score-0.313]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xf', 0.495), ('xa', 0.374), ('queries', 0.242), ('earch', 0.225), ('evasion', 0.186), ('ct', 0.184), ('adversary', 0.164), ('mac', 0.161), ('ulti', 0.133), ('query', 0.119), ('elson', 0.115), ('oseph', 0.115), ('ygar', 0.115), ('nducing', 0.109), ('trategies', 0.109), ('uery', 0.109), ('vading', 0.109), ('ine', 0.106), ('rubinstein', 0.106), ('cost', 0.103), ('lowd', 0.103), ('ao', 0.098), ('lassifiers', 0.093), ('costs', 0.085), ('onvex', 0.082), ('directions', 0.08), ('uang', 0.076), ('search', 0.075), ('xg', 0.075), ('membership', 0.071), ('convex', 0.07), ('ers', 0.067), ('meek', 0.065), ('bertsimas', 0.063), ('optimality', 0.06), ('classi', 0.06), ('ball', 0.058), ('multiplicative', 0.056), ('hyperplane', 0.055), ('ee', 0.055), ('gt', 0.054), ('orthant', 0.053), ('ellipsoid', 0.053), ('er', 0.052), ('cd', 0.051), ('reverse', 0.048), ('separating', 0.046), ('ntersect', 0.046), ('vempala', 0.045), ('orthants', 0.044), ('balls', 0.044), ('intrusion', 0.042), ('caps', 0.04), ('vertex', 0.04), ('hull', 0.039), ('bound', 0.038), ('hypersphere', 0.038), ('family', 0.037), ('nelson', 0.037), ('vertices', 0.036), ('covering', 0.035), ('blaine', 0.034), ('orth', 0.034), ('piral', 0.034), ('halfspace', 0.034), ('fe', 0.034), ('evading', 0.034), ('spam', 0.032), ('gap', 0.032), ('querying', 0.031), ('hz', 0.031), ('th', 0.03), ('spherical', 0.03), ('intersection', 0.03), ('hypercube', 0.029), ('adversarial', 0.029), ('acre', 0.029), ('multiline', 0.029), ('additive', 0.028), ('negative', 0.028), ('body', 0.028), ('rounding', 0.027), ('centroid', 0.027), ('responds', 0.027), ('xd', 0.027), ('randomized', 0.026), ('hamming', 0.026), ('responses', 0.026), ('walk', 0.025), ('cap', 0.025), ('prune', 0.025), ('direction', 0.024), ('ap', 0.024), ('sin', 0.023), ('bounds', 0.023), ('edi', 0.023), ('searchability', 0.023), ('searchable', 0.023), ('submits', 0.023), ('wyner', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000023 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar

2 0.32717669 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

Author: Ji Liu, Peter Wonka, Jieping Ye

Abstract: We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ∗ + ε where ε is the noise vector following a Gaussian distribution N(0, σ2 I), how to recover the signal (or parameter vector) β∗ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively reﬁnes the target signal β∗ . We show that if X obeys a certain condition, then with a large probability ˆ the difference between the solution β estimated by the proposed method and the true solution β∗ measured in terms of the ℓ p norm (p ≥ 1) is bounded as ˆ β − β∗ p ≤ C(s − N)1/p log m + ∆ σ, where C is a constant, s is the number of nonzero entries in β∗ , the risk of the oracle estimator ∆ is independent of m and is much smaller than the ﬁrst term, and N is the number of entries of β∗ larger √ than a certain value in the order of O (σ log m). The proposed √ method improves the estimation √ bound of the standard Dantzig selector approximately from Cs1/p log mσ to C(s − N)1/p log mσ where the value N depends on the number of large entries in β∗ . When N = s, the proposed algorithm achieves the oracle solution with a high probability, where the oracle solution is the projection of the observation vector y onto true features. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. Finally, we extend this multi-stage procedure to the LASSO case. Keywords: multi-stage, Dantzig selector, LASSO, sparse signal recovery

3 0.11389235 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan

Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efﬁcient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efﬁcient, crowdsourcing

4 0.10595021 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations sufﬁce to produce a predictor with empirical risk ε-close to the inﬁmum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy

5 0.070249639 3 jmlr-2012-A Geometric Approach to Sample Compression

Author: Benjamin I.P. Rubinstein, J. Hyam Rubinstein

Abstract: The Sample Compression Conjecture of Littlestone & Warmuth has remained unsolved for a quarter century. While maximum classes (concept classes meeting Sauer’s Lemma with equality) can be compressed, the compression of general concept classes reduces to compressing maximal classes (classes that cannot be expanded without increasing VC dimension). Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a ﬁrst negative result on the former, through a systematic investigation of ﬁnite maximum classes. Simple arrangements of hyperplanes in hyperbolic space are shown to represent maximum classes, generalizing the corresponding Euclidean result. We show that sweeping a generic hyperplane across such arrangements forms an unlabeled compression scheme of size VC dimension and corresponds to a special case of peeling the one-inclusion graph, resolving a recent conjecture of Kuzmin & Warmuth. A bijection between ﬁnite maximum classes and certain arrangements of piecewise-linear (PL) hyperplanes in either a ball or Euclidean space is established. Finally we show that d-maximum classes corresponding to PL-hyperplane arrangements in Rd have cubical complexes homeomorphic to a d-ball, or equivalently complexes that are manifolds with boundary. A main result is that PL arrangements can be swept by a moving hyperplane to unlabeled d-compress any ﬁnite maximum class, forming a peeling scheme as conjectured by Kuzmin & Warmuth. A corollary is that some d-maximal classes cannot be embedded into any maximum class of VC-dimension d + k, for any constant k. The construction of the PL sweeping involves Pachner moves on the one-inclusion graph, corresponding to moves of a hyperplane across the intersection of d other hyperplanes. This extends the well known Pachner moves for triangulations to c

6 0.063200802 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

7 0.062978528 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

8 0.062504634 12 jmlr-2012-Active Clustering of Biological Sequences

9 0.05916959 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks

10 0.05657132 95 jmlr-2012-Random Search for Hyper-Parameter Optimization

11 0.053083584 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

12 0.052035004 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package

13 0.050613903 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

14 0.047816053 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

15 0.047211889 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems

16 0.045877397 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

17 0.04458984 80 jmlr-2012-On Ranking and Generalization Bounds

18 0.043197114 13 jmlr-2012-Active Learning via Perfect Selective Classification

19 0.043010384 84 jmlr-2012-Online Submodular Minimization

20 0.040640805 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.202), (1, -0.046), (2, -0.068), (3, -0.027), (4, -0.184), (5, 0.254), (6, 0.244), (7, -0.206), (8, 0.095), (9, 0.391), (10, -0.264), (11, -0.112), (12, -0.081), (13, 0.204), (14, -0.011), (15, 0.174), (16, -0.031), (17, -0.102), (18, 0.126), (19, 0.029), (20, 0.07), (21, -0.045), (22, -0.061), (23, 0.014), (24, 0.021), (25, 0.025), (26, -0.042), (27, -0.114), (28, 0.007), (29, 0.03), (30, -0.027), (31, 0.001), (32, -0.092), (33, 0.002), (34, -0.026), (35, -0.038), (36, -0.04), (37, 0.031), (38, -0.048), (39, 0.004), (40, 0.024), (41, -0.019), (42, -0.024), (43, 0.002), (44, -0.029), (45, -0.002), (46, -0.02), (47, 0.08), (48, 0.101), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93741781 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar

2 0.74476373 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

Author: Ji Liu, Peter Wonka, Jieping Ye

3 0.39501622 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

4 0.31632969 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan

5 0.29215887 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

Author: Nir Ailon

Abstract: Given a set V of n elements we wish to linearly order them given pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(ε−6 n log5 n) preference labels for a regret of ε times the optimal loss. As a function of n, this is asymptotically better than standard (non-adaptive) learning bounds achievable for the same problem. Our main result takes us a step closer toward settling an open problem posed by learning-torank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we analyze a typical test case in which a large margin linear relaxation is used for efﬁciently solving the simpler learning problems in our decomposition. Keywords: statistical learning theory, active learning, ranking, pairwise ranking, preferences

6 0.23824987 12 jmlr-2012-Active Clustering of Biological Sequences

7 0.23174708 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

8 0.23132172 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

9 0.22100103 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

10 0.20737503 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

11 0.20715357 3 jmlr-2012-A Geometric Approach to Sample Compression

12 0.20039637 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks

13 0.18722898 95 jmlr-2012-Random Search for Hyper-Parameter Optimization

14 0.18042271 13 jmlr-2012-Active Learning via Perfect Selective Classification

15 0.16545045 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

16 0.15779835 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

17 0.15627724 91 jmlr-2012-Plug-in Approach to Active Learning

18 0.15489483 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

19 0.15222232 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems

20 0.14938143 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.034), (26, 0.028), (27, 0.028), (29, 0.029), (34, 0.015), (35, 0.035), (49, 0.017), (56, 0.013), (64, 0.445), (69, 0.014), (75, 0.059), (77, 0.011), (79, 0.033), (92, 0.066), (96, 0.078)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93893933 107 jmlr-2012-Smoothing Multivariate Performance Measures

Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan

Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can ﬁnd an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges signiﬁcantly faster than CPMs without sacriﬁcing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing

same-paper 2 0.76963419 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar

3 0.69891298 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

Author: Stephen Gould

Abstract: We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data ﬂows. Keywords: machine learning, graphical models, computer vision, open-source software

4 0.68561977 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization

Author: Yiming Ying, Peng Li

Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efﬁcient metric learning algorithms. Indeed, ﬁrst-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difﬁcult and challenging face veriﬁcation data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-deﬁnite programming, ﬁrst-order methods, eigenvalue optimization, matrix factorization, face veriﬁcation

5 0.38764304 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identiﬁcation of an appropriate distance metric on the input data. It is often beneﬁcial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semideﬁnite. Semideﬁnite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semideﬁnite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semideﬁnite matrices as weak learners within an efﬁcient and scalable boosting-based learning process. The resulting methods are easy to implement, efﬁcient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semideﬁnite matrix with trace and rank being one rather than a classiﬁer or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classiﬁcation accuracy and running time. Keywords: Mahalanobis distance, semideﬁnite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor

6 0.32535559 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

7 0.32416639 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

8 0.32415378 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

9 0.31925505 84 jmlr-2012-Online Submodular Minimization

10 0.31777763 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

11 0.31697592 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

12 0.3147296 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

13 0.31253424 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks

14 0.31000999 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

15 0.30980012 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

16 0.30803329 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

17 0.30577073 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss

18 0.30502644 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

19 0.3006348 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

20 0.30012792 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis