jmlr jmlr2012 jmlr2012-94 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
Reference: text
sentIndex sentText sentNum sentScore
1 We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. [sent-22, score-0.386]
2 We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. [sent-24, score-0.501]
3 Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. [sent-25, score-0.271]
4 We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. [sent-26, score-0.336]
5 We consider how an adversary can systematically discover blind spots by querying a fixed or learningbased detector to find a low cost (for some cost function) instance that the detector does not filter. [sent-38, score-0.401]
6 By observing the responses of the detector, the adversary can search for a modification while using as few queries as possible. [sent-42, score-0.507]
7 We also show that near-optimal evasion does not require reverse engineering the classifier’s decision boundary, which is the approach taken by Lowd and Meek (2005) for evading linear classifiers in a continuous domain. [sent-48, score-0.268]
8 We present algorithms for evasion that are near-optimal under weighted ℓ1 costs in Section 3, and we consider minimizing general ℓ p costs in Section 4. [sent-62, score-0.356]
9 We conclude the paper by discussing future directions for near-optimal evasion in Section 5. [sent-63, score-0.266]
10 • Even though our algorithms find solutions for a more general family of classifiers, our algorithms still use only polynomially-many queries in the dimension of the feature space and the accuracy of the desired approximation. [sent-72, score-0.279]
11 , Brent, 1973), which exhibit superlinear convergence under certain conditions on the query function; that is, the number of queries is inversely quadratic in the desired error tolerance. [sent-100, score-0.361]
12 We assume the feature space representation is known by the adversary and there are no restrictions on the adversary’s queries; that is, any point x in feature space X can be queried by the adversary to learn the classifier’s prediction at that point. [sent-120, score-0.328]
13 1 Adversarial Cost We assume the adversary has a notion of utility over the feature space, which we quantify with a cost function A : X → ℜ0+ (the non-negative reals); for example, for a spammer, this could be a string edit distance on email messages. [sent-135, score-0.267]
14 We focus on the general class of weighted ℓ p (0 < p ≤ ∞) cost functions relative to the target xA given by (c) Ap A x−x D = ∑ d=1 1/p A p cd xd − xd , (1) where 0 < cd < ∞ is the relative cost the adversary associates with altering the d th feature. [sent-138, score-0.556]
15 Nevertheless, the objective of this paper is not to provide practical evasion algorithms but rather to understand the theoretic capabilities of an adversary on the analytically tractable, albeit practically restrictive, family of ℓ p costs. [sent-147, score-0.387]
16 Weighted ℓ1 costs are particularly appropriate for adversarial problems in which the adversary is interested in some features more than others and his cost is assessed based on the degree to which a feature is altered. [sent-148, score-0.381]
17 Lowd and Meek (2005) define minimal adversarial cost (MAC) of a classifier f to be MAC (f , A) inf A x − xA x∈Xf− ; that is, the greatest lower bound on the cost obtained by any negative instance. [sent-154, score-0.301]
18 Namely, if the adversary can determine whether an intermediate cost establishes a new upper or lower bound on the MAC, then binary search strategies can iteratively reduce the t th gap between any bounds Ct− and Ct+ with the fewest steps. [sent-166, score-0.465]
19 The t th query (∗) is Ct = Ct− ·Ct+ , the stopping criterion is Gt ≤ 1 + ε and the search achieves ε-multiplicative optimality in (∗) log2 G0 (∗) Lε = log2 (4) log2 (1 + ε) steps. [sent-176, score-0.284]
20 , any instance x† that satisfies the optimality criterion for cost A also satisfies it for A′ (x) = s · A (x) for any s > 0) whereas multiplicative optimality is scale invariant. [sent-186, score-0.279]
21 , any instance x† that satisfies the optimality criterion for cost A also satisfies it for A′ (x) = s + A (x) for any s ≥ 0) whereas multiplicative optimality is not. [sent-189, score-0.279]
22 Scale invariance is more salient in near-optimal evasion because if the cost function is also scale invariant (all proper norms are) then the optimality condition is invariant to a rescaling of the underlying feature space; for example, a change in units for all features. [sent-190, score-0.349]
23 Finally, while we express query complexity (∗) in the sequel in terms of multiplicative Lε , note that Lε = Θ(log 1 ) and so in this way our query ε complexities can be rewritten to directly depend on ε. [sent-210, score-0.294]
24 1300 Q UERY S TRATEGIES FOR E VADING C ONVEX -I NDUCING C LASSIFIERS using polynomially-many membership queries in D and Lε . [sent-225, score-0.313]
25 Unlike Lowd and Meek’s approach for continuous spaces, our algorithms construct queries to provably find an ε-IMAC without reverse engineering the classifier’s decision boundary; that is, estimating the decision surface of f or estimating the parameters that specify it. [sent-228, score-0.29]
26 Both approaches use queries to reduce the size of version space F ⊂ F ; that is, the set of classifiers consistent with the adversary’s membership queries. [sent-232, score-0.313]
27 Nonetheless for certain cost functions A, it is easy to determine whether a particular cost ball B C (A) is completely contained within a convex set. [sent-259, score-0.334]
28 We present an efficient algorithm for optimizing (weighted) ℓ1 costs when Xf+ is convex and a polynomial randomized algorithm for optimizing any convex cost when Xf− is convex. [sent-265, score-0.354]
29 The existence of an efficient query algorithm relies on three facts: (1) xA ∈ Xf+ ; (2) every ℓ1 cost C-ball centered at xA intersects with Xf− only if at least one of its vertices is in Xf− ; and (3) C-balls of ℓ1 costs only have 2 · D vertices. [sent-279, score-0.343]
30 (5) C cd normalizes for the weight cd on the d th (c) Lemma 3 For all C > 0, if there exists some x ∈ Xf− that achieves a cost of C = A1 x − xA , then there is some feature d such that a vertex of the form of Equation (5) is in Xf− (and also achieves cost C by Equation 1). [sent-284, score-0.378]
31 (a) Weighted ℓ1 balls are centered around the target xA and have 2 · D vertices; (b) Search directions in multi-line search radiate from xA to probe specific costs; (c) In general, we leverage convexity of the cost function when searching to evade. [sent-301, score-0.302]
32 By probing all search directions at a specific cost, the convex hull of the positive queries bounds the ℓ1 cost ball contained within it. [sent-302, score-0.69]
33 , the convex hull of these queries will either form an upper or lower bound on the MAC) to determine whether B C (A) ⊂ Xf+ . [sent-307, score-0.389]
34 Once a negative instance is found at cost C, we cease further queries at cost C since a single negative instance is sufficient to establish a lower bound. [sent-308, score-0.504]
35 Further, when an upper bound is established for a cost C (a negative vertex is found), our algorithm prunes all directions that were positive at cost C. [sent-310, score-0.392]
36 Finally, by performing a binary search on the cost, M ULTI L INE S EARCH finds an ε-IMAC with no more than |W | · Lε queries but at least |W | + Lε queries. [sent-312, score-0.317]
37 In this case, the search issues at most 2 · D queries to determine whether B C (A1 ) is a subset of Xf+ and so Algorithm 2 is O (Lε · D). [sent-321, score-0.317]
38 1, the number of search directions required to adequately bound an ℓ p cost ball for p > 1 can be exponential in D. [sent-333, score-0.354]
39 This strategy uses fewer queries to shrink the cost gap on symmetrically rounded bodies but is unable to do so for asymmetrically elongated bodies. [sent-340, score-0.377]
40 1305 N ELSON , RUBINSTEIN , H UANG , J OSEPH , L EE , R AO AND T YGAR At each phase, the K- STEP M ULTI L INE S EARCH (Algorithm 3) chooses a single direction e and queries it for K steps to generate candidate bounds B− and B+ on the MAC. [sent-342, score-0.289]
41 It then iteratively queries all remaining directions at the candidate lower bound B+ (a breadth-first strategy). [sent-344, score-0.36]
42 2 L OWER B OUND Here we find a lower bound on the number of queries required by any algorithm to find an ε-IMAC when Xf+ is convex for any convex cost function (e. [sent-355, score-0.523]
43 + Theorem 5 For any D > 0, any positive convex function A : ℜD → ℜ+ , any initial bounds 0 < C0 < C− (∗) − C0 on the MAC, and 0 < ε < C0 − 1, all algorithms must submit at least max{D, Lε } membership + 0 queries in the worst case to be ε-multiplicatively optimal on F convex,'+' . [sent-360, score-0.406]
44 Because linear classifiers are a special case of convex-inducing classifiers, Algorithm 2 can be applied, and our K- STEP M ULTI L INE S EARCH algorithm improves on complexity of Lowd and Meek’s reverseengineering technique’s O (Lε · D) queries and applies to a broader family of classifiers. [sent-376, score-0.279]
45 While Algorithm 2 has superior complexity, it uses 2 · D search directions rather than the D directions used in the approach of Lowd and Meek, which may require our technique to issue more queries in some practical settings. [sent-377, score-0.477]
46 For t instance, given a set W of search directions, t queries xi i=1 and their corresponding responses t yi i=1 , a search direction e can be eliminated from W if for all Ct+ ≤ α < Ct− there does not exist any classifier f ∈ F consistent with all previous queries (i. [sent-379, score-0.684]
47 That is, e is feasible if and only if it is the only search direction among the set of remaining search directions, W , that would be classified as a negative for a cost α by some consistent classifier. [sent-385, score-0.305]
48 Further, since subsequent queries only restrict the feasible space of α and the set of consistent classifiers ˆ F , pruning these infeasible directions is sound for the remainder of the search. [sent-386, score-0.322]
49 Such programs can be efficiently solved and may allow the adversary to rapidly eliminate infeasible search directions without issuing additional queries. [sent-396, score-0.319]
50 Extending M ULTI L INE S EARCH Algorithms to Weights cd = ∞ or cd = 0: In Algorithm 2, we reweighted the d th axis-aligned directions by a factor c1d to make unit cost vectors by implicitly assuming cd ∈ (0, ∞). [sent-398, score-0.366]
51 This algorithm performs a halving search on the exponent along a single direction to find a positive example, then queries the remaining directions at this candidate bound. [sent-410, score-0.421]
52 Further, in this algorithm, multiple directions are probed only during iterations with positive queries and it makes at most one positive query for each direction. [sent-417, score-0.441]
53 This precursor step requires at most |W | · T queries to initialize 1 the M ULTI L INE S EARCH algorithm with a gap such that Lε = (T − 1) + log2 log (1+ε) according 2 to Equation (4). [sent-432, score-0.274]
54 2 ε-IMAC Learning for a Convex Xf− Here, we minimize a convex cost function A with bounded cost balls (we focus on weighted ℓ1 costs in Equation 1) when the feasible set Xf− is convex. [sent-436, score-0.405]
55 Then given access to an oracle returning separating hyperplanes for the A cost balls, Algorithm 7 will find an ε-IMAC using O ∗ D5 queries with high probability. [sent-447, score-0.391]
56 Though Xf− may be unbounded, we are minimizing a cost with bounded cost balls, so we can instead use the set P 0 = Xf− ∩ B 2R (A1 ; x− ) (where R = A x− − xA > Ct ), which is a convex bounded subset of Xf− . [sent-472, score-0.276]
57 For any point y ∈ B C (A1 ), the (sub)gradient of the ℓ1 cost is given by / A hy = cd sign yd − xd d , (6) t and is a separating hyperplane for y and B C (A1 ). [sent-498, score-0.282]
58 Because every iteration in Algorithm 5 requires N = O ∗ (D) samples, each of which need K = ∗ D3 random walk steps, and there are T = O ∗ (D) iterations, the total number of membership O queries required by Algorithm 5 is O ∗ D5 . [sent-509, score-0.338]
59 This algorithm, the ROUNDING algorithm as described by Lov´ sz and Vempala (2003), uses O ∗ D4 membership queries to find a transformation a that places P 0 into a near-isotropic position and produces an initial set of samples from it. [sent-531, score-0.313]
60 , we show that finding an ε-IMAC for this family can require exponentially many queries in D and Lε ). [sent-553, score-0.279]
61 1 to find solutions to the near-optimal evasion problem for ℓ p cost functions with p = 1. [sent-557, score-0.289]
62 Figure 8 demonstrates how queries can be used to construct upper and lower bounds on general ℓ p costs. [sent-559, score-0.265]
63 Lemma 7 The largest ℓ p (p > 1) ball enclosed within a C-cost ℓ1 ball has a cost of C · D p = ∞ the cost is C · D−1 . [sent-561, score-0.322]
64 The lower bound provided by those M positive points is the cost of the largest ℓ p cost + ball that fits entirely within their convex hull; let’s say this cost is C† ≤ C0 . [sent-565, score-0.475]
65 The first ratio C0 /C0 is controlled solely by the accuracy ε achieved by running the multiline search algorithm for Lε steps whereas the second ratio + C0 /C† depends only on how well the ℓ p ball is approximated by the convex hull of the M search directions. [sent-568, score-0.346]
66 Each row represents a unique set of positive (red '+' points) and negative (black '∗' points) queries and each column shows the implied upper bound (the green dashed ball) and lower bound (the solid blue ball) for a different ℓ p cost. [sent-584, score-0.346]
67 In the first row, the body is defined by a random set of seven queries, in the second, the queries are along the coordinate axes, and in the third, the queries are around a circle. [sent-585, score-0.512]
68 C † = 1 for the cost function A, then M ULTI L INE S EARCH Moreover, if the M search directions yield C algorithms can achieve ε-multiplicative optimality with a query complexity that is polynomial in M (∗) and Lε for any ε > 0. [sent-590, score-0.437]
69 In fact, for some fixed values of ε, there is no query-based strategy that can bound ℓ p costs using polynomially-many queries in D as the following result shows. [sent-606, score-0.365]
70 4 M ULTILINE S EARCH FOR p = 2 C− − + Theorem 11 For any D > 1, any initial bounds 0 < C0 < C0 on the MAC, and 0 < ε < C0 − 1, all + D−2 2 algorithms must submit at least αε 0 membership queries (where αε = case to be ε-multiplicatively optimal on F convex,'+' for ℓ2 costs. [sent-614, score-0.336]
71 This result says that there is no algorithm that can generally achieve ε-multiplicative optimality C− for ℓ2 costs for any fixed ε > 0 using only polynomially-many queries in D since the ratio C0 could + 0 be arbitrarily large. [sent-616, score-0.387]
72 Interestingly, substituting this lower bound on ε into the bound given by Theorem 11, we get that the number of required √ queries for ε > D − 1 need only be ≥ M D−2 2 (1 + ε)2 (1 + ε)2 − 1 = D D−1 D−2 2 , √ which is a monotonically increasing function in D that asymptotes at e ≈ 1. [sent-619, score-0.318]
73 They are given by the equivalent (sub)-gradients for ℓ p cost balls: hy p,d = cd · sign A yd − xd · A |yd − xd | (c) Ap (y − xA ) p−1 , A A (c) A hy ∞,d = cd · sign yd − xd · I |yd − xd | = A∞ y − x . [sent-625, score-0.313]
74 By only changing the cost function A and the separating hyperplane hy used for the halfspace cut in Algorithms 5 and 7, the randomized ellipsoid method can also be applied for any weighted ℓ p cost (c) A p with p > 1. [sent-626, score-0.42]
75 For more general convex costs A, we still have that every C-cost ball is a convex set (i. [sent-627, score-0.283]
76 Further, since for any D > C, B C (A) ⊂ B D (A), the separating hyperplane of the D-cost ball is also a separating hyperplane of the C cost ball and can be re-used in our Algorithm 7. [sent-630, score-0.421]
77 Thus, this procedure is applicable for any convex cost function, A, so long as we can compute the separating hyperplanes of any cost ball of A for any point y not in the cost ball. [sent-631, score-0.483]
78 We present membership query algorithms that efficiently accomplish ε-IMAC search on this family. [sent-636, score-0.265]
79 If the adversary is unaware of which set is convex, they can trivially run both searches to discover an ε-IMAC with a combined polynomial query complexity. [sent-639, score-0.283]
80 By doing so, these algorithms only require polynomially-many queries in spite of the size of the family of all convex-inducing classifiers. [sent-643, score-0.279]
81 Exploring near-optimal evasion is important for understanding how an adversary may circumvent learners in security-sensitive settings. [sent-651, score-0.35]
82 The algorithms we present are invaluable tools not for an adversary to develop better attacks but rather for analysts to better understand the vulnerabilities of their filters: our framework provides the query complexity in the worst-case setting when an adversary can directly query the classifier. [sent-653, score-0.566]
83 However, our analysis and algorithms do not completely answer the evasion problem and also generally can not be easily used by an adversary since there are several real-world obstacles that are not incorporated into our framework. [sent-654, score-0.35]
84 Most importantly, an adversary may not be able to query all x ∈ X ; instead their queries must be legitimate objects (such as email) that are mapped into X . [sent-656, score-0.525]
85 Can an adversary efficiently perform ε-IMAC search when his cost is defined in an alternate feature space to the classifier’s? [sent-660, score-0.342]
86 Query Complexity for K- STEP M ULTI L INE S EARCH Algorithm We consider the evasion problem as a game between classifier (playing first) and adversary (playing second) who wishes to evade detection by the classifier. [sent-672, score-0.35]
87 To analyze the worst-case query complexity of K- STEP M ULTI L INE S EARCH (Algorithm 3), we consider a worst-case classifier that seeks to maximize the number of queries submitted by the adversary. [sent-673, score-0.361]
88 The primary decision for the worst-case classifier occurs when the adversary begins querying other directions beside e. [sent-681, score-0.275]
89 We conservatively assume that the gap only decreases for case 1, which decouples the analysis of the queries for C1 and C2 and allows us to upper bound the total number of queries. [sent-687, score-0.312]
90 At every iteration in case one, the adversary makes exactly K + |Wt | − 1 queries where Wt is the set of feasible directions remaining at the t th iteration. [sent-690, score-0.516]
91 Proof of Theorem 5 Suppose a query-based algorithm submits N < D + 1 membership queries x1 , . [sent-707, score-0.336]
92 Suppose that the responses to the queries are consistent with the classifier f defined as: f (x) = − +1 , if A x − xA < C0 . [sent-712, score-0.268]
93 − + + Since G contains all positive queries and C0 < C0 , the convex set Xg is consistent with the observed − + + + responses, B C0 (A) ⊂ Xg by definition, and B C0 (A) ⊂ Xg since the positive queries are all inside − the open C0 -sublevel set. [sent-727, score-0.554]
94 [−∞, 0] , if ai = −1 Proof of Theorem 10 Suppose a query-based algorithm submits N membership queries x1 , . [sent-784, score-0.336]
95 By intersecting each data point’s orthant with the set Xf+ and taking the convex hull of these regions, G is convex , contains xA and is a subset of Xf+ consistent with all the query responses of f ; that − + is, each of the M positive queries are in Xg and all the negative queries are in Xg . [sent-803, score-0.889]
96 We refer to the orthants used in G to cover the M positive queries as covering orthants and their corresponding vertices form a covering of the hypercube. [sent-808, score-0.436]
97 Since the displacement d defined above is greater than 0, by applying Lemma 13, this separating hyperplane upper bounds the cost of the largest ℓ p ball enclosed in G as − MAC (g, A p ) ≤ C0 (D − K) p−1 p · w −1 p p−1 − = C0 D−K D for 1 < p < ∞ and p−1 p D−K D − for p = ∞. [sent-835, score-0.285]
98 Γ( D+1 )Γ( D ) 2 2 Γ(1+ D )Γ( D−1 ) 2 2 = D−1 D so that after Q UERY S TRATEGIES FOR E VADING C ONVEX -I NDUCING C LASSIFIERS Proof of Theorem 11 Suppose a query-based algorithm submits N membership queries x1 , . [sent-864, score-0.336]
99 Now − consider the projection of each of the positive queries onto the surface of the ℓ2 ball B C0 (A2 ), i − given by the points zi = C0 A (xx−xA ) . [sent-883, score-0.3]
100 On the power of membership queries in agnostic learning. [sent-934, score-0.313]
wordName wordTfidf (topN-words)
[('xf', 0.495), ('xa', 0.374), ('queries', 0.242), ('earch', 0.225), ('evasion', 0.186), ('ct', 0.184), ('adversary', 0.164), ('mac', 0.161), ('ulti', 0.133), ('query', 0.119), ('elson', 0.115), ('oseph', 0.115), ('ygar', 0.115), ('nducing', 0.109), ('trategies', 0.109), ('uery', 0.109), ('vading', 0.109), ('ine', 0.106), ('rubinstein', 0.106), ('cost', 0.103), ('lowd', 0.103), ('ao', 0.098), ('lassifiers', 0.093), ('costs', 0.085), ('onvex', 0.082), ('directions', 0.08), ('uang', 0.076), ('search', 0.075), ('xg', 0.075), ('membership', 0.071), ('convex', 0.07), ('ers', 0.067), ('meek', 0.065), ('bertsimas', 0.063), ('optimality', 0.06), ('classi', 0.06), ('ball', 0.058), ('multiplicative', 0.056), ('hyperplane', 0.055), ('ee', 0.055), ('gt', 0.054), ('orthant', 0.053), ('ellipsoid', 0.053), ('er', 0.052), ('cd', 0.051), ('reverse', 0.048), ('separating', 0.046), ('ntersect', 0.046), ('vempala', 0.045), ('orthants', 0.044), ('balls', 0.044), ('intrusion', 0.042), ('caps', 0.04), ('vertex', 0.04), ('hull', 0.039), ('bound', 0.038), ('hypersphere', 0.038), ('family', 0.037), ('nelson', 0.037), ('vertices', 0.036), ('covering', 0.035), ('blaine', 0.034), ('orth', 0.034), ('piral', 0.034), ('halfspace', 0.034), ('fe', 0.034), ('evading', 0.034), ('spam', 0.032), ('gap', 0.032), ('querying', 0.031), ('hz', 0.031), ('th', 0.03), ('spherical', 0.03), ('intersection', 0.03), ('hypercube', 0.029), ('adversarial', 0.029), ('acre', 0.029), ('multiline', 0.029), ('additive', 0.028), ('negative', 0.028), ('body', 0.028), ('rounding', 0.027), ('centroid', 0.027), ('responds', 0.027), ('xd', 0.027), ('randomized', 0.026), ('hamming', 0.026), ('responses', 0.026), ('walk', 0.025), ('cap', 0.025), ('prune', 0.025), ('direction', 0.024), ('ap', 0.024), ('sin', 0.023), ('bounds', 0.023), ('edi', 0.023), ('searchability', 0.023), ('searchable', 0.023), ('submits', 0.023), ('wyner', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000023 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
2 0.32717669 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO
Author: Ji Liu, Peter Wonka, Jieping Ye
Abstract: We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ∗ + ε where ε is the noise vector following a Gaussian distribution N(0, σ2 I), how to recover the signal (or parameter vector) β∗ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively refines the target signal β∗ . We show that if X obeys a certain condition, then with a large probability ˆ the difference between the solution β estimated by the proposed method and the true solution β∗ measured in terms of the ℓ p norm (p ≥ 1) is bounded as ˆ β − β∗ p ≤ C(s − N)1/p log m + ∆ σ, where C is a constant, s is the number of nonzero entries in β∗ , the risk of the oracle estimator ∆ is independent of m and is much smaller than the first term, and N is the number of entries of β∗ larger √ than a certain value in the order of O (σ log m). The proposed √ method improves the estimation √ bound of the standard Dantzig selector approximately from Cs1/p log mσ to C(s − N)1/p log mσ where the value N depends on the number of large entries in β∗ . When N = s, the proposed algorithm achieves the oracle solution with a high probability, where the oracle solution is the projection of the observation vector y onto true features. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. Finally, we extend this multi-stage procedure to the LASSO case. Keywords: multi-stage, Dantzig selector, LASSO, sparse signal recovery
3 0.11389235 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan
Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efficient, crowdsourcing
4 0.10595021 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
5 0.070249639 3 jmlr-2012-A Geometric Approach to Sample Compression
Author: Benjamin I.P. Rubinstein, J. Hyam Rubinstein
Abstract: The Sample Compression Conjecture of Littlestone & Warmuth has remained unsolved for a quarter century. While maximum classes (concept classes meeting Sauer’s Lemma with equality) can be compressed, the compression of general concept classes reduces to compressing maximal classes (classes that cannot be expanded without increasing VC dimension). Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes. Simple arrangements of hyperplanes in hyperbolic space are shown to represent maximum classes, generalizing the corresponding Euclidean result. We show that sweeping a generic hyperplane across such arrangements forms an unlabeled compression scheme of size VC dimension and corresponds to a special case of peeling the one-inclusion graph, resolving a recent conjecture of Kuzmin & Warmuth. A bijection between finite maximum classes and certain arrangements of piecewise-linear (PL) hyperplanes in either a ball or Euclidean space is established. Finally we show that d-maximum classes corresponding to PL-hyperplane arrangements in Rd have cubical complexes homeomorphic to a d-ball, or equivalently complexes that are manifolds with boundary. A main result is that PL arrangements can be swept by a moving hyperplane to unlabeled d-compress any finite maximum class, forming a peeling scheme as conjectured by Kuzmin & Warmuth. A corollary is that some d-maximal classes cannot be embedded into any maximum class of VC-dimension d + k, for any constant k. The construction of the PL sweeping involves Pachner moves on the one-inclusion graph, corresponding to moves of a hyperplane across the intersection of d other hyperplanes. This extends the well known Pachner moves for triangulations to c
6 0.063200802 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data
7 0.062978528 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity
8 0.062504634 12 jmlr-2012-Active Clustering of Biological Sequences
9 0.05916959 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
10 0.05657132 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
11 0.053083584 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
12 0.052035004 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package
13 0.050613903 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization
14 0.047816053 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
15 0.047211889 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
16 0.045877397 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs
17 0.04458984 80 jmlr-2012-On Ranking and Generalization Bounds
18 0.043197114 13 jmlr-2012-Active Learning via Perfect Selective Classification
19 0.043010384 84 jmlr-2012-Online Submodular Minimization
20 0.040640805 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
topicId topicWeight
[(0, -0.202), (1, -0.046), (2, -0.068), (3, -0.027), (4, -0.184), (5, 0.254), (6, 0.244), (7, -0.206), (8, 0.095), (9, 0.391), (10, -0.264), (11, -0.112), (12, -0.081), (13, 0.204), (14, -0.011), (15, 0.174), (16, -0.031), (17, -0.102), (18, 0.126), (19, 0.029), (20, 0.07), (21, -0.045), (22, -0.061), (23, 0.014), (24, 0.021), (25, 0.025), (26, -0.042), (27, -0.114), (28, 0.007), (29, 0.03), (30, -0.027), (31, 0.001), (32, -0.092), (33, 0.002), (34, -0.026), (35, -0.038), (36, -0.04), (37, 0.031), (38, -0.048), (39, 0.004), (40, 0.024), (41, -0.019), (42, -0.024), (43, 0.002), (44, -0.029), (45, -0.002), (46, -0.02), (47, 0.08), (48, 0.101), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.93741781 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
2 0.74476373 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO
Author: Ji Liu, Peter Wonka, Jieping Ye
Abstract: We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ∗ + ε where ε is the noise vector following a Gaussian distribution N(0, σ2 I), how to recover the signal (or parameter vector) β∗ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively refines the target signal β∗ . We show that if X obeys a certain condition, then with a large probability ˆ the difference between the solution β estimated by the proposed method and the true solution β∗ measured in terms of the ℓ p norm (p ≥ 1) is bounded as ˆ β − β∗ p ≤ C(s − N)1/p log m + ∆ σ, where C is a constant, s is the number of nonzero entries in β∗ , the risk of the oracle estimator ∆ is independent of m and is much smaller than the first term, and N is the number of entries of β∗ larger √ than a certain value in the order of O (σ log m). The proposed √ method improves the estimation √ bound of the standard Dantzig selector approximately from Cs1/p log mσ to C(s − N)1/p log mσ where the value N depends on the number of large entries in β∗ . When N = s, the proposed algorithm achieves the oracle solution with a high probability, where the oracle solution is the projection of the observation vector y onto true features. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. Finally, we extend this multi-stage procedure to the LASSO case. Keywords: multi-stage, Dantzig selector, LASSO, sparse signal recovery
3 0.39501622 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
4 0.31632969 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan
Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efficient, crowdsourcing
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them given pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(ε−6 n log5 n) preference labels for a regret of ε times the optimal loss. As a function of n, this is asymptotically better than standard (non-adaptive) learning bounds achievable for the same problem. Our main result takes us a step closer toward settling an open problem posed by learning-torank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we analyze a typical test case in which a large margin linear relaxation is used for efficiently solving the simpler learning problems in our decomposition. Keywords: statistical learning theory, active learning, ranking, pairwise ranking, preferences
6 0.23824987 12 jmlr-2012-Active Clustering of Biological Sequences
7 0.23174708 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization
8 0.23132172 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data
9 0.22100103 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
10 0.20737503 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs
11 0.20715357 3 jmlr-2012-A Geometric Approach to Sample Compression
12 0.20039637 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
13 0.18722898 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
14 0.18042271 13 jmlr-2012-Active Learning via Perfect Selective Classification
15 0.16545045 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class
16 0.15779835 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
17 0.15627724 91 jmlr-2012-Plug-in Approach to Active Learning
18 0.15489483 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
19 0.15222232 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
20 0.14938143 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
topicId topicWeight
[(21, 0.034), (26, 0.028), (27, 0.028), (29, 0.029), (34, 0.015), (35, 0.035), (49, 0.017), (56, 0.013), (64, 0.445), (69, 0.014), (75, 0.059), (77, 0.011), (79, 0.033), (92, 0.066), (96, 0.078)]
simIndex simValue paperId paperTitle
1 0.93893933 107 jmlr-2012-Smoothing Multivariate Performance Measures
Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan
Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing
same-paper 2 0.76963419 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
3 0.69891298 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
Author: Stephen Gould
Abstract: We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data flows. Keywords: machine learning, graphical models, computer vision, open-source software
4 0.68561977 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
5 0.38764304 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
6 0.32535559 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
7 0.32416639 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
8 0.32415378 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
9 0.31925505 84 jmlr-2012-Online Submodular Minimization
10 0.31777763 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
11 0.31697592 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
12 0.3147296 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
13 0.31253424 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
14 0.31000999 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
15 0.30980012 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
16 0.30803329 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
17 0.30577073 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss
18 0.30502644 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
20 0.30012792 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis