jmlr jmlr2012 jmlr2012-104 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marius Kloft, Pavel Laskov
Abstract: Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution,1 we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms. Keywords: anomaly detection, adversarial, security analysis, support vector data description, computer security, network intrusion detection
Reference: text
sentIndex sentText sentNum sentScore
1 Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. [sent-12, score-1.632]
2 We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. [sent-13, score-1.472]
3 Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. [sent-14, score-1.012]
4 One particular kind of attack is the so-called “poisoning” in which specially crafted data points are injected to cause the decision function to misclassify a given malicious point as benign. [sent-85, score-0.896]
5 This attack makes sense when an attacker does not have “write” permission to the training data, hence cannot manipulate it directly. [sent-86, score-1.03]
6 The poisoning attack against online centroid anomaly detection has been considered by Nelson and Joseph (2006) for the case of an infinite window, that is, when a learning algorithm memorizes all data seen so far. [sent-88, score-1.433]
7 Their main result was surprisingly optimistic: it was shown that the number of attack data points that must be injected grows exponentially as a function of the impact over a learned hypothesis. [sent-89, score-0.871]
8 As a further contribution, we analyze the algorithm under two additional constraints: (a) the fraction of the traffic controlled by an attacker is bounded by ν, and (b) the false positive rate induced by an attack is bounded by α. [sent-94, score-1.08]
9 Such an analysis may take different forms, for example calculation of the probability for an attack to succeed, estimation of the required number of attack iterations, calculation of the geometric impact of an attack (a shift towards an insecure state), etc. [sent-112, score-2.464]
10 (2006) is given in brackets): • whether an attack is staged during the training (causative) or the deployment of an algorithm (exploratory), or • whether an attack attempts to increase the false negative or the false positive rate at the deployment stage (integrity/availability). [sent-126, score-1.787]
11 The poisoning attack addressed in our work can be classified as a causative integrity attack. [sent-127, score-0.998]
12 Other common attack types are the mimicry attack—alteration of malicious data to resemble innocuous data (an exploratory integrity attack) and the “red herring” attack—sending junk data that causes false alarms (an exploratory availability attack). [sent-129, score-1.014]
13 3 Poisoning Attack The goal of a poisoning attack is to force an algorithm, at some learning iteration i, to accept the attack point A that lies outside of the normality ball, that is, ||A − ci || > r. [sent-200, score-1.934]
14 As illustrated in Figure 2, the poisoning attack amounts to injecting specially crafted points that are accepted as innocuous but shift the center of mass in the direction of the attack point until the latter appears innocuous as well. [sent-204, score-2.104]
15 Intuitively, one can expect that the optimal one-step displacement of the center of mass is achieved by placing attack point xi along the line connecting c and A such that ||xi − c|| = r. [sent-206, score-1.067]
16 3688 S ECURITY A NALYSIS OF O NLINE C ENTROID A NOMALY D ETECTION A−c Definition 1 (Relative displacement) Let A be an attack point and a = ||A−c0 || be the attack di0 rection unit vector. [sent-209, score-1.632]
17 r The relative displacement measures the length of the projection of accrued change to ci onto the attack direction a in terms of the radius of the normality ball. [sent-211, score-1.116]
18 Definition 2 An attack strategy that maximizes the displacement Di in each iteration i is referred to as greedy-optimal. [sent-216, score-0.985]
19 Attack Effectiveness for Infinite Horizon Centroid Learner The effectiveness of a poisoning attack for the infinite horizon learner has been analyzed in Nelson and Joseph (2006). [sent-218, score-1.115]
20 Theorem 3 The i-th relative displacement Di of the online centroid learner with an infinite horizon under a poisoning attack is bounded by Di ≤ ln 1 + i n , (3) where i is the number of attack points and n is the number of initial training points. [sent-221, score-2.314]
21 Proof We first determine the greedy-optimal attack strategy and then bound the attack progress. [sent-222, score-1.645]
22 (a) Let A be an attack point and denote by a the corresponding attack direction vector. [sent-223, score-1.632]
23 Thus the greedy-optimal attack is given by xi = ci + ra . [sent-230, score-0.971]
24 4 In other words, an attacker’s effort grows prohibitively fast with respect to the separability of the attack from innocuous data. [sent-236, score-0.918]
25 For a kernelized centroid learner, the greedy-optimal attack may not be valid, as there may not exist a point in the input space corresponding to the optimal attack image in the feature space. [sent-237, score-1.81]
26 However, an attacker can construct points in the input space that are close enough to the greedy-optimal point for the attack to succeed, with a moderate constant cost factor; cf. [sent-238, score-1.025]
27 The analysis can be carried out theoretically for the average-out and random-out update rules; for the nearest-out rule, an optimal attack can be stated as an optimization problem and the attack effectiveness can be analyzed empirically. [sent-247, score-1.661]
28 Theorem 4 The i-th relative displacement Di of the online centroid learner with the average-out update rule under a worst-case optimal poisoning attack is i Di = , n where i is the number of attack points and n is the training window size. [sent-255, score-2.302]
29 By explicitly writing out the recurrence between subsequent displacements, we conclude that the greedy-optimal attack is also attained by placing an attack point along the line connecting ci and A at the edge of the sphere (cf. [sent-257, score-1.766]
30 It follows that the relative displacement under the greedy-optimal attack is 1 Di+1 = Di + . [sent-259, score-0.972]
31 The oldest-out rule can also be handled similarly to the average-out rule by observing that in both cases some fixed point known in advance is removed from a working set, which allows an attacker to easily find an optimal attack point. [sent-266, score-1.011]
32 1 G REEDY- OPTIMAL ATTACK The greedy-optimal attack should provide a maximal gain for an attacker in a single iteration. [sent-273, score-1.023]
33 For the infinite-horizon learner (and hence also for the average-out learner, as it uses the same recurrence in a proof), it is possible to show that the greedy-optimal attack yields the maximum gain for the entire sequence of attack iterations; that is, it is (globally) optimal. [sent-274, score-1.707]
34 An optimal attack point is placed at the “corner” of a Voronoi cell (including possibly a round boundary of the centroid) to cause the largest displacement of the centroid along the attack direction. [sent-295, score-1.988]
35 Once the candidate attack locations are found for each of the n Voronoi cells, the one that has the highest value of the objective function y j (x∗ ) is injected and the respective center x j∗ of the j Voronoi cell is expunged from the training set: j∗ = argmax j∈1,. [sent-296, score-0.913]
36 An attack direction a, a = 1, is chosen randomly, and 500 attack iterations (5 ∗ n) are generated using the procedure presented in Sections 4. [sent-345, score-1.632]
37 The relative displacement of the center in the direction of the attack is measured at each iteration. [sent-350, score-1.003]
38 Figure 4(a) shows the observed progress of the greedy-optimal attack against the nearest-out learner and compares it to the behavior of the theoretical bounds for the infinite-horizon learner (the bound of Nelson and Joseph, 2006) and the average-out learner. [sent-352, score-0.993]
39 The attack effectiveness is measured for all three cases by the relative displacement as a function of the number of iterations. [sent-353, score-0.988]
40 First, the attack progress, that is, the functional dependence of the relative displacement of the greedy-optimal attack against the nearest-out learner with respect to the number of iterations, is linear. [sent-356, score-1.863]
41 Second, the slope of the linear attack progress increases with the dimensionality of the problem. [sent-358, score-0.877]
42 A further illustration of the behavior of the greedy-optimal attack is given in Figure 4(b), showing the dependence of the average attack slope on the dimensionality. [sent-364, score-1.644]
43 One can see that the attack slope increases logarithmically with the dimensionality and wanes out to a constant factor after the dimensionality exceeds the number of training data points. [sent-365, score-0.891]
44 3 Concluding Remarks To summarize our analysis for the case of the attacker’s full control over the training data, we conclude that an optimal poisoning attack successfully subverts a finite-horizon online centroid learner for all outgoing point selection rules. [sent-370, score-1.302]
45 2 4 5 1 2 10 i/n 10 dimensionality (a) (b) Figure 4: Effectiveness of the poisoning attack for the nearest-out rule as a function of input space dimensionality. [sent-378, score-1.02]
46 (a) The displacement of the centroid along the attack direction grows linearly with the number of injected points. [sent-379, score-1.175]
47 The key factor for the success of a poisoning attack in the nearest-out case lies in high dimensionality of the feature space. [sent-385, score-1.02]
48 The progress of an optimal poisoning attack depends on the size of the Voronoi cells induced by the training data points. [sent-386, score-1.056]
49 With the increasing dimensionality of the feature space, the volume of the sphere increases exponentially, which leads to a higher attack progress rate. [sent-390, score-0.879]
50 The mixing of innocuous and attack points is modeled by a Bernoulli random variable with the parameter ν which denotes the probability that an adversarial point is presented to the learner. [sent-406, score-0.99]
51 Adversarial points Ai are chosen according to the attack function f depending on the actual state of the learner ci . [sent-407, score-1.025]
52 For simplicity, we make one additional assumption in this chapter: all innocuous points are accepted by the learner at any time of the attack independent of their actual distance to the center of mass. [sent-410, score-1.05]
53 The attack strategy is a function that maps a vector (the center) to an attack location. [sent-433, score-1.645]
54 This raises the question of which attack strategies are optimal in the sense that an attacker reaches his goal of concealing a predefined attack direction vector in a minimal number of iterations. [sent-435, score-1.827]
55 As in the previous sections, an attack’s progress is measured by projecting the current center of mass onto the attack direction vector: Di = ci · a . [sent-436, score-1.023]
56 Attack strategies maximizing the displacement Di in each iteration i are referred to as greedyoptimal attack strategies. [sent-437, score-0.972]
57 Then the greedy-optimal attack strategy f against the online centroid learner is given by f (ci ) := ci + a . [sent-441, score-1.234]
58 Note that the displacement measures the projection of the change of the centroid onto the attack direction vector. [sent-446, score-1.15]
59 Theorem 8 For the displacement Di of the centroid learner under an optimal poisoning attack, (a) (b) E(Di ) = (1 − ai ) Var(Di ) ≤ γi i where ai := 1 − 1−ν , bi = 1 − 1−ν 2 − 1 n n n i ν 1−ν ν 1−ν 2 + δn , , γi = ai − bi , and δn := ν2 +(1−bi ) . [sent-450, score-0.881]
60 (2n−1)(1−ν)2 Proof (a) Inserting the greedy-optimal attack strategy of Equation (16) into Equation (15) of Axiom 6, we have: 1 ci+1 = ci + (Bi (ci + a) + (1 − Bi )xi − ci ) , n which can be rewritten as: ci+1 = 1 − 1 − Bi n ci + 3697 (1 − Bi ) Bi a+ xi . [sent-451, score-1.224]
61 5 ν=5% 2 i/n Figure 5: Theoretical behavior of the displacement of a centroid under a poisoning attack for a bounded fraction of traffic under attacker’s control. [sent-454, score-1.346]
62 01 0 0 1 2 3 4 5 i/n Figure 6: Comparison of the empirical displacement of the centroid under a poisoning attack with attacker’s limited control (ν = 0. [sent-475, score-1.332]
63 Figure 6 shows a typical displacement curve over the first 500, 000 attack iterations. [sent-480, score-0.972]
64 If the latter is exceeded, we assume the adversary’s attack to have failed and a safe state of the learner to be loaded. [sent-495, score-0.891]
65 Optimal attack strategies are characterized in terms of the displacement as in the previous sections. [sent-510, score-0.972]
66 The intuition behind the symmetry assumption in Axiom 10 is that it ensures that resetting the centroid’s center to zero (initiated by the false positive protection) does not lead to a positive shift of the centroid toward the attack direction. [sent-512, score-1.08]
67 Proposition 11 Let a be an attack direction vector and consider the centroid learner with maximal false positive rate α as defined in Axiom 10. [sent-516, score-1.136]
68 Then the greedy-optimal attack strategy f is given by f (ci ) := ci + a . [sent-517, score-0.949]
69 01 0 0 average-out for various FP levels α 1 2 3 4 5 i/n Figure 7: Theoretical behavior of the displacement of a centroid under a poisoning attack for different levels of false positive protection α. [sent-556, score-1.421]
70 Note that, in practice, the attacker can only construct attack points in the input space and not directly in the feature space. [sent-586, score-1.025]
71 This is admissible for security analysis; it is the underestimation of the attack capability that would have been problematic. [sent-592, score-0.919]
72 Conventional defenses against such malicious software rest on abuse detection; that is, identifying attacks using known patterns of abuse, so-called attack signatures. [sent-608, score-0.926]
73 The time span required for crafting a signature from a newly discovered attack is insufficient for protecting against rapidly propagating malicious code (e. [sent-610, score-0.872]
74 We refer to these embedded attacks as the attack points; that is, the points in the feature space that the adversary would like to have declared as non-anomalous. [sent-666, score-0.914]
75 This implies that, although the effective dimensionality of the HTTP traffic is significantly smaller than the number of training data points, it still remains sufficiently high so that the attack progress rate approaches 1, which is similar to the simple average-out learner. [sent-727, score-0.884]
76 5 Geometrical Constraints of HTTP Data Several technical difficulties arising from data geometry have to be overcome in launching a poisoning attack in practice. [sent-729, score-0.998]
77 Then we draw a random instance from each of the 20 attack classes and for each of these 20 attack instances generate a poisoning attack as described in Section 8. [sent-765, score-2.63]
78 An attack succeeds when the attack point is accepted as innocuous by the learning algorithm. [sent-767, score-1.746]
79 For each attack instance, the number of iterations needed for an attack to succeed and the respective displacement of the center of mass is recorded. [sent-768, score-1.848]
80 Figure 9 shows, for each attack instance, the behavior of the relative displacement at the point of success as a function of the number of iterations. [sent-769, score-0.972]
81 For example, we use a k-gram spectrum kernel, so each poisoning attack point is restricted to have unit norm in feature space. [sent-794, score-0.998]
82 The practicality of the poisoning attack is further emphasized by the small number of iterations needed for an attack to succeed: it suffices to overwrite between 2 and 35 percent of the initial number of points in the training data to subvert the nearest-out learner. [sent-795, score-1.869]
83 7 Critical Traffic Ratios of HTTP Attacks For the case of the attacker’s limited control over the data, the success of a poisoning attack largely depends on attacker’s constraints, as shown in the analysis in Sections 5 and 6. [sent-797, score-0.998]
84 Theorem 8 and Figure 5) shows that the displacement of a poisoning attack is bounded from above by a constant depending on the traffic ratio ν controlled by an attacker. [sent-801, score-1.154]
85 Hence the susceptibility of the learner to a particular attack depends on the value of this constant. [sent-802, score-0.891]
86 If an attacker does not control a sufficiently large traffic portion and the potential displacement is bounded by a constant smaller than the distance from the initial center of mass to 3708 S ECURITY A NALYSIS OF O NLINE C ENTROID A NOMALY D ETECTION the attack point, then the attack fails. [sent-803, score-2.043]
87 To illustrate this observation, we compute critical traffic rates needed for the success of attacks from each of the 20 attack classes in our malicious pool. [sent-804, score-0.926]
88 Theorem 8 and Figure 5) shows that the displacement of a poisoning attack is bounded from above by a number, depending on the traffic ratio ν and the maximal false positive rate α. [sent-917, score-1.221]
89 A false positive threshold that is too low would lead to a quasi-permanent shutdown of online updates; a high tolerance to false positives would allow an attack to slip in unnoticed. [sent-923, score-0.958]
90 We then proceed by presenting normal data (drawn with replacement from the online training set) mixed with poisoning attack points (using the IIS 5. [sent-935, score-1.063]
91 0 WebDAV exploit as target) and measuring the false positive rate for each attack iteration. [sent-936, score-0.884]
92 02 0 0 1 2 3 4 5 6 7 8 9 10 i/n Figure 11: Simulation of a poisoning attack against IIS 5. [sent-956, score-0.998]
93 In Figure 10 the maximal observed false positive rate is shown for various values of ν, where the maximum is taken over all attack iterations and 10 runs. [sent-958, score-0.883]
94 0 exploit) and run a poisoning attack against the average-out centroid learner for various values of ν ∈ [0. [sent-969, score-1.251]
95 In another experiment, we therefore consider the ALT-N WebAdmin Overflow; that is, the most optimistic scenario for the attacker and the closest attack to the centroid. [sent-984, score-1.011]
96 045 0 0 2 4 6 8 10 i/n Figure 12: Simulation of a poisoning attack against ALT-N WebAdmin Overflow under limited control. [sent-1005, score-0.998]
97 It differs from the scenario considered in our work in that the attack takes place after the model has been trained, whereas the poisoning attack affects the model in the course of training. [sent-1035, score-1.814]
98 However, once the attacker has knowledge about individual micro-models, a poisoning attack similar the one considered in our work can be constructed. [sent-1045, score-1.193]
99 Clearly, the greedy attack will result in a progress of r/n (we will move one of the points by r but the center’s displacement will be discounted by 1/n). [sent-1109, score-1.013]
100 Lemma 18 Let C be a online centroid learner with maximal false positive rate α satisfying the optimal attack strategy. [sent-1138, score-1.168]
wordName wordTfidf (topN-words)
[('attack', 0.816), ('attacker', 0.195), ('poisoning', 0.182), ('centroid', 0.178), ('displacement', 0.156), ('anomaly', 0.141), ('ci', 0.12), ('security', 0.103), ('di', 0.103), ('innocuous', 0.102), ('bi', 0.097), ('axiom', 0.091), ('detection', 0.084), ('askov', 0.081), ('ecurity', 0.077), ('entroid', 0.077), ('nomaly', 0.077), ('learner', 0.075), ('loft', 0.069), ('attacks', 0.069), ('traf', 0.06), ('etection', 0.059), ('adversarial', 0.058), ('voronoi', 0.057), ('false', 0.055), ('intrusion', 0.054), ('rieck', 0.047), ('laskov', 0.044), ('nline', 0.044), ('malicious', 0.041), ('nalysis', 0.04), ('nelson', 0.036), ('xi', 0.035), ('protection', 0.034), ('fogla', 0.033), ('ai', 0.032), ('online', 0.032), ('iis', 0.031), ('center', 0.031), ('barreno', 0.029), ('mass', 0.029), ('progress', 0.027), ('horizon', 0.026), ('byte', 0.025), ('joseph', 0.025), ('injected', 0.025), ('radius', 0.024), ('dimensionality', 0.022), ('adances', 0.022), ('subvert', 0.022), ('exi', 0.022), ('cell', 0.022), ('malware', 0.02), ('hypersphere', 0.019), ('raid', 0.019), ('stolfo', 0.019), ('training', 0.019), ('webdav', 0.018), ('privacy', 0.017), ('spam', 0.017), ('effectiveness', 0.016), ('dalvi', 0.016), ('impact', 0.016), ('kernel', 0.015), ('hofmeyr', 0.015), ('qsi', 0.015), ('xold', 0.015), ('signature', 0.015), ('adversary', 0.015), ('network', 0.015), ('inserting', 0.015), ('sphere', 0.014), ('deployed', 0.014), ('anomalous', 0.014), ('window', 0.014), ('points', 0.014), ('fraction', 0.014), ('requests', 0.014), ('formula', 0.014), ('constraints', 0.014), ('strategy', 0.013), ('sch', 0.013), ('xk', 0.013), ('carried', 0.013), ('exploit', 0.013), ('deployment', 0.013), ('forrest', 0.013), ('intrusions', 0.013), ('polymorphic', 0.013), ('crit', 0.013), ('qclp', 0.013), ('lowd', 0.013), ('slope', 0.012), ('maximal', 0.012), ('accepted', 0.012), ('cells', 0.012), ('pool', 0.011), ('dekel', 0.011), ('wang', 0.011), ('operational', 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
Author: Marius Kloft, Pavel Laskov
Abstract: Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution,1 we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms. Keywords: anomaly detection, adversarial, security analysis, support vector data description, computer security, network intrusion detection
2 0.053083584 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
3 0.048640385 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
Author: Gil Tahan, Lior Rokach, Yuval Shahar
Abstract: This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files. Keywords: computer security, malware detection, common segment analysis, supervised learning
4 0.040491614 12 jmlr-2012-Active Clustering of Biological Sequences
Author: Konstantin Voevodski, Maria-Florina Balcan, Heiko Röglin, Shang-Hua Teng, Yu Xia
Abstract: Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s ∈ S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our procedure to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire data set. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification. Keywords: clustering, active clustering, k-median, approximation algorithms, approximation stability, clustering accuracy, protein sequences ∗. A preliminary version of this article appeared under the title Efficient Clustering with Limited Distance Information in the Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Corvallis, Oregon, 632-641. †. Most of this work was completed at Boston University. c 2012 Konstantin Voevodski, Maria-Florina Balcan, Heiko R¨ glin, Shang-Hua Teng and Yu Xia. o ¨ VOEVODSKI , BALCAN , R OGLIN , T ENG AND X IA
5 0.035479993 100 jmlr-2012-Robust Kernel Density Estimation
Author: JooSeuk Kim, Clayton D. Scott
Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, influence function, M-estimation
6 0.031612266 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
7 0.028520448 102 jmlr-2012-Sally: A Tool for Embedding Strings in Vector Spaces
8 0.02822542 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
9 0.027955553 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning
10 0.023848232 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
11 0.023629041 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
12 0.022550436 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression
13 0.022017663 44 jmlr-2012-Feature Selection via Dependence Maximization
14 0.020876072 13 jmlr-2012-Active Learning via Perfect Selective Classification
15 0.01990504 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
16 0.019852933 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
17 0.019519718 97 jmlr-2012-Regularization Techniques for Learning with Matrices
18 0.019481288 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
19 0.018858643 82 jmlr-2012-On the Necessity of Irrelevant Variables
20 0.018823741 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
topicId topicWeight
[(0, -0.097), (1, -0.002), (2, 0.035), (3, 0.022), (4, -0.004), (5, 0.013), (6, 0.033), (7, -0.002), (8, 0.031), (9, 0.068), (10, -0.049), (11, -0.052), (12, -0.046), (13, -0.01), (14, -0.01), (15, 0.114), (16, 0.053), (17, -0.138), (18, -0.006), (19, -0.003), (20, 0.035), (21, 0.053), (22, -0.036), (23, -0.031), (24, 0.106), (25, -0.115), (26, -0.088), (27, 0.013), (28, -0.004), (29, -0.038), (30, 0.34), (31, -0.207), (32, 0.024), (33, -0.085), (34, 0.229), (35, 0.055), (36, 0.063), (37, 0.012), (38, 0.111), (39, 0.029), (40, -0.067), (41, 0.049), (42, 0.049), (43, -0.166), (44, -0.07), (45, 0.263), (46, 0.018), (47, -0.062), (48, 0.149), (49, 0.154)]
simIndex simValue paperId paperTitle
same-paper 1 0.95096254 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
Author: Marius Kloft, Pavel Laskov
Abstract: Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution,1 we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms. Keywords: anomaly detection, adversarial, security analysis, support vector data description, computer security, network intrusion detection
2 0.52477199 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
Author: Gil Tahan, Lior Rokach, Yuval Shahar
Abstract: This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files. Keywords: computer security, malware detection, common segment analysis, supervised learning
3 0.32292682 12 jmlr-2012-Active Clustering of Biological Sequences
Author: Konstantin Voevodski, Maria-Florina Balcan, Heiko Röglin, Shang-Hua Teng, Yu Xia
Abstract: Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s ∈ S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our procedure to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire data set. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification. Keywords: clustering, active clustering, k-median, approximation algorithms, approximation stability, clustering accuracy, protein sequences ∗. A preliminary version of this article appeared under the title Efficient Clustering with Limited Distance Information in the Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Corvallis, Oregon, 632-641. †. Most of this work was completed at Boston University. c 2012 Konstantin Voevodski, Maria-Florina Balcan, Heiko R¨ glin, Shang-Hua Teng and Yu Xia. o ¨ VOEVODSKI , BALCAN , R OGLIN , T ENG AND X IA
4 0.29298565 102 jmlr-2012-Sally: A Tool for Embedding Strings in Vector Spaces
Author: Konrad Rieck, Christian Wressnegger, Alexander Bikadorov
Abstract: Strings and sequences are ubiquitous in many areas of data analysis. However, only few learning methods can be directly applied to this form of data. We present Sally, a tool for embedding strings in vector spaces that allows for applying a wide range of learning methods to string data. Sally implements a generalized form of the bag-of-words model, where strings are mapped to a vector space that is spanned by a set of string features, such as words or n-grams of words. The implementation of Sally builds on efficient string algorithms and enables processing millions of strings and features. The tool supports several data formats and is capable of interfacing with common learning environments, such as Weka, Shogun, Matlab, or Pylab. Sally has been successfully applied for learning with natural language text, DNA sequences and monitored program behavior. Keywords: string embedding, bag-of-words models, learning with sequential data
5 0.26957649 3 jmlr-2012-A Geometric Approach to Sample Compression
Author: Benjamin I.P. Rubinstein, J. Hyam Rubinstein
Abstract: The Sample Compression Conjecture of Littlestone & Warmuth has remained unsolved for a quarter century. While maximum classes (concept classes meeting Sauer’s Lemma with equality) can be compressed, the compression of general concept classes reduces to compressing maximal classes (classes that cannot be expanded without increasing VC dimension). Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes. Simple arrangements of hyperplanes in hyperbolic space are shown to represent maximum classes, generalizing the corresponding Euclidean result. We show that sweeping a generic hyperplane across such arrangements forms an unlabeled compression scheme of size VC dimension and corresponds to a special case of peeling the one-inclusion graph, resolving a recent conjecture of Kuzmin & Warmuth. A bijection between finite maximum classes and certain arrangements of piecewise-linear (PL) hyperplanes in either a ball or Euclidean space is established. Finally we show that d-maximum classes corresponding to PL-hyperplane arrangements in Rd have cubical complexes homeomorphic to a d-ball, or equivalently complexes that are manifolds with boundary. A main result is that PL arrangements can be swept by a moving hyperplane to unlabeled d-compress any finite maximum class, forming a peeling scheme as conjectured by Kuzmin & Warmuth. A corollary is that some d-maximal classes cannot be embedded into any maximum class of VC-dimension d + k, for any constant k. The construction of the PL sweeping involves Pachner moves on the one-inclusion graph, corresponding to moves of a hyperplane across the intersection of d other hyperplanes. This extends the well known Pachner moves for triangulations to c
6 0.25757119 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
7 0.25205618 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
8 0.23382182 100 jmlr-2012-Robust Kernel Density Estimation
9 0.20547056 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
10 0.1982616 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
11 0.17774013 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization
12 0.16627504 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
13 0.15625386 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
14 0.15477772 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss
15 0.14530247 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
16 0.14073779 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs
17 0.13977885 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification
18 0.13596708 109 jmlr-2012-Stability of Density-Based Clustering
19 0.13584162 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning
20 0.13144076 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
topicId topicWeight
[(7, 0.011), (21, 0.029), (26, 0.043), (27, 0.012), (29, 0.044), (34, 0.346), (35, 0.041), (49, 0.027), (56, 0.018), (64, 0.023), (69, 0.016), (75, 0.072), (77, 0.018), (79, 0.014), (81, 0.04), (89, 0.016), (92, 0.07), (96, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.72824091 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
Author: Marius Kloft, Pavel Laskov
Abstract: Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution,1 we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms. Keywords: anomaly detection, adversarial, security analysis, support vector data description, computer security, network intrusion detection
2 0.36145836 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information
Author: Jasper Snoek, Ryan P. Adams, Hugo Larochelle
Abstract: While unsupervised learning has long been useful for density modeling, exploratory data analysis and visualization, it has become increasingly important for discovering features that will later be used for discriminative tasks. Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. One particularly effective way to perform such unsupervised learning has been to use autoencoder neural networks, which find latent representations that are constrained but nevertheless informative for reconstruction. However, pure unsupervised learning with autoencoders can find representations that may or may not be useful for the ultimate discriminative task. It is a continuing challenge to guide the training of an autoencoder so that it finds features which will be useful for predicting labels. Similarly, we often have a priori information regarding what statistical variation will be irrelevant to the ultimate discriminative task, and we would like to be able to use this for guidance as well. Although a typical strategy would be to include a parametric discriminative model as part of the autoencoder training, here we propose a nonparametric approach that uses a Gaussian process to guide the representation. By using a nonparametric model, we can ensure that a useful discriminative function exists for a given set of features, without explicitly instantiating it. We demonstrate the superiority of this guidance mechanism on four data sets, including a real-world application to rehabilitation research. We also show how our proposed approach can learn to explicitly ignore statistically significant covariate information that is label-irrelevant, by evaluating on the small NORB image recognition problem in which pose and lighting labels are available. Keywords: autoencoder, gaussian process, gaussian process latent variable model, representation learning, unsupervised learning
Author: Neil D. Lawrence
Abstract: We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.
4 0.35691246 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan
Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efficient, crowdsourcing
5 0.35626447 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning
6 0.35401058 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
7 0.35174397 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
8 0.34998554 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
9 0.3493554 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
10 0.34797311 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
11 0.34729478 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
12 0.34644258 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
13 0.34622523 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
14 0.34599775 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
15 0.34564674 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
16 0.34417987 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning
17 0.34253865 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
18 0.34176546 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach
19 0.34130543 44 jmlr-2012-Feature Selection via Dependence Maximization
20 0.34101805 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity