jmlr jmlr2006 jmlr2006-97 knowledge-graph by maker-knowledge-mining

97 jmlr-2006- (Special Topic on Machine Learning for Computer Security)


Source: pdf

Author: Charles V. Wright, Fabian Monrose, Gerald M. Masson

Abstract: Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of traffic as it traverses the network. However, with the increasing popularity of cryptographic protocols, the traditional means of inspecting packet contents to enforce security policies is no longer a viable approach as message contents are concealed by encryption. In this paper, we investigate the extent to which common application protocols can be identified using only the features that remain intact after encryption—namely packet size, timing, and direction. We first present what we believe to be the first exploratory look at protocol identification in encrypted tunnels which carry traffic from many TCP connections simultaneously, using only post-encryption observable features. We then explore the problem of protocol identification in individual encrypted TCP connections, using much less data than in other recent approaches. The results of our evaluation show that our classifiers achieve accuracy greater than 90% for several protocols in aggregate traffic, and, for most protocols, greater than 80% when making fine-grained classifications on single connections. Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%. Keywords: traffic classification, hidden Markov models, network security

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we investigate the extent to which common application protocols can be identified using only the features that remain intact after encryption—namely packet size, timing, and direction. [sent-8, score-0.505]

2 We first present what we believe to be the first exploratory look at protocol identification in encrypted tunnels which carry traffic from many TCP connections simultaneously, using only post-encryption observable features. [sent-9, score-0.866]

3 We then explore the problem of protocol identification in individual encrypted TCP connections, using much less data than in other recent approaches. [sent-10, score-0.547]

4 Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%. [sent-12, score-0.49]

5 Even more problematic for such traffic characterization techniques is the fact that with the increased use of cryptographic protocols such as SSL (Rescorla, 2000) and SSH (Ylonen, 1996), fewer and fewer packets in legitimate traffic become available for inspection. [sent-27, score-0.572]

6 Here we investigate the extent to which common Internet application protocols remain distinguishable even when packet payloads and TCP headers have been stripped away, leaving only extremely lean data which includes nothing more than the packets’ timing, size, and direction. [sent-40, score-0.526]

7 We begin our analysis in §3 by exploring protocol recognition techniques for traffic aggregates where all flows carry the same application protocol. [sent-41, score-0.466]

8 Since we do not have access to packet payloads in these traces, we do not attempt to determine the “ground truth” of which connections truly belong to which protocols. [sent-54, score-0.481]

9 We encode the packet’s direction in the sign bit of the packet’s size, so that packets sent from server to client have size less than zero and those from client to server have size greater than zero. [sent-58, score-0.58]

10 Traffic Classification in Aggregate Encrypted Traffic Here we investigate the problem of determining the application protocol in use in aggregate traffic composed of several TCP connections which all employ the same application protocol. [sent-64, score-0.74]

11 The techniques we develop here can be used to quickly and efficiently infer the nature of the application protocol used in aggregate traffic without demultiplexing or reassembling the individual flows from the aggregate. [sent-67, score-0.49]

12 To evaluate the techniques developed in this section, we assemble traffic aggregates for each protocol using several TCP connections extracted from the GMU data as described in §2. [sent-73, score-0.716]

13 For each 10-minute trace and each protocol, we select all connections for the given protocol in the given trace, and interleave their packets into a single unified stream, sorted in order of arrival on the link. [sent-74, score-0.988]

14 Currently, we group packets into four types; any packet is classified as either small (i. [sent-83, score-0.529]

15 In general, when we consider M different packet types, this splitting and counting procedure yields a vector-valued count of packets nt = nt1 , nt2 , . [sent-88, score-0.605]

16 We then assemble single-protocol aggregates from this day’s traces for each protocol in the study, yielding a list of vectors n1 , n2 , . [sent-100, score-0.528]

17 Larger values of s mean that each epoch includes packets from a greater number of connections, so it is not surprising that, as s increases, the mix of packets observed in a given epoch approaches the mix of packets the protocol tends to produce overall. [sent-113, score-1.45]

18 On the other hand, smaller values of s allow us to analyze shorter traces and should make it more difficult for an adversary to successfully masquerade one protocol as another. [sent-114, score-0.48]

19 Given a sequence of packets corresponding to a traffic aggregate, we begin by preprocessing it into a sequence of vectors of packet counts and normalizing each vector just as we did for each of the aggregates in the training set. [sent-118, score-0.6]

20 This classifier is able to correctly recognize 100% of the aggregates for several of the protocols with many different values of k, leading us to believe that the vectors of packet counts observed for each of these protocols tend to cluster together into perhaps a few large groups. [sent-125, score-0.85]

21 The results in this section show that, by using the Kullback-Leibler distance to construct a kNearest Neighbor classifier for short slices of time, we can then build a classifier for longer traces which performs quite well on aggregate traffic where only a single application protocol is involved. [sent-127, score-0.587]

22 However, we may not always be able to assume that all flows in the aggregate carry the same appli2749 W RIGHT, M ONROSE AND M ASSON protocol HTTP HTTPS AIM SMTP-in SMTP-out FTP SSH Telnet 1-NN TD FD 100. [sent-128, score-0.49]

23 If we are concerned only with detecting instances of a given target protocol (or indeed, a set of target protocols), we simply label the vectors in the training set based on whether they contain an instance of the target protocol(s). [sent-198, score-0.442]

24 We flag the aggregate as an instance of the target protocol if and only if the percentage of the time slices for which the classifier returns True is above some threshold. [sent-200, score-0.49]

25 Figure 2 shows the detection rates for the k-Nearest Neighbor-based multi-flow protocol detectors for AIM, HTTP, FTP, and SMTP-in, with k = 7. [sent-202, score-0.614]

26 In each graph, the x-axis represents the threshold level, and the plots show the probability that the given detector, when set with a particular threshold, flags instances of each protocol in the study. [sent-203, score-0.442]

27 Overall, the multi-flow protocol detectors seem to perform quite well detecting broad classes of protocol behavior. [sent-204, score-0.926]

28 The FTP detector’s rates (d) show that, when observed in a multi-flow aggregate, the more interactive protocols exhibit very similar on-the-wire behaviors; after FTP itself, the FTP detector is most likely to flag instances of AIM, SSH, and Telnet. [sent-208, score-0.48]

29 Even if the connections use SSL or TLS to encrypt their packets, the administrator could perform more in-depth analysis to determine the application protocol used in each individual TCP connection. [sent-216, score-0.719]

30 Machine Learning Techniques for the Analysis of Single Flows We now relax the earlier assumption that all TCP connections in a given set carry the same application protocol, but retain the assumption that the individual TCP connections can be demultiplexed. [sent-219, score-0.5]

31 We present an approach based on building statistical models for the sequence of packets produced by each protocol of interest, and then use these models to identify the protocol in use in new TCP connections. [sent-221, score-1.134]

32 Identifying protocols in this setting is fairly difficult due to the fact that certain application protocols exhibit more than one typical behavior pattern (e. [sent-224, score-0.548]

33 , λn , that correspond to the protocols of interest (say AIM, SMTP, FTP), the goal is to pick the model that best describes the sequences of encrypted packets observed in the different connections. [sent-237, score-0.725]

34 Given a set of connections for training, we begin by constructing an initial model (see Figure 4) such that the length of the chain of states in the 2752 O N I NFERRING A PPLICATION P ROTOCOL B EHAVIORS model is equal to the average length (in packets) of the connections in the training set. [sent-247, score-0.5]

35 To allow for variations between the observed sequences of packets in connections of the same protocol, the model has two additional states for each position in the chain. [sent-260, score-0.572]

36 Transitions from the Delete state in each column to Insert state in the next column allow for a normal packet at the given position to be removed and replaced with a packet which does not fit the profile. [sent-263, score-0.462]

37 In our case, the addition of a second match state per position was intended to allow the model to better represent the correlation between successive packets in TCP connections (Wright et al. [sent-273, score-0.548]

38 Since TCP uses sliding windows and positive acknowledgments to achieve reliable data transfer, the direction of a packet is often closely correlated (either positively or negatively) to the direction of the previous packet in the connection. [sent-275, score-0.462]

39 Therefore, the Server Match state matches only packets observed traveling from the server to the client, and the Client Match state matches packets traveling in the opposite direction. [sent-276, score-0.72]

40 For example, a transition from a Client Match state to a Server Match state indicates that a typical packet (for the given protocol) was observed traveling from the client to the server, followed by a similarly typical packet on its way from the server to the client. [sent-277, score-0.624]

41 In practice, the Insert states represent duplicate packets and retransmissions, while the Delete states account for packets lost in the network or dropped by the detector. [sent-278, score-0.632]

42 Our first such classifier assigns protocol labels to sequences according to the principle of maximum likelihood. [sent-287, score-0.442]

43 For a given experiment, we select one day for use as a 2754 O N I NFERRING A PPLICATION P ROTOCOL B EHAVIORS protocol AIM SMTP-out SMTP-in HTTP HTTPS FTP SSH Telnet micro-level TD FD 80. [sent-305, score-0.472]

44 From this day’s traces, we randomly select approximately 400 connections 2 of each protocol and use these to build our profile HMMs. [sent-338, score-0.703]

45 Then, for each of the remaining 8 days, we randomly select approximately 400 connections for each protocol and use the model-based classifier to assign class labels to each of them. [sent-339, score-0.668]

46 As a result, we believe the detection rates presented here could be improved for a given network by including the relative frequencies of the protocols (i. [sent-342, score-0.44]

47 With the exception of the connections for FTP and SSH, the Viterbi classifier correctly identifies the protocol more than 73% of the time. [sent-353, score-0.668]

48 As such, we also report the true detection (TD) and false detection (FD) rates when we group protocols into the following equivalence classes: {[AIM], [HTTP, HTTPS], [SMTP − in, SMTP − out], [FTP], [SSH, Telnet]} where the latter class represents the grouping of the interactive protocols. [sent-358, score-0.521]

49 We find the Viterbi classifier to be slightly more accurate than the Maximum Likelihood classifier in almost every case,3 but the protocol whose recognition rates are most improved with the Viterbi method is SSH. [sent-359, score-0.464]

50 We choose 400 because it is the largest size for which we can select the same number of instances of each protocol on every day in the data set. [sent-361, score-0.472]

51 We therefore split the packets into two sets: those sent from the client to the server, and those sent from server to client. [sent-403, score-0.439]

52 4 A Protocol Detector for Single Flows In this section, we evaluate the suitability of our profile HMMs for a slightly different task: identifying the TCP connections that belong to a given protocol of interest. [sent-424, score-0.668]

53 2, such a detector could be used, for example, by a network administrator to detect policy violations by a user running a prohibited application (such as instant messenger) or remotely accessing a rogue SMTP server over an encrypted connection. [sent-426, score-0.449]

54 2, and have the system flag a detection when a sequence is classified as belonging to the protocol of interest. [sent-428, score-0.502]

55 While we believe this cost to be warranted when we are interested in determining which protocol generated what connections in the network, at other times one may simply be interested in determining whether or not connections belong to a target protocol. [sent-433, score-0.918]

56 As in the previous sections, we build a profile HMM λP for the target protocol P. [sent-437, score-0.453]

57 Intuitively, this model is intended to represent the packets we expect to see in background traffic, so we estimate its parameters using connections from all protocols in the study. [sent-440, score-0.822]

58 In our simplest (and most efficient) protocol detector, a test connection whose log odds score falls above the threshold is immediately flagged as an instance of the given protocol. [sent-453, score-0.476]

59 The goal, of course, is to build detectors which simultaneously achieve high detection rates for their target protocols and near-zero detection rates for the other, non-target protocols. [sent-454, score-0.635]

60 To empirically evaluate the extent to which our protocol detectors are able to do so, we run each detector on a number of instances of each protocol. [sent-455, score-0.611]

61 We then extract 400 connections from the training set for each of the protocols we want to detect, and use these connections to build one profile HMM for each protocol and one unigram HMM for the noise model. [sent-458, score-1.227]

62 Similarly, we randomly select 400 connections from the holdout set for each protocol, and use these to determine thresholds for a range of detection rates between 1% and 99% for each protocol detector. [sent-459, score-0.83]

63 Finally, we randomly select 400 connections of each protocol from the test set, and run each protocol detector on all 3200 test connections. [sent-460, score-1.213]

64 Similarly, detectors for HTTPS, SMTP and FTP built on this basic design are also able to distinguish their respective protocols from most of the other protocols in our test set, with reasonable accuracy. [sent-467, score-0.614]

65 This is not surprising, since FTP and SMTP share a similar “numeric code and status message” format and generate sequences of packets that look very similar, even when examining packet payloads. [sent-470, score-0.553]

66 Nevertheless, we are able to improve our initial false positive rates for these two protocols using a technique based on iteratively refining of the set of protocols that we suspect a connection might belong to. [sent-472, score-0.628]

67 To build an improved protocol detector, we construct profile HMMs not only for the target protocol, but also for any other similarly-behaving protocols with which it is frequently confused. [sent-473, score-0.727]

68 2) with the models for the frequently-confused protocols to identify the other (non-target) protocol most likely to have generated the sequence of packets in the connection. [sent-478, score-0.99]

69 Only if the model for the target protocol produces a higher Viterbi path probability than this protocol’s model, do we flag the connection as an instance of the target protocol. [sent-479, score-0.452]

70 Tracking the Number of Live Connections in Encrypted Tunnels In §3, we showed that it is often possible to determine the application protocol used in aggregate traffic without demultiplexing or reassembling the TCP connections in the aggregate. [sent-488, score-0.74]

71 We now turn our attention to the case where we cannot demultiplex the flows or determine which packets in the aggregate belong to which flows, as is the case when aggregate traffic is encrypted at the network layer using IPsec Encapsulating Security Payload (Kent and Atkinson, 1998) or SSH tunneling. [sent-490, score-0.639]

72 Specifically, we develop a model-based technique which enables us to accurately track the number of connections in a network-layer tunnel which carries traffic for only a single application protocol. [sent-491, score-0.433]

73 As an example of this scenario, consider a proxy server which listens for clients’ requests on one edge network and forwards them through an encrypted tunnel across the Internet to a set of servers on another edge network. [sent-492, score-0.455]

74 Despite our inability to demultiplex the flows inside such a tunnel, the technique developed in §3 still enables us to correctly identify the application protocol much of the time. [sent-493, score-0.45]

75 We now go on to show how we can, given the application protocol, derive an estimate for the number of connections in the tunnel at each point in time. [sent-494, score-0.433]

76 Assumption 3 For each packet type m, each connection in the tunnel generates packets of type m according to a homogeneous Poisson process with constant rate γm , which is determined by the application protocol in use in the connection. [sent-501, score-1.164]

77 Implications It follows from Assumption 1 and Assumption 2 that, in each timeslice, the number of connections in the tunnel will have a Gaussian distribution with mean equal to the number of connections in the tunnel during the previous timeslice. [sent-502, score-0.866]

78 Accordingly, the set of packet rates {γm } provides a sufficiently descriptive 2760 O N I NFERRING A PPLICATION P ROTOCOL B EHAVIORS model for the given application protocol (in this scenario). [sent-504, score-0.695]

79 We use these observations in the following section to build models that enable us to extract information about the number of tunneled TCP connections from the observed sequence of packet arrivals. [sent-505, score-0.516]

80 1 A Model for Multi-Flow Tunnels To track the number of connections in a multi-flow tunnel, we build a statistical model which relates the stochastic process describing the number of live connections to the stochastic process of packet arrivals. [sent-507, score-0.808]

81 Here, the hidden state transition process describes the changing number of connections Nt in the tunnel, and the symbol output process describes the arrival of packets on the link. [sent-509, score-0.57]

82 States in the HMM therefore correspond to connection counts, and the event that the HMM is in state i at time t corresponds to the event that we see packets from i distinct connections during time slice t. [sent-510, score-0.582]

83 These parameters are: first, the standard deviation σ of the number of live connections in each epoch, and second, the set of base packet rates {γm : packet types m}. [sent-513, score-0.8]

84 Under Assumption 2, the average number of live connections in a time slice follows a Normal distribution with mean equal to the average number of live connections in the previous interval and standard deviation σ. [sent-514, score-0.584]

85 σ σ Due to Assumption 3, that each live connection generates packets independently according to a Poisson process, we expect the total packet arrival rate to increase linearly with the number of live connections. [sent-521, score-0.669]

86 Then, when there are j connections in the tunnel, the number of type-m packet arrivals in an interval of length s will follow a Poisson distribution with parameter equal to jγm s. [sent-522, score-0.481]

87 Therefore, the probability of the joint event that we observe ntm packets of each type m during an interval of length s, when there are j connections in the tunnel, is given by ˆ b j (nt ) = M ∏ e− jγm s m=1 ( jγm s)ntm . [sent-523, score-0.592]

88 For each s-length interval t, we measure (1) the number of connections Nt in the tunnel during the interval, and (2) nt = nt1 , nt2 , . [sent-528, score-0.509]

89 That is, γm gives us the rate at which the number of packets observed increases with the number of connections in the tunnel. [sent-533, score-0.548]

90 2761 W RIGHT, M ONROSE AND M ASSON We estimate σ as the sample conditional standard deviation of the number of connections in the tunnel Nt during an interval t, given the number Nt−1 in the tunnel in the preceding interval. [sent-534, score-0.616]

91 The models for AIM and HTTPS are able to track the true number of connections in their tunnels especially well: on average, their predictions differ from true number of connections by only 22% and 19%, respectively. [sent-564, score-0.569]

92 Between 40 and 65 ticks, the HTTP tunnel produces a sequence of packets where the relative frequencies of the different packet types are out of proportion to those on the training day. [sent-568, score-0.712]

93 ” Zhang and Paxson (2000a) present one of the earliest studies of techniques for network protocol recognition without using port numbers, based on matching patterns in the packet payloads. [sent-574, score-0.717]

94 (2003), where a decision tree classifier that used n-grams of packets was proposed for distinguishing among flows from HTTP, SMTP, FTP, SSH and Telnet servers based on average packet size, average inter-arrival time, and TCP flags. [sent-578, score-0.554]

95 (2006) build a classifier based on k-means clustering of the sizes of the first five packets in each connection to identify application protocols “on the fly. [sent-581, score-0.641]

96 (2001) show that the interarrival times of packets in SSH (version 1) connections can be used to infer information about the user’s keystrokes and thereby reduce the search space for cracking login passwords. [sent-616, score-0.548]

97 We also show that, when it is possible to demultiplex the flows, more in-depth analysis of the packets in each flow can lead to even more robust and accurate classification even when a mix of several protocols are included. [sent-622, score-0.627]

98 Finally, and perhaps most surprisingly, we show that encrypted tunnels which carry only a single application protocol leak sufficient information about the flows in the tunnel to allow us to accurately track their number. [sent-623, score-0.799]

99 We also intend to extend our techniques to more general types of encrypted tunnels, with the ultimate goal of being able to track the number of connections of each protocol inside a full IPsec VPN (Kent and Atkinson, 1998). [sent-628, score-0.797]

100 Table 4 depicts shows the full confusion matrix for the Viterbi classifier when analyzing TCP connections as sequences of packet sizes. [sent-634, score-0.505]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('protocol', 0.418), ('packets', 0.298), ('protocols', 0.274), ('traf', 0.257), ('connections', 0.25), ('ssh', 0.234), ('packet', 0.231), ('tcp', 0.204), ('tunnel', 0.183), ('ftp', 0.176), ('https', 0.152), ('encrypted', 0.129), ('detector', 0.127), ('viterbi', 0.11), ('telnet', 0.107), ('hmm', 0.101), ('security', 0.096), ('smtp', 0.095), ('ows', 0.085), ('detection', 0.084), ('server', 0.082), ('nt', 0.076), ('asson', 0.076), ('ehaviors', 0.076), ('nferring', 0.076), ('onrose', 0.076), ('rotocol', 0.076), ('aggregate', 0.072), ('tunnels', 0.069), ('detectors', 0.066), ('pplication', 0.064), ('traces', 0.062), ('td', 0.062), ('client', 0.059), ('timing', 0.057), ('day', 0.054), ('pro', 0.052), ('administrator', 0.051), ('karagiannis', 0.051), ('paxson', 0.051), ('aggregates', 0.048), ('epoch', 0.046), ('rates', 0.046), ('ntm', 0.044), ('fd', 0.044), ('percentile', 0.043), ('live', 0.042), ('aim', 0.041), ('hmms', 0.04), ('er', 0.04), ('le', 0.038), ('gmu', 0.038), ('hosts', 0.038), ('codebook', 0.038), ('network', 0.036), ('build', 0.035), ('classi', 0.034), ('connection', 0.034), ('interactive', 0.033), ('port', 0.032), ('demultiplex', 0.032), ('holdout', 0.032), ('ssl', 0.032), ('usenix', 0.032), ('internet', 0.029), ('http', 0.029), ('intrusion', 0.029), ('epochs', 0.027), ('wright', 0.027), ('fabian', 0.027), ('encryption', 0.027), ('moore', 0.026), ('faxon', 0.025), ('papagiannaki', 0.025), ('pviterbi', 0.025), ('quantize', 0.025), ('schliep', 0.025), ('servers', 0.025), ('vern', 0.025), ('quantization', 0.025), ('ow', 0.025), ('detect', 0.024), ('detecting', 0.024), ('august', 0.024), ('sequences', 0.024), ('eddy', 0.024), ('krogh', 0.024), ('stepping', 0.024), ('threshold', 0.024), ('mix', 0.023), ('counts', 0.023), ('arrival', 0.022), ('days', 0.022), ('ags', 0.021), ('gerald', 0.021), ('bulk', 0.021), ('monrose', 0.021), ('headers', 0.021), ('bytes', 0.021), ('traveling', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 97 jmlr-2006- (Special Topic on Machine Learning for Computer Security)

Author: Charles V. Wright, Fabian Monrose, Gerald M. Masson

Abstract: Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of traffic as it traverses the network. However, with the increasing popularity of cryptographic protocols, the traditional means of inspecting packet contents to enforce security policies is no longer a viable approach as message contents are concealed by encryption. In this paper, we investigate the extent to which common application protocols can be identified using only the features that remain intact after encryption—namely packet size, timing, and direction. We first present what we believe to be the first exploratory look at protocol identification in encrypted tunnels which carry traffic from many TCP connections simultaneously, using only post-encryption observable features. We then explore the problem of protocol identification in individual encrypted TCP connections, using much less data than in other recent approaches. The results of our evaluation show that our classifiers achieve accuracy greater than 90% for several protocols in aggregate traffic, and, for most protocols, greater than 80% when making fine-grained classifications on single connections. Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%. Keywords: traffic classification, hidden Markov models, network security

2 0.24351022 59 jmlr-2006-Machine Learning for Computer Security    (Special Topic on Machine Learning for Computer Security)

Author: Philip K. Chan, Richard P. Lippmann

Abstract: The prevalent use of computers and internet has enhanced the quality of life for many people, but it has also attracted undesired attempts to undermine these systems. This special topic contains several research studies on how machine learning algorithms can help improve the security of computer systems. Keywords: computer security, spam, images with embedded text, malicious executables, network protocols, encrypted traffic

3 0.054993711 69 jmlr-2006-One-Class Novelty Detection for Seizure Analysis from Intracranial EEG

Author: Andrew B. Gardner, Abba M. Krieger, George Vachtsevanos, Brian Litt

Abstract: This paper describes an application of one-class support vector machine (SVM) novelty detection for detecting seizures in humans. Our technique maps intracranial electroencephalogram (EEG) time series into corresponding novelty sequences by classifying short-time, energy-based statistics computed from one-second windows of data. We train a classifier on epochs of interictal (normal) EEG. During ictal (seizure) epochs of EEG, seizure activity induces distributional changes in feature space that increase the empirical outlier fraction. A hypothesis test determines when the parameter change differs significantly from its nominal value, signaling a seizure detection event. Outputs are gated in a “one-shot” manner using persistence to reduce the false alarm rate of the system. The detector was validated using leave-one-out cross-validation (LOO-CV) on a sample of 41 interictal and 29 ictal epochs, and achieved 97.1% sensitivity, a mean detection latency of ©2005 Andrew B. Gardner, Abba M. Krieger, George Vachtsevanos and Brian Litt GARDNER, KRIEGER, VACHTSEVANOS AND LITT -7.58 seconds, and an asymptotic false positive rate (FPR) of 1.56 false positives per hour (Fp/hr). These results are better than those obtained from a novelty detection technique based on Mahalanobis distance outlier detection, and comparable to the performance of a supervised learning technique used in experimental implantable devices (Echauz et al., 2001). The novelty detection paradigm overcomes three significant limitations of competing methods: the need to collect seizure data, precisely mark seizure onset and offset times, and perform patient-specific parameter tuning for detector training. Keywords: seizure detection, novelty detection, one-class SVM, epilepsy, unsupervised learning 1

4 0.044724915 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

Author: Hichem Sahbi, Donald Geman

Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: first, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-specific SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classifiers, coarse-to-fine computation, support vector machines, face detection

5 0.041753851 30 jmlr-2006-Evolutionary Function Approximation for Reinforcement Learning

Author: Shimon Whiteson, Peter Stone

Abstract: Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investigates evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves individuals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method. The resulting NEAT+Q algorithm automatically discovers effective representations for neural network function approximators. This paper also presents on-line evolutionary computation, which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. We evaluate these contributions with extended empirical studies in two domains: 1) the mountain car task, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a large probabilistic domain drawn from the field of autonomic computing. The results demonstrate that evolutionary function approximation can significantly improve the performance of TD methods and on-line evolutionary computation can significantly improve evolutionary methods. This paper also presents additional tests that offer insight into what factors can make neural network function approximation difficult in practice. Keywords: reinforcement learning, temporal difference methods, evolutionary computation, neuroevolution, on-

6 0.039654057 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

7 0.037262581 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

8 0.032343019 7 jmlr-2006-A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

9 0.031864524 31 jmlr-2006-Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization     (Special Topic on Machine Learning and Optimization)

10 0.027787173 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints     (Special Topic on Machine Learning and Optimization)

11 0.02701886 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

12 0.026398202 12 jmlr-2006-Active Learning with Feedback on Features and Instances

13 0.023597751 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

14 0.023306957 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

15 0.022097226 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

16 0.020044126 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research     (Special Topic on Machine Learning and Optimization)

17 0.019531542 10 jmlr-2006-Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

18 0.019261181 25 jmlr-2006-Distance Patterns in Structural Similarity

19 0.018558076 49 jmlr-2006-Learning Parts-Based Representations of Data

20 0.018527227 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.117), (1, -0.036), (2, -0.058), (3, 0.11), (4, -0.005), (5, -0.015), (6, -0.176), (7, -0.415), (8, 0.064), (9, 0.008), (10, -0.45), (11, -0.164), (12, -0.026), (13, -0.123), (14, -0.055), (15, -0.004), (16, 0.093), (17, -0.063), (18, -0.229), (19, -0.115), (20, 0.035), (21, 0.156), (22, 0.057), (23, -0.032), (24, 0.009), (25, -0.026), (26, 0.024), (27, 0.023), (28, -0.013), (29, -0.028), (30, -0.011), (31, -0.026), (32, 0.008), (33, -0.027), (34, 0.007), (35, -0.051), (36, 0.02), (37, -0.02), (38, 0.058), (39, -0.023), (40, -0.006), (41, 0.066), (42, 0.011), (43, 0.041), (44, -0.047), (45, -0.009), (46, -0.01), (47, -0.013), (48, 0.014), (49, -0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96504688 97 jmlr-2006- (Special Topic on Machine Learning for Computer Security)

Author: Charles V. Wright, Fabian Monrose, Gerald M. Masson

Abstract: Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of traffic as it traverses the network. However, with the increasing popularity of cryptographic protocols, the traditional means of inspecting packet contents to enforce security policies is no longer a viable approach as message contents are concealed by encryption. In this paper, we investigate the extent to which common application protocols can be identified using only the features that remain intact after encryption—namely packet size, timing, and direction. We first present what we believe to be the first exploratory look at protocol identification in encrypted tunnels which carry traffic from many TCP connections simultaneously, using only post-encryption observable features. We then explore the problem of protocol identification in individual encrypted TCP connections, using much less data than in other recent approaches. The results of our evaluation show that our classifiers achieve accuracy greater than 90% for several protocols in aggregate traffic, and, for most protocols, greater than 80% when making fine-grained classifications on single connections. Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%. Keywords: traffic classification, hidden Markov models, network security

2 0.93525153 59 jmlr-2006-Machine Learning for Computer Security    (Special Topic on Machine Learning for Computer Security)

Author: Philip K. Chan, Richard P. Lippmann

Abstract: The prevalent use of computers and internet has enhanced the quality of life for many people, but it has also attracted undesired attempts to undermine these systems. This special topic contains several research studies on how machine learning algorithms can help improve the security of computer systems. Keywords: computer security, spam, images with embedded text, malicious executables, network protocols, encrypted traffic

3 0.22517265 69 jmlr-2006-One-Class Novelty Detection for Seizure Analysis from Intracranial EEG

Author: Andrew B. Gardner, Abba M. Krieger, George Vachtsevanos, Brian Litt

Abstract: This paper describes an application of one-class support vector machine (SVM) novelty detection for detecting seizures in humans. Our technique maps intracranial electroencephalogram (EEG) time series into corresponding novelty sequences by classifying short-time, energy-based statistics computed from one-second windows of data. We train a classifier on epochs of interictal (normal) EEG. During ictal (seizure) epochs of EEG, seizure activity induces distributional changes in feature space that increase the empirical outlier fraction. A hypothesis test determines when the parameter change differs significantly from its nominal value, signaling a seizure detection event. Outputs are gated in a “one-shot” manner using persistence to reduce the false alarm rate of the system. The detector was validated using leave-one-out cross-validation (LOO-CV) on a sample of 41 interictal and 29 ictal epochs, and achieved 97.1% sensitivity, a mean detection latency of ©2005 Andrew B. Gardner, Abba M. Krieger, George Vachtsevanos and Brian Litt GARDNER, KRIEGER, VACHTSEVANOS AND LITT -7.58 seconds, and an asymptotic false positive rate (FPR) of 1.56 false positives per hour (Fp/hr). These results are better than those obtained from a novelty detection technique based on Mahalanobis distance outlier detection, and comparable to the performance of a supervised learning technique used in experimental implantable devices (Echauz et al., 2001). The novelty detection paradigm overcomes three significant limitations of competing methods: the need to collect seizure data, precisely mark seizure onset and offset times, and perform patient-specific parameter tuning for detector training. Keywords: seizure detection, novelty detection, one-class SVM, epilepsy, unsupervised learning 1

4 0.16430795 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

Author: Hichem Sahbi, Donald Geman

Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: first, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-specific SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classifiers, coarse-to-fine computation, support vector machines, face detection

5 0.15378693 30 jmlr-2006-Evolutionary Function Approximation for Reinforcement Learning

Author: Shimon Whiteson, Peter Stone

Abstract: Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investigates evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves individuals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method. The resulting NEAT+Q algorithm automatically discovers effective representations for neural network function approximators. This paper also presents on-line evolutionary computation, which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. We evaluate these contributions with extended empirical studies in two domains: 1) the mountain car task, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a large probabilistic domain drawn from the field of autonomic computing. The results demonstrate that evolutionary function approximation can significantly improve the performance of TD methods and on-line evolutionary computation can significantly improve evolutionary methods. This paper also presents additional tests that offer insight into what factors can make neural network function approximation difficult in practice. Keywords: reinforcement learning, temporal difference methods, evolutionary computation, neuroevolution, on-

6 0.14153892 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

7 0.11271366 31 jmlr-2006-Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization     (Special Topic on Machine Learning and Optimization)

8 0.11087344 49 jmlr-2006-Learning Parts-Based Representations of Data

9 0.10574377 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

10 0.10247242 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints     (Special Topic on Machine Learning and Optimization)

11 0.10119373 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

12 0.09723074 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity

13 0.096720941 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

14 0.094795063 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis

15 0.094416589 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections     (Special Topic on Machine Learning and Optimization)

16 0.090441473 7 jmlr-2006-A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

17 0.085640192 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

18 0.084021389 10 jmlr-2006-Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

19 0.082317516 12 jmlr-2006-Active Learning with Feedback on Features and Instances

20 0.08008039 78 jmlr-2006-Rearrangement Clustering: Pitfalls, Remedies, and Applications


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.012), (35, 0.011), (36, 0.057), (45, 0.03), (50, 0.022), (61, 0.01), (63, 0.035), (68, 0.014), (70, 0.048), (76, 0.014), (78, 0.016), (79, 0.015), (81, 0.029), (84, 0.012), (90, 0.012), (91, 0.025), (95, 0.489), (96, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79230934 97 jmlr-2006- (Special Topic on Machine Learning for Computer Security)

Author: Charles V. Wright, Fabian Monrose, Gerald M. Masson

Abstract: Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of traffic as it traverses the network. However, with the increasing popularity of cryptographic protocols, the traditional means of inspecting packet contents to enforce security policies is no longer a viable approach as message contents are concealed by encryption. In this paper, we investigate the extent to which common application protocols can be identified using only the features that remain intact after encryption—namely packet size, timing, and direction. We first present what we believe to be the first exploratory look at protocol identification in encrypted tunnels which carry traffic from many TCP connections simultaneously, using only post-encryption observable features. We then explore the problem of protocol identification in individual encrypted TCP connections, using much less data than in other recent approaches. The results of our evaluation show that our classifiers achieve accuracy greater than 90% for several protocols in aggregate traffic, and, for most protocols, greater than 80% when making fine-grained classifications on single connections. Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%. Keywords: traffic classification, hidden Markov models, network security

2 0.26723665 59 jmlr-2006-Machine Learning for Computer Security    (Special Topic on Machine Learning for Computer Security)

Author: Philip K. Chan, Richard P. Lippmann

Abstract: The prevalent use of computers and internet has enhanced the quality of life for many people, but it has also attracted undesired attempts to undermine these systems. This special topic contains several research studies on how machine learning algorithms can help improve the security of computer systems. Keywords: computer security, spam, images with embedded text, malicious executables, network protocols, encrypted traffic

3 0.20966603 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs

4 0.20226349 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

5 0.2021189 44 jmlr-2006-Large Scale Transductive SVMs

Author: Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou

Abstract: We show how the concave-convex procedure can be applied to transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at http://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction. html. Keywords: transduction, transductive SVMs, semi-supervised learning, CCCP

6 0.20098916 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

7 0.19733307 53 jmlr-2006-Learning a Hidden Hypergraph

8 0.19721857 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming     (Special Topic on Machine Learning and Optimization)

9 0.19602397 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

10 0.19520208 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

11 0.19517246 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs     (Special Topic on Machine Learning and Optimization)

12 0.19289081 14 jmlr-2006-An Efficient Implementation of an Active Set Method for SVMs    (Special Topic on Machine Learning and Optimization)

13 0.19272171 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms

14 0.1925893 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

15 0.19119391 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

16 0.19114377 88 jmlr-2006-Streamwise Feature Selection

17 0.19113022 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

18 0.19093914 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

19 0.19068733 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections     (Special Topic on Machine Learning and Optimization)

20 0.19002733 48 jmlr-2006-Learning Minimum Volume Sets