jmlr jmlr2006 jmlr2006-80 knowledge-graph by maker-knowledge-mining

80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

Source: pdf

Author: Seyoung Kim, Padhraic Smyth

Abstract: This paper proposes a general probabilistic framework for shape-based modeling and classiﬁcation of waveform data. A segmental hidden Markov model (HMM) is used to characterize waveform shape and shape variation is captured by adding random effects to the segmental model. The resulting probabilistic framework provides a basis for learning of waveform models from data as well as parsing and recognition of new waveforms. Expectation-maximization (EM) algorithms are derived and investigated for ﬁtting such models to data. In particular, the “expectation conditional maximization either” (ECME) algorithm is shown to provide signiﬁcantly faster convergence than a standard EM procedure. Experimental results on two real-world data sets demonstrate that the proposed approach leads to improved accuracy in classiﬁcation and segmentation when compared to alternatives such as Euclidean distance matching, dynamic time warping, and segmental HMMs without random effects. Keywords: waveform recognition, random effects, segmental hidden Markov models, EM algorithm, ECME

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Computer Science University of California, Irvine Irvine, CA 92697-3425, USA Editor: Sam Roweis Abstract This paper proposes a general probabilistic framework for shape-based modeling and classiﬁcation of waveform data. [sent-5, score-0.391]

2 A segmental hidden Markov model (HMM) is used to characterize waveform shape and shape variation is captured by adding random effects to the segmental model. [sent-6, score-1.936]

3 The resulting probabilistic framework provides a basis for learning of waveform models from data as well as parsing and recognition of new waveforms. [sent-7, score-0.404]

4 Experimental results on two real-world data sets demonstrate that the proposed approach leads to improved accuracy in classiﬁcation and segmentation when compared to alternatives such as Euclidean distance matching, dynamic time warping, and segmental HMMs without random effects. [sent-10, score-0.682]

5 Keywords: waveform recognition, random effects, segmental hidden Markov models, EM algorithm, ECME 1. [sent-11, score-1.019]

6 Waveform analysis has also attracted attention in information retrieval and data mining, with a focus on algorithms that can take a waveform as an input query and search a large database to ﬁnd similar waveforms that match the query waveform (e. [sent-13, score-0.946]

7 While the human visual system can easily recognize the characteristic signature of a particular waveform shape (a heartbeat waveform for example) the problem can be quite difﬁcult for automated methods. [sent-18, score-0.811]

8 For example, Figure 1 shows a set of time-series waveforms collected during turbulent ﬂuid-ﬂow experiments where the shape of each waveform is determined by the nature of c 2006 Seyoung Kim and Padhraic Smyth. [sent-19, score-0.665]

9 Figure 1(a) shows an example waveform from a particular type of interaction. [sent-22, score-0.364]

10 Although all of these waveforms belong to the same interaction class, there is signiﬁcant variability in shape among those waveforms. [sent-24, score-0.321]

11 An example waveform from a different class is shown in Figure 1(c), and a set of such waveforms are shown in Figure 1(d). [sent-26, score-0.582]

12 In this paper we address the problem of detecting and classifying general classes of waveforms based on their shape and propose a new statistical model that directly addresses within-class shape variability. [sent-28, score-0.37]

13 , that all of the waveform measurements are available at the time of analysis rather than arriving sequentially in an online fashion. [sent-35, score-0.413]

14 For waveform modeling, a useful extension is the so-called segmental HMM, originally introduced in the speech recognition (Russell, 1993) and more recently used for more general waveform modeling (Ge and Smyth, 2000). [sent-45, score-1.428]

15 The segmental model allows for the observed data within each segment (a sequence of states with the same value) to follow a general parametric regression form, such as a linear function of time with additive noise. [sent-46, score-0.819]

16 This allows us to model the shape of the waveform directly, in this case as a sequence of piecewise linear components—Figure 2(a) shows a piecewise linear representation of the waveform in Figure 1(a). [sent-47, score-0.858]

17 Thus, the only source of variability in an observed waveform arises from variation in the lengths of the segments and observation noise added to the functional form in each segment. [sent-49, score-0.405]

18 The limitation of this can clearly be seen in Figure 2(b), where a segmental HMM has been trained on the data in Figure 1(b) and then used to generate maximum-likelihood estimates of segment boundaries, slopes, and intercepts for the new waveform in Figure 2(b). [sent-50, score-1.19]

19 , in the ﬁrst segment the intercept is clearly too low on the y-axis, in the second segment the slope is too small, and so forth. [sent-53, score-0.541]

20 By using the same ﬁxed parameters for all waveforms, the model cannot fully account for variability in waveform shapes (e. [sent-54, score-0.482]

21 To address this limitation, in this paper we combine segmental HMMs with random effects models. [sent-57, score-0.794]

22 By extending the segmental HMM to include random effects, we can allow the slopes and intercepts within each segment of each waveform to vary according to a prior distribution. [sent-59, score-1.255]

23 As illustrated in Figure 3, in the hierarchical setup of our model each waveform (at the bottom level) has its own slope and intercept parameters (as shown at the middle level) that come from a shape 947 K IM AND S MYTH template (at the top level) modeled as a population prior. [sent-60, score-0.75]

24 For example, we can in principle learn that the slopes across multiple waveforms for the ﬁrst segment in Figure 1(b) tend to have a characteristic mean slope and standard deviation. [sent-63, score-0.505]

25 Figure 2(c) shows a visual example of how a random effects model (constructed from the training data in Figure 1(b)) is used to generate maximum-likelihood estimates of segment boundaries and segment slopes and intercepts for the waveform in Figure 1(a). [sent-65, score-0.961]

26 (2004) described preliminary results using random effects segmental HMMs for waveform parsing and recognition. [sent-67, score-1.177]

27 This is a result of the large amount of missing information present (due to the random effects component of the model), compared to a standard segmental HMM. [sent-69, score-0.794]

28 In this paper we use the “expectation conditional maximization either” (ECME) algorithm (Liu and Rubin, 1994) for parameter estimation of random effects segmental HMMs. [sent-70, score-0.794]

29 This dramatically speeds up convergence relative to the EM algorithm, making the model much more practical to use for real-world waveform recognition problems. [sent-71, score-0.431]

30 We begin our discussion by reviewing related work on segmental HMMs and random effects models in Section 2. [sent-72, score-0.794]

31 In Section 5, we evaluate our model on two applications involving bubble-probe interaction data and ECG data, and compare random effects segmental HMMs to other waveform-matching algorithms such as Euclidean distance matching, dynamic time warping, and segmental HMMs without random effects. [sent-76, score-1.477]

32 Related Work and Contributions A general approach to waveform recognition is to extract characteristic features from the waveform in the time-domain or the frequency-domain, and then perform classiﬁcation in the resulting feature-vector space. [sent-79, score-0.749]

33 Using classiﬁers in this manner requires training data from both positive and negative classes as well as the extraction of reliable discriminative features from the raw waveform data. [sent-81, score-0.364]

34 In the approach described in this paper we avoid these requirements by learning models from the positive class only and by modeling the waveform directly in the time-domain without any need for feature extraction. [sent-82, score-0.391]

35 Other techniques have been pursued in the area of waveform query-matching for information retrieval involving time-series data (e. [sent-83, score-0.364]

36 This allows us (for example) 948 S EGMENTAL H IDDEN M ARKOV M ODELS WITH R ANDOM E FFECTS to learn models from data, to handle within-class waveform variability, and to generate maximumlikelihood segmentations of waveforms. [sent-89, score-0.364]

37 As mentioned in Section 1, standard discrete-time ﬁnite-state HMMs are not ideal for modeling waveform shapes since the generative model implicitly assumes a geometric distribution on segment lengths and a piecewise constant shape model. [sent-90, score-0.724]

38 (1994) and Russell (1993) extended these models to segmental HMMs by modeling dependencies between observations from the same state with a parametric trajectory model that changes over time. [sent-94, score-0.713]

39 (1996) reviewed variations of segmental HMMs from a speech recognition perspective. [sent-96, score-0.673]

40 Ge and Smyth (2000) introduced the idea of using segmental HMMs for general waveform recognition problems. [sent-99, score-1.011]

41 The idea of using random effects with segmental HMMs to model parameter variability across waveforms is implicit in the speech recognition work of both Gales and Young (1993) and, later, Holmes and Russell (1999). [sent-100, score-1.128]

42 This work can be viewed as precursors to the more general random effects segmental HMM framework we present in this paper. [sent-101, score-0.794]

43 , 2004), we noted that Holmes and Russell’s model could be formalized within a random effects framework, and derived a more general EM framework for such models, taking advantage of ideas developed separately in speech recognition and in statistics. [sent-105, score-0.243]

44 In the statistical literature there is a signiﬁcant body of work on modeling a hierarchical datagenerating process with a random effects model and estimating the parameters of this model (Searle et al. [sent-106, score-0.275]

45 There appears to be no work in the statistical literature on applying random effects to segmental HMMs. [sent-113, score-0.794]

46 In this context, the primary contribution of this paper is a general framework for random-effects segmental hidden Markov models. [sent-114, score-0.626]

47 We demonstrate how such models can be used for waveform modeling, recognition, and segmentation, with experimental comparisons of the random effects approach with alternative methods such as dynamic time warping, using two real-world data sets. [sent-115, score-0.532]

48 We extend earlier approaches for learning the parameters of random effects segmental HMMs by deriving a provably correct EM algorithm with monotonic convergence. [sent-116, score-0.818]

49 We further extend the standard EM algorithm to develop an ECME algorithm for ﬁtting random effects segmental HMMs. [sent-118, score-0.794]

50 We also show that this inference algorithm can be applied to full covariance models rather than assuming (as in Holmes and Russell, 1999) that the intercept and slope in the segment distribution are conditionally independent. [sent-122, score-0.415]

51 Segmental HMMs A segmental HMM with M states is described by an M × M transition matrix, a probability distribution over duration for each state, and a segment model for each state. [sent-125, score-0.86]

52 In waveform modeling, we typically constrain the transition matrix to allow only left-to-right transitions and do not allow self-transitions. [sent-131, score-0.364]

53 Once the process enters state k, a duration d is drawn, and state k produces a segment of observations of length d from the segment distribution model. [sent-141, score-0.435]

54 In speech recognition using the mid-point of a segment as a parameter in the model instead of intercept has been shown to lead to better speech recognition performance (Holmes and Russell, 1999). [sent-144, score-0.412]

55 Nonetheless, parametrization of the model via the intercept worked well in our experiments, and for this reason we use the intercept in the models discussed in this paper. [sent-145, score-0.278]

56 Inference algorithms for segmental HMMs provide a natural way to evaluate the performance of the model on test data. [sent-158, score-0.654]

57 The F-B algorithm scores a previously unseen waveform y by calculating the likelihood p(y|θ) = ∑ p(y, s|θ) = ∑ αT (k), s (4) k where s represents a sequence of unknown state labels for observations y. [sent-159, score-0.435]

58 The Viterbi algorithm can provide a segmentation of a waveform by computing the most likely state sequence (e. [sent-160, score-0.423]

59 The addition of duration distributions in segmental HMMs increases the time complexity of both the F-B and Viterbi algorithms from O(M 2 T ) for standard HMMs to O(M 2 T 2 ), where T is the length of the waveform (i. [sent-163, score-1.031]

60 Segmental HMMs with Random Effects A random effects model is a general statistical framework when the data generation process has a hierarchical structure, coupling a population-level model with individual-level variation. [sent-167, score-0.224]

61 By combining segmental HMMs and random effects models we can take advantage of the strength of each in waveform modeling. [sent-172, score-1.158]

62 Random effects models add one level of hierarchy to the probabilistic structure of segmental HMMs, deﬁning a population distribution over the possible shapes of waveform segments. [sent-173, score-1.217]

63 Instead of requiring all waveforms to be modeled with a single set of parameters, individual waveforms are allowed to have their own parameters but coupled by a common population prior across all waveforms. [sent-174, score-0.484]

64 1 The Model Beginning with the segmental HMMs described in Section 3, we add random effects via a new variable ui to the segment distribution part of the model as follows. [sent-176, score-1.151]

65 Consider the rth segment yi of r r length d from the ith individual waveform yi generated by state k. [sent-177, score-0.694]

66 Following the discussion in Laird 951 K IM AND S MYTH Template βk ’s C © c c c c s d d ˆr (βk + ui )’s for ith waveform i = 1, . [sent-178, score-0.528]

67 , 5 c c Observed data Figure 3: A visual illustration of the random effects segmental HMM, using ﬂuid-ﬂow waveform data as an example (as described in Section 5. [sent-181, score-1.158]

68 The top level shows the population level parameters βk ’s for the waveform shape. [sent-183, score-0.456]

69 The plots in the middle level show the posterior estimates (combining both the data ˆ ˆ and the prior) of ui and si , using Equation (8) and the Viterbi algorithm respectively. [sent-185, score-0.251]

70 βk r r represents the mean regression parameters for segment k, and ui represents the variation in regresr sion (or shape) parameters for the ith individual waveform. [sent-188, score-0.377]

71 At this level, the individual random effects ui as well as βk and σ2 are viewed as parameters. [sent-189, score-0.332]

72 At the top level, ui is viewed as a random r r variable with distribution (6) ui ∼ N2 (0, Ψk ), r where Ψk is a 2 × 2 covariance matrix, and ui is independent of ei . [sent-190, score-0.563]

73 It can be r r r r r shown that yi and ui have the following joint distribution: r r yi r ui r Xi βk r 0 ∼ Nd+2 ′ , Xi Ψk Xi + σ2 Id Xi Ψk r r r ′ Ψk Ψk Xi r . [sent-192, score-0.444]

74 (a) segment model in segmental HMMs, (b) a two-stage model with random effects parameters in random effects segmental HMMs, and (c) the model after integrating out random effects parameters from (b). [sent-194, score-2.053]

75 , R} given the segmentation si of r ˆ ˆ ˆr waveform yi into R segments. [sent-199, score-0.514]

76 of waveform y Figure 3 conceptually illustrates the hierarchical setup of the model. [sent-204, score-0.364]

77 The plots at the middle level show the posterior estimates ˆ ˆ (combining both the data and the prior) of ui and si , using Equation (8) and the Viterbi algorithm respectively. [sent-207, score-0.251]

78 From a generative model perspective, the shape templates in the middle row, (βk + ui )’s, r i = 1, . [sent-208, score-0.287]

79 Figure 4 shows plate diagrams for the segment distribution part of segmental HMMs and random effects segmental HMMs, illustrating the generative process for N waveforms, y1 , . [sent-214, score-1.618]

80 , yN , under the simplifying assumption that each waveform comes from a single segment of length D corresponding to state k. [sent-217, score-0.561]

81 Replacing the two-level segment distribution with this marginal r r distribution, and collapsing the hierarchy into a single level, we can use the same F-B and Viterbi algorithm as in segmental HMMs in the marginalized space over the random effects parameters ui . [sent-220, score-1.164]

82 The likelihood of a waveform y, given ﬁxed parameters θ = {A, Λ, θ f = {βk , Ψk , (σ2 )|k = 1, . [sent-223, score-0.427]

83 s k As in segmental HMMs, the Viterbi algorithm can be used as a method to segment a waveform by computing the most likely state sequence. [sent-227, score-1.187]

84 What appears to make the inference in random effects segmental HMMs computationally much more expensive than in segmental HMMs is the inversion of the d × d covariance matrix of the ′ marginal segment distribution, Xi Ψk Xi + σ2 Id , during the evaluation of the likelihood of a segr r ment. [sent-228, score-1.663]

85 For example, in the F-B algorithm, the likelihood of a segment yi of length d given state k, r p(yi |βk , Ψk , σ2 ), needs to be calculated for all possible durations d in each of the αt (k) and βt (k) r expressions at each recursion. [sent-229, score-0.294]

86 In the case of a simpler model with a diagonal covariance matrix for Ψk , Holmes and Russell (1999) derived a method for computing the segment likelihood with time complexity O(M 2 T 3 ). [sent-232, score-0.253]

88 ˆr Si = (yi − Xi βk − Xi ui )′ (yi − Xi βk − Xi ui ) + σ2 ui ′ Ψk r r r r ˆr r r r ˆr This can be further simpliﬁed using Equation (9): Si = (yi − Xi βk )′ (yi − Xi βk − Xi ui ). [sent-239, score-0.656]

89 For segmental HMMs this computational complexity can be further reduced to O(M 2 T 2 ) by precomputing the segment likelihood and storing the values in a table (Mitchell et al. [sent-242, score-0.83]

90 3 Parameter Estimation In this section, we describe how to obtain maximum-likelihood estimates of the parameters from a training set of multiple waveforms for a random effects segmental HMM using the EM algorithm. [sent-246, score-1.036]

91 We augment the observed waveform data with both (a) state sequences and (b) random effects parameters (both are considered to be hidden). [sent-247, score-0.588]

92 The sufﬁcient statistics, r r ′ E ui |si = k, Y, θ(t) and E ui ui |si = k, Y, θ(t) , for P(ui |si = k, yi , θ(t) ) in Equation (15) can be r r r r r r r r directly obtained from Equations (9) and (10). [sent-260, score-0.55]

93 The time complexity for an E step is O(M 2 T 3 N) where N is the number of waveforms (and assuming each waveform is of length T ). [sent-261, score-0.582]

94 In practice, the algorithm often converges relatively slowly, compared to segmental HMMs, due to the additional missing information in random effects parameters U. [sent-268, score-0.818]

95 The segmental HMM converges much faster but converges to a lower log-likelihood value. [sent-270, score-0.626]

96 Holmes and Russell (1999) augmented the observed waveform data with state sequences after integrating out the random effects parameters, and used Dcomplete = {Y, S} in the E step. [sent-272, score-0.564]

97 In this case the parameters for the segment distribution {βk , σ2 , Ψk } do not decouple in the complete data log-likelihood and there is no closed form solution for those parameters in the M step. [sent-273, score-0.23]

98 ECME for the random effects segmental HMM, x-axis on a log-scale. [sent-277, score-0.794]

99 Liu and Rubin report slightly faster convergence from the latter, but in our application of the ECME algorithm to random effects segmental HMMs we use the ﬁrst version with closed form CM steps, thus, retaining the monotone convergence property of EM. [sent-344, score-0.847]

100 For random effects segmental HMMs we partition the parameters θ into θ1 = {A, Λ, Ψk , σ2 |k = 1, . [sent-345, score-0.818]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('segmental', 0.626), ('waveform', 0.364), ('hmms', 0.315), ('ecme', 0.236), ('waveforms', 0.218), ('segment', 0.165), ('em', 0.165), ('ui', 0.164), ('effects', 0.139), ('intercept', 0.125), ('hmm', 0.093), ('holmes', 0.092), ('slope', 0.086), ('egmental', 0.072), ('ffects', 0.072), ('myth', 0.072), ('russell', 0.071), ('cm', 0.07), ('si', 0.065), ('viterbi', 0.064), ('shape', 0.062), ('andom', 0.061), ('yi', 0.058), ('dcomplete', 0.051), ('rubin', 0.051), ('ware', 0.051), ('arkov', 0.05), ('idden', 0.05), ('measurements', 0.049), ('odels', 0.044), ('smyth', 0.044), ('ecm', 0.043), ('id', 0.043), ('duration', 0.041), ('variability', 0.041), ('ecg', 0.041), ('uir', 0.041), ('im', 0.039), ('likelihood', 0.039), ('gw', 0.036), ('slopes', 0.036), ('intercepts', 0.035), ('laird', 0.034), ('generative', 0.033), ('ow', 0.032), ('state', 0.032), ('dr', 0.031), ('gales', 0.031), ('warping', 0.031), ('dempster', 0.029), ('random', 0.029), ('model', 0.028), ('modeling', 0.027), ('segmentation', 0.027), ('wth', 0.026), ('speech', 0.026), ('irvine', 0.025), ('shapes', 0.025), ('population', 0.024), ('parameters', 0.024), ('tth', 0.023), ('log', 0.023), ('level', 0.022), ('young', 0.021), ('xi', 0.021), ('ei', 0.021), ('covariance', 0.021), ('agrawal', 0.021), ('ferguson', 0.021), ('heartbeat', 0.021), ('heartbeats', 0.021), ('ics', 0.021), ('keogh', 0.021), ('levinson', 0.021), ('padhraic', 0.021), ('seyoung', 0.021), ('turbulent', 0.021), ('yir', 0.021), ('yti', 0.021), ('recognition', 0.021), ('kim', 0.02), ('piecewise', 0.02), ('ri', 0.02), ('template', 0.02), ('parsing', 0.019), ('inference', 0.018), ('equations', 0.018), ('iteration', 0.018), ('convergence', 0.018), ('probe', 0.017), ('koski', 0.017), ('rth', 0.017), ('xti', 0.017), ('faloutsos', 0.017), ('sir', 0.017), ('tep', 0.017), ('hierarchy', 0.017), ('closed', 0.017), ('bottom', 0.017), ('liu', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

Author: Seyoung Kim, Padhraic Smyth

2 0.062829442 57 jmlr-2006-Linear State-Space Models for Blind Source Separation

Author: Rasmus Kongsgaard Olsson, Lars Kai Hansen

Abstract: We apply a type of generative modelling to the problem of blind source separation in which prior knowledge about the latent source signals, such as time-varying auto-correlation and quasiperiodicity, are incorporated into a linear state-space model. In simulations, we show that in terms of signal-to-error ratio, the sources are inferred more accurately as a result of the inclusion of strong prior knowledge. We explore different schemes of maximum-likelihood optimization for the purpose of learning the model parameters. The Expectation Maximization algorithm, which is often considered the standard optimization method in this context, results in slow convergence when the noise variance is small. In such scenarios, quasi-Newton optimization yields substantial improvements in a range of signal to noise ratios. We analyze the performance of the methods on convolutive mixtures of speech signals. Keywords: blind source separation, state-space model, independent component analysis, convolutive model, EM, speech modelling

3 0.04374183 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data (Special Topic on Machine Learning and Optimization)

Author: Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola

Abstract: We propose a novel second order cone programming formulation for designing robust classiﬁers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classiﬁcation and regression problems. Experiments show that the proposed formulations outperform imputation.

4 0.040147338 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

Author: Tzu-Kuo Huang, Ruby C. Weng, Chih-Jen Lin

Abstract: The Bradley-Terry model for obtaining individual skill from paired comparisons has been popular in many areas. In machine learning, this model is related to multi-class probability estimates by coupling all pairwise classiﬁcation results. Error correcting output codes (ECOC) are a general framework to decompose a multi-class problem to several binary problems. To obtain probability estimates under this framework, this paper introduces a generalized Bradley-Terry model in which paired individual comparisons are extended to paired team comparisons. We propose a simple algorithm with convergence proofs to solve the model and obtain individual skill. Experiments on synthetic and real data demonstrate that the algorithm is useful for obtaining multi-class probability estimates. Moreover, we discuss four extensions of the proposed model: 1) weighted individual skill, 2) home-ﬁeld advantage, 3) ties, and 4) comparisons with more than two teams. Keywords: Bradley-Terry model, probability estimates, error correcting output codes, support vector machines

5 0.039654057 97 jmlr-2006- (Special Topic on Machine Learning for Computer Security)

Author: Charles V. Wright, Fabian Monrose, Gerald M. Masson

Abstract: Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of trafﬁc as it traverses the network. However, with the increasing popularity of cryptographic protocols, the traditional means of inspecting packet contents to enforce security policies is no longer a viable approach as message contents are concealed by encryption. In this paper, we investigate the extent to which common application protocols can be identiﬁed using only the features that remain intact after encryption—namely packet size, timing, and direction. We ﬁrst present what we believe to be the ﬁrst exploratory look at protocol identiﬁcation in encrypted tunnels which carry trafﬁc from many TCP connections simultaneously, using only post-encryption observable features. We then explore the problem of protocol identiﬁcation in individual encrypted TCP connections, using much less data than in other recent approaches. The results of our evaluation show that our classiﬁers achieve accuracy greater than 90% for several protocols in aggregate trafﬁc, and, for most protocols, greater than 80% when making ﬁne-grained classiﬁcations on single connections. Moreover, perhaps most surprisingly, we show that one can even estimate the number of live connections in certain classes of encrypted tunnels to within, on average, better than 20%. Keywords: trafﬁc classiﬁcation, hidden Markov models, network security

6 0.037100941 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis

7 0.035853423 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

8 0.032387998 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

9 0.032342087 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

10 0.031591039 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)

11 0.030430451 59 jmlr-2006-Machine Learning for Computer Security (Special Topic on Machine Learning for Computer Security)

12 0.029414399 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

13 0.029141074 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

14 0.028726052 49 jmlr-2006-Learning Parts-Based Representations of Data

15 0.028050497 31 jmlr-2006-Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization (Special Topic on Machine Learning and Optimization)

16 0.023187105 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

17 0.021181332 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

18 0.020604825 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

19 0.020582128 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

20 0.019292895 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.116), (1, -0.012), (2, -0.041), (3, 0.035), (4, -0.082), (5, -0.063), (6, -0.068), (7, -0.094), (8, 0.045), (9, -0.013), (10, -0.037), (11, -0.05), (12, -0.08), (13, -0.07), (14, 0.005), (15, -0.035), (16, -0.076), (17, 0.101), (18, -0.024), (19, -0.147), (20, 0.005), (21, 0.067), (22, -0.296), (23, -0.109), (24, -0.099), (25, -0.024), (26, -0.087), (27, -0.02), (28, 0.111), (29, -0.126), (30, -0.048), (31, -0.394), (32, 0.01), (33, 0.049), (34, 0.03), (35, -0.085), (36, -0.02), (37, 0.009), (38, -0.155), (39, -0.071), (40, -0.091), (41, -0.241), (42, -0.245), (43, -0.197), (44, 0.333), (45, 0.06), (46, -0.048), (47, 0.063), (48, -0.09), (49, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96633434 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

Author: Seyoung Kim, Padhraic Smyth

2 0.32799226 57 jmlr-2006-Linear State-Space Models for Blind Source Separation

Author: Rasmus Kongsgaard Olsson, Lars Kai Hansen

3 0.30971509 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data (Special Topic on Machine Learning and Optimization)

Author: Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola

4 0.20574839 31 jmlr-2006-Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization (Special Topic on Machine Learning and Optimization)

Author: Olvi L. Mangasarian

Abstract: Support vector machines utilizing the 1-norm, typically set up as linear programs (Mangasarian, 2000; Bradley and Mangasarian, 1998), are formulated here as a completely unconstrained minimization of a convex differentiable piecewise-quadratic objective function in the dual space. The objective function, which has a Lipschitz continuous gradient and contains only one additional ﬁnite parameter, can be minimized by a generalized Newton method and leads to an exact solution of the support vector machine problem. The approach here is based on a formulation of a very general linear program as an unconstrained minimization problem and its application to support vector machine classiﬁcation problems. The present approach which generalizes both (Mangasarian, 2004) and (Fung and Mangasarian, 2004) is also applied to nonlinear approximation where a minimal number of nonlinear kernel functions are utilized to approximate a function from a given number of function values.

5 0.19825855 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)

Author: Radu Stefan Niculescu, Tom M. Mitchell, R. Bharat Rao

Abstract: The task of learning models for many real-world problems requires incorporating domain knowledge into learning algorithms, to enable accurate learning from a realistic volume of training data. This paper considers a variety of types of domain knowledge for constraining parameter estimates when learning Bayesian networks. In particular, we consider domain knowledge that constrains the values or relationships among subsets of parameters in a Bayesian network with known structure. We incorporate a wide variety of parameter constraints into learning procedures for Bayesian networks, by formulating this task as a constrained optimization problem. The assumptions made in module networks, dynamic Bayes nets and context speciﬁc independence models can be viewed as particular cases of such parameter constraints. We present closed form solutions or fast iterative algorithms for estimating parameters subject to several speciﬁc classes of parameter constraints, including equalities and inequalities among parameters, constraints on individual parameters, and constraints on sums and ratios of parameters, for discrete and continuous variables. Our methods cover learning from both frequentist and Bayesian points of view, from both complete and incomplete data. We present formal guarantees for our estimators, as well as methods for automatically learning useful parameter constraints from data. To validate our approach, we apply it to the domain of fMRI brain image analysis. Here we demonstrate the ability of our system to ﬁrst learn useful relationships among parameters, and then to use them to constrain the training of the Bayesian network, resulting in improved cross-validated accuracy of the learned model. Experiments on synthetic data are also presented. Keywords: Bayesian networks, constrained optimization, domain knowledge c 2006 Radu Stefan Niculescu, Tom Mitchell and Bharat Rao. N ICULESCU , M ITCHELL AND R AO

6 0.1865457 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

7 0.1802983 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis

8 0.16315156 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

9 0.16271982 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

10 0.15825281 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

11 0.15492085 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

12 0.15414381 97 jmlr-2006- (Special Topic on Machine Learning for Computer Security)

13 0.15191752 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

14 0.14200611 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

15 0.13896672 49 jmlr-2006-Learning Parts-Based Representations of Data

16 0.12381052 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs

17 0.11815582 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

18 0.10741991 7 jmlr-2006-A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

19 0.10725599 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

20 0.10224967 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.017), (36, 0.049), (45, 0.02), (50, 0.03), (63, 0.025), (68, 0.02), (76, 0.014), (78, 0.013), (79, 0.025), (81, 0.582), (84, 0.019), (90, 0.016), (91, 0.025), (96, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93911731 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

Author: Seyoung Kim, Padhraic Smyth

2 0.91020751 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes

Author: Andrea Caponnetto, Alexander Rakhlin

Abstract: We study some stability properties of algorithms which minimize (or almost-minimize) empirical error over Donsker classes of functions. We show that, as the number n of samples grows, the L 2 1 diameter of the set of almost-minimizers of empirical error with tolerance ξ(n) = o(n − 2 ) converges to zero in probability. Hence, even in the case of multiple minimizers of expected error, as n increases it becomes less and less likely that adding a sample (or a number of samples) to the training set will result in a large jump to a new hypothesis. Moreover, under some assumptions on the entropy of the class, along with an assumption of Komlos-Major-Tusnady type, we derive a power rate of decay for the diameter of almost-minimizers. This rate, through an application of a uniform ratio limit inequality, is shown to govern the closeness of the expected errors of the almost-minimizers. In fact, under the above assumptions, the expected errors of almost-minimizers become closer with a rate strictly faster than n−1/2 . Keywords: empirical risk minimization, empirical processes, stability, Donsker classes

3 0.81437999 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

Author: Francis R. Bach, David Heckerman, Eric Horvitz

Abstract: Receiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classiﬁers for all feasible ratios of the costs associated with false positives and false negatives. For linear classiﬁers, the set of classiﬁers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classiﬁers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efﬁciently the entire ROC curve, and that has the same computational complexity as training a single classiﬁer. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classiﬁer and the cost model assumed in applying the classiﬁer. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantiﬁable difference around extreme asymmetries. Keywords: support vector machines, receiver operating characteristic (ROC) analysis, linear classiﬁcation

4 0.59535992 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classiﬁcation where the documents are allowed to belong to more than one category at a time. The classiﬁcation model is a variant of the Maximum Margin Markov Network framework, where the classiﬁcation hierarchy is represented as a Markov tree equipped with an exponential family deﬁned on the edges. We present an efﬁcient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classiﬁcation hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efﬁcient as training independent SVM-light classiﬁers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classiﬁcation learning algorithms. Keywords: kernel methods, hierarchical classiﬁcation, text categorization, convex optimization, structured outputs

5 0.54251951 66 jmlr-2006-On Model Selection Consistency of Lasso

Author: Peng Zhao, Bin Yu

Abstract: Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to ﬁnd such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection. Therefore it is important to study Lasso for model selection purposes. In this paper, we prove that a single condition, which we call the Irrepresentable Condition, is almost necessary and sufﬁcient for Lasso to select the true model both in the classical ﬁxed p setting and in the large p setting as the sample size n gets large. Based on these results, sufﬁcient conditions that are veriﬁable in practice are given to relate to previous works and help applications of Lasso for feature selection and sparse representation. This Irrepresentable Condition, which depends mainly on the covariance of the predictor variables, states that Lasso selects the true model consistently if and (almost) only if the predictors that are not in the true model are “irrepresentable” (in a sense to be clariﬁed) by predictors that are in the true model. Furthermore, simulations are carried out to provide insights and understanding of this result. Keywords: Lasso, regularization, sparsity, model selection, consistency

6 0.42812178 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

7 0.4237785 82 jmlr-2006-Some Theory for Generalized Boosting Algorithms

8 0.41125062 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation

9 0.38896123 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

10 0.38479102 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

11 0.37740496 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

12 0.37567466 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity

13 0.36770129 49 jmlr-2006-Learning Parts-Based Representations of Data

14 0.35502237 24 jmlr-2006-Consistency of Multiclass Empirical Risk Minimization Methods Based on Convex Loss

15 0.35399827 10 jmlr-2006-Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

16 0.35222736 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

17 0.35100055 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

18 0.34846786 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

19 0.34585333 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

20 0.34325749 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification