jmlr jmlr2012 jmlr2012-70 knowledge-graph by maker-knowledge-mining

70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

Source: pdf

Author: Mario Frank, Andreas P. Streich, David Basin, Joachim M. Buhmann

Abstract: We propose a probabilistic model for clustering Boolean data where an object can be simultaneously assigned to multiple clusters. By explicitly modeling the underlying generative process that combines the individual source emissions, highly structured data are expressed with substantially fewer clusters compared to single-assignment clustering. As a consequence, such a model provides robust parameter estimators even when the number of samples is low. We extend the model with different noise processes and demonstrate that maximum-likelihood estimation with multiple assignments consistently infers source parameters more accurately than single-assignment clustering. Our model is primarily motivated by the task of role mining for role-based access control, where users of a system are assigned one or more roles. In experiments with real-world access-control data, our model exhibits better generalization performance than state-of-the-art approaches. Keywords: clustering, multi-assignments, overlapping clusters, Boolean data, role mining, latent feature models

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We extend the model with different noise processes and demonstrate that maximum-likelihood estimation with multiple assignments consistently infers source parameters more accurately than single-assignment clustering. [sent-13, score-0.325]

2 While clustering data into disjoint clusters is conceptually simple, the exclusive assignment of data to clusters is often overly restrictive, especially when data is structured. [sent-18, score-0.327]

3 In our approach, an object can be assigned to multiple clusters at the same time, that is, the assignments of an object can sum to a number larger than 1. [sent-39, score-0.26]

4 The combined emissions from individual sources generate an object by the Boolean OR operation. [sent-44, score-0.283]

5 This task requires to infer a user-role assignment matrix and a role-permission assignment matrix from a Boolean user-permission assignment relation deﬁning an access-control system. [sent-50, score-0.521]

6 xid = (1) k=1 Belohlavek and Vychodil (2010) show that the set cover problem is equivalent to Boolean factor analysis, where each factor corresponds to a row of u. [sent-86, score-0.572]

7 1 ROLE M INING AND RBAC The goal of role mining is to automatically decompose a binary user-permission assignment matrix x into a role-based access control (RBAC) conﬁguration consisting of a binary user-role assignment matrix z and a binary role-permission assignment matrix u. [sent-178, score-0.671]

8 Instead of directly assigning permissions to users, users are assigned to one or more roles (represented by the matrix z) and obtain the permissions contained in these roles (represented by u). [sent-181, score-0.81]

9 Second, assigning users to just a few roles is easier than assigning them to hundreds of individual permissions. [sent-188, score-0.359]

10 The role mining step should ideally migrate the regularities of the assignment matrix x to RBAC, while ﬁltering out the remaining exceptional permission assignments. [sent-202, score-0.372]

11 We model exceptional assignments (all three cases) with a noise model described in Section 3. [sent-203, score-0.316]

12 The association of data items to sources is encoded in the binary assignment matrix z ∈ {0, 1}N×K , with zik = 1 if and only if the data item i belongs to the source k, and zik = 0 otherwise. [sent-240, score-0.644]

13 The sum of the assignment variables for the data item i, ∑k zik , can be larger than 1, which denotes that a data item i is assigned to multiple clusters. [sent-241, score-0.338]

14 Let Ä be the set of all S possible assignment sets and L ∈ Ä be one such an assignment set. [sent-248, score-0.31]

15 The value of xid is a Boolean disjunction of the values at dimension d of all sources to which object i is assigned. [sent-249, score-0.808]

16 The Boolean S S disjunction in the generation process of an xid results in a probability for xid = 1, which is strictly non-decreasing in the number of associated sources |Li |: If any of the sources in Li emits a 1 in S S dimension d, then xid = 1. [sent-250, score-2.106]

17 Conversely, xid = 0 requires that all contributing sources have emitted a 0 in dimension d. [sent-251, score-0.755]

18 This parameter matrix β ∈ [0, 1]K×D allows us to simplify notation and to write K S pS xid = 0 | zi∗ , β = ∏ βzik kd S S and pS xid = 1 | zi∗ , β = 1 − pS xid = 0 | zi∗ , β . [sent-253, score-1.744]

19 Employing the notion of assignment sets, one can interpret the product K βLi d := ∏ βzik kd (2) k=1 as the source of the assignment set Li . [sent-255, score-0.396]

20 The probability distribution of a xid generated from this structure model given the assignments Li and the sources β is then S S S pS xid | Li , β = (1 − βLi d )xid (βLi d )1−xid . [sent-259, score-1.427]

21 1 S TRUCTURE C OMPLEXITY AND S INGLE -A SSIGNMENT C LUSTERING In the general case, which is when no restrictions on the assignment sets are given, there are L = 2K possible assignment sets. [sent-268, score-0.31]

22 2 Noise Models and their Relationship In this section, we ﬁrst present the mixture noise model, which interprets the observed data as a mixture of independent emissions from the structure part and a noise source. [sent-277, score-0.405]

23 1 M IXTURE N OISE M ODEL In the mixture noise model, each xid is generated either by the signal distribution or by a noise process. [sent-288, score-0.91]

24 The binary indicator variable ξid indicates whether xid is a noisy bit (ξid = 1) or a signal bit (ξid = 0). [sent-289, score-0.693]

25 The observed xid is then generated by S N xid = (1 − ξid )xid + ξid xid , S where the generative process for the signal bit xid is either described by the deterministic rule in N Equation 1 or by the probability distribution in Equation 3. [sent-290, score-2.405]

26 The noise bit xid follows a Bernoulli distribution that is independent of object index i and dimension index d: N N N pN xid | r = rxid (1 − r)1−xid . [sent-291, score-1.362]

27 Combining the signal and noise distributions, the overall probability of an observed xid is pmix (xid | Li , β, r, ξid ) = pN (xid | r)ξid pS (xid | Li , β)1−ξid . [sent-293, score-0.865]

28 The joint probability of xid and ξid given the assignment matrix z and all parameters is thus pmix (xid , ξ | z, β, r, ε) = pM (xid | z, β, r, ξ) · εξid (1 − ε)1−ξid . [sent-295, score-0.888]

29 M Since different xid are conditionally independent given the assignments z and the parameters Θmix , we have pmix (x, ξ | z, β, r) = ∏ pmix (xid , ξ | z, β, r) . [sent-296, score-0.938]

30 M M id The noise indicators ξid cannot be observed. [sent-297, score-0.265]

31 We therefore marginalize out all ξid to derive the probability of x as pmix (x | z, β, r, ε) = ∑ pmix (x, ξ | z, β, r, ε) M M {ξ} = ∏ (ε · pN (xid ) + (1 − ε) · pS (xid )) . [sent-298, score-0.266]

32 id The observed data x is thus a mixture between the emissions of the structure part (which has weight 1 − ε) and the noise emissions (with weight ε). [sent-299, score-0.402]

33 Introducing the auxiliary variable qmix := pmix (xid = 1 | z, β, r, ε) = εr + (1 − ε) (1 − βLi d ) M Li d to represent the probability that xid = 1 under this model, we get a data-centric representation of the probability of x as pmix (x | z, β, r, ε) = ∏(xid qmix + (1 − xid ) 1 − qmix ) . [sent-300, score-1.536]

34 M Li d Li d (6) id The parameters of the mixture noise model are Θmix := (ε, r). [sent-301, score-0.304]

35 Li is the assignment set of object i, indicating which Boolean sources from u generated it. [sent-304, score-0.367]

36 The bit ξid selects whether N S the noise-free bit xid or the noise bit xid is observed. [sent-305, score-1.433]

37 S It is deﬁned by the probability pS xid | Li , β (Equation 3). [sent-313, score-0.572]

38 This noise S process is described by the probability pα (xid |xid , Θα ), where α identiﬁes the noise model and N α are the parameters of the noise model α. [sent-316, score-0.417]

39 ΘN The overall probability of an observation xid given all parameters is thus S S pα (xid |Li , β, Θα ) = ∑ pS xid | Li , β · pα xid |xid , Θα . [sent-317, score-1.716]

40 3 M IXTURE N OISE M ODEL The mixture noise model assumes that each xid is explained either by the structure model or by S an independent global noise process. [sent-320, score-0.889]

41 Therefore, the joint probability of pmix xid |xid , Θmix can be N factored as S N S N pmix xid |xid , Θmix = pmix xid |xid , xid , ξid · pmix (xid |r) , M N N with S N pmix xid |xid , xid , ξid = I{xS =xid } M id 468 1−ξid I{xN =xid } id ξid . [sent-321, score-4.349]

43 4 F LIP N OISE M ODEL In contrast to the previous noise model, where the likelihood is a mixture of independent noise and signal distributions, the ﬂip noise model assumes that the effect of the noise depends on the signal itself. [sent-327, score-0.637]

44 Formally, the generative process for a bit xid is described by S xid = xid ⊕ ξid , S where ⊕ denotes addition modulo 2. [sent-330, score-1.812]

45 Again, the generative process for the structure bit xid is deS is to be scribed by either Equation 1 or Equation 3. [sent-331, score-0.668]

46 The value of ξid indicates whether the bit xid ﬂipped (ξid = 1) or not (ξid = 0). [sent-332, score-0.622]

47 Given the ﬂip indicator ξid and the signal bit xid , the ﬁnal observation is deterministic: I{ξ ﬂip S pM (xid |ξid , xid ) = xid id S =xid } I S (1 − xid ) {ξid =xid } . [sent-335, score-2.485]

48 The joint probability distribution is then given by S pﬂip xid |xid , ΘN ﬂip 3. [sent-336, score-0.572]

49 5 R ELATION B ETWEEN THE 1 = ∑ ξid =0 S S pM (xid |ξid , xid ) · p(ξid |xid , ε0 , ε1 ) . [sent-338, score-0.572]

50 ﬂip N OISE PARAMETERS Our uniﬁed formulation of the noise models allows us to compare the inﬂuence of the noise processes on the clean signal under different noise models. [sent-339, score-0.438]

51 Conversely, we have that the ﬂip noise model with ΘN = (ε0 , ε1 ) is equivalent to the mixture noise model with Θmix = ε0 + ε1 , ε0ε0 1 . [sent-341, score-0.317]

52 We therefore use only the mixture noise model in the remainder of this paper and omit the indicator α to differentiate between the different noise models. [sent-343, score-0.317]

53 We deﬁne the empirical risk of assigning an object i to the set of clusters L as the negative log-likelihood of the feature vector xi∗ being generated by the sources contained in L : RiL := log p(xi· |Li , Θ) = − ∑ log (xid (1 − qL d ) + (1 − xid )qL d ) . [sent-393, score-0.865]

54 γiL ∂θ ∑ ∑ ∂θ xid (1 − qL d ) + (1 − xid )qL d i L i L d 4. [sent-404, score-1.144]

55 1 Evaluation Criteria For synthetic data, we evaluate the estimated sources by their Hamming distance to the true sources being used to generate the data. [sent-419, score-0.397]

56 , zψNN (N (2) )∗ 473 T , F RANK , S TREICH , BASIN AND B UHMANN source 1 source 2 source 3 5 10 15 20 (a) Overlapping Sources (b) Orthogonal Sources Figure 3: Overlapping sources (left) and orthogonal sources (right) used in the experiments with synthetic data. [sent-455, score-0.681]

57 Since we assume that the noise associated with the features of different objects is independent, we deduce from a low generalization error that the algorithm can infer sources that explain—up to residual noise—the features of new objects from the same distribution. [sent-463, score-0.441]

58 The ′ of the new objects are recomputed by comparing all source combinations with assignment sets z a fraction κ of the bits of these objects. [sent-468, score-0.374]

59 we vary them between overlapping sources and or474 M ULTI -A SSIGNMENT C LUSTERING FOR B OOLEAN DATA thogonal sources), the fraction of bits that are affected by the noise process, and the kind of noise process. [sent-488, score-0.571]

60 The overlapping sources are used as shown in Figure 3(a), and the structure is randomly perturbed by a mixture noise process. [sent-504, score-0.404]

61 BICA performs better on data that was modiﬁed by the symmetric mixture noise process than on data from a noisy-OR noise process. [sent-517, score-0.317]

62 Since BICA does not have a noise model, the data containing noise from the noisy-OR noise process leads to extra 1s in the source estimators. [sent-518, score-0.503]

63 The assumption of orthogonal source centroids is essential for BICA’s performance as the poor results on data with non-orthogonal sources show. [sent-526, score-0.332]

64 We use (circle, square, star) symmetric Bernoulli noise and overlapping sources with three different fractions of multi-assignment data, (x) orthogonal sources and symmetric noise, and (+) overlapping sources and a noisy-or noise process. [sent-534, score-0.963]

65 In a setting where most of the data is created by a combination of sources, DBPS will ﬁrst select a single source that equals the disjunction of the true sources because this covers the most 1s. [sent-538, score-0.293]

66 Here, the candidate set does not contain sources that correspond to combinations of true sources, and the greedy optimization algorithm can only select a candidate source that corresponds to a true single source. [sent-549, score-0.294]

67 DBPS thus performs best with respect to source parameter estimation when the generating sources are orthogonal. [sent-550, score-0.269]

68 Furthermore, in contrast to BICA, DBPS, and all MAC variants, INO determines the number of sources by itself and might obtain a value different than the number of sources used to generate the data. [sent-557, score-0.366]

69 If INO estimates a larger set of sources than than the true one, the best-matching INO sources are used. [sent-559, score-0.366]

70 In all settings, except the case where 80% of the data items are generated by multiple sources, INO yields perfect source estimators up to noise levels of 30%. [sent-562, score-0.277]

71 477 F RANK , S TREICH , BASIN AND B UHMANN In comparison to the experiments with overlapping sources described in the previous paragraph, MAC proﬁts from orthogonal centroids and yields superior parameter accuracy for noise levels above 50%. [sent-575, score-0.428]

72 Interestingly, MAC’s accuracy peaks when the noise is generated by a noisy-OR noise process. [sent-578, score-0.278]

73 The reason is that observing a 1 at a particular bit creates a much higher entropy of the parameter estimate than observing a 0: a 1 can be explained by all possible combinations of sources having a 1 at this position, whereas a 0 gives strong evidence that all sources of the object are 0. [sent-579, score-0.47]

74 The objects are sampled from the overlapping sources depicted in Figure 3(a). [sent-591, score-0.267]

75 For a low fraction of noisy bits (< 50%), the estimators with a noise model are perfect, but are already wrong for 10% noise when not using a noise model. [sent-596, score-0.484]

76 MAC explains the observations with combinations of sources whereas SAC assigns each object to a single source only. [sent-603, score-0.323]

77 Second, using the same source in different combinations with other sources implicitly provides a consistency check for the source parameter estimates. [sent-605, score-0.38]

78 The sharp decrease in accuracy is shifted to higher noise levels and appears in a smaller noise window when more data is available. [sent-611, score-0.278]

79 3 Experiments on Role Mining Data To evaluate the performance of our algorithm on real data, we apply MAC to mining RBAC roles from access control conﬁgurations. [sent-613, score-0.299]

80 1 S ETTING AND TASK D ESCRIPTION As explained in Section 2, role mining must ﬁnd a suitable RBAC conﬁguration based on a binary user-permission assignment matrix x. [sent-617, score-0.273]

81 An RBAC conﬁguration is the assignment of K roles to permissions and assignments of users to these roles. [sent-618, score-0.638]

82 z ˆ We emphasize the importance of the generalization ability of the RBAC conﬁguration: The goal is not primarily to compress the existing user-permission matrix x, but rather to infer a set of roles that generalizes well to new users. [sent-621, score-0.293]

83 An RBAC system’s security and maintainability improve when the roles do not need to be redeﬁned whenever there is a small change in the enterprise, such as a new user being added to the system or users changing positions within the enterprise. [sent-622, score-0.308]

84 Such exceptional assignments are represented by the noise component of the mixture model. [sent-624, score-0.355]

85 Rows and columns of the right matrix are reordered such that users with the same role set and permissions of the same role are adjacent to each other, if possible. [sent-630, score-0.285]

86 We increase the number of roles until the generalization error increases. [sent-663, score-0.265]

87 To restrict the cardinality of the assignment sets (for MAC), we make one trial run with a large number of roles and observe how many of the roles are involved in role combinations. [sent-666, score-0.662]

88 In our experiments on Corig , for instance, 10% of K = 100 roles are used in role combinations and no roles appear in combinations with more than two roles. [sent-668, score-0.557]

89 Therefore, for subsequent runs of the algorithm, we set M = 2 and limit the number of roles that can belong to a multiple assignment set to 10% of K. [sent-669, score-0.383]

90 481 F RANK , S TREICH , BASIN AND B UHMANN Restricting the number of roles that can belong to a multiple assignment set risks having too few role combinations available to ﬁt the data at hand. [sent-673, score-0.459]

91 Therefore, the most trivial role set where roles are assigned no permissions already yields a generalization error of 13. [sent-694, score-0.461]

92 The lower row of Figure 8 shows the average role overlap between the roles obtained by the different methods. [sent-714, score-0.303]

93 The decrease in the difference in performance between BICA and the other models after processing the modiﬁed data set indicates that the main difﬁculty for models that can represent overlapping roles is the increased noise level rather than the overlapping structure. [sent-720, score-0.453]

94 For the data sets ‘customer’, ‘americas small’, and ‘ﬁrewall 1’, we ﬁrst make a trial run with many roles to identify the maximum cardinality of assignment sets M that MAC uses. [sent-746, score-0.383]

95 Comparison with the results on synthetic data suggests that their differing performance on different data sets is either due to different fractions of random noise or to true underlying sources with different overlap. [sent-764, score-0.377]

96 The DBPS implementation in C++ is impressively fast while the trend of the generalization error over the number of roles is roughly comparable to MAC and BICA. [sent-866, score-0.265]

97 Assuming an arbitrary but ﬁxed numbering of assignment sets in Ä, zÄ = 1 means that the l th assignment set lk contains source k, and zÄ = 0 otherwise. [sent-875, score-0.396]

98 Hence, the assignment matrix z decomposes into z = lk zL ∗ zÄ , where zL ∈ {0, 1}N×L denotes the exclusive assignment of objects to assignment sets (zL il iff object i has assignment set l, and ∑l zL = 1 for all i). [sent-876, score-0.756]

99 In our generative model, the clusters are the sources that generate the structure in the data and irregularities are explained by an independent noise process. [sent-889, score-0.458]

100 Role mining — revealing business roles for security administration using data mining technology. [sent-1010, score-0.329]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xid', 0.572), ('mac', 0.259), ('roles', 0.228), ('bica', 0.217), ('dbps', 0.196), ('sources', 0.183), ('ino', 0.174), ('boolean', 0.158), ('assignment', 0.155), ('sac', 0.154), ('basin', 0.14), ('noise', 0.139), ('pmix', 0.133), ('id', 0.126), ('oolean', 0.119), ('ssignment', 0.119), ('rbac', 0.105), ('treich', 0.105), ('assignments', 0.1), ('permissions', 0.098), ('lustering', 0.097), ('uhmann', 0.09), ('source', 0.086), ('exceptional', 0.077), ('bits', 0.067), ('ulti', 0.063), ('mix', 0.063), ('ril', 0.063), ('clustering', 0.062), ('users', 0.057), ('streich', 0.056), ('clusters', 0.055), ('items', 0.052), ('role', 0.051), ('ip', 0.05), ('bit', 0.05), ('li', 0.049), ('corig', 0.049), ('emissions', 0.049), ('zik', 0.048), ('assigned', 0.047), ('ps', 0.046), ('generative', 0.046), ('item', 0.044), ('overlapping', 0.043), ('mario', 0.042), ('qmix', 0.042), ('oise', 0.042), ('objects', 0.041), ('mixture', 0.039), ('mining', 0.039), ('il', 0.038), ('joachim', 0.038), ('zl', 0.038), ('centroids', 0.037), ('generalization', 0.037), ('americas', 0.035), ('comedy', 0.035), ('emea', 0.035), ('irregularities', 0.035), ('rewall', 0.035), ('sacmat', 0.035), ('hp', 0.035), ('frank', 0.034), ('odel', 0.034), ('access', 0.032), ('xs', 0.031), ('synthetic', 0.031), ('miettinen', 0.03), ('object', 0.029), ('matrix', 0.028), ('vaidya', 0.028), ('ql', 0.027), ('mismatch', 0.026), ('orthogonal', 0.026), ('assigning', 0.026), ('combinations', 0.025), ('variants', 0.025), ('overlap', 0.024), ('responsibilities', 0.024), ('disjunction', 0.024), ('agrawal', 0.024), ('ene', 0.024), ('fractions', 0.024), ('sip', 0.024), ('inference', 0.023), ('security', 0.023), ('individual', 0.022), ('temperature', 0.022), ('ths', 0.022), ('permission', 0.022), ('rank', 0.021), ('signal', 0.021), ('belohlavek', 0.021), ('classics', 0.021), ('cormen', 0.021), ('dominos', 0.021), ('employee', 0.021), ('ganter', 0.021), ('ingliar', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

Author: Mario Frank, Andreas P. Streich, David Basin, Joachim M. Buhmann

2 0.063200802 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar

Abstract: Classiﬁers are often used to detect miscreant activities. We study how an adversary can systematically query a classiﬁer to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classiﬁers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classiﬁer’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classiﬁers is generally efﬁcient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning

3 0.052327219 12 jmlr-2012-Active Clustering of Biological Sequences

Author: Konstantin Voevodski, Maria-Florina Balcan, Heiko Röglin, Shang-Hua Teng, Yu Xia

Abstract: Given a point set S and an unknown metric d on S, we study the problem of efﬁciently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s ∈ S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efﬁciently ﬁnd an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our procedure to cluster proteins by sequence similarity. This setting nicely ﬁts our model because we can use a fast sequence database search program to query a sequence against an entire data set. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classiﬁcation. Keywords: clustering, active clustering, k-median, approximation algorithms, approximation stability, clustering accuracy, protein sequences ∗. A preliminary version of this article appeared under the title Efﬁcient Clustering with Limited Distance Information in the Proceedings of the Twenty-Sixth Conference on Uncertainty in Artiﬁcial Intelligence, AUAI Press, Corvallis, Oregon, 632-641. †. Most of this work was completed at Boston University. c 2012 Konstantin Voevodski, Maria-Florina Balcan, Heiko R¨ glin, Shang-Hua Teng and Yu Xia. o ¨ VOEVODSKI , BALCAN , R OGLIN , T ENG AND X IA

4 0.04153198 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

Author: Michael U. Gutmann, Aapo Hyvärinen

Abstract: We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a ﬁnite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only speciﬁed up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artiﬁcially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities. Keywords: statistics unnormalized models, partition function, computation, estimation, natural image

5 0.038243055 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

Author: Kay H. Brodersen, Christoph Mathys, Justin R. Chumbley, Jean Daunizeau, Cheng Soon Ong, Joachim M. Buhmann, Klaas E. Stephan

Abstract: Classiﬁcation algorithms are frequently used on data with a natural hierarchical structure. For instance, classiﬁers are often trained and tested on trial-wise measurements, separately for each subject within a group. One important question is how classiﬁcation outcomes observed in individual subjects can be generalized to the population from which the group was sampled. To address this question, this paper introduces novel statistical models that are guided by three desiderata. First, all models explicitly respect the hierarchical nature of the data, that is, they are mixed-effects models that simultaneously account for within-subjects (ﬁxed-effects) and across-subjects (random-effects) variance components. Second, maximum-likelihood estimation is replaced by full Bayesian inference in order to enable natural regularization of the estimation problem and to afford conclusions in terms of posterior probability statements. Third, inference on classiﬁcation accuracy is complemented by inference on the balanced accuracy, which avoids inﬂated accuracy estimates for imbalanced data sets. We introduce hierarchical models that satisfy these criteria and demonstrate their advantages over conventional methods using MCMC implementations for model inversion and model selection on both synthetic and empirical data. We envisage that our approach will improve the sensitivity and validity of statistical inference in future hierarchical classiﬁcation studies. Keywords: beta-binomial, normal-binomial, balanced accuracy, Bayesian inference, group studies

6 0.032647572 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach

7 0.029628538 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization

8 0.029139388 82 jmlr-2012-On the Necessity of Irrelevant Variables

9 0.028273726 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

10 0.027689489 101 jmlr-2012-SVDFeature: A Toolkit for Feature-based Collaborative Filtering

11 0.027140891 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

12 0.026522458 55 jmlr-2012-Learning Algorithms for the Classification Restricted Boltzmann Machine

13 0.025633128 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

14 0.025542101 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

15 0.024259858 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage

16 0.023891032 59 jmlr-2012-Linear Regression With Random Projections

17 0.023763658 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

18 0.023641694 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

19 0.023261301 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

20 0.023243617 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.125), (1, 0.044), (2, 0.069), (3, -0.036), (4, -0.01), (5, 0.01), (6, 0.055), (7, 0.0), (8, -0.007), (9, 0.051), (10, -0.097), (11, -0.005), (12, 0.036), (13, 0.037), (14, -0.046), (15, 0.1), (16, -0.02), (17, -0.008), (18, 0.123), (19, -0.115), (20, -0.029), (21, -0.009), (22, 0.049), (23, -0.061), (24, -0.04), (25, 0.028), (26, -0.089), (27, 0.054), (28, -0.095), (29, 0.146), (30, 0.153), (31, 0.02), (32, -0.128), (33, -0.068), (34, 0.118), (35, -0.008), (36, 0.1), (37, 0.092), (38, -0.084), (39, 0.076), (40, 0.169), (41, 0.232), (42, -0.093), (43, 0.265), (44, 0.028), (45, -0.056), (46, 0.155), (47, 0.201), (48, -0.157), (49, -0.155)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94919932 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

Author: Mario Frank, Andreas P. Streich, David Basin, Joachim M. Buhmann

2 0.48600453 12 jmlr-2012-Active Clustering of Biological Sequences

Author: Konstantin Voevodski, Maria-Florina Balcan, Heiko Röglin, Shang-Hua Teng, Yu Xia

3 0.27888453 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

Author: Daniel L. Ly, Hod Lipson

Abstract: A hybrid dynamical system is a mathematical model suitable for describing an extensive spectrum of multi-modal, time-series behaviors, ranging from bouncing balls to air trafﬁc controllers. This paper describes multi-modal symbolic regression (MMSR): a learning algorithm to construct non-linear symbolic representations of discrete dynamical systems with continuous mappings from unlabeled, time-series data. MMSR consists of two subalgorithms—clustered symbolic regression, a method to simultaneously identify distinct behaviors while formulating their mathematical expressions, and transition modeling, an algorithm to infer symbolic inequalities that describe binary classiﬁcation boundaries. These subalgorithms are combined to infer hybrid dynamical systems as a collection of apt, mathematical expressions. MMSR is evaluated on a collection of four synthetic data sets and outperforms other multi-modal machine learning approaches in both accuracy and interpretability, even in the presence of noise. Furthermore, the versatility of MMSR is demonstrated by identifying and inferring classical expressions of transistor modes from recorded measurements. Keywords: hybrid dynamical systems, evolutionary computation, symbolic piecewise functions, symbolic binary classiﬁcation

4 0.25892863 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

Author: Sivan Sabato, Naftali Tishby

Abstract: In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classiﬁcation rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to speciﬁc settings or applications. In this work we provide a uniﬁed theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a speciﬁc application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efﬁcient PAC-learning for MIL can be generated from any efﬁcient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size. Keywords: multiple-instance learning, learning theory, sample complexity, PAC learning, supervised classiﬁcation

5 0.25862324 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

Author: Franz J. Király, Paul von Bünau, Frank C. Meinecke, Duncan A.J. Blythe, Klaus-Robert Müller

Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of ﬁnding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identiﬁability criterion. Keywords: computational algebraic geometry, approximate algebra, unsupervised Learning

6 0.25319821 109 jmlr-2012-Stability of Density-Based Clustering

7 0.22738503 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

8 0.21749204 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers

9 0.21717265 82 jmlr-2012-On the Necessity of Irrelevant Variables

10 0.21633679 101 jmlr-2012-SVDFeature: A Toolkit for Feature-based Collaborative Filtering

11 0.21469536 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

12 0.21236327 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory

13 0.20114338 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

14 0.2004216 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition

15 0.19603276 90 jmlr-2012-Pattern for Python

16 0.18230441 60 jmlr-2012-Local and Global Scaling Reduce Hubs in Space

17 0.17322087 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

18 0.17192893 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach

19 0.17058678 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

20 0.1678834 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.022), (21, 0.029), (26, 0.032), (29, 0.036), (35, 0.031), (49, 0.022), (56, 0.022), (57, 0.471), (64, 0.011), (69, 0.014), (75, 0.038), (77, 0.015), (79, 0.016), (92, 0.071), (96, 0.072)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87107289 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides ﬁtting Gaussian graphical models, it also provides functions for ﬁtting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efﬁciency. Keywords: high-dimensional undirected graph estimation, glasso, huge, semiparametric graph estimation, data-dependent model selection, lossless screening, lossy screening 1. Overview Undirected graphs is a natural approach to describe the conditional independence among many variables. Each node of the graph represents a single variable and no edge between two variables implies that they are conditional independent given all other variables. In the past decade, significant progress has been made on designing efﬁcient algorithms to learn undirected graphs from high-dimensional observational data sets. Most of these methods are based on either the penalized maximum-likelihood estimation (Friedman et al., 2007) or penalized regression methods (Meinshausen and B¨ hlmann, 2006). Existing packages include glasso, Covpath and CLIME. In particuu ∗. Also in the Department of Biostatistics. †. Also in the Department of Machine Learning. c 2012 Zhao, Liu, Roeder, Lafferty and Wasserman. Z HAO , L IU , ROEDER , L AFFERTY AND WASSERMAN lar, the glasso package has been widely adopted by statisticians and computer scientists due to its friendly user-inference and efﬁciency. In this paper1 we describe a newly developed R package named huge (High-dimensional Undirected Graph Estimation) coded in C. The package includes a wide range of functional modules and addresses some drawbacks of the graphical lasso algorithm. To gain more scalability, the package supports two modes of screening, lossless (Witten et al., 2011) and lossy screening. When using lossy screening, the user can select the desired screening level to scale up for high-dimensional problems, but this introduces some estimation bias. 2. Software Design and Implementation The package huge aims to provide a general framework for high-dimensional undirected graph estimation. The package includes Six functional modules (M1-M6) facilitate a ﬂexible pipeline for analysis (Figure 1). M1. Data Generator: The function huge.generator() can simulate multivariate Gaussian data with different undirected graphs, including hub, cluster, band, scale-free, and Erd¨ s-R´ nyi o e random graphs. The sparsity level of the obtained graph and signal-to-noise ratio can also be set up by users. M2. Semiparametric Transformation: The function huge.npn() implements the nonparanormal method (Liu et al., 2009, 2012) for estimating a semiparametric Gaussian copula model.The nonparanormal family extends the Gaussian distribution by marginally transforming the variables. Computationally, the nonparanormal transformation only requires one pass through the data matrix. M3. Graph Screening: The scr argument in the main function huge() controls the use of largescale correlation screening before graph estimation. The function supports the lossless screening (Witten et al., 2011) and the lossy screening. Such screening procedures can greatly reduce the computational cost and achieve equal or even better estimation by reducing the variance at the expense of increased bias. Figure 1: The graph estimation pipeline. M4. Graph Estimation: Similar to the glasso package, the method argument in the huge() function supports two estimation methods: (i) the neighborhood pursuit algorithm (Meinshausen and B¨ hlmann, 2006) and (ii) the graphical lasso algorithm (Friedman et al., 2007). We apply u the coordinate descent with active set and covariance update, as well as other tricks suggested in Friedman et al. (2010). We modiﬁed the warm start trick to address the potential divergence problem of the graphical lasso algorithm (Mazumder and Hastie, 2011). The code is also memory-optimized using the sparse matrix data structure when estimating and storing full regularization paths for large 1. This paper is only a summary of the package huge. For more details please refer to the online vignette. 1060 H IGH - DIMENSIONAL U NDIRECTED G RAPH E STIMATION data sets. we also provide a complementary graph estimation method based on thresholding the sample correlation matrix, which is computationally efﬁcient and widely applied in biomedical research. M5. Model Selection: The function huge.select() provides two regularization parameter selection methods: the stability approach for regularization selection (StARS) (Liu et al., 2010); and rotation information criterion (RIC). We also provide a likelihood-based extended Bayesian information criterion. M6. Graph Visualization: The plotting functions huge.plot() and plot() provide visualizations of the simulated data sets, estimated graphs and paths. The implementation is based on the igraph package. 3. User Interface by Example We illustrate the user interface by analyzing a stock market data which we contribute to the huge package. We acquired closing prices from all stocks in the S&P; 500 for all the days that the market was open between Jan 1, 2003 and Jan 1, 2008. This gave us 1258 samples for the 452 stocks that remained in the S&P; 500 during the entire time period. > > > > > library(huge) data(stockdata) # Load the data x = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) # Preprocessing x.npn = huge.npn(x, npn.func=

2 0.85073251 41 jmlr-2012-Exploration in Relational Domains for Model-based Reinforcement Learning

Author: Tobias Lang, Marc Toussaint, Kristian Kersting

Abstract: A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and R- MAX algorithms. Efﬁcient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a well-known context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efﬁciency of our framework using count functions under the assumption that we had a relational KWIK learner and a near-optimal planner. We propose a concrete exploration algorithm which integrates a practically efﬁcient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques. Keywords: reinforcement learning, statistical relational learning, exploration, relational transition models, robotics

same-paper 3 0.75713742 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

Author: Mario Frank, Andreas P. Streich, David Basin, Joachim M. Buhmann

4 0.35188895 72 jmlr-2012-Multi-Target Regression with Rule Ensembles

Author: Timo Aho, Bernard Ženko, Sašo Džeroski, Tapio Elomaa

Abstract: Methods for learning decision rules are being successfully applied to many problem domains, in particular when understanding and interpretation of the learned model is necessary. In many real life problems, we would like to predict multiple related (nominal or numeric) target attributes simultaneously. While several methods for learning rules that predict multiple targets at once exist, they are all based on the covering algorithm, which does not work well for regression problems. A better solution for regression is the rule ensemble approach that transcribes an ensemble of decision trees into a large collection of rules. An optimization procedure is then used to select the best (and much smaller) subset of these rules and to determine their respective weights. We introduce the F IRE algorithm for solving multi-target regression problems, which employs the rule ensembles approach. We improve the accuracy of the algorithm by adding simple linear functions to the ensemble. We also extensively evaluate the algorithm with and without linear functions. The results show that the accuracy of multi-target regression rule ensembles is high. They are more accurate than, for instance, multi-target regression trees, but not quite as accurate as multi-target random forests. The rule ensembles are signiﬁcantly more concise than random forests, and it is also possible to create compact rule sets that are smaller than a single regression tree but still comparable in accuracy. Keywords: multi-target prediction, rule learning, rule ensembles, regression ∗. Also in Microtask, Tampere, Finland. †. Also in the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia. ‡. Also in the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia and the Joˇ ef Stefan International Postgraduate School, Ljubljana, Slovenia. z ˇ c 2012 Timo Aho, Bernard Zenko, Saˇo Dˇ eroski and Tapio Elomaa. s z ˇ ˇ

5 0.33410254 116 jmlr-2012-Transfer in Reinforcement Learning via Shared Features

Author: George Konidaris, Ilya Scheidwasser, Andrew Barto

Abstract: We present a framework for transfer in reinforcement learning based on the idea that related tasks share some common features, and that transfer can be achieved via those shared features. The framework attempts to capture the notion of tasks that are related but distinct, and provides some insight into when transfer can be usefully applied to a problem sequence and when it cannot. We apply the framework to the knowledge transfer problem, and show that an agent can learn a portable shaping function from experience in a sequence of tasks to signiﬁcantly improve performance in a later related task, even given a very brief training period. We also apply the framework to skill transfer, to show that agents can learn portable skills across a sequence of tasks that signiﬁcantly improve performance on later related tasks, approaching the performance of agents given perfectly learned problem-speciﬁc skills. Keywords: reinforcement learning, transfer, shaping, skills

6 0.29673755 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

7 0.29653659 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

8 0.29058453 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

9 0.28844923 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

10 0.28762317 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions

11 0.28663313 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

12 0.2840319 34 jmlr-2012-Dynamic Policy Programming

13 0.2827965 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

14 0.28163078 80 jmlr-2012-On Ranking and Generalization Bounds

15 0.28129691 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

16 0.27731454 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

17 0.27429855 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

18 0.27327156 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

19 0.272838 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory

20 0.27272314 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach