jmlr jmlr2011 jmlr2011-39 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shuheng Zhou, Philipp Rütimann, Min Xu, Peter Bühlmann
Abstract: Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1 -penalization methods. We propose and study the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1 -norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. We show that under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Keywords: graphical model selection, covariance estimation, Lasso, nodewise regression, thresholding c 2011 Shuheng Zhou, Philipp R¨ timann, Min Xu and Peter B¨ hlmann. u u ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN
Reference: text
sentIndex sentText sentNum sentScore
1 We show that under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. [sent-14, score-0.209]
2 Keywords: graphical model selection, covariance estimation, Lasso, nodewise regression, thresholding c 2011 Shuheng Zhou, Philipp R¨ timann, Min Xu and Peter B¨ hlmann. [sent-16, score-0.334]
3 Introduction There have been a lot of recent activities for estimation of high-dimensional covariance and inverse covariance matrices where the dimension p of the matrix may greatly exceed the sample size n. [sent-18, score-0.253]
4 We focus here on the latter class with permutation invariant estimation and we aim for an estimator which is accurate for both the covariance matrix Σ and its inverse, the precision matrix Σ−1 . [sent-26, score-0.234]
5 The GLasso approach is simple to use, at least when relying on publicly available software such as the glasso package in R. [sent-30, score-0.309]
6 We follow the nodewise regression approach from Meinshausen and B¨ hlmann [2006] but we make use of recent results for variable selection in linear models assuming u the much weaker restricted eigenvalue condition [Bickel et al. [sent-51, score-0.209]
7 In some sense, the novelty of our theory extending beyond Zhou [2010a] is the analysis for covariance and inverse covariance estimation and for risk consistency based on an estimated sparse graph as we mentioned above. [sent-53, score-0.302]
8 Our regression and thresholding results build upon analysis of the thresholded Lasso estimator as studied in Zhou [2010a]. [sent-54, score-0.212]
9 Another relation can be made to the method by R¨ timann u and B¨ hlmann [2009] for covariance and inverse covariance estimation based on a directed acyclic u graph. [sent-63, score-0.297]
10 For example, if all non-zero non-diagonal elements θ0,i j of the ith row are larger in absolute value than λ θ0,ii , the value si 0,n coincides with the node degree si . [sent-134, score-0.248]
11 We first estimate the undirected 0 graph with edge set E0 as in (6) and we then use the maximum likelihood estimator based on the estimate En , that is, the non-zero elements of Θn correspond to the estimated edges in En . [sent-139, score-0.19]
12 Inferring the edge set E0 can be based on the following approach as proposed and theoretically justified in Meinshausen and B¨ hlmann [2006]: perform p regressions using the Lasso to obtain p vectors of u regression coefficients β1 , . [sent-140, score-0.197]
13 Consider the Lasso for each of the nodewise regressions βi = argminβi init n ∑ (Xi r=1 (r) − ∑ βij X j )2 + λn ∑ |βij | for i = 1, . [sent-150, score-0.301]
14 The use of thresholding has clear benefits from a theoretical point of view: the number of false positive selections may be much larger without thresholding (when tuned for good prediction). [sent-157, score-0.186]
15 The estimator for the concentration matrix in view of (1) is: Θn (E) = argminΘ∈M p,E tr(ΘΓn ) − log |Θ| , where M p,E = {Θ ∈ R p×p ; Θ ≻ 0 and θi j = 0 for all (i, j) ∈ E, where i = j} (13) defines the constrained set for positive definite Θ. [sent-165, score-0.217]
16 If the edge set E is sparse having relatively few edges only, the estimator in (13) is already sufficiently regularized by the constraints and hence, no additional penalization is used at this stage. [sent-175, score-0.214]
17 Given (8), (7), and Θ0 , define 2 Cdiag := min{ max θ2 , max si /S0,n · diag(Θ0 ) 0,ii 0,n i=1,. [sent-209, score-0.314]
18 We note that convergence rates for the estimated covariance matrix and for predictive risk depend on the rate in Frobenius norm of the estimated inverse covariance matrix. [sent-224, score-0.279]
19 In this case, we have Θn − Θ0 F = OP (p + S0,n ) log max(n, p)/n , Σn − Σ0 F = OP (p + S0,n ) log max(n, p)/n , R(Θn ) − R(Θ0 ) = OP ((p + S0,n ) log max(n, p)/n) . [sent-262, score-0.216]
20 It is also of interest to understand the bias of the estimator caused by using the estimated edge set En instead of the true edge set E0 . [sent-266, score-0.203]
21 Although we do not intend to make precise the exact conditions and choices of tuning parameters in regression and thresholding in order to achieve (21), we state Theorem 5, in case (21) holds with the following condition: the number of false positives is bounded as |E \ E0 | = O(S). [sent-283, score-0.183]
22 Suppose on some event E , such that P (E ) ≥ 1 − d/p2 for a small constant d, we obtain an edge set E such that E0 ⊆ E and |E \ E0 | = O(S). [sent-286, score-0.215]
23 First, we note that to obtain with high probability the exact edge recovery, E = E0 , we need again sufficiently large non-zero edge weights and some restricted eigenvalue (RE) conditions on the covariance matrix as defined in Section A even for the multi-stage procedure. [sent-295, score-0.227]
24 [2009], where the second stage estimator β corresponding to (11) is obtained with nodewise regressions using adaptive Lasso [Zou, 2006] rather than thresholding as in the present work in order to recover the edge set E0 with high probability under an assumption which is stronger than (A0). [sent-297, score-0.41]
25 We first prove an init oracle result on nodewise regressions in Theorem 15. [sent-324, score-0.33]
26 , p for init thresholding our initial regression coefficients. [sent-330, score-0.228]
27 Hence the entries in the covariance matrix Σn for the chosen set of edges in E and the diagonal entries are set to their corresponding values in Sn . [sent-357, score-0.228]
28 Indeed, we can derive the above relationships via the Lagrange form, where we add Lagrange constants γ jk for edges in D , ℓC (Θ) = log |Θ| − tr(Sn Θ) − ∑ γ jk θ jk . [sent-358, score-0.419]
29 (26) ( j,k)∈D Now the gradient equation of (26) is: Θ−1 − Sn − Γ = 0, where Γ is a matrix of Lagrange parameters such that γ jk = 0 for all ( j, k) ∈ D and γ jk = 0 otherwise. [sent-359, score-0.23]
30 Subject to the constraints that Ω ∈ Sn as defined in (28), we write the maximum likelihood estimate for Ω0 : Ωn (E) := arg min Rn (Ω) = arg Ω∈Sn min p p Ω∈S++ ∩SE tr(ΩΓn ) − log |Ω| , (29) which yields the following relationships regarding Ωn (E), its inverse Ψn = (Ωn (E))−1 , and Γn . [sent-390, score-0.261]
31 Let E be some event such that max min P (E ) ≥ 1 − d/p2 for a small constant d. [sent-402, score-0.332]
32 (30) Let Ωn (E) be as defined in (29) Suppose the sample size satisfies for C3 ≥ 4 5/3, n> 144σ4 max 2 Mlow 4C3 + 13Mupp 12σ2 min 2 2 max 2|E| log max(n, p), Cbias 2S0,n log p . [sent-408, score-0.415]
33 (31) Then with probability ≥ 1 − (d + 1)/p2 , we have for M = (9σ4 /(2k2 )) · 4C3 + 13Mupp /(12σ2 ) max min Ωn (E) − Ω0 F ≤ (M + 1) max 2|E| log max(n, p)/n, Cbias 2S0,n log(p)/n . [sent-409, score-0.343]
34 We compare our estimation method with the GLasso, the Space method and a simplified Gelato estimator without thresholding for inferring the conditional independence graph. [sent-416, score-0.188]
35 For the Space method, estimation of Θ0 is done via maximum likelihood as (Space) in (13) based on the edge set En from the estimated sparse partial correlation matrix. [sent-425, score-0.181]
36 Our Gelato method also addresses the bias problem inherent in the GLasso estimator since we no longer shrink the entries in the covariance matrix corresponding to the selected edge set En in the maximum likelihood estimate, as shown in Section 3. [sent-609, score-0.289]
37 Consider nodewise regressions as in (2), where we are given vectors of parameters {βij , j = 1, . [sent-633, score-0.201]
38 With respect to the degree of node i for each i, we define 2999 ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN si ≤ si ≤ s as the smallest integer such that 0 p ∑ j=1, j=i min((βij )2 , λ2 Var(Vi )) ≤ si λ2 Var(Vi ), where λ = 0 2 log p/n, (35) where si denotes si as defined in (7). [sent-640, score-0.692]
39 For si as in (35), define 0 s0 := max si ≤ s and S0,n := 0 i=1,. [sent-643, score-0.343]
40 For an integer m ≤ p, we define the smallest and largest m-sparse eigenvalues of Σ0 as follows: 1/2 ρmin (m) := min t=0;m−sparse Σ0 t t 1/2 2 ρmax (m) := , 2 max t=0;m−sparse Σ0 t t 2 . [sent-651, score-0.207]
41 In fact it holds that |θ0,i j | < λ θ0,ii which is equivalent to |βij | < λσVi , for all j ≥ si + 1 + Ii≤si0 +1 , 0 (38) where I{·} is the indicator function, if we order the regression coefficients as follows: |βi | ≥ |βi |. [sent-671, score-0.185]
42 The following event R provides an upper bound on K(s0 , k0 , X) for a given k0 > 0 when Σ0 satisfies RE(s0 , k0 , Σ0 ) condition: R (θ) := X : RE(s0 , k0 , X) holds with 0 < K(s0 , k0 , X) ≤ K(s0 , k0 , Σ0 ) . [sent-704, score-0.209]
43 Then, for a random design X as generated by (15), we have P (X ) := P (R (θ) ∩ F (θ) ∩ M (θ)) ≥ 1 − 3 exp(−cθ2 n/α4 ) as long as the sample size satisfies n > max 80sα4 9c′ α4 5ep max 36K 2 (s0 , 4, Σ0 ) f (s0 ), log p , 2 log 2 θ θ sθ . [sent-718, score-0.334]
44 For some a ≥ 6, let event √ Ca := max |Z jk | < 1 + a (2 log p)/n where a ≥ 6 . [sent-740, score-0.423]
45 Consider for some i constant C3 > 4 5/3, X0 := max |∆ jk | < C3 σi σ j log max{p, n}/n < 1/2 . [sent-745, score-0.267]
46 It is also clear that event Ca is equivalent to the event to be defined in (44). [sent-747, score-0.312]
47 Lemma 12 also justifies the choice of λn in nodewise regressions (cf. [sent-748, score-0.201]
48 It is clear that event (44) is the same as event Ca . [sent-777, score-0.312]
49 By the union bound and by taking τ = C2 with σ jk = 0, ∀ j, k, where log p n in (45) 2(1 + a) ≥ C2 > 2 10/3 for a ≥ 6. [sent-779, score-0.199]
50 1 − P (Ca ) = P max |Z jk | ≥ 2(1 + a) jk log p n ≤ P max |Z jk | ≥ C2 jk ≤ p2 exp − 2 3C2 log p 10 = p− log p n ≤ (p2 − p) exp − 2 3C2 10 +2 < 2 3C2 log p 10 1 , p2 where we apply Lemma 14 with ρ jk = 0, ∀ j, k = 1, . [sent-780, score-0.978]
51 In order to bound the probability of event X0 , we first state the following bound for the non-diagonal entries of Σ0 , which follows immediately from Lemma 14 by plugging in σ2 = σ0,ii = 1, ∀i = 1, . [sent-785, score-0.246]
52 , p i and using the fact that |σ0, jk | = |ρ jk σ j σk | ≤ 1, ∀ j = k, where ρ jk is the correlation coefficient between variables X j and Xk . [sent-788, score-0.333]
53 Then 0, P |∆ jk | > τ ≤ exp − 3nτ2 10(1 + σ2 jk ) 0, ≤ exp − 3nτ2 20 for 0 ≤ τ ≤ Ψ jk . [sent-790, score-0.3]
54 Bounds for Nodewise Regressions In Theorem 15 and Lemma 16 we let si be as in (35) and T0i denote locations of the si largest 0 0 coefficients of βi in absolute values. [sent-798, score-0.248]
55 Consider the nodewise regressions in (10), where for each i, we regress Xi onto the other variables {Xk ; k = i} following (2), where Vi ∼ N(0, Var(Vi )) is independent of X j , ∀ j = i as in (3). [sent-812, score-0.201]
56 Let h T0 init on Ca ∩ X , where X := R (θ) ∩ F (θ) ∩ M (θ), we have βi − βi init 2 ≤ λ si d0 0 hT01 2 ≤ D0 d0 λ 2D2 + 2D2 + 2, where 0 1 hi 0c T si and 0 1 = βi 0c init,T 1 ≤ D1 d0 λsi , 0 (48) where D0 , D1 are defined in (82) and (83) respectively. [sent-816, score-0.448]
57 3005 ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN It is also clear that on Ca ∩ X , event Ta ∩ X holds for each regression when we invoke Theorem 33, i with Y := Xi and X := X·\i , for i = 1, . [sent-825, score-0.217]
58 Let βi be an optimal solution init to (10) with λn = d0 λ where d0 = c0 (1 + θ)2 ρmax (s)ρmax (3s0 ); Suppose for each regression, we apply the same thresholding rule to obtain a subset I i as follows, I i = { j : j = i, βij,init ≥ t0 = f0 λ}, and D i := {1, . [sent-840, score-0.193]
59 Then we have on event Ca ∩ X , |I i | ≤ si (1 + D1 /D4 ) and |I i ∪ Si | ≤ si + (D1 /D4 )si , and 0 0 βi D 2 ≤ d0 λ si 0 (49) 1 + (D0 + D4 )2 , where D is understood to be D i . [sent-847, score-0.528]
60 Then on event Ca ∩ X , for Θ0 as in (24) and E as in (23), we have for S0,n as in (36) and Θ0 = (θ0,i j ) |E| ≤ (1 + D1 /D4 )S0,n where |E \ E0 | ≤ (D1 /D4 )S0,n , (50) and Θ0,D F := Θ0 − Θ0 F ≤ min{S0,n ( max θ2 ), s0 diag(Θ0 ) 0,ii := S0,n (1 + (D0 + D4 )2 )Cdiag d0 λ, i=1,. [sent-852, score-0.251]
61 0 We have by (48), |I i ∩ T0c | ≤ βi 0c init,T 1 1 ≤ D1 d0 si /(D4 d0 ) ≤ D1 si /D4 , 0 0 f0 λ (52) where D1 is understood to be the same constant that appears in (48). [sent-872, score-0.248]
62 Suppose the sample size satisfies for C3 ≥ 4 5/3, n> 32 106 4C3 + 2 31c2 k 2 2 max 2|E| log max(n, p), Cbias 2S0,n log p . [sent-890, score-0.239]
63 (55) Then on event E ∩ X0 , we have for M = (9/(2k2 )) · 4C3 + 32/(31c2 ) Θn (E) − Θ0 F ≤ (M + 1) max 2|E| log max(n, p)/n, Cbias 2S0,n log(p)/n . [sent-891, score-0.323]
64 We now state bounds for the convergence rate on Frobenius norm of the covariance matrix and for KL divergence. [sent-897, score-0.175]
65 Suppose the sample size satisfies for C3 ≥ 4 5/3 and Cbias , M as defined in Theorem 19 n> 106 c2 k4 4C3 + 32 31c2 2 2 max 2|E| log max(p, n), Cbias 2S0,n log p . [sent-904, score-0.239]
66 (57) Then on event E ∩ X0 , we have ϕmin (Θn (E)) > c/2 > 0 and for Σn (E) = (Θn (E))−1 , Σn (E) − Σ0 F ≤ 2(M + 1) max c2 2|E| log max(n, p) , Cbias n 2S0,n log(p) n Theorem 21 Suppose all conditions, events, and bounds on |E| and Θ0,D F . [sent-905, score-0.353]
67 Then on event E ∩ X0 , we have for R(Θn (E)) − R(Θ0 ) ≥ 0, 2 R(Θn (E)) − R(Θ0 ) ≤ M(C3 + 1/8) max 2|E| log max(n, p)/n, Cbias 2S0,n log(p)/n . [sent-909, score-0.323]
68 In view of Corollary 17, we have on E := X ∩ Ca : for Cdiag as in (17) |E| Θ0,D F ≤ (1 + D1 )S0,n ≤ 2S0,n for D4 ≥ D1 and D4 Θ0 − Θ0 := F ≤ Cbias 2S0,n log(p)/n ≤ c/32, where 2 Cbias := min = max θ2 , 0,ii i=1,. [sent-912, score-0.176]
69 Then we have on event E , |E| ≤ (1 + D1 )S0,n and |E \ E0 | ≤ D1 S0,n by (50); And on event E ∩ X0 , ′ for Cbias := Cdiag d0 1 + (D0 + 1)2 , Θn (E) − Θ0 F ≤ M max 2(1 + D1 )S0,n log max(n, p) ′ , Cbias n 2S0,n log p n . [sent-923, score-0.551]
70 2 Proof of Theorem 19 Suppose event E holds throughout this proof. [sent-926, score-0.182]
71 Define for Rn as in expression (53) Q(Θ) := Rn (Θ) − Rn (Θ0 ) = tr(ΘΓn ) − log |Θ| − tr(Θ0 Γn ) + log |Θ0 | = tr (Θ − Θ0 )(Γn − Σ0 ) − (log |Θ| − log |Θ0 |) + tr (Θ − Θ0 )Σ0 . [sent-934, score-0.514]
72 (62) rn = max n n Define for ∆ ∈ Un (Θ0 ), G(∆) := Q(Θ0 + ∆) = tr(∆(Γn − Σ0 )) − (log |Θ0 + ∆| − log |Θ0 |) + tr(∆Σ0 ). [sent-939, score-0.235]
73 31c2 (64) Conditioned on event X0 , by (70) and (55) log max(n, p)/n =: δn . [sent-958, score-0.228]
74 max |Γn, jk − σ0, jk | ≤ 4C3 j,k ≤ δn |offd(∆)|1 , where Thus on event E ∩ X0 , we have tr ∆(Γn − Σ0 ) |offd(∆)|1 ≤ offd(∆) 0 offd(∆) F 2|E| ∆ ≤ F, and tr ∆(Γn − Σ0 ) log max(n, p)/n 2|E| ∆ ≥ −4C3 ≥ −4C3 rn ∆ F F. [sent-959, score-0.889]
75 First we note that for ∆ ∈ Tn , we have on event E , ∆ 2 ≤ ∆ F = Mrn < 7 , 16k (66) 2 9 32 2 given (55): n > ( 16 · 2k )2 4C3 + 31c2 max (2|E|) log(n), Cbias 2S0,n log p . [sent-961, score-0.323]
76 By Proposition 26 and the fact that G(∆) ≤ G(0) = 0 on event E , we have the following: on event E , if G(∆) > 0, ∀∆ ∈ Tn then ∆ F < Mrn , given that ∆ ∈ Un (Θ0 ) \ (Tn ∪ Wn ). [sent-970, score-0.312]
77 Remark 27 Clearly we have on event E , by (64) (Θ0 )−1 − Σ0 F ≤ Θ0,D F ϕmin (Θ0 )ϕmin (Θ0 ) ≤ 32Cbias 2S0,n log p/n . [sent-976, score-0.228]
78 Now suppose n>( 16 9 2 32 · 2 ) C3 + 7c 2k 31c2 2 2 max 2|E| log max(n, p), Cbias 2S0,n log p , which clearly holds given (57). [sent-979, score-0.302]
79 Then in addition to the bound in (66), on event E ∩ X0 , we have Mrn < 7c/16, 3013 (68) ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN for rn as in (62). [sent-980, score-0.251]
80 Then, by Theorem 19, for the same M as therein, on event E ∩ X0 , we have Θn (E) − Θ0 F ≤ (M + 1) max 2|E| log max(n, p)/n, Cbias 2S0,n log(p)/n , given that sample bound in (55) is clearly satisfied. [sent-981, score-0.35]
81 We now proceed to bound Σn − Σ0 First note that by (68), we have on event E ∩ X0 for M > 7 ϕmin (Θn (E)) ≥ ϕmin (Θ0 ) − Θn − Θ0 2 ≥ ϕmin (Θ0 ) − Θn − Θ0 F given (56). [sent-982, score-0.183]
82 Now clearly on event E ∩ X0 , (58) holds by (56) and Σn (E) − Σ0 F ≤ Θn (E) − Θ0 F ϕmin (Θn (E))ϕmin (Θ0 ) < 2 Θn (E) − Θ0 c2 F . [sent-984, score-0.182]
83 Lemma 28 On event E , we have for Cbias , Θ0 , Θ0 as in Theorem 19, 2 0 ≤ R(Θ0 ) − R(Θ0 ) ≤ (32/(31c))2Cbias 2S0,n log p 2 2 ≤ (32/(31c))2 · rn /2 ≤ Mrn /8, 2n for rn as in (62), where the last inequality holds given that M ≥ 9/2(4C3 + 32/(31c2 )). [sent-990, score-0.39]
84 log |Θ0 + ∆0 | − log |Θ0 | = 1 0 (1 − v) d2 log |Θ0 + v∆0 |dv dv2 We now obtain an upper bound on B ≥ 0. [sent-1001, score-0.243]
85 Now, conditioned on event E ∩ X0 , following the same arguments around (65), we have ≤ δn offd(∆) ≤ δn tr ∆(Sn − Σ0 ) 1 ≤ MrnC3 where offd(∆) 0 2|E| offd(∆) F 2 2|E| log max(n, p)/n ≤ MC3 rn , ≤ 2|E| by definition, and rn is as defined in (62). [sent-1008, score-0.513]
86 On event X0 , the following holds for τ = C3 log max{p,n} n Xi 2 2 −1 σ2 n i ≤ τ, 1 Xi /σi , X j /σ j − ρ0,i j n ≤ τ. [sent-1018, score-0.254]
87 k 2S0,n log(p)/n < √ 13 144σ2 max 4C3 + 12c2 σ2 min ≤ kc2 σ2 cσ2 k min min ≤ min , 48C3 σ2 13σ2 (48c2 σ2 C3 + 13)σ2 max max max min ≤ c . [sent-1025, score-0.785]
88 13σ2 max Suppose event E holds throughout this proof. [sent-1026, score-0.277]
89 Define for Rn as in expression (27), Q(Ω) := Rn (Ω) − Rn (Ω0 ) = tr(ΩΓn ) − log |Ω| − tr(Ω0 Γn ) + log |Ω0 | = tr (Ω − Ω0 )(Γn − Ψ0 ) − (log |Ω| − log |Ω0 |) + tr (Ω − Ω0 )Ψ0 . [sent-1036, score-0.514]
90 Indeed, by definition of ∆ ∈ Tn , we have ϕmin (Ω0 + ∆) ≻ 0 on event E ; thus ϕmin (Ω0 + (1 + ε)∆) ≥ ϕmin (Ω0 + ∆) − ε ∆ and ϕmin (Ω0 − ε∆) ≥ ϕmin (Ω0 ) − ε ∆ 2 2 > 0, > 12σ2 c/13 − ε ∆ min 2 >0 for ε > 0 that is sufficiently small. [sent-1050, score-0.237]
91 First we note that for ∆ ∈ Tn , we have on event E , ∆ 2≤ ∆ F = Mrn < 3019 3σ2 max , 8k F. [sent-1057, score-0.251]
92 (76) ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN 2 13 9 2 given (31): n > ( 8 · 2k )2 σ4 max 2|E|) log max(n, p),Cbias 2S0,n log p . [sent-1058, score-0.239]
93 We have max 4C3 + 12σ2 c2 3 min by (72) and (34) following Rothman et al. [sent-1059, score-0.176]
94 [2008] (see Page 502, proof of Theorem 1 therein): on event E , A ≥ ∆ 2 F/ 2 ϕmax (Ω0 ) + ∆ > ∆ 2 F/ 2σ4 max 2 2 c 1 3 + + k 13 8k 2 > ∆ 2 F 2k2 . [sent-1060, score-0.251]
95 9σ4 max (77) Now on event E ∩ X0 , for all ∆ ∈ Tn , we have by (74),(77), (76), and (75), G(∆) > ∆ 2 F 2k2 − 4C3 rn ∆ 9σ4 max = ∆ 2 F 2k2 1 − 4 ∆ F 9σmax = ∆ 2 F 2k2 13 1 4C3 + − 4 9σmax M 12σ2 c2 min Hence we have G(∆) > 0 for M (9σ4 /(2k2 )) 4C3 + 13/(12σ2 c2 ) suffices. [sent-1061, score-0.495]
96 max min F − ∆ F 4C3 rn + large 13rn 12σ2 c2 min 13rn 12σ2 c2 min enough, . [sent-1062, score-0.406]
97 H IGH - DIMENSIONAL C OVARIANCE E STIMATION We can now apply the union bound to obtain: P max |Y j | ≥ t 1≤ j≤p c1 σ −nt 2 ≤ p √ exp nt 2c2 σ2 1 √ t πn nt 2 − log p . [sent-1101, score-0.25]
98 + log √ 2c2 σ2 2c1 σ 1 √ √ By choosing t = c1 σ 1 + a 2 log p/n, the right-hand side is bounded by ( π log ppa )−1 for a ≥ 0. [sent-1102, score-0.216]
99 High dimensional sparse covariance estimation via directed acyclic u u graphs. [sent-1258, score-0.213]
100 High dimensional inverse covariance matrix estimation via linear programming. [sent-1315, score-0.219]
wordName wordTfidf (topN-words)
[('gelato', 0.372), ('glasso', 0.309), ('cbias', 0.236), ('uhlmann', 0.236), ('utimann', 0.236), ('hou', 0.201), ('mrn', 0.2), ('ovariance', 0.193), ('stimation', 0.166), ('event', 0.156), ('tr', 0.149), ('igh', 0.148), ('zhou', 0.138), ('nodewise', 0.136), ('se', 0.128), ('offd', 0.127), ('si', 0.124), ('tn', 0.106), ('jk', 0.1), ('init', 0.1), ('sn', 0.097), ('max', 0.095), ('thresholding', 0.093), ('rothman', 0.09), ('frobenius', 0.09), ('lasso', 0.084), ('cdiag', 0.082), ('min', 0.081), ('covariance', 0.079), ('vi', 0.074), ('log', 0.072), ('diag', 0.071), ('en', 0.071), ('ar', 0.069), ('dv', 0.069), ('op', 0.069), ('rn', 0.068), ('regressions', 0.065), ('leibler', 0.062), ('theorem', 0.06), ('edge', 0.059), ('concentration', 0.058), ('estimator', 0.057), ('kullback', 0.055), ('un', 0.053), ('ij', 0.053), ('stands', 0.052), ('sparse', 0.051), ('meinshausen', 0.048), ('edges', 0.047), ('bickel', 0.046), ('dashdotted', 0.045), ('dashes', 0.045), ('mlow', 0.045), ('dimensional', 0.045), ('ta', 0.045), ('vec', 0.045), ('var', 0.044), ('peng', 0.042), ('levina', 0.042), ('cand', 0.039), ('isoprenoid', 0.039), ('lam', 0.039), ('hlmann', 0.038), ('estimation', 0.038), ('suppose', 0.037), ('remark', 0.037), ('entries', 0.036), ('mupp', 0.036), ('timann', 0.036), ('lemma', 0.036), ('norm', 0.036), ('regression', 0.035), ('triangles', 0.035), ('stand', 0.033), ('correlation', 0.033), ('eigenvalues', 0.031), ('rudelson', 0.031), ('breast', 0.03), ('bounds', 0.03), ('matrix', 0.03), ('mle', 0.03), ('oracle', 0.029), ('tuning', 0.029), ('fan', 0.029), ('risk', 0.028), ('nt', 0.028), ('geer', 0.028), ('bias', 0.028), ('proposition', 0.027), ('undirected', 0.027), ('inverse', 0.027), ('scad', 0.027), ('shuheng', 0.027), ('thresholded', 0.027), ('bound', 0.027), ('graphical', 0.026), ('circles', 0.026), ('sparsity', 0.026), ('holds', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
Author: Shuheng Zhou, Philipp Rütimann, Min Xu, Peter Bühlmann
Abstract: Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1 -penalization methods. We propose and study the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1 -norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. We show that under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Keywords: graphical model selection, covariance estimation, Lasso, nodewise regression, thresholding c 2011 Shuheng Zhou, Philipp R¨ timann, Min Xu and Peter B¨ hlmann. u u ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN
2 0.16823754 37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices
Author: Jérémie Bigot, Rolando J. Biscay, Jean-Michel Loubes, Lillian Muñiz-Alvarez
Abstract: In this paper, we consider the Group Lasso estimator of the covariance matrix of a stochastic process corrupted by an additive noise. We propose to estimate the covariance matrix in a highdimensional setting under the assumption that the process has a sparse representation in a large dictionary of basis functions. Using a matrix regression model, we propose a new methodology for high-dimensional covariance matrix estimation based on empirical contrast regularization by a group Lasso penalty. Using such a penalty, the method selects a sparse set of basis functions in the dictionary used to approximate the process, leading to an approximation of the covariance matrix into a low dimensional space. Consistency of the estimator is studied in Frobenius and operator norms and an application to sparse PCA is proposed. Keywords: group Lasso, ℓ1 penalty, high-dimensional covariance estimation, basis expansion, sparsity, oracle inequality, sparse PCA
3 0.12173611 35 jmlr-2011-Forest Density Estimation
Author: Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry Wasserman
Abstract: We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal’s algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models. Keywords: kernel density estimation, forest structured Markov network, high dimensional inference, risk consistency, structure selection consistency
4 0.10552069 97 jmlr-2011-Union Support Recovery in Multi-task Learning
Author: Mladen Kolar, John Lafferty, Larry Wasserman
Abstract: We sharply characterize the performance of different penalization schemes for the problem of selecting the relevant variables in the multi-task setting. Previous work focuses on the regression problem where conditions on the design matrix complicate the analysis. A clearer and simpler picture emerges by studying the Normal means model. This model, often used in the field of statistics, is a simplified model that provides a laboratory for studying complex procedures. Keywords: high-dimensional inference, multi-task learning, sparsity, normal means, minimax estimation
5 0.078224659 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
Author: Adam D. Bull
Abstract: In the efficient global optimization problem, we minimize an unknown function f , using as few observations f (x) as possible. It can be considered a continuum-armed-bandit problem, with noiseless data, and simple regret. Expected-improvement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussian-process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f . We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior. Keywords: convergence rates, efficient global optimization, expected improvement, continuumarmed bandit, Bayesian optimization
6 0.077475414 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
7 0.077464037 101 jmlr-2011-Variable Sparsity Kernel Learning
8 0.068998657 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
9 0.06347008 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
10 0.058326326 8 jmlr-2011-Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
11 0.058086541 6 jmlr-2011-A Simpler Approach to Matrix Completion
12 0.054538984 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
13 0.045826349 59 jmlr-2011-Learning with Structured Sparsity
14 0.044128891 55 jmlr-2011-Learning Multi-modal Similarity
15 0.044013713 91 jmlr-2011-The Sample Complexity of Dictionary Learning
16 0.043125536 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
17 0.042388756 54 jmlr-2011-Learning Latent Tree Graphical Models
18 0.041526105 69 jmlr-2011-Neyman-Pearson Classification, Convexity and Stochastic Constraints
19 0.03853951 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
20 0.038450576 40 jmlr-2011-Hyper-Sparse Optimal Aggregation
topicId topicWeight
[(0, 0.243), (1, -0.015), (2, 0.127), (3, 0.105), (4, 0.114), (5, 0.006), (6, -0.019), (7, -0.018), (8, -0.059), (9, 0.177), (10, 0.153), (11, -0.139), (12, -0.05), (13, -0.264), (14, -0.034), (15, 0.096), (16, 0.117), (17, -0.009), (18, -0.057), (19, 0.065), (20, 0.093), (21, -0.026), (22, -0.067), (23, -0.055), (24, -0.029), (25, -0.028), (26, 0.149), (27, 0.208), (28, 0.126), (29, 0.208), (30, 0.119), (31, -0.075), (32, -0.064), (33, -0.108), (34, 0.077), (35, -0.053), (36, -0.074), (37, -0.124), (38, 0.069), (39, -0.015), (40, -0.045), (41, 0.032), (42, -0.026), (43, -0.048), (44, 0.051), (45, -0.025), (46, 0.102), (47, 0.024), (48, 0.036), (49, 0.115)]
simIndex simValue paperId paperTitle
same-paper 1 0.94124472 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
Author: Shuheng Zhou, Philipp Rütimann, Min Xu, Peter Bühlmann
Abstract: Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1 -penalization methods. We propose and study the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1 -norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. We show that under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Keywords: graphical model selection, covariance estimation, Lasso, nodewise regression, thresholding c 2011 Shuheng Zhou, Philipp R¨ timann, Min Xu and Peter B¨ hlmann. u u ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN
2 0.73796773 37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices
Author: Jérémie Bigot, Rolando J. Biscay, Jean-Michel Loubes, Lillian Muñiz-Alvarez
Abstract: In this paper, we consider the Group Lasso estimator of the covariance matrix of a stochastic process corrupted by an additive noise. We propose to estimate the covariance matrix in a highdimensional setting under the assumption that the process has a sparse representation in a large dictionary of basis functions. Using a matrix regression model, we propose a new methodology for high-dimensional covariance matrix estimation based on empirical contrast regularization by a group Lasso penalty. Using such a penalty, the method selects a sparse set of basis functions in the dictionary used to approximate the process, leading to an approximation of the covariance matrix into a low dimensional space. Consistency of the estimator is studied in Frobenius and operator norms and an application to sparse PCA is proposed. Keywords: group Lasso, ℓ1 penalty, high-dimensional covariance estimation, basis expansion, sparsity, oracle inequality, sparse PCA
3 0.44997975 35 jmlr-2011-Forest Density Estimation
Author: Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry Wasserman
Abstract: We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal’s algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models. Keywords: kernel density estimation, forest structured Markov network, high dimensional inference, risk consistency, structure selection consistency
4 0.41258797 97 jmlr-2011-Union Support Recovery in Multi-task Learning
Author: Mladen Kolar, John Lafferty, Larry Wasserman
Abstract: We sharply characterize the performance of different penalization schemes for the problem of selecting the relevant variables in the multi-task setting. Previous work focuses on the regression problem where conditions on the design matrix complicate the analysis. A clearer and simpler picture emerges by studying the Normal means model. This model, often used in the field of statistics, is a simplified model that provides a laboratory for studying complex procedures. Keywords: high-dimensional inference, multi-task learning, sparsity, normal means, minimax estimation
5 0.39485914 6 jmlr-2011-A Simpler Approach to Matrix Completion
Author: Benjamin Recht
Abstract: This paper provides the best bounds to date on the number of randomly sampled entries required to reconstruct an unknown low-rank matrix. These results improve on prior work by Cand` s and e Recht (2009), Cand` s and Tao (2009), and Keshavan et al. (2009). The reconstruction is accome plished by minimizing the nuclear norm, or sum of the singular values, of the hidden matrix subject to agreement with the provided entries. If the underlying matrix satisfies a certain incoherence condition, then the number of entries required is equal to a quadratic logarithmic factor times the number of parameters in the singular value decomposition. The proof of this assertion is short, self contained, and uses very elementary analysis. The novel techniques herein are based on recent work in quantum information theory. Keywords: matrix completion, low-rank matrices, convex optimization, nuclear norm minimization, random matrices, operator Chernoff bound, compressed sensing
6 0.35944673 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
7 0.34982789 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
8 0.33041245 101 jmlr-2011-Variable Sparsity Kernel Learning
9 0.31982216 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
10 0.30736315 69 jmlr-2011-Neyman-Pearson Classification, Convexity and Stochastic Constraints
11 0.28471074 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
12 0.28158659 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
13 0.2691972 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
14 0.25800514 88 jmlr-2011-Structured Variable Selection with Sparsity-Inducing Norms
15 0.25013101 40 jmlr-2011-Hyper-Sparse Optimal Aggregation
16 0.2309064 91 jmlr-2011-The Sample Complexity of Dictionary Learning
17 0.22127575 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
18 0.21662244 58 jmlr-2011-Learning from Partial Labels
19 0.21504016 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
20 0.21385077 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
topicId topicWeight
[(4, 0.067), (9, 0.024), (10, 0.016), (24, 0.019), (31, 0.066), (32, 0.011), (41, 0.025), (60, 0.01), (73, 0.019), (78, 0.637), (90, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99000144 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
Author: Shuheng Zhou, Philipp Rütimann, Min Xu, Peter Bühlmann
Abstract: Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1 -penalization methods. We propose and study the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1 -norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. We show that under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Keywords: graphical model selection, covariance estimation, Lasso, nodewise regression, thresholding c 2011 Shuheng Zhou, Philipp R¨ timann, Min Xu and Peter B¨ hlmann. u u ¨ ¨ Z HOU , R UTIMANN , X U AND B UHLMANN
2 0.98673624 33 jmlr-2011-Exploiting Best-Match Equations for Efficient Reinforcement Learning
Author: Harm van Seijen, Shimon Whiteson, Hado van Hasselt, Marco Wiering
Abstract: This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of modelfree methods. Best-match learning works by approximating the solution to a set of best-match equations, which combine a sparse model with a model-free Q-value function constructed from samples not used by the model. We prove that, unlike regular sparse model-based methods, bestmatch learning is guaranteed to converge to the optimal Q-values in the tabular case. Empirical results demonstrate that best-match learning can substantially outperform regular sparse modelbased methods, as well as several model-free methods that strive to improve the sample efficiency of temporal-difference methods. In addition, we demonstrate that best-match learning can be successfully combined with function approximation. Keywords: reinforcement learning, on-line learning, temporal-difference methods, function approximation, data reuse
3 0.97568518 87 jmlr-2011-Stochastic Methods forl1-regularized Loss Minimization
Author: Shai Shalev-Shwartz, Ambuj Tewari
Abstract: We describe and analyze two stochastic methods for ℓ1 regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only uses a single training example at each iteration. In both methods, the choice of feature or example is uniformly at random. Our theoretical runtime analysis suggests that the stochastic methods should outperform state-of-the-art deterministic approaches, including their deterministic counterparts, when the size of the problem is large. We demonstrate the advantage of stochastic methods by experimenting with synthetic and natural data sets.1 Keywords: L1 regularization, optimization, coordinate descent, mirror descent, sparsity
4 0.94329512 22 jmlr-2011-Differentially Private Empirical Risk Minimization
Author: Kamalika Chaudhuri, Claire Monteleoni, Anand D. Sarwate
Abstract: Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ε-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacypreserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance. Keywords: privacy, classification, optimization, empirical risk minimization, support vector machines, logistic regression
5 0.85149878 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
Author: Şeyda Ertekin, Cynthia Rudin
Abstract: We demonstrate that there are machine learning algorithms that can achieve success for two separate tasks simultaneously, namely the tasks of classification and bipartite ranking. This means that advantages gained from solving one task can be carried over to the other task, such as the ability to obtain conditional density estimates, and an order-of-magnitude reduction in computational time for training the algorithm. It also means that some algorithms are robust to the choice of evaluation metric used; they can theoretically perform well when performance is measured either by a misclassification error or by a statistic of the ROC curve (such as the area under the curve). Specifically, we provide such an equivalence relationship between a generalization of Freund et al.’s RankBoost algorithm, called the “P-Norm Push,” and a particular cost-sensitive classification algorithm that generalizes AdaBoost, which we call “P-Classification.” We discuss and validate the potential benefits of this equivalence relationship, and perform controlled experiments to understand P-Classification’s empirical performance. There is no established equivalence relationship for logistic regression and its ranking counterpart, so we introduce a logistic-regression-style algorithm that aims in between classification and ranking, and has promising experimental performance with respect to both tasks. Keywords: supervised classification, bipartite ranking, area under the curve, rank statistics, boosting, logistic regression
6 0.79892188 37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices
7 0.79281592 81 jmlr-2011-Robust Approximate Bilinear Programming for Value Function Approximation
8 0.77895862 40 jmlr-2011-Hyper-Sparse Optimal Aggregation
9 0.76109779 89 jmlr-2011-Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation
10 0.7584399 8 jmlr-2011-Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
11 0.73921835 29 jmlr-2011-Efficient Learning with Partially Observed Attributes
12 0.73122525 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
13 0.70535767 36 jmlr-2011-Generalized TD Learning
14 0.70433074 1 jmlr-2011-A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
15 0.70408368 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
16 0.70349312 104 jmlr-2011-X-Armed Bandits
17 0.70076716 91 jmlr-2011-The Sample Complexity of Dictionary Learning
18 0.6963076 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem
19 0.69286543 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm
20 0.69082332 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning