nips nips2009 nips2009-225 knowledge-graph by maker-knowledge-mining

225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes


Source: pdf

Author: Mladen Kolar, Le Song, Eric P. Xing

Abstract: To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. [sent-4, score-0.257]

2 In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. [sent-5, score-0.133]

3 We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i. [sent-6, score-0.424]

4 We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. [sent-9, score-0.985]

5 We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. [sent-10, score-0.137]

6 We also consider how this applies to structure estimation of time-varying probabilistic graphical models. [sent-12, score-0.122]

7 , n, such as the prices of a set of stocks at time i, or the signals from some sensors deployed at location i; the noise ǫ1 , . [sent-22, score-0.13]

8 Varying-coefficient models are a non-parametric extension to the linear regression models, which unlike other non-parametric models, assume that there is a linear relationship (generalizable to log-linear relationship) between the feature variables and the output variable, albeit a changing one. [sent-33, score-0.121]

9 At different blocks only covariates with non-zero coefficient affect the response, e. [sent-85, score-0.313]

10 (b) Schematic representation of the covariates affecting the response during the second block in panel (a), which is reminiscent of neighborhood selection in graph structure learning. [sent-90, score-0.503]

11 (c) and (d) Application of VCVS for graph structure estimation (see Section 7) of non-piecewise constant evolving graphs. [sent-91, score-0.206]

12 In this paper, we analyze VCVS as functions of time, and the main goal is to estimate the dynamic structure and jump points of the unknown vector function β(t). [sent-93, score-0.542]

13 < TB = 1}, 1 < B ≤ n, of the time interval (scaled to) [0, 1], such that β(t) = γj , t ∈ [Tj−1 , Tj ) for some constant vectors γj ∈ Rp , j = 1, . [sent-100, score-0.141]

14 Furthermore, we assume that at each time point ti only a few covariates affect the response, i. [sent-108, score-0.621]

15 A good estimation procedure would be able to identify the correct partition of the interval [0, 1] so that within each segment the coefficient function is constant. [sent-111, score-0.362]

16 This estimation problem is particularly important in applications where one needs to uncover dynamic relational information or model structures from time series data. [sent-115, score-0.199]

17 Another important problem is to identify structural changes in fields such as signal processing, EEG segmentation and analysis of seismic signals. [sent-117, score-0.216]

18 In all these problems, the goal is not to estimate the optimum value of β(t) for predicting Y , but to consistently uncover the zero and non-zero patterns in β(t) at time points of interest that reveal the changing structure of the model. [sent-118, score-0.255]

19 Our problem is remotely related to, but very different from, earlier works on linear regression models with structural changes [4], and the problem of change-point detection (e. [sent-120, score-0.303]

20 A number of existing methods are available to identify only one structural change in the data; in order to identify multiple changes these methods can be applied sequentially on smaller intervals that are assumed to harbor only one change [14]. [sent-123, score-0.296]

21 Another common approach is to assume that there are K changes and use Dynamic Programming to estimate them [4]. [sent-124, score-0.117]

22 In this paper, we propose and analyze a penalized least squares approach, which automatically adapts to the unknown number of structural changes present in the data and performs the variable selection on each of the constant regions. [sent-125, score-0.386]

23 , βk is a spline function, with potential jump points at observation times ti , i = 1, . [sent-133, score-0.783]

24 In this particular case, the total variation penalty defined above allows us to conceptualize βk as a vector in Rn , whose components βk,i ≡ βk (ti ) correspond to function values at ti , i = 1, . [sent-137, score-0.437]

25 Observe that ℓ1 penalty encourages sparsity of the signal at each time point and enables a selection over the relevant coefficients; whereas the total variation penalty ˆ is used to partition the interval [0, 1] so that βk is constant within each segment. [sent-147, score-0.545]

26 , B, denote the set of time points that fall into the interval [Tj−1 , Tj ); when the meaning is clear from the context, we also use Bj as a shorthand of this interval. [sent-157, score-0.176]

27 For example, XBj and YBj represent the submatrix of X and subvector of Y , respectively, that include elements only ˆ corresponding to time points within interval Bj . [sent-158, score-0.176]

28 , T ˆ } of [0, 1] (possibly a trivial one) and unique vectors γj ∈ Rp , j = ˆ ˆ block partition T ˆ B ˆ ˆ ˆ 1, . [sent-163, score-0.132]

29 The set of relevant covariates during inverval Bj , i. [sent-167, score-0.306]

30 The larger the estimated segments, the smaller the relative influence of the bias from the total variation, while the magnitude of the bias introduced by the ℓ1 penalty is uniform across different segments. [sent-184, score-0.126]

31 The additional bias coming from the total variation penalty was also noted in the problem of signal denoising [23]. [sent-185, score-0.117]

32 3 A two-step procedure for estimating time-varying structures In this section, we propose a new algorithm for estimating the time-varying structure of the varyingcoefficient model in Eq. [sent-187, score-0.205]

33 Estimate the block partition T , on which the coefficient vector is constant within each block. [sent-191, score-0.168]

34 This can be obtained by minimizing the following objective: p n i=1 2 (Yi − X′ β(ti )) + 2λ2 i k=1 ||βk ||TV , (7) which we refer to as a temporal difference (TD) regression for reasons that will be clear shortly. [sent-192, score-0.123]

35 (7) and turn it into an ℓ1 -regularized regression problem, and solve it using the randomized Lasso. [sent-194, score-0.156]

36 For each block of the partition, Bj , 1 ≤ j ≤ B, estimate γj by minimizing the Lasso ˆ objective within the block: γj = argmin ˆ γ∈Rp ˆ ti ∈Bj (Yi − X′ γ)2 + 2λ1 ||γ||1 . [sent-197, score-0.494]

37 i (8) We name this procedure TDB-Lasso (or TDBL), after the two steps (TD randomized Lasso, and Lasso within Blocks) given above. [sent-198, score-0.115]

38 (7) ˆ into an equivalent ℓ1 penalized regression problem, which allows us to cast the T estimation † problem as a feature selection problem. [sent-204, score-0.302]

39 Let βk,i denote the temporal difference between the regression coefficients corresponding to the same covariate k at successive time points ti−1 and † ti : βk,i ≡ βk (ti ) − βk (ti−1 ), k = 1, . [sent-205, score-0.515]

40 (9) This transformation was proposed in [8] in the context of one-dimensional signal denoising, however, we are interested in the estimation of jump points in the context of time-varying coefficient model. [sent-226, score-0.544]

41 To deal with the problem of robustness, we employed the stability selection procedure of [22] (see also the bootstrap Lasso [2], however, we have decided to use the stability selection because of the weaker assumptions). [sent-231, score-0.369]

42 The stability selection approach to estimating the jump-points is comprised of two main components: i) simulating multiple datasets using bootstrap, and ii) using the randomized Lasso outlined in Algorithm 1 (see also Appendix) to solve (9). [sent-232, score-0.266]

43 While the bootstrap step improves the robustness of the estimator, ˆ the randomized Lasso weakens the conditions under which the estimator β † selects exactly the true features. [sent-233, score-0.167]

44 We obtain a stable estimate of the support by selecting variables that appear in multiple supports M b=1 ˆ 1 ∈ Jb† } I{k ≥ τ }, (10) M ˆ which is then used to obtain the block partition estimate T . [sent-237, score-0.208]

45 The parameter τ is a tuning parameter that controls the number of falsely identified jump points. [sent-238, score-0.432]

46 1 Estimating jump points We first address the issue of estimating jump points by analyzing the transformed TD-regression problem Eq. [sent-242, score-0.985]

47 To prove that all the jump points are included in J τ , we first state a sparse eigenvalue condition on the design (e. [sent-245, score-0.463]

48 3/2 ϕmin (CJ 2 , X† ) This condition guarantees a correlation structure between TD-transformed covariates that allows for detection of the jump points. [sent-256, score-0.697]

49 Comparing to the irrepresentible condition [30, 21, 27], necessary for the ordinary Lasso to perform feature selection, condition A1 is much weaker [22] and is sufficient for the randomized Lasso to select the relevant feature with high probability (see also [26]). [sent-257, score-0.151]

50 If the minimum size of the jump is bounded away from zero as † min |βk | ≥ 0. [sent-259, score-0.433]

51 3(CJ)3/2 λmin , k∈J † √ where λmin = 2σ † ( CJ + 1) (13) and σ † ≥ V ar(Yi† ), for np > 10 and J ≥ 7, there exists ˆ some δ = δJ ∈ (0, 1) such that for all τ ≥ 1 − δ, the collection of the estimated jump points J τ satisfies, ˆ P(J τ = J † ) ≥ 1 − 5/np. [sent-260, score-0.552]

52 (14) log np n 2 Remark: Note that Theorem 1 gives conditions under which we can recover every jump point in every covariates. [sent-261, score-0.432]

53 In particular, there are no assumptions on the number of covariates that change values at a jump point. [sent-262, score-0.696]

54 Assuming that multiple covariates change their values at a jump point, we could further relax the condition on the minimal size of a jump given in Eq. [sent-263, score-1.088]

55 It was also pointed to us that the framework of [18] may be a more natural way to estimate jump points. [sent-265, score-0.43]

56 2 Identifying correct covariates Now we address the issue of selecting the relevant features for every estimated segment. [sent-267, score-0.39]

57 Under the conditions of Theorem 1, correct jump points will be detected with probability arbitrarily close to 1. [sent-268, score-0.498]

58 That means under the assumption A1, we can run the regular Lasso on each of the estimated segments to select the relevant features therein. [sent-269, score-0.134]

59 (15) The assumption A2 is a mild version of the mutual coherence condition used in [7], which is necesˆ sary for identification of the relevant covariates in each segment. [sent-273, score-0.306]

60 Then for a sequence δ = δn → 0, λ1 ≥ 4Lσ ln 4Kp ln 2Kp δ δ ∨ 8L ρ ρ we have and min min |γj,k | ≥ 2λ1 , 1≤j≤B k∈SBj ˆ lim P(B = B) = 1, (16) lim max P(||ˆj − γj ||1 = 0) = 1, γ (17) n→∞ n→∞ 1≤j≤B lim ˆ min P(SBj = SBj ) = 1. [sent-282, score-0.123]

61 , it selects the correct jump points and for each segment between two jump points it is able to select the correct covariates. [sent-285, score-1.059]

62 5 Practical considerations As in standard Lasso, the regularization parameters in TDB-Lasso need to be tuned appropriately to attain correct structural recovery. [sent-287, score-0.172]

63 The TD regression procedure requires three parameters: the penalty parameter λ2 , cut-off parameter τ , and weakness parameter α. [sent-288, score-0.29]

64 From our empirical experiˆ ence, the recovered set of jump points T vary very little with respect to these parameters in a wide range. [sent-289, score-0.463]

65 Theorem 1 in [22] gives a way to select the cutoff τ while controlling the number of falsely included jump points. [sent-291, score-0.475]

66 The weakness parameter can be chosen in quite a large interval (see Appendix on the randomized Lasso) and we report our results using the values α = 0. [sent-293, score-0.217]

67 (8) on each estimated segment to select relevant variables, which requires a choice of the penalty parameter λ1 . [sent-296, score-0.231]

68 In cases where the assumptions are violated, the resulting set of estimated jump points is larger than the true set T , e. [sent-299, score-0.512]

69 the ˆ points close to the true jump points get included into the resulting estimate T . [sent-301, score-0.572]

70 We propose to use an ad hoc heuristic to refine the initially selected set of jump points. [sent-302, score-0.392]

71 A commonly used procedure for estimation of linear regression models with structural changes [3] is a dynamic programming method that considers a possible structural change at every location ti , i = 1, . [sent-303, score-0.927]

72 We modify this method to consider jump points only ˆ ˆ in the estimated set T and thus considerably reducing the computational complexity to O(|T |2 ), ˆ | ≪ n. [sent-307, score-0.512]

73 5 0 200 400 Sample size 0 200 400 Sample size 0 0 200 400 Sample size 0 200 400 Sample size 200 400 Sample size Figure 2: Comparison results of different estimation procedures on a synthetic dataset. [sent-318, score-0.136]

74 to 500 time points, and fixed the number of covariates is fixed to p = 20. [sent-319, score-0.301]

75 The block partition was generated randomly and consists of ten blocks with minimum length set to 10 time points. [sent-320, score-0.218]

76 In each of the block, only 5 covariates out of 20 affected the response. [sent-321, score-0.264]

77 A simple local regression method [13], which is commonly used for estimation in varying coefficient models, was used as the simplest baseline for comparing the relative performance of estimation. [sent-331, score-0.203]

78 Our first competitor is an extension of the baseline, which uses the following estimator [28]: n p n min β∈Rp×n i′ =1 i=1 (Yi − X′ βi′ )2 Kh (ti′ − ti ) + i n 2 βi′ ,j , λj j=1 (19) i′ =1 1 where Kh (·) = h K(·/h) is the kernel function. [sent-332, score-0.525]

79 Another competitor uses the ℓ1 penalized local regression independently at each time point, which leads to the following estimator of β(t), p n min p β∈R i=1 (Yi − X′ β)2 Kh (ti − t) + i j=1 λj |βj |. [sent-334, score-0.307]

80 The difference between the two methods is that “Kernel ℓ1 /ℓ2 ” biases certain covariates toward zero at every time point, based on global information; whereas “Kernel ℓ1 ” biases covariates toward zero only based on local information. [sent-336, score-0.565]

81 We report the relative estimation error, REE = 100 × Pp ∗ ˆ j=1 |βi,j −βi,j | Pp ˜i,j −β ∗ | , i=1 j=1 |β i,j Pn i=1 Pn ˜ where β is the baseline local linear estimator, as a measure of estimation accuracy. [sent-342, score-0.162]

82 To asses the performance of the model selection, we report precision, recall and their harmonic mean F1 measure when estimating the relevant covariates at each time point and the percentage of correctly identified irrelevant covariates. [sent-343, score-0.402]

83 7 Application to Time-varying Graph Structure Estimation An interesting application of the TDB-Lasso is in structural estimation of time-varying undirected graphical models [1, 17]. [sent-350, score-0.218]

84 A graph structure estimation can be posed as a neighborhood selection 7 problem, in which neighbors of each node are estimated independently. [sent-351, score-0.306]

85 Neighborhood selection in the time-varying Gaussian graphical models (GGM) is equivalent to model selection in VCVS, where value of one node is regressed to the rest of nodes. [sent-352, score-0.198]

86 Graphs estimated in this way will have neighborhoods of each node that are constant on a partition, but the graph as a whole changes more flexibly (Fig. [sent-354, score-0.164]

87 00s The graph structure estimation using the TDBLasso is demonstrated on a real dataset of electroencephalogram (EEG) measurements. [sent-359, score-0.122]

88 We use the brain computer interface (BCI) dataset IVa from [11] in which the EEG data is collected from 5 subjects, who were given visual cues based on which they were required to imagine right hand Figure 3: Brain interactions for the subject ’aa’ or right foot for 3. [sent-360, score-0.142]

89 3 gives a visualization of the brain interactions over the time of the experiment for the subject ’aa’ while presented with visual cues for the class 1 (right hand). [sent-365, score-0.179]

90 We also used TDB-Lasso for estimating the time-varying gene networks from microarray data time series data, but due to space limit, results will be reported later in a biological paper. [sent-375, score-0.186]

91 61 Discussion We have developed the TDB-Lasso procedure, a novel approach for model selection and variable estimation in the varying-coefficient varying-structure models with piecewise constant functions. [sent-387, score-0.263]

92 Due to their flexibility, important classical problems, such as linear regression with structural changes and change point detection, and some more recent problems, like structure estimation of varying graphical models, can be modeled within this class of models. [sent-389, score-0.5]

93 The TDB-Lasso compares favorably to other commonly used [28] or latest [1] techniques for estimation in this class of models, which was demonstrated on the synthetic data. [sent-390, score-0.136]

94 The model selection properties of the TDB-Lasso, demonstrated on the synthetic data, are also supported by the theoretical analysis. [sent-391, score-0.154]

95 Application of the TDB-Lasso procedure goes beyond the linear varying coefficient regression models. [sent-393, score-0.168]

96 A direct extension is to generalized varying-coefficient models g(m(Xi , ti )) = X′ β(ti ), i = i 1, . [sent-394, score-0.32]

97 , n, where g(·) is a given link function and m(Xi , ti ) = E[Y |X = Xi , t = ti ] is the conditional mean. [sent-397, score-0.64]

98 Estimating and testing linear models with multiple structural changes. [sent-417, score-0.137]

99 Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1 + ℓ2 penalization. [sent-437, score-0.186]

100 Least-squares estimation of an unknown number of shifts in a time series. [sent-502, score-0.118]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jump', 0.392), ('ti', 0.32), ('vcvs', 0.273), ('tv', 0.264), ('covariates', 0.264), ('lasso', 0.186), ('bj', 0.178), ('sbj', 0.174), ('structural', 0.137), ('coef', 0.112), ('eeg', 0.105), ('selection', 0.099), ('eric', 0.099), ('rp', 0.096), ('cj', 0.087), ('regression', 0.087), ('estimation', 0.081), ('weakness', 0.08), ('changes', 0.079), ('penalty', 0.077), ('yi', 0.076), ('bic', 0.074), ('points', 0.071), ('partition', 0.069), ('randomized', 0.069), ('interval', 0.068), ('kolar', 0.065), ('mladen', 0.065), ('block', 0.063), ('segment', 0.063), ('tj', 0.06), ('harchaoui', 0.06), ('estimating', 0.059), ('td', 0.058), ('kernel', 0.057), ('aa', 0.056), ('competitor', 0.056), ('za', 0.056), ('synthetic', 0.055), ('prices', 0.053), ('stock', 0.053), ('estimator', 0.051), ('kh', 0.05), ('rnp', 0.05), ('sparsistent', 0.05), ('varyingcoef', 0.05), ('xbj', 0.05), ('ybj', 0.05), ('estimated', 0.049), ('blocks', 0.049), ('cues', 0.049), ('evolving', 0.048), ('series', 0.047), ('interactions', 0.047), ('piecewise', 0.047), ('bootstrap', 0.047), ('procedure', 0.046), ('brain', 0.046), ('cients', 0.045), ('preprint', 0.045), ('jb', 0.043), ('cutoff', 0.043), ('jianqing', 0.043), ('tb', 0.043), ('tesla', 0.043), ('segments', 0.043), ('gene', 0.043), ('relevant', 0.042), ('william', 0.041), ('structure', 0.041), ('min', 0.041), ('variation', 0.04), ('change', 0.04), ('np', 0.04), ('ordinary', 0.04), ('stocks', 0.04), ('ggm', 0.04), ('falsely', 0.04), ('xi', 0.04), ('stability', 0.039), ('song', 0.039), ('estimate', 0.038), ('argmin', 0.037), ('bai', 0.037), ('time', 0.037), ('cient', 0.036), ('appendix', 0.036), ('neighborhood', 0.036), ('minimizing', 0.036), ('constant', 0.036), ('rn', 0.036), ('fused', 0.035), ('penalized', 0.035), ('correct', 0.035), ('varying', 0.035), ('changing', 0.034), ('sj', 0.034), ('francis', 0.034), ('econometrics', 0.034), ('uncover', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

Author: Mladen Kolar, Le Song, Eric P. Xing

Abstract: To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1

2 0.17772773 246 nips-2009-Time-Varying Dynamic Bayesian Networks

Author: Le Song, Mladen Kolar, Eric P. Xing

Abstract: Directed graphical models such as Bayesian networks are a favored formalism for modeling the dependency structures in complex multivariate systems such as those encountered in biology and neural science. When a system is undergoing dynamic transformation, temporally rewiring networks are needed for capturing the dynamic causal influences between covariates. In this paper, we propose time-varying dynamic Bayesian networks (TV-DBN) for modeling the structurally varying directed dependency structures underlying non-stationary biological/neural time series. This is a challenging problem due the non-stationarity and sample scarcity of time series data. We present a kernel reweighted 1 -regularized auto-regressive procedure for this problem which enjoys nice properties such as computational efficiency and provable asymptotic consistency. To our knowledge, this is the first practical and statistically sound method for structure learning of TVDBNs. We applied TV-DBNs to time series measurements during yeast cell cycle and brain response to visual stimuli. In both cases, TV-DBNs reveal interesting dynamics underlying the respective biological systems. 1

3 0.16037875 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

Author: Sahand Negahban, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar

Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M -estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M -estimators have fast convergence rates. 1

4 0.15697215 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

Author: Shuheng Zhou

Abstract: Given n noisy samples with p dimensions, where n ≪ p, we show that the multistep thresholding procedure can accurately estimate a sparse vector β ∈ Rp in a linear model, under the restricted eigenvalue conditions (Bickel-Ritov-Tsybakov 09). Thus our conditions for model selection consistency are considerably weaker than what has been achieved in previous works. More importantly, this method allows very significant values of s, which is the number of non-zero elements in the true parameter. For example, it works for cases where the ordinary Lasso would have failed. Finally, we show that if X obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, the Gauss-Dantzig selector (Cand` se Tao 07) achieves the ℓ2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle which would supply perfect information about which coordinates are non-zero and which are above the noise level, while selecting a sufficiently sparse model. 1

5 0.15301217 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

Author: Sylvain Arlot, Francis R. Bach

Abstract: This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression or spline smoothing, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows’ CL penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as 10-fold cross-validation or generalized cross-validation. 1

6 0.12393834 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

7 0.10708622 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

8 0.10412663 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

9 0.10125539 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

10 0.096091881 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

11 0.09225554 133 nips-2009-Learning models of object structure

12 0.091394708 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

13 0.090513639 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

14 0.085903108 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

15 0.085213587 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

16 0.085203215 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

17 0.082787752 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

18 0.082316376 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

19 0.079334207 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

20 0.077940062 67 nips-2009-Directed Regression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.26), (1, 0.084), (2, 0.007), (3, 0.109), (4, -0.014), (5, -0.054), (6, 0.14), (7, -0.135), (8, 0.051), (9, -0.031), (10, 0.025), (11, -0.05), (12, 0.052), (13, -0.025), (14, 0.055), (15, 0.109), (16, -0.004), (17, -0.128), (18, -0.089), (19, 0.072), (20, 0.054), (21, 0.132), (22, -0.04), (23, 0.007), (24, 0.034), (25, 0.154), (26, -0.078), (27, 0.239), (28, -0.001), (29, -0.109), (30, -0.055), (31, 0.056), (32, -0.061), (33, -0.094), (34, -0.035), (35, 0.044), (36, -0.026), (37, 0.07), (38, -0.045), (39, 0.065), (40, 0.031), (41, -0.064), (42, -0.045), (43, -0.008), (44, 0.118), (45, -0.026), (46, -0.07), (47, 0.048), (48, -0.028), (49, -0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94996589 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

Author: Mladen Kolar, Le Song, Eric P. Xing

Abstract: To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1

2 0.68154287 105 nips-2009-Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction

Author: Grzegorz Swirszcz, Naoki Abe, Aurelie C. Lozano

Abstract: We consider the problem of variable group selection for least squares regression, namely, that of selecting groups of variables for best regression performance, leveraging and adhering to a natural grouping structure within the explanatory variables. We show that this problem can be efficiently addressed by using a certain greedy style algorithm. More precisely, we propose the Group Orthogonal Matching Pursuit algorithm (Group-OMP), which extends the standard OMP procedure (also referred to as “forward greedy feature selection algorithm” for least squares regression) to perform stage-wise group variable selection. We prove that under certain conditions Group-OMP can identify the correct (groups of) variables. We also provide an upperbound on the l∞ norm of the difference between the estimated regression coefficients and the true coefficients. Experimental results on simulated and real world datasets indicate that Group-OMP compares favorably to Group Lasso, OMP and Lasso, both in terms of variable selection and prediction accuracy. 1

3 0.67539936 246 nips-2009-Time-Varying Dynamic Bayesian Networks

Author: Le Song, Mladen Kolar, Eric P. Xing

Abstract: Directed graphical models such as Bayesian networks are a favored formalism for modeling the dependency structures in complex multivariate systems such as those encountered in biology and neural science. When a system is undergoing dynamic transformation, temporally rewiring networks are needed for capturing the dynamic causal influences between covariates. In this paper, we propose time-varying dynamic Bayesian networks (TV-DBN) for modeling the structurally varying directed dependency structures underlying non-stationary biological/neural time series. This is a challenging problem due the non-stationarity and sample scarcity of time series data. We present a kernel reweighted 1 -regularized auto-regressive procedure for this problem which enjoys nice properties such as computational efficiency and provable asymptotic consistency. To our knowledge, this is the first practical and statistically sound method for structure learning of TVDBNs. We applied TV-DBNs to time series measurements during yeast cell cycle and brain response to visual stimuli. In both cases, TV-DBNs reveal interesting dynamics underlying the respective biological systems. 1

4 0.62241381 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

Author: Shuheng Zhou

Abstract: Given n noisy samples with p dimensions, where n ≪ p, we show that the multistep thresholding procedure can accurately estimate a sparse vector β ∈ Rp in a linear model, under the restricted eigenvalue conditions (Bickel-Ritov-Tsybakov 09). Thus our conditions for model selection consistency are considerably weaker than what has been achieved in previous works. More importantly, this method allows very significant values of s, which is the number of non-zero elements in the true parameter. For example, it works for cases where the ordinary Lasso would have failed. Finally, we show that if X obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, the Gauss-Dantzig selector (Cand` se Tao 07) achieves the ℓ2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle which would supply perfect information about which coordinates are non-zero and which are above the noise level, while selecting a sufficiently sparse model. 1

5 0.60500222 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

Author: Han Liu, Xi Chen

Abstract: This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors, including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications of specific versions of the additive forward regression. 1

6 0.60317844 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

7 0.595137 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

8 0.5846557 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

9 0.56469983 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

10 0.55875307 67 nips-2009-Directed Regression

11 0.54038411 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

12 0.52706158 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

13 0.49169219 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

14 0.4867931 138 nips-2009-Learning with Compressible Priors

15 0.4843514 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis

16 0.48033077 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

17 0.46208107 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

18 0.45717809 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

19 0.44416034 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing

20 0.42667469 91 nips-2009-Fast, smooth and adaptive regression in metric spaces


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.016), (21, 0.021), (24, 0.051), (25, 0.131), (35, 0.053), (36, 0.138), (39, 0.05), (55, 0.012), (58, 0.082), (61, 0.03), (71, 0.039), (81, 0.021), (86, 0.076), (91, 0.013), (98, 0.197)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90390575 206 nips-2009-Riffled Independence for Ranked Data

Author: Jonathan Huang, Carlos Guestrin

Abstract: Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. In ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. 1

2 0.88958865 25 nips-2009-Adaptive Design Optimization in Experiments with People

Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt

Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efficiently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to find designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1

3 0.886967 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

same-paper 4 0.86415082 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

Author: Mladen Kolar, Le Song, Eric P. Xing

Abstract: To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1

5 0.75780827 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

Author: Kwang I. Kim, Florian Steinke, Matthias Hein

Abstract: Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Based on these observations, we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both these problems. If the data lies on or close to a low-dimensional submanifold in feature space, the Hessian energy prefers functions whose values vary linearly with respect to geodesic distance. We first derive the Hessian energy for smooth manifolds and continue to give a stable estimation procedure for the common case where only samples of the underlying manifold are given. The preference of ‘’linear” functions on manifolds renders the Hessian energy particularly suited for the task of semi-supervised dimensionality reduction, where the goal is to find a user-defined embedding function given some labeled points which varies smoothly (and ideally linearly) along the manifold. The experimental results suggest superior performance of our method compared with semi-supervised regression using Laplacian regularization or standard supervised regression techniques applied to this task. 1

6 0.74964446 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

7 0.7473436 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

8 0.7470907 97 nips-2009-Free energy score space

9 0.74334484 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

10 0.7418561 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

11 0.74118888 133 nips-2009-Learning models of object structure

12 0.73937804 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

13 0.73931187 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

14 0.73899001 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

15 0.73686975 157 nips-2009-Multi-Label Prediction via Compressed Sensing

16 0.73640597 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

17 0.73560292 175 nips-2009-Occlusive Components Analysis

18 0.73521209 129 nips-2009-Learning a Small Mixture of Trees

19 0.73291874 71 nips-2009-Distribution-Calibrated Hierarchical Classification

20 0.73226172 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes