jmlr jmlr2013 jmlr2013-106 knowledge-graph by maker-knowledge-mining

106 jmlr-2013-Stationary-Sparse Causality Network Learning

Source: pdf

Author: Yuejia He, Yiyuan She, Dapeng Wu

Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identiﬁcation. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efﬁcient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efﬁciency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Electrical and Computer Engineering University of Florida Gainesville, FL 32611-6130 Editor: Hui Zou Abstract Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. [sent-6, score-0.248]

2 In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identiﬁcation. [sent-9, score-0.284]

3 We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. [sent-12, score-0.259]

4 Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. [sent-15, score-0.249]

5 Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. [sent-16, score-0.24]

6 Introduction There has been an increasing interest in identifying network dynamics and topologies in the emerging scientiﬁc discipline of network science (e. [sent-18, score-0.264]

7 If the topology and evolution rules of the network are known, we can analyze the c 2013 Yuejia He, Yiyuan She and Dapeng Wu. [sent-24, score-0.164]

8 Similarly, the modeling and estimation of dynamical networks are of great importance for various domains including stock market (Mills and Markellos, 2008), brain network (Bullmore and Sporns, 2009) and social network (Hanneke et al. [sent-26, score-0.359]

9 In practice, we can obtain discrete observations of the network over a period of time, which can usually be modeled by multivariate time series from a statistical perspective. [sent-29, score-0.224]

10 These multivariate time series contain important information of the network topology and dynamics. [sent-31, score-0.187]

11 In estimating the transition matrix, one must bare in mind two most important objectives: ﬁrst, the estimate ﬁtted on the training data should provide accurate prediction; second, a sparse topology that illustrates the most prominent network connections is desired. [sent-37, score-0.23]

12 Nevertheless, the major problem of the ℓ1 penalty for dynamical network learning is its incapability of handling collinearity, which typically exists in network data as a result of the interaction between nodes. [sent-41, score-0.379]

13 We propose a Berhu thresholding operator to efﬁciently solve the Berhu penalized problem. [sent-45, score-0.157]

14 In real-world problems, raw observations from a dynamical network are usually preprocessed and stationarized before the application of PML (Stock and Watson, 2012; Hsu et al. [sent-46, score-0.197]

15 Hence, our work, distinguished from the existing ones, focuses on enforcing the stationarity guarantee in network estimation and topology identiﬁcation, which is of great importance but has never been properly addressed. [sent-50, score-0.34]

16 The stationarity condition of the VAR model is its spectral radius being smaller than one (Reinsel, 1997). [sent-51, score-0.176]

17 Hence, we use a convex relaxation and come up with a stationarity constrained PML problem. [sent-54, score-0.17]

18 We propose an efﬁcient algorithm, the Berhu iterative sparsity pursuit with stationarity (BISPS), to achieve stationary-sparse (S2 ) network learning. [sent-55, score-0.379]

19 For a network with p nodes and n observations, the number of unknown variables in the transition matrix is p2 , and we frequently face practical data sets with p2 ≫ n, for example, microarray data sets consisting of thousands of genes but fewer than a hundred observations. [sent-60, score-0.175]

20 First, the quantile thresholding iterative screening (QTIS) is designed to “preselect” connections for the BISPS algorithm in a supervised manner. [sent-63, score-0.349]

21 QTIS differs from existing screening techniques such as the sure independence screening (Fan and Lv, 2008) in that it takes into account of collinearity in the data. [sent-64, score-0.329]

22 Secondly, we propose the stationary bootstrap enhanced BISPS (SB-BISPS). [sent-65, score-0.21]

23 Our work applies this powerful tool with stationarity guarantee to network identiﬁcation and provides a conﬁdence level for the occurrence of each possible connection in the network. [sent-67, score-0.288]

24 The remainder of the paper is organized as follows: Section 2 introduces the stationary and sparse network model and formulates the S2 learning framework. [sent-68, score-0.209]

25 Section 3 proposes our algorithms, mainly the Berhu iterative sparsity pursuit with stationarity (BISPS) and the quantile thresholding iterative screening (QTIS), and provides theoretical proofs for their convergence. [sent-69, score-0.571]

26 Section 4 describes the stationary bootstrap enhanced BISPS (SB-BISPS). [sent-70, score-0.21]

27 We are interested in characterizing the observations of x at different time points using mathematical models, based on which we can conduct useful network analysis. [sent-78, score-0.144]

28 u In (1), the transition matrix B = [bi j ]1≤i, j≤p describes a network that represents the dynamical system: if bi j = 0, there is a Granger causal connection (Granger, 1969) from node j to node i with weight bi j . [sent-84, score-0.336]

29 For example, for a network with 6 nodes 3075 H E , S HE AND W U 0. [sent-86, score-0.146]

30 2 2 (a) causality network (b) multivariate time series Figure 1: Example of a network (1) with transition matrix (2). [sent-94, score-0.323]

31 Therefore, matrix B not only illustrates the dynamic rules that govern the evolution of the system, but also captures a linear causality network that describes the (Granger) casual relations between nodes. [sent-111, score-0.151]

32 1 Sparse Network Learning by Penalized Maximum Likelihood Estimation Given n observations of the dynamical network x1 , · · · , xn , we wish to estimate B. [sent-113, score-0.197]

33 macroeconomic data (Stock and Watson, 2012), which are stationary after proper transformations. [sent-158, score-0.187]

34 5 −1 0 50 100 150 t Figure 2: Example of stationary and nonstationary processes. [sent-162, score-0.183]

35 3078 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING In this paper, we propose the framework of stationary-sparse (S2 ) network learning to address the limitation of PML to guarantee the stationarity property of the network. [sent-177, score-0.266]

36 We invoke the stationarity condition and design an efﬁcient algorithm to solve the optimization problem. [sent-178, score-0.146]

37 A random process as in (1) is stationary if and only if its spectral radius ρ(A) satisﬁes the stationarity condition: ∆ ρ(A) = max |λi | < 1, i (4) where λi is the ith eigenvalue of A, possibly complex, and | · | is the complex norm (Reinsel, 1997). [sent-179, score-0.265]

38 We consider a reasonable convex relaxation of (4) as the stationarity constraint: A ∆ 2 = max |νi | ≤ 1, i where νi is the ith singular value of A and thus A 2 is the spectral norm. [sent-189, score-0.225]

39 Note that the stationarity constraint also has a “shrinking” effect on the estimate, which contributes to the shrinkage estimation we are seeking, as discussed in Section 2. [sent-195, score-0.173]

40 1, coherence is often observed in real-world network data, especially when some nodes have similar dynamical behaviors or strong inﬂuence between each other. [sent-199, score-0.199]

41 Hence, a more proper penalty is in need for S2 network learning. [sent-201, score-0.206]

42 Lasso penalty ridge penalty 1 eNet penalty 2 1. [sent-206, score-0.311]

43 5 Berhu penalty 4 1 0 −2 Soft thresholding 0 0 −2 2 ridge thresholding 1. [sent-209, score-0.347]

44 5 0 2 0 −2 eNet thresholding 0 2 Berhu thresholding 2 2 2 2 1 1 1 1 0 0 0 0 −1 −1 −1 −1 −2 −2 0 2 −2 −2 0 2 −2 −2 0 2 −2 −2 0 2 Figure 4: Penalty functions and corresponding solutions. [sent-211, score-0.208]

45 The stationarity constraint adds more difﬁculties to the problem. [sent-233, score-0.146]

46 Computation of BISPS In this section, we propose an algorithm named Berhu iterative sparsity pursuit with stationarity (BISPS) to effectively solve the S2 learning problem (8). [sent-237, score-0.259]

47 Finally, to facilitate BISPS for ultra-high dimensional problems, we propose the quantile thresholding iterative screening (QTIS). [sent-241, score-0.312]

48 0 on a PC uu u with 4GB memory; when the number of network nodes is larger than 100, both solvers ran out of memory. [sent-257, score-0.146]

49 In this paper, we reparameterize Berhu and develop its coupled thresholding rule, which allows us to solve the Berhu sparsity pursuit—problem (3) with Berhu penalty—in a simple and efﬁcient way. [sent-292, score-0.142]

50 1+η (10) It can be veriﬁed that, as shown in Lemma 3, ΘB (·; λ, η) is the coupled thresholding rule for the Berhu penalty PB (·; λ, η): PB (t; λ, η) = |t| 0 (sup{s : ΘB (s; λ, η) ≤ u} − u)du. [sent-298, score-0.19]

51 With ΘB (·; λ, η), we can solve the Berhu sparsity pursuit (without the stationarity constraint) using a simple iterative procedure: Ak+1 = ΘB (Ak + αk (ΣXY − ΣXX Ak ); λ, η). [sent-300, score-0.259]

52 It is worth pointing out that, based on the construction rule (13), we can also deﬁne the thresholding operators for other penalties including Lasso, eNet and the ridge penalty, as shown in Figure 4. [sent-302, score-0.195]

53 The Berhu thresholding operator ΘB (t; λ, η) offers a nonlinear fusion of the soft thresholding operator (coupled with Lasso) ΘS (t; λ) = sgn(t)(|t| − λ)1|t|≥λ and the ridge thresholding operat tor ΘR (t; η) = 1+η . [sent-303, score-0.409]

54 For the difference between the Berhu thresholding and the eNet thresholding 1 ΘE (t; λ, η) = 1+η sgn(t)(|t| − λ)1|t|≥λ , see the discussion in Section 2. [sent-304, score-0.208]

55 1 is to enforce sparsity by Berhu thresholding and Step 2. [sent-320, score-0.142]

56 If it satisﬁes the 3084 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING stationarity condition (4), we accept and output this solution. [sent-342, score-0.146]

57 4 Quantile Thresholding Iterative Screening Nowadays, a great challenge for network identiﬁcation and statistical learning comes from the large scale of the system. [sent-352, score-0.15]

58 For example, for a network with p = 1000 nodes, the number of variables in the transition matrix is as large as p2 = 106 , which poses a great challenge for any estimation algorithm in scalability and stability. [sent-353, score-0.179]

59 This idea can be adopted in network identiﬁcation: if one is sure that the average number of connections for each node is much less than ⌈µn⌉ (say µ = 0. [sent-357, score-0.192]

60 8) or the total number of connections in the network is much less than ⌈µpn⌉, one can ﬁrst use fast screening techniques to select m = ⌈µpn⌉ candidate connections, and then apply BISPS restricted on the candidate connections for further selection and estimation. [sent-358, score-0.318]

61 If the screening technique can include all the true connections with high probability, dramatic computational gain can be attained with mild performance sacriﬁce. [sent-359, score-0.161]

62 Independence screening methods, such as the sure independence screening (SIS) (Fan and Lv, 2008) can be applied to preselect variables in a supervised manner. [sent-360, score-0.248]

63 In network settings, the nodes are interacting dynamically with each other, so there is usually high collinearity in the data. [sent-363, score-0.227]

64 To derive a new screening technique that can handle network data, we ﬁrst observe that SIS corresponds to the ﬁrst step of the iterative procedure in (11) with A0 = 0 and hard thresholding ΘH (t; λ) = t1|t|≥λ with a properly chosen λ. [sent-365, score-0.385]

65 3085 H E , S HE AND W U We call this screening procedure the quantile thresholding iterative screening (QTIS). [sent-367, score-0.436]

66 In this section, we propose the stationary bootstrap enhanced BISPS (SB-BISPS) which provides a conﬁdence level about whether a connection exists in the network by measuring the frequency with which it is chosen by the BISPS algorithm. [sent-409, score-0.352]

67 1 The SB-BISPS Framework The SB-BISPS procedure completes the BISPS (or QTIS+BISPS) algorithm with a stationary bootstrap resampling step. [sent-411, score-0.207]

68 ˆ • Step 2 : Draw B stationary bootstrap samples from X . [sent-415, score-0.183]

69 For example, if fi j = 80%, it means that in 80% of the bootstrap samples, a connection from node i to node j is detected. [sent-420, score-0.186]

70 The stationary bootstrap (Politis and Romano, 1994) is a method with this property. [sent-431, score-0.183]

71 Fortunately, the sensitivity of γ in stationary bootstrap is less than that of l in block bootstrap. [sent-436, score-0.183]

72 • Stationarity violation percentage (Pv ): In N repeated experiments, if there are Nv experiments ˆ in which the estimate A violates the stationarity condition (4), then the stationarity violation percentage is deﬁned as Pv = Nv /N. [sent-441, score-0.292]

73 Then 3088 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING we use this estimate to forecast xt+h , denoting the forecast as xt+h and the forecasting error as ˆ ˆ eth = xt+h − xt+h 2 . [sent-458, score-0.15]

74 We generate the p × p transition matrix A with both sparsity and stationarity properties. [sent-466, score-0.213]

75 For example, for the network with 300 nodes, the number of parameters to be estimated is 9 × 104 , which is extremely high compared with the number of observations 80. [sent-485, score-0.144]

76 As shown by Pv , no matter what penalty is used, it is possible for PML to give a nonstationary ˆ estimate, whereas the proposed S2 learning and BISPS can guarantee the stationarity property of A. [sent-493, score-0.326]

77 This indicates that adding the stationarity constraint into the sparsity pursuit does effectively prevent the estimate from becoming nonstationary. [sent-494, score-0.222]

78 Table 1 gives the rolling MSEs for different horizons h to illustrate both the short term and long term forecasting performance. [sent-496, score-0.152]

79 For PML estimates, the rolling MSEs grow explosively with h due to the existence of nonstationary estimates, while those of BISPS accumulate much more slowly. [sent-497, score-0.208]

80 To further illustrate the disadvantages of a nonstationary estimate, we ﬁnd one run where ˆ ˆ Lasso gives a nonstationary estimate ALasso . [sent-498, score-0.188]

81 3 Performance of QTIS To examine the performance of QTIS for connection screening, we ﬁrst compare it with the sure independence screening (SIS) (Fan and Lv, 2008) by examining their ability to include all the true connections, which can be measured by the miss rate. [sent-567, score-0.197]

82 As a result, independence screening is not appropriate for network learning. [sent-574, score-0.244]

83 Top: sample from ˆ the true model; Middle: forecast from the nonstationary estimate ALasso ; Bottom: forecast ˆ BISPS . [sent-588, score-0.206]

84 We use the S2 network model (1) to detect Granger causal relations between these macroeconomic indices. [sent-667, score-0.26]

85 Table 2 shows the rolling MSEs of the Lasso and BISPS, normalized by that of the AR(4) model, which is a conventional benchmark of macroeconomic forecasting. [sent-685, score-0.212]

86 Therefore, by introducing the Granger causal interactions between different indices, we can build a multivariate network model that is more accurate than the univariate AR model in capturing the evolution of the U. [sent-687, score-0.162]

87 It indicates that adopting a fusion of ℓ1 and ℓ2 penalties and imposing the stationarity constraint can capture the network dynamics more accurately and achieve a stronger capability of forecasting. [sent-692, score-0.304]

88 As the horizon increases, the rolling MSE of Lasso grows exponentially, which clearly indicates that some estimates of the Lasso are nonstationary and thus fail to forecast for large horizons. [sent-696, score-0.264]

89 The stationary bootstrap samples are obtained using the R function tsboot (Dalgaard, 2008) with default parameter values. [sent-707, score-0.183]

90 The number of stationary bootstrap samples is set to be B = 100. [sent-708, score-0.183]

91 Distinguished from the existing works, we explicitly incorporated the stationarity concern in a possibly ultra-high dimensional scenario and provided a probabilistic measure for the occurrence of any causal connection. [sent-747, score-0.188]

92 We added a relaxed stationarity constraint in the penalized maximum likelihood estimation and proposed the BISPS algorithm which is easy to implement and computationally efﬁcient. [sent-748, score-0.177]

93 We must point out that although the algorithm is designed for the Berhu penalty, the framework extends to any convex penalties and their coupled thresholding rules. [sent-749, score-0.166]

94 In network modeling, the number of unknown variables p2 is often much larger than the number of observations n, which confronts us with an ultra-high dimensional problem. [sent-750, score-0.144]

95 Therefore, we implanted the quantile thresholding iterative screening (QTIS) into the BISPS algorithm to improve scalability and computational efﬁciency. [sent-751, score-0.312]

96 Furthermore, the stationary bootstrap enhanced BISPS (SB-BISPS) was proposed to provide a conﬁdence measure for each possible connection in the network. [sent-752, score-0.232]

97 Finally, we will proceed to nonlinear time series models for network identiﬁcation to handle more complex network data. [sent-760, score-0.263]

98 3095 H E , S HE AND W U Lemma 3 Given the Berhu penalty (9), the minimization problem min B 1 B−Ξ 2 2 F + PB (B; λ, η) (12) ˆ has a unique optimal solution given by B = ΘB (Ξ; λ, η), where ΘB (·; λ, η) is the Berhu thresholding rule deﬁned as (10). [sent-766, score-0.19]

99 Proof of Theorem 2 First, we introduce a quantile thresholding rule Θ# (·; m) as a variant of the hard thresholding rule. [sent-867, score-0.255]

100 The jackknife and the bootstrap for general stationary observations. [sent-1128, score-0.183]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bisps', 0.545), ('berhu', 0.431), ('qtis', 0.228), ('moderation', 0.211), ('ak', 0.193), ('stationarity', 0.146), ('enet', 0.138), ('pml', 0.138), ('screening', 0.124), ('ausality', 0.122), ('etwork', 0.122), ('network', 0.12), ('pb', 0.115), ('rolling', 0.114), ('thresholding', 0.104), ('lasso', 0.104), ('macroeconomic', 0.098), ('nonstationary', 0.094), ('bootstrap', 0.094), ('stationary', 0.089), ('penalty', 0.086), ('collinearity', 0.081), ('nonexpansive', 0.081), ('granger', 0.077), ('sis', 0.069), ('parse', 0.066), ('period', 0.057), ('akj', 0.057), ('forecast', 0.056), ('dynamical', 0.053), ('ridge', 0.053), ('xt', 0.053), ('miss', 0.051), ('aml', 0.049), ('scv', 0.049), ('quantile', 0.047), ('fan', 0.044), ('topology', 0.044), ('ml', 0.043), ('sgn', 0.043), ('causal', 0.042), ('earning', 0.042), ('cof', 0.041), ('overton', 0.041), ('sparsity', 0.038), ('pursuit', 0.038), ('penalties', 0.038), ('economic', 0.038), ('forecasting', 0.038), ('admm', 0.038), ('fb', 0.038), ('iterative', 0.037), ('connections', 0.037), ('xx', 0.037), ('stock', 0.036), ('xa', 0.035), ('node', 0.035), ('zou', 0.033), ('alasso', 0.033), ('bxt', 0.033), ('xy', 0.032), ('causality', 0.031), ('lv', 0.031), ('ty', 0.031), ('penalized', 0.031), ('mse', 0.03), ('great', 0.03), ('spectral', 0.03), ('subgradient', 0.029), ('transition', 0.029), ('regulation', 0.028), ('mses', 0.028), ('huber', 0.027), ('enhanced', 0.027), ('shrinkage', 0.027), ('nonconvex', 0.026), ('nodes', 0.026), ('singular', 0.025), ('tu', 0.025), ('pv', 0.025), ('browder', 0.024), ('curtis', 0.024), ('hoerl', 0.024), ('opial', 0.024), ('pf', 0.024), ('psgm', 0.024), ('topologies', 0.024), ('resampling', 0.024), ('observations', 0.024), ('convex', 0.024), ('series', 0.023), ('autoregressive', 0.023), ('lemma', 0.023), ('connection', 0.022), ('ai', 0.022), ('operator', 0.022), ('dv', 0.022), ('svd', 0.021), ('reinsel', 0.021), ('owen', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 106 jmlr-2013-Stationary-Sparse Causality Network Learning

Author: Yuejia He, Yiyuan She, Dapeng Wu

2 0.095488034 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

Author: Tingni Sun, Cun-Hui Zhang

Abstract: We propose a new method of learning a sparse nonnegative-deﬁnite target matrix. Our primary example of the target matrix is the inverse of a population covariance or correlation matrix. The algorithm ﬁrst estimates each column of the target matrix by the scaled Lasso and then adjusts the matrix estimator to be symmetric. The penalty level of the scaled Lasso for each column is completely determined by data via convex minimization, without using cross-validation. We prove that this scaled Lasso method guarantees the fastest proven rate of convergence in the spectrum norm under conditions of weaker form than those in the existing analyses of other ℓ1 regularized algorithms, and has faster guaranteed rate of convergence when the ratio of the ℓ1 and spectrum norms of the target inverse matrix diverges to inﬁnity. A simulation study demonstrates the computational feasibility and superb performance of the proposed method. Our analysis also provides new performance bounds for the Lasso and scaled Lasso to guarantee higher concentration of the error at a smaller threshold level than previous analyses, and to allow the use of the union bound in column-by-column applications of the scaled Lasso without an adjustment of the penalty level. In addition, the least squares estimation after the scaled Lasso selection is considered and proven to guarantee performance bounds similar to that of the scaled Lasso. Keywords: precision matrix, concentration matrix, inverse matrix, graphical model, scaled Lasso, linear regression, spectrum norm

3 0.071561329 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

Author: Julien Mairal, Bin Yu

Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efﬁciently solve by leveraging network ﬂow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network ﬂow optimization, graph sparsity

4 0.062352385 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability

Author: Wei Sun, Junhui Wang, Yixin Fang

Abstract: Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model ﬁtting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model ﬁtting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both ﬁxed and diverging dimensions. Its effectiveness is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data. Keywords: kappa coefﬁcient, penalized regression, selection consistency, stability, tuning

5 0.048027951 103 jmlr-2013-Sparse Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

Author: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We introduce a new class of quadratic support (QS) functions, many of which already play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and inverse problems such as Kalman smoothing. Well known examples of QS penalties include the ℓ2 , Huber, ℓ1 and Vapnik losses. We build on a dual representation for QS functions, using it to characterize conditions necessary to interpret these functions as negative logs of true probability densities. This interpretation establishes the foundation for statistical modeling with both known and new QS loss functions, and enables construction of non-smooth multivariate distributions with speciﬁed means and variances from simple scalar building blocks. The main contribution of this paper is a ﬂexible statistical modeling framework for a variety of learning applications, together with a toolbox of efﬁcient numerical methods for estimation. In particular, a broad subclass of QS loss functions known as piecewise linear quadratic (PLQ) penalties has a dual representation that can be exploited to design interior point (IP) methods. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. We provide several numerical examples, along with a code that can be used to solve general PLQ problems. The efﬁciency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing. This class comprises a wide variety of applications, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaus∗. The authors would like to thank Bradley M. Bell for insightful discussions and helpful suggestions. The research leading to these results has received funding from the European Union Seventh Framework Programme [FP7/2007-2013

6 0.046193648 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

7 0.039217707 76 jmlr-2013-Nonparametric Sparsity and Regularization

8 0.037764251 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

9 0.03726868 120 jmlr-2013-Variational Algorithms for Marginal MAP

10 0.036664441 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

11 0.035760909 104 jmlr-2013-Sparse Single-Index Model

12 0.034770623 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

13 0.032931965 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

14 0.031978197 53 jmlr-2013-Improving CUR Matrix Decomposition and the Nystrom Approximation via Adaptive Sampling

15 0.031839725 82 jmlr-2013-Optimally Fuzzy Temporal Memory

16 0.031575631 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

17 0.031484406 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

18 0.029857771 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

19 0.029616334 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

20 0.028901737 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.161), (1, 0.024), (2, 0.004), (3, 0.062), (4, 0.118), (5, 0.011), (6, -0.004), (7, -0.015), (8, 0.175), (9, 0.072), (10, 0.013), (11, -0.076), (12, -0.136), (13, 0.006), (14, 0.004), (15, -0.123), (16, 0.013), (17, 0.139), (18, 0.088), (19, 0.105), (20, -0.022), (21, -0.066), (22, 0.036), (23, -0.017), (24, -0.049), (25, -0.034), (26, -0.078), (27, 0.025), (28, 0.105), (29, -0.015), (30, -0.041), (31, -0.027), (32, 0.109), (33, 0.034), (34, 0.152), (35, -0.01), (36, 0.036), (37, -0.034), (38, -0.033), (39, 0.209), (40, -0.004), (41, 0.165), (42, 0.061), (43, -0.132), (44, 0.012), (45, -0.045), (46, -0.07), (47, 0.257), (48, -0.13), (49, -0.094)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92340732 106 jmlr-2013-Stationary-Sparse Causality Network Learning

Author: Yuejia He, Yiyuan She, Dapeng Wu

2 0.43053347 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

Author: Tingni Sun, Cun-Hui Zhang

3 0.38845536 95 jmlr-2013-Ranking Forests

Author: Stéphan Clémençon, Marine Depecker, Nicolas Vayatis

Abstract: The present paper examines how the aggregation and feature randomization principles underlying the algorithm R ANDOM F OREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Cl´ mencon and Vayatis (2009c). The present work e ¸ describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called R ANK ING F OREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artiﬁcial data sets. Keywords: bipartite ranking, nonparametric scoring, classiﬁcation data, ROC optimization, AUC criterion, tree-based ranking rules, bootstrap, bagging, rank aggregation, median ranking, feature randomization

4 0.37656596 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

Author: Julien Mairal, Bin Yu

5 0.36362305 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

Author: Sohail Bahmani, Bhiksha Raj, Petros T. Boufounos

Abstract: Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and Compressed Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the ﬁdelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsityconstrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressed Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic and real data, where the algorithm is employed for sparse logistic regression with and without ℓ2 -regularization. Keywords: sparsity, optimization, compressed sensing, greedy algorithm

6 0.33733943 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

7 0.32932952 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

8 0.3250736 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability

9 0.31075108 103 jmlr-2013-Sparse Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

10 0.28020003 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

11 0.28008184 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

12 0.2785832 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

13 0.26520842 82 jmlr-2013-Optimally Fuzzy Temporal Memory

14 0.25670347 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

15 0.24415012 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models

16 0.23080246 76 jmlr-2013-Nonparametric Sparsity and Regularization

17 0.20312554 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

18 0.20148149 41 jmlr-2013-Experiment Selection for Causal Discovery

19 0.19982748 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

20 0.19551516 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.021), (5, 0.087), (6, 0.527), (10, 0.049), (20, 0.021), (23, 0.034), (41, 0.016), (44, 0.012), (68, 0.035), (70, 0.025), (75, 0.034), (85, 0.017), (87, 0.029), (93, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94769937 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

Author: Nicholas Ruozzi, Sekhar Tatikonda

Abstract: Gaussian belief propagation (GaBP) is an iterative algorithm for computing the mean (and variances) of a multivariate Gaussian distribution, or equivalently, the minimum of a multivariate positive deﬁnite quadratic function. Sufﬁcient conditions, such as walk-summability, that guarantee the convergence and correctness of GaBP are known, but GaBP may fail to converge to the correct solution given an arbitrary positive deﬁnite covariance matrix. As was observed by Malioutov et al. (2006), the GaBP algorithm fails to converge if the computation trees produced by the algorithm are not positive deﬁnite. In this work, we will show that the failure modes of the GaBP algorithm can be understood via graph covers, and we prove that a parameterized generalization of the min-sum algorithm can be used to ensure that the computation trees remain positive deﬁnite whenever the input matrix is positive deﬁnite. We demonstrate that the resulting algorithm is closely related to other iterative schemes for quadratic minimization such as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically, that there always exists a choice of parameters such that the above generalization of the GaBP algorithm converges. Keywords: belief propagation, Gaussian graphical models, graph covers

2 0.91191471 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

Author: Dan Stowell, Mark D. Plumbley

Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a ﬁxed number of sources and/or a ﬁxed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, ﬂow network, sound

same-paper 3 0.8794592 106 jmlr-2013-Stationary-Sparse Causality Network Learning

Author: Yuejia He, Yiyuan She, Dapeng Wu

4 0.54371232 82 jmlr-2013-Optimally Fuzzy Temporal Memory

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacriﬁces the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

5 0.53363949 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms signiﬁcantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models

6 0.48814434 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

7 0.46964154 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

8 0.4662334 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

9 0.45582658 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

10 0.45085984 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

11 0.43744352 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

12 0.43235943 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

13 0.42155987 22 jmlr-2013-Classifying With Confidence From Incomplete Information

14 0.41319433 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion

15 0.40537456 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

16 0.40311822 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

17 0.39430135 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

18 0.39266121 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

19 0.38878879 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

20 0.38839301 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs