jmlr jmlr2013 jmlr2013-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuejia He, Yiyuan She, Dapeng Wu
Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efficient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efficiency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Department of Electrical and Computer Engineering University of Florida Gainesville, FL 32611-6130 Editor: Hui Zou Abstract Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. [sent-6, score-0.248]
2 In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. [sent-9, score-0.284]
3 We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. [sent-12, score-0.259]
4 Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. [sent-15, score-0.249]
5 Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. [sent-16, score-0.24]
6 Introduction There has been an increasing interest in identifying network dynamics and topologies in the emerging scientific discipline of network science (e. [sent-18, score-0.264]
7 If the topology and evolution rules of the network are known, we can analyze the c 2013 Yuejia He, Yiyuan She and Dapeng Wu. [sent-24, score-0.164]
8 Similarly, the modeling and estimation of dynamical networks are of great importance for various domains including stock market (Mills and Markellos, 2008), brain network (Bullmore and Sporns, 2009) and social network (Hanneke et al. [sent-26, score-0.359]
9 In practice, we can obtain discrete observations of the network over a period of time, which can usually be modeled by multivariate time series from a statistical perspective. [sent-29, score-0.224]
10 These multivariate time series contain important information of the network topology and dynamics. [sent-31, score-0.187]
11 In estimating the transition matrix, one must bare in mind two most important objectives: first, the estimate fitted on the training data should provide accurate prediction; second, a sparse topology that illustrates the most prominent network connections is desired. [sent-37, score-0.23]
12 Nevertheless, the major problem of the ℓ1 penalty for dynamical network learning is its incapability of handling collinearity, which typically exists in network data as a result of the interaction between nodes. [sent-41, score-0.379]
13 We propose a Berhu thresholding operator to efficiently solve the Berhu penalized problem. [sent-45, score-0.157]
14 In real-world problems, raw observations from a dynamical network are usually preprocessed and stationarized before the application of PML (Stock and Watson, 2012; Hsu et al. [sent-46, score-0.197]
15 Hence, our work, distinguished from the existing ones, focuses on enforcing the stationarity guarantee in network estimation and topology identification, which is of great importance but has never been properly addressed. [sent-50, score-0.34]
16 The stationarity condition of the VAR model is its spectral radius being smaller than one (Reinsel, 1997). [sent-51, score-0.176]
17 Hence, we use a convex relaxation and come up with a stationarity constrained PML problem. [sent-54, score-0.17]
18 We propose an efficient algorithm, the Berhu iterative sparsity pursuit with stationarity (BISPS), to achieve stationary-sparse (S2 ) network learning. [sent-55, score-0.379]
19 For a network with p nodes and n observations, the number of unknown variables in the transition matrix is p2 , and we frequently face practical data sets with p2 ≫ n, for example, microarray data sets consisting of thousands of genes but fewer than a hundred observations. [sent-60, score-0.175]
20 First, the quantile thresholding iterative screening (QTIS) is designed to “preselect” connections for the BISPS algorithm in a supervised manner. [sent-63, score-0.349]
21 QTIS differs from existing screening techniques such as the sure independence screening (Fan and Lv, 2008) in that it takes into account of collinearity in the data. [sent-64, score-0.329]
22 Secondly, we propose the stationary bootstrap enhanced BISPS (SB-BISPS). [sent-65, score-0.21]
23 Our work applies this powerful tool with stationarity guarantee to network identification and provides a confidence level for the occurrence of each possible connection in the network. [sent-67, score-0.288]
24 The remainder of the paper is organized as follows: Section 2 introduces the stationary and sparse network model and formulates the S2 learning framework. [sent-68, score-0.209]
25 Section 3 proposes our algorithms, mainly the Berhu iterative sparsity pursuit with stationarity (BISPS) and the quantile thresholding iterative screening (QTIS), and provides theoretical proofs for their convergence. [sent-69, score-0.571]
26 Section 4 describes the stationary bootstrap enhanced BISPS (SB-BISPS). [sent-70, score-0.21]
27 We are interested in characterizing the observations of x at different time points using mathematical models, based on which we can conduct useful network analysis. [sent-78, score-0.144]
28 u In (1), the transition matrix B = [bi j ]1≤i, j≤p describes a network that represents the dynamical system: if bi j = 0, there is a Granger causal connection (Granger, 1969) from node j to node i with weight bi j . [sent-84, score-0.336]
29 For example, for a network with 6 nodes 3075 H E , S HE AND W U 0. [sent-86, score-0.146]
30 2 2 (a) causality network (b) multivariate time series Figure 1: Example of a network (1) with transition matrix (2). [sent-94, score-0.323]
31 Therefore, matrix B not only illustrates the dynamic rules that govern the evolution of the system, but also captures a linear causality network that describes the (Granger) casual relations between nodes. [sent-111, score-0.151]
32 1 Sparse Network Learning by Penalized Maximum Likelihood Estimation Given n observations of the dynamical network x1 , · · · , xn , we wish to estimate B. [sent-113, score-0.197]
33 macroeconomic data (Stock and Watson, 2012), which are stationary after proper transformations. [sent-158, score-0.187]
34 5 −1 0 50 100 150 t Figure 2: Example of stationary and nonstationary processes. [sent-162, score-0.183]
35 3078 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING In this paper, we propose the framework of stationary-sparse (S2 ) network learning to address the limitation of PML to guarantee the stationarity property of the network. [sent-177, score-0.266]
36 We invoke the stationarity condition and design an efficient algorithm to solve the optimization problem. [sent-178, score-0.146]
37 A random process as in (1) is stationary if and only if its spectral radius ρ(A) satisfies the stationarity condition: ∆ ρ(A) = max |λi | < 1, i (4) where λi is the ith eigenvalue of A, possibly complex, and | · | is the complex norm (Reinsel, 1997). [sent-179, score-0.265]
38 We consider a reasonable convex relaxation of (4) as the stationarity constraint: A ∆ 2 = max |νi | ≤ 1, i where νi is the ith singular value of A and thus A 2 is the spectral norm. [sent-189, score-0.225]
39 Note that the stationarity constraint also has a “shrinking” effect on the estimate, which contributes to the shrinkage estimation we are seeking, as discussed in Section 2. [sent-195, score-0.173]
40 1, coherence is often observed in real-world network data, especially when some nodes have similar dynamical behaviors or strong influence between each other. [sent-199, score-0.199]
41 Hence, a more proper penalty is in need for S2 network learning. [sent-201, score-0.206]
42 Lasso penalty ridge penalty 1 eNet penalty 2 1. [sent-206, score-0.311]
43 5 Berhu penalty 4 1 0 −2 Soft thresholding 0 0 −2 2 ridge thresholding 1. [sent-209, score-0.347]
44 5 0 2 0 −2 eNet thresholding 0 2 Berhu thresholding 2 2 2 2 1 1 1 1 0 0 0 0 −1 −1 −1 −1 −2 −2 0 2 −2 −2 0 2 −2 −2 0 2 −2 −2 0 2 Figure 4: Penalty functions and corresponding solutions. [sent-211, score-0.208]
45 The stationarity constraint adds more difficulties to the problem. [sent-233, score-0.146]
46 Computation of BISPS In this section, we propose an algorithm named Berhu iterative sparsity pursuit with stationarity (BISPS) to effectively solve the S2 learning problem (8). [sent-237, score-0.259]
47 Finally, to facilitate BISPS for ultra-high dimensional problems, we propose the quantile thresholding iterative screening (QTIS). [sent-241, score-0.312]
48 0 on a PC uu u with 4GB memory; when the number of network nodes is larger than 100, both solvers ran out of memory. [sent-257, score-0.146]
49 In this paper, we reparameterize Berhu and develop its coupled thresholding rule, which allows us to solve the Berhu sparsity pursuit—problem (3) with Berhu penalty—in a simple and efficient way. [sent-292, score-0.142]
50 1+η (10) It can be verified that, as shown in Lemma 3, ΘB (·; λ, η) is the coupled thresholding rule for the Berhu penalty PB (·; λ, η): PB (t; λ, η) = |t| 0 (sup{s : ΘB (s; λ, η) ≤ u} − u)du. [sent-298, score-0.19]
51 With ΘB (·; λ, η), we can solve the Berhu sparsity pursuit (without the stationarity constraint) using a simple iterative procedure: Ak+1 = ΘB (Ak + αk (ΣXY − ΣXX Ak ); λ, η). [sent-300, score-0.259]
52 It is worth pointing out that, based on the construction rule (13), we can also define the thresholding operators for other penalties including Lasso, eNet and the ridge penalty, as shown in Figure 4. [sent-302, score-0.195]
53 The Berhu thresholding operator ΘB (t; λ, η) offers a nonlinear fusion of the soft thresholding operator (coupled with Lasso) ΘS (t; λ) = sgn(t)(|t| − λ)1|t|≥λ and the ridge thresholding operat tor ΘR (t; η) = 1+η . [sent-303, score-0.409]
54 For the difference between the Berhu thresholding and the eNet thresholding 1 ΘE (t; λ, η) = 1+η sgn(t)(|t| − λ)1|t|≥λ , see the discussion in Section 2. [sent-304, score-0.208]
55 1 is to enforce sparsity by Berhu thresholding and Step 2. [sent-320, score-0.142]
56 If it satisfies the 3084 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING stationarity condition (4), we accept and output this solution. [sent-342, score-0.146]
57 4 Quantile Thresholding Iterative Screening Nowadays, a great challenge for network identification and statistical learning comes from the large scale of the system. [sent-352, score-0.15]
58 For example, for a network with p = 1000 nodes, the number of variables in the transition matrix is as large as p2 = 106 , which poses a great challenge for any estimation algorithm in scalability and stability. [sent-353, score-0.179]
59 This idea can be adopted in network identification: if one is sure that the average number of connections for each node is much less than ⌈µn⌉ (say µ = 0. [sent-357, score-0.192]
60 8) or the total number of connections in the network is much less than ⌈µpn⌉, one can first use fast screening techniques to select m = ⌈µpn⌉ candidate connections, and then apply BISPS restricted on the candidate connections for further selection and estimation. [sent-358, score-0.318]
61 If the screening technique can include all the true connections with high probability, dramatic computational gain can be attained with mild performance sacrifice. [sent-359, score-0.161]
62 Independence screening methods, such as the sure independence screening (SIS) (Fan and Lv, 2008) can be applied to preselect variables in a supervised manner. [sent-360, score-0.248]
63 In network settings, the nodes are interacting dynamically with each other, so there is usually high collinearity in the data. [sent-363, score-0.227]
64 To derive a new screening technique that can handle network data, we first observe that SIS corresponds to the first step of the iterative procedure in (11) with A0 = 0 and hard thresholding ΘH (t; λ) = t1|t|≥λ with a properly chosen λ. [sent-365, score-0.385]
65 3085 H E , S HE AND W U We call this screening procedure the quantile thresholding iterative screening (QTIS). [sent-367, score-0.436]
66 In this section, we propose the stationary bootstrap enhanced BISPS (SB-BISPS) which provides a confidence level about whether a connection exists in the network by measuring the frequency with which it is chosen by the BISPS algorithm. [sent-409, score-0.352]
67 1 The SB-BISPS Framework The SB-BISPS procedure completes the BISPS (or QTIS+BISPS) algorithm with a stationary bootstrap resampling step. [sent-411, score-0.207]
68 ˆ • Step 2 : Draw B stationary bootstrap samples from X . [sent-415, score-0.183]
69 For example, if fi j = 80%, it means that in 80% of the bootstrap samples, a connection from node i to node j is detected. [sent-420, score-0.186]
70 The stationary bootstrap (Politis and Romano, 1994) is a method with this property. [sent-431, score-0.183]
71 Fortunately, the sensitivity of γ in stationary bootstrap is less than that of l in block bootstrap. [sent-436, score-0.183]
72 • Stationarity violation percentage (Pv ): In N repeated experiments, if there are Nv experiments ˆ in which the estimate A violates the stationarity condition (4), then the stationarity violation percentage is defined as Pv = Nv /N. [sent-441, score-0.292]
73 Then 3088 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING we use this estimate to forecast xt+h , denoting the forecast as xt+h and the forecasting error as ˆ ˆ eth = xt+h − xt+h 2 . [sent-458, score-0.15]
74 We generate the p × p transition matrix A with both sparsity and stationarity properties. [sent-466, score-0.213]
75 For example, for the network with 300 nodes, the number of parameters to be estimated is 9 × 104 , which is extremely high compared with the number of observations 80. [sent-485, score-0.144]
76 As shown by Pv , no matter what penalty is used, it is possible for PML to give a nonstationary ˆ estimate, whereas the proposed S2 learning and BISPS can guarantee the stationarity property of A. [sent-493, score-0.326]
77 This indicates that adding the stationarity constraint into the sparsity pursuit does effectively prevent the estimate from becoming nonstationary. [sent-494, score-0.222]
78 Table 1 gives the rolling MSEs for different horizons h to illustrate both the short term and long term forecasting performance. [sent-496, score-0.152]
79 For PML estimates, the rolling MSEs grow explosively with h due to the existence of nonstationary estimates, while those of BISPS accumulate much more slowly. [sent-497, score-0.208]
80 To further illustrate the disadvantages of a nonstationary estimate, we find one run where ˆ ˆ Lasso gives a nonstationary estimate ALasso . [sent-498, score-0.188]
81 3 Performance of QTIS To examine the performance of QTIS for connection screening, we first compare it with the sure independence screening (SIS) (Fan and Lv, 2008) by examining their ability to include all the true connections, which can be measured by the miss rate. [sent-567, score-0.197]
82 As a result, independence screening is not appropriate for network learning. [sent-574, score-0.244]
83 Top: sample from ˆ the true model; Middle: forecast from the nonstationary estimate ALasso ; Bottom: forecast ˆ BISPS . [sent-588, score-0.206]
84 We use the S2 network model (1) to detect Granger causal relations between these macroeconomic indices. [sent-667, score-0.26]
85 Table 2 shows the rolling MSEs of the Lasso and BISPS, normalized by that of the AR(4) model, which is a conventional benchmark of macroeconomic forecasting. [sent-685, score-0.212]
86 Therefore, by introducing the Granger causal interactions between different indices, we can build a multivariate network model that is more accurate than the univariate AR model in capturing the evolution of the U. [sent-687, score-0.162]
87 It indicates that adopting a fusion of ℓ1 and ℓ2 penalties and imposing the stationarity constraint can capture the network dynamics more accurately and achieve a stronger capability of forecasting. [sent-692, score-0.304]
88 As the horizon increases, the rolling MSE of Lasso grows exponentially, which clearly indicates that some estimates of the Lasso are nonstationary and thus fail to forecast for large horizons. [sent-696, score-0.264]
89 The stationary bootstrap samples are obtained using the R function tsboot (Dalgaard, 2008) with default parameter values. [sent-707, score-0.183]
90 The number of stationary bootstrap samples is set to be B = 100. [sent-708, score-0.183]
91 Distinguished from the existing works, we explicitly incorporated the stationarity concern in a possibly ultra-high dimensional scenario and provided a probabilistic measure for the occurrence of any causal connection. [sent-747, score-0.188]
92 We added a relaxed stationarity constraint in the penalized maximum likelihood estimation and proposed the BISPS algorithm which is easy to implement and computationally efficient. [sent-748, score-0.177]
93 We must point out that although the algorithm is designed for the Berhu penalty, the framework extends to any convex penalties and their coupled thresholding rules. [sent-749, score-0.166]
94 In network modeling, the number of unknown variables p2 is often much larger than the number of observations n, which confronts us with an ultra-high dimensional problem. [sent-750, score-0.144]
95 Therefore, we implanted the quantile thresholding iterative screening (QTIS) into the BISPS algorithm to improve scalability and computational efficiency. [sent-751, score-0.312]
96 Furthermore, the stationary bootstrap enhanced BISPS (SB-BISPS) was proposed to provide a confidence measure for each possible connection in the network. [sent-752, score-0.232]
97 Finally, we will proceed to nonlinear time series models for network identification to handle more complex network data. [sent-760, score-0.263]
98 3095 H E , S HE AND W U Lemma 3 Given the Berhu penalty (9), the minimization problem min B 1 B−Ξ 2 2 F + PB (B; λ, η) (12) ˆ has a unique optimal solution given by B = ΘB (Ξ; λ, η), where ΘB (·; λ, η) is the Berhu thresholding rule defined as (10). [sent-766, score-0.19]
99 Proof of Theorem 2 First, we introduce a quantile thresholding rule Θ# (·; m) as a variant of the hard thresholding rule. [sent-867, score-0.255]
100 The jackknife and the bootstrap for general stationary observations. [sent-1128, score-0.183]
wordName wordTfidf (topN-words)
[('bisps', 0.545), ('berhu', 0.431), ('qtis', 0.228), ('moderation', 0.211), ('ak', 0.193), ('stationarity', 0.146), ('enet', 0.138), ('pml', 0.138), ('screening', 0.124), ('ausality', 0.122), ('etwork', 0.122), ('network', 0.12), ('pb', 0.115), ('rolling', 0.114), ('thresholding', 0.104), ('lasso', 0.104), ('macroeconomic', 0.098), ('nonstationary', 0.094), ('bootstrap', 0.094), ('stationary', 0.089), ('penalty', 0.086), ('collinearity', 0.081), ('nonexpansive', 0.081), ('granger', 0.077), ('sis', 0.069), ('parse', 0.066), ('period', 0.057), ('akj', 0.057), ('forecast', 0.056), ('dynamical', 0.053), ('ridge', 0.053), ('xt', 0.053), ('miss', 0.051), ('aml', 0.049), ('scv', 0.049), ('quantile', 0.047), ('fan', 0.044), ('topology', 0.044), ('ml', 0.043), ('sgn', 0.043), ('causal', 0.042), ('earning', 0.042), ('cof', 0.041), ('overton', 0.041), ('sparsity', 0.038), ('pursuit', 0.038), ('penalties', 0.038), ('economic', 0.038), ('forecasting', 0.038), ('admm', 0.038), ('fb', 0.038), ('iterative', 0.037), ('connections', 0.037), ('xx', 0.037), ('stock', 0.036), ('xa', 0.035), ('node', 0.035), ('zou', 0.033), ('alasso', 0.033), ('bxt', 0.033), ('xy', 0.032), ('causality', 0.031), ('lv', 0.031), ('ty', 0.031), ('penalized', 0.031), ('mse', 0.03), ('great', 0.03), ('spectral', 0.03), ('subgradient', 0.029), ('transition', 0.029), ('regulation', 0.028), ('mses', 0.028), ('huber', 0.027), ('enhanced', 0.027), ('shrinkage', 0.027), ('nonconvex', 0.026), ('nodes', 0.026), ('singular', 0.025), ('tu', 0.025), ('pv', 0.025), ('browder', 0.024), ('curtis', 0.024), ('hoerl', 0.024), ('opial', 0.024), ('pf', 0.024), ('psgm', 0.024), ('topologies', 0.024), ('resampling', 0.024), ('observations', 0.024), ('convex', 0.024), ('series', 0.023), ('autoregressive', 0.023), ('lemma', 0.023), ('connection', 0.022), ('ai', 0.022), ('operator', 0.022), ('dv', 0.022), ('svd', 0.021), ('reinsel', 0.021), ('owen', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 106 jmlr-2013-Stationary-Sparse Causality Network Learning
Author: Yuejia He, Yiyuan She, Dapeng Wu
Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efficient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efficiency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap
2 0.095488034 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
Author: Tingni Sun, Cun-Hui Zhang
Abstract: We propose a new method of learning a sparse nonnegative-definite target matrix. Our primary example of the target matrix is the inverse of a population covariance or correlation matrix. The algorithm first estimates each column of the target matrix by the scaled Lasso and then adjusts the matrix estimator to be symmetric. The penalty level of the scaled Lasso for each column is completely determined by data via convex minimization, without using cross-validation. We prove that this scaled Lasso method guarantees the fastest proven rate of convergence in the spectrum norm under conditions of weaker form than those in the existing analyses of other ℓ1 regularized algorithms, and has faster guaranteed rate of convergence when the ratio of the ℓ1 and spectrum norms of the target inverse matrix diverges to infinity. A simulation study demonstrates the computational feasibility and superb performance of the proposed method. Our analysis also provides new performance bounds for the Lasso and scaled Lasso to guarantee higher concentration of the error at a smaller threshold level than previous analyses, and to allow the use of the union bound in column-by-column applications of the scaled Lasso without an adjustment of the penalty level. In addition, the least squares estimation after the scaled Lasso selection is considered and proven to guarantee performance bounds similar to that of the scaled Lasso. Keywords: precision matrix, concentration matrix, inverse matrix, graphical model, scaled Lasso, linear regression, spectrum norm
3 0.071561329 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Author: Julien Mairal, Bin Yu
Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network flow optimization, graph sparsity
4 0.062352385 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability
Author: Wei Sun, Junhui Wang, Yixin Fang
Abstract: Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both fixed and diverging dimensions. Its effectiveness is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data. Keywords: kappa coefficient, penalized regression, selection consistency, stability, tuning
Author: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto
Abstract: We introduce a new class of quadratic support (QS) functions, many of which already play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and inverse problems such as Kalman smoothing. Well known examples of QS penalties include the ℓ2 , Huber, ℓ1 and Vapnik losses. We build on a dual representation for QS functions, using it to characterize conditions necessary to interpret these functions as negative logs of true probability densities. This interpretation establishes the foundation for statistical modeling with both known and new QS loss functions, and enables construction of non-smooth multivariate distributions with specified means and variances from simple scalar building blocks. The main contribution of this paper is a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for estimation. In particular, a broad subclass of QS loss functions known as piecewise linear quadratic (PLQ) penalties has a dual representation that can be exploited to design interior point (IP) methods. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. We provide several numerical examples, along with a code that can be used to solve general PLQ problems. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing. This class comprises a wide variety of applications, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaus∗. The authors would like to thank Bradley M. Bell for insightful discussions and helpful suggestions. The research leading to these results has received funding from the European Union Seventh Framework Programme [FP7/2007-2013
6 0.046193648 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning
7 0.039217707 76 jmlr-2013-Nonparametric Sparsity and Regularization
8 0.037764251 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
9 0.03726868 120 jmlr-2013-Variational Algorithms for Marginal MAP
10 0.036664441 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
11 0.035760909 104 jmlr-2013-Sparse Single-Index Model
12 0.034770623 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
13 0.032931965 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
14 0.031978197 53 jmlr-2013-Improving CUR Matrix Decomposition and the Nystrom Approximation via Adaptive Sampling
15 0.031839725 82 jmlr-2013-Optimally Fuzzy Temporal Memory
16 0.031575631 51 jmlr-2013-Greedy Sparsity-Constrained Optimization
17 0.031484406 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
18 0.029857771 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
19 0.029616334 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres
20 0.028901737 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models
topicId topicWeight
[(0, -0.161), (1, 0.024), (2, 0.004), (3, 0.062), (4, 0.118), (5, 0.011), (6, -0.004), (7, -0.015), (8, 0.175), (9, 0.072), (10, 0.013), (11, -0.076), (12, -0.136), (13, 0.006), (14, 0.004), (15, -0.123), (16, 0.013), (17, 0.139), (18, 0.088), (19, 0.105), (20, -0.022), (21, -0.066), (22, 0.036), (23, -0.017), (24, -0.049), (25, -0.034), (26, -0.078), (27, 0.025), (28, 0.105), (29, -0.015), (30, -0.041), (31, -0.027), (32, 0.109), (33, 0.034), (34, 0.152), (35, -0.01), (36, 0.036), (37, -0.034), (38, -0.033), (39, 0.209), (40, -0.004), (41, 0.165), (42, 0.061), (43, -0.132), (44, 0.012), (45, -0.045), (46, -0.07), (47, 0.257), (48, -0.13), (49, -0.094)]
simIndex simValue paperId paperTitle
same-paper 1 0.92340732 106 jmlr-2013-Stationary-Sparse Causality Network Learning
Author: Yuejia He, Yiyuan She, Dapeng Wu
Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efficient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efficiency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap
2 0.43053347 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
Author: Tingni Sun, Cun-Hui Zhang
Abstract: We propose a new method of learning a sparse nonnegative-definite target matrix. Our primary example of the target matrix is the inverse of a population covariance or correlation matrix. The algorithm first estimates each column of the target matrix by the scaled Lasso and then adjusts the matrix estimator to be symmetric. The penalty level of the scaled Lasso for each column is completely determined by data via convex minimization, without using cross-validation. We prove that this scaled Lasso method guarantees the fastest proven rate of convergence in the spectrum norm under conditions of weaker form than those in the existing analyses of other ℓ1 regularized algorithms, and has faster guaranteed rate of convergence when the ratio of the ℓ1 and spectrum norms of the target inverse matrix diverges to infinity. A simulation study demonstrates the computational feasibility and superb performance of the proposed method. Our analysis also provides new performance bounds for the Lasso and scaled Lasso to guarantee higher concentration of the error at a smaller threshold level than previous analyses, and to allow the use of the union bound in column-by-column applications of the scaled Lasso without an adjustment of the penalty level. In addition, the least squares estimation after the scaled Lasso selection is considered and proven to guarantee performance bounds similar to that of the scaled Lasso. Keywords: precision matrix, concentration matrix, inverse matrix, graphical model, scaled Lasso, linear regression, spectrum norm
3 0.38845536 95 jmlr-2013-Ranking Forests
Author: Stéphan Clémençon, Marine Depecker, Nicolas Vayatis
Abstract: The present paper examines how the aggregation and feature randomization principles underlying the algorithm R ANDOM F OREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Cl´ mencon and Vayatis (2009c). The present work e ¸ describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called R ANK ING F OREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artificial data sets. Keywords: bipartite ranking, nonparametric scoring, classification data, ROC optimization, AUC criterion, tree-based ranking rules, bootstrap, bagging, rank aggregation, median ranking, feature randomization
4 0.37656596 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Author: Julien Mairal, Bin Yu
Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network flow optimization, graph sparsity
5 0.36362305 51 jmlr-2013-Greedy Sparsity-Constrained Optimization
Author: Sohail Bahmani, Bhiksha Raj, Petros T. Boufounos
Abstract: Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and Compressed Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsityconstrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressed Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic and real data, where the algorithm is employed for sparse logistic regression with and without ℓ2 -regularization. Keywords: sparsity, optimization, compressed sensing, greedy algorithm
6 0.33733943 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
7 0.32932952 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
8 0.3250736 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability
10 0.28020003 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
12 0.2785832 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning
13 0.26520842 82 jmlr-2013-Optimally Fuzzy Temporal Memory
14 0.25670347 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
15 0.24415012 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models
16 0.23080246 76 jmlr-2013-Nonparametric Sparsity and Regularization
17 0.20312554 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
18 0.20148149 41 jmlr-2013-Experiment Selection for Causal Discovery
19 0.19982748 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
20 0.19551516 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
topicId topicWeight
[(0, 0.021), (5, 0.087), (6, 0.527), (10, 0.049), (20, 0.021), (23, 0.034), (41, 0.016), (44, 0.012), (68, 0.035), (70, 0.025), (75, 0.034), (85, 0.017), (87, 0.029), (93, 0.011)]
simIndex simValue paperId paperTitle
1 0.94769937 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
Author: Nicholas Ruozzi, Sekhar Tatikonda
Abstract: Gaussian belief propagation (GaBP) is an iterative algorithm for computing the mean (and variances) of a multivariate Gaussian distribution, or equivalently, the minimum of a multivariate positive definite quadratic function. Sufficient conditions, such as walk-summability, that guarantee the convergence and correctness of GaBP are known, but GaBP may fail to converge to the correct solution given an arbitrary positive definite covariance matrix. As was observed by Malioutov et al. (2006), the GaBP algorithm fails to converge if the computation trees produced by the algorithm are not positive definite. In this work, we will show that the failure modes of the GaBP algorithm can be understood via graph covers, and we prove that a parameterized generalization of the min-sum algorithm can be used to ensure that the computation trees remain positive definite whenever the input matrix is positive definite. We demonstrate that the resulting algorithm is closely related to other iterative schemes for quadratic minimization such as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically, that there always exists a choice of parameters such that the above generalization of the GaBP algorithm converges. Keywords: belief propagation, Gaussian graphical models, graph covers
2 0.91191471 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
Author: Dan Stowell, Mark D. Plumbley
Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, flow network, sound
same-paper 3 0.8794592 106 jmlr-2013-Stationary-Sparse Causality Network Learning
Author: Yuejia He, Yiyuan She, Dapeng Wu
Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efficient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efficiency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap
4 0.54371232 82 jmlr-2013-Optimally Fuzzy Temporal Memory
Author: Karthik H. Shankar, Marc W. Howard
Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series
5 0.53363949 120 jmlr-2013-Variational Algorithms for Marginal MAP
Author: Qiang Liu, Alexander Ihler
Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models
6 0.48814434 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs
7 0.46964154 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
8 0.4662334 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
9 0.45582658 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
10 0.45085984 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
11 0.43744352 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies
12 0.43235943 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
13 0.42155987 22 jmlr-2013-Classifying With Confidence From Incomplete Information
14 0.41319433 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
16 0.40311822 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
17 0.39430135 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
18 0.39266121 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
20 0.38839301 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs