jmlr jmlr2013 jmlr2013-106 jmlr2013-106-reference knowledge-graph by maker-knowledge-mining

106 jmlr-2013-Stationary-Sparse Causality Network Learning

Source: pdf

Author: Yuejia He, Yiyuan She, Dapeng Wu

Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identiﬁcation. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efﬁcient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efﬁciency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap

reference text

H. Akaike. A new look at the statistical model identiﬁcation. Automatic Control, IEEE Transactions on, 19(6):716–723, 1974. Ya. I. Alber, A. N. Iusem, and M. V. Solodov. On the projected subgradient method for nonsmooth convex optimization in a hilbert space. Mathematical Programming, pages 23–35, 1998. H. H. Bauschke and P. L. Combettes. A dykstra-like algorithm for two monotone operators. Paciﬁc Journal of Optimization, 4(3):383–391, Sep. 2008. T. Blumensath and M. E. Davies. Normalized iterative hard thresholding: Guaranteed stability and performance. IEEE Journal of Selected Topics in Signal Processing, 4(2), Apr. 2010. S. Boyd, N. Parikh, E. Chu, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Information Systems Journal, 3(1):1–118, 2010. S.P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. ISBN 9780521833783. J. P. Boyle and R. L Dykstra. A method for ﬁnding projections onto the intersection of convex sets in hilbert spaces. In Advances in Order Restricted Statistical Inference, pages 28–47. Springer, 1986. F. E. Browder. Nonexpansive nonlinear operators in a banach space. Proceedings of the National Academy of Sciences of the United States of America, 54(4):1041, 1965. 3101 H E , S HE AND W U F. E. Browder and W. V. Petryshyn. The solution by iteration of nonlinear functional equations in banach spaces. Bull. Amer. Math. Soc., 72:571–575, 1966. E. Bullmore and O. Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10:186–198, Mar. 2009. J. V. Burke, A. S. Lewis, and M. L. Overton. A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. on Optimization, 15(3):751–779, Mar. 2005. ISSN 1052-6234. J. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J. on Optimization, 20(4):1956–1982, Mar. 2010. F. Curtis and M. Overton. A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM Journal on Optimization, 22(2):474–500, 2012. P. Dalgaard. Introductory Statistics with R, Statistics and Computing. Springer, Aug. 2008. S. J. Davis and J. A. Kahn. Interpreting the great moderation: Changes in the volatility of economic activity at the macro and micro levels. Journal of Economic Perspectives, 22(4):155–180, 2008. D.L. Donoho. Compressed sensing. Information Theory, IEEE Transactions on, 52(4):1289–1306, Apr. 2006. R. L. Dykstra. An algorithm for restricted least squares regression. Journal of the American Statistical Association, 78(384):837–842, 1983. B. Efron. Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1):1–26, 1979. J. J. Faith, B. Hayete, J. T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J. J. Collins, and T. S. Gardner. Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression proﬁles. PLoS Biol, 5(1):e8, Jan. 2007. J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96:1348–1360, Dec. 2001. J. Fan and R. Li. Statistical challenges with high dimensionality: feature selection in knowledge discovery. In International Congress of Mathematicans, Aug. 2006. J. Fan and J. Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911, 2008. J. Fan and J. Lv. A selective overview of variable selection in high dimensional feature space. Stat Sin., 20(1):101–148, Jan. 2010. J. Fan, R. Samworth, and Y. Wu. Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research, 10:2013–2038, Dec. 2009. ISSN 1532-4435. C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3):424–438, 1969. 3102 S TATIONARY-S PARSE C AUSALITY N ETWORK L EARNING M. C. Grant and S. P. Boyd. Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control, pages 95–110. Springer, 2008. G. E. Halkos and N. G. Tzeremes. Oil consumption and economic efﬁciency: A comparative analysis of advanced, developing and emerging economies. Ecological Economics, 70(7):1354– 1362, May 2011. S. Hanneke, W. Fu, and E. P. Xing. Discrete temporal models of social networks. Electronic Journal of Statistics, 4:585–605, 2010. B.S. He, H. Yang, and S.L. Wang. Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. Journal of Optimization Theory and Applications, 106 (2):337–356, 2000. ISSN 0022-3239. A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970. N. Hsu, H. Hung, and Y. Chang. Subset selection for vector autoregressive processes using lasso. Computational Statistics and Data Analysis, 52(7):3645–3657, 2008. P. J. Huber. Robust Statistics. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. Wiley, 1981. W. James and C. Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 361–379, 1961. D. W. Jorgenson and P. J. Wilcoxen. Environmental regulation and U.S. economic growth. RAND Journal of Economics, 21(2):314–340, Summer 1990. H. R. Kunsch. The jackknife and the bootstrap for general stationary observations. The Annals of Statistics, 17(3):1217–1241, 1989. T. G. Lewis. Network Science: Theory and Applications. Wiley Publishing, 2009. Z. Lin. Some software packages for partial SVD computation. CoRR, abs/1108.1548, 2011. H. L¨ tkepohl. New Introduction to Multiple Time Series Analysis. Springer, 2nd edition, 2007. u T.C. Mills and R.N. Markellos. The Econometric Modelling of Financial Time Series. Cambridge University Press, 2008. J. J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien. Reports of the Paris Academy of Sciences, Series A, 255:2897–2899, 1962. M. E. J. Newman and D. J. Watts. The Structure and Dynamics of Networks. Princeton University Press, 2006. Z. Opial. Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc., 73:591–597, 1967. 3103 H E , S HE AND W U M. L. Overton and R. S. Womersley. On minimizing the spectral radius of a nonsymmetric matrix function - optimality conditions and duality theory. SIAM J. Matrix Anal. Appl., 9(4):473–498, Oct. 1988. A. B. Owen. A robust hybrid of lasso and ridge regression. Prediction and Discovery (Contemporary Mathematics), 443:59–71, 2007. D. N. Politis and J. P. Romano. The stationary bootstrap. Journal of the American Statistical Association, 89(428):1303–1313, 1994. G. C. Reinsel. Elements of Multivariate Time Series Analysis. New York, Springer, 2nd edition, 1997. R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970. Y. She. Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Statist, 3:384–415, 2009. Y. She. An iterative algorithm for ﬁtting nonconvex penalized generalized linear models with grouped predictors. Computational Statistics and Data Analysis, 56(10):2976–2990, Oct. 2012. C. A. Sims. Macroeconomics and reality. Econometrica, 48(1):1–48, Jan. 1980. J. Songsiri and L. Vandenberghe. Topology selection in graphical models of autoregressive processes. Journal of Machine Learning Research, 9999:2671–2705, Dec. 2010. ISSN 1532-4435. J. H. Stock and M. W. Watson. Generalized shrinkage methods for forecasting using many predictors. Journal of Business & Economic Statistics, 30(4):481–493, Oct. 2012. J. F. Sturm. Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones, 1998. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267–288, 1996. Y. Tsaig and D. L. Donoho. Extensions of compressed sensing. Signal Processing, 86(3):549–571, 2006. R. H. T¨ t¨ nc¨ , K. C. Toh, and M. J. Todd. Solving semideﬁnite-quadratic-linear programs using uu u sdpt3. Mathematical Programming, 95:189–217, 2003. J. von Neumann. Some matrix inequalities and metrization of matrix space. Tomsk. Univ. Rev., 1: 153–167, 1937. H. Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101:1418–1429, 2006. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005. 3104