jmlr jmlr2013 jmlr2013-82 jmlr2013-82-reference knowledge-graph by maker-knowledge-mining

82 jmlr-2013-Optimally Fuzzy Temporal Memory

Source: pdf

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacriﬁces the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

reference text

R. T. Baillie. Long memory processes and fractional integration in econometrics. Journal of Econometrics, 73:5–59, 1996. P. D. Balsam and C. R. Gallistel. Temporal maps and informativeness in associative learning. Trends in Neuroscience, 32(2):73–78, 2009. J. Beran. Statistics for Long-Memory Processes. Chapman & Hall, New York, 1994. G. Chechik, A. Globerson, N. Tishby, and Y. Weiss. Information bottleneck for gaussian variables. Journal of Machine Learning Research, 6:165–188, 2005. F. Creutzig and H. Sprekeler. Predictive coding and the slowness principle: An informationtheoretic approach. Neural Computation, 20(4):1026–1041, 2008. F. Creutzig, A. Globerson, and N. Tishby. Past-future information bottleneck in dynamical systems. Physical Review E, 79:041925, 2009. C. Donkin and R. M. Nosofsky. A power-law model of psychological memory strength in shortand long-term recognition. Psychological Science, 23:625–634, 2012. D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. Journal of Optical Society of America A, 4:2379–2394, 1987. C. R. Gallistel and J. Gibbon. Time, rate, and conditioning. Psychological Review, 107(2):289–344, 2000. S. Ganguli, D. Huh, and H. Sompolinsky. Memory traces in dynamical systems. Proceedings of the National Academy of Sciences of the United States of America, 105(48):18970–18975, 2008. D. L. Gilden. Cognitive emissions of 1/ f noise. Psychological Review, 108:33–56, 2001. C. W. J. Granger and R. Joyeux. An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis, 1:15–29, 1980. M. Hermans and B. Schrauwen. Memory in linear recurrent neural networks in continuous time. Neural Networks, 23(3):341–355, 2010. M. Hermans and B. Schrauwen. Recurrent kernel machines: Computing with inﬁnite echo state networks. Neural Computation, 24(1):104–133, 2012. G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527–1554, 2006. J. R. M. Hosking. Fractional differencing. Biometrika, 68(1):165–176, 1981. 3810 O PTIMALLY F UZZY T EMPORAL M EMORY H. Jaeger. The echo state approach to analyzing and training recurrent networks. GMD-Report 148, GMD - German National Research Institute for Information Technology, 2001. H. Jaeger. Short term memory in echo state networks. GMD-Report 152, GMD - German National Research Institute for Information Technology, 2002. H. Jaeger, M. Lukosevicius, D. Popovici, and U. Siewert. Optimization and applications of echo state networks with leaky integrator neurons. Neural Networks, 20:335–352, 2007. K. Linkenkaer-Hansen, V.V. Nikouline, J. M. Palva, and R. J. Ilmoniemi. Long-range temporal correlations and scaling behavior in human brain oscillations. Journal of Neuroscience, 21:1370– 1377, 2001. W. Maass, T. Natschl¨ ger, and H. Markram. Real-time computing without stable states: A new a framework for neural computation based on perturbations. Neural Computation, 14(11):2531– 2560, 2002. B. Mandelbrot. The Fractal Geometry of Nature. W. H. Freeman, San Fransisco, CA, 1982. K. R. M¨ ller, A. J. Smola, G. R¨ tsch, B. Sch¨ lkopf, J. Kohlmorgen, and V. Vapnik. Predicting time u a o series with support vector machines. In Proceedings of the International Conference on Analog Neural Networks, 1997. E. Post. Generalized differentiation. Transactions of the American Mathematical Society, 32:723– 781, 1930. B. C. Rakitin, J. Gibbon, T. B. Penny, C. Malapani, S. C. Hinton, and W. H. Meck. Scalar expectancy theory and peak-interval timing in humans. Journal of Experimental Psychology: Animal Behavior Processes, 24:15–33, 1998. S. Roberts. Isolation of an internal clock. Journal of Experimental Psychology: Animal Behavior Processes, 7:242–268, 1981. K. H. Shankar and M. W. Howard. A scale-invariant internal representation of time. Neural Computation, 24:134–193, 2012. M. C. Smith. CS-US interval and US intensity in classical conditioning of rabbit’s nictitating membrane response. Journal of Comparative and Physiological Psychology, 66(3):679–687, 1968. N. Tishby, F. C. Pereira, and W. Bialek. Information bottleneck method. In Proceedings of 37th Allerton Conference on Communication and Computation, Monticello, IL, 1999. A. Torralba and A. Oliva. Statistics of natural image categories. Network: Computation in Neural Systems, 14(3):391–412, 2003. G. C. Van Orden, J. G. Holden, and M. T. Turvey. Self organization of cognitive performance. Journal of Experimental Psychology: General, 132:331–350, 2003. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. R. F. Voss and J. Clarke. 1/ f noise in music and speech. Nature, 258:317–318, 1975. 3811 S HANKAR AND H OWARD E. J. Wagenmakers, S. Farrell, and R. Ratcliff. Estimation and interpretation of 1/ f α noise in human cognition. Psychonomic Bulletin & Review, 11(4):579–615, 2004. E. Wallace, H. R. Maei, and P. E. Latham. Randomly connected networks have short temporal memory. Neural Computation, 25:1408–1439, 2013. J. H. Wearden and H. Lejeune. Scalar properties in human timing: conformity and violations. Quarterly Journal of Experimental Psychology, 61:569–587, 2008. O. L. White, D. D. Lee, and H. Sompolinsky. Short-term memory in orthogonal neural networks. Physical Review Letters, 92(14):148102, 2004. L. Wiskott and T. Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715–770, 2002. F. Wyffels and B. Schrauwen. A comparative study of reservoir computing strategies for monthly time series prediction. Neurocomputing, 73:1958–1964, 2010. 3812