jmlr jmlr2013 jmlr2013-115 jmlr2013-115-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen
Abstract: Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-field iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two real-world data sets. Keywords: neural networks, energy-based models, time-series, missing values, optimization
A. Barbu. Training an active random field for real-time image denoising. IEEE Transactions on Image Processing, 18(11):2451–2462, 2009. Y. Bengio and F. Gingras. Recurrent neural networks for missing or asynchronous data. Advances in Neural Information Processing Systems, pages 395–401, 1996. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. WardeFarley, and Y. Bengio. Theano: A CPU and GPU math compiler in python. In S. van der Walt and J. Millman, editors, Proceedings of the 9th Python in Science Conference, pages 3–10, 2010. M. Bertalm´o, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In SIGGRAPH, pages ı 417–424, 2000. D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1999. C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. ISBN 0387310738. N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. Modeling temporal dependencies in highdimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning. Omnipress, 2012. P. Brakel, S. Dieleman, and B. Schrauwen. Training restricted Boltzmann machines with multi´ tempering: Harnessing parallelization. In A. E. P. Villa, W. Duch, P. Erdi, F. Masulli, and G. Palm, editors, ICANN (2), volume 7553 of Lecture Notes in Computer Science, pages 92–99. Springer, 2012. ISBN 978-3-642-33265-4. G. Desjardins, A. C. Courville, Y. Bengio, P. Vincent, and O. Delalleau. Tempered Markov chain Monte Carlo for training of restricted boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 145–152, 2010. J. Domke. Parameter learning with truncated message-passing. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pages 2937–2943, Washington, DC, USA, 2011. IEEE Computer Society. ISBN 978-1-4577-0394-2. doi: 10.1109/CVPR.2011.5995320. J. Domke. Generic methods for optimization-based modeling. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 318–326, 2012. S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics Letters B, 195(2):216–222, 1987. A. Frank and A. Asuncion. UCI machine learning repository, 2010. URL http://archive.ics. uci.edu/ml. A. Freire, G. Barreto, M. Veloso, and A. Varela. Short-term memory mechanisms in neural network learning of robot navigation tasks: A case study. In 6th Latin American, Robotics Symposium, pages 1–6, 2009. doi: 10.1109/LARS.2009.5418323. 2794 E NERGY-BASED M ODELS FOR T IME -S ERIES I MPUTATION Y. Freund and D. Haussler. Unsupervised learning of distributions on binary vectors using two layer networks. Technical report, Santa Cruz, CA, USA, 1994. Z. Ghahramani and M. I. Jordan. Factorial hidden markov models. Machine Learning, 29(2-3): 245–273, 1997. Z. Ghahramani and S. T. Roweis. Learning nonlinear dynamical systems using an EM algorithm. In Advances in Neural Information Processing Systems, pages 599–605. MIT Press, 1999. A. Gupta and M. S. Lam. Estimating missing values using neural networks. Journal of the Operational Research Society, pages 229–238, 1996. G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002. G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. A. Hyv¨ rinen. Estimation of non-normalized statistical models by score matching. Journal of a Machine Learning Research, 6:695–709, 2005. V. Jain, J. F. Murray, F. Roth, S. C. Turaga, V. P. Zhigulin, K. L. Briggman, M. Helmstaedter, W. Denk, and H. S. Seung. Supervised learning of image restoration with convolutional networks. In ICCV, pages 1–8. IEEE, 2007. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. H. Larochelle and I. Murray. The neural autoregressive distribution estimator. JMLR: W&CP;, 15: 29–37, 2011. N. D. Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In S. Thrun, L. K. Saul, and B. Sch¨ lkopf, editors, Advances in Neural Information Processing o Systems. MIT Press, 2003. ISBN 0-262-20152-6. N. D. Lawrence. Learning for larger datasets with the gaussian process latent variable model. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pages 21–24. Omnipress, 2007. Y. LeCun and F. Huang. Loss functions for discriminative training of energy-based models. In Proceedings of the 10-th International Conference on Artificial Intelligence and Statistics, 2005. Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang. A tutorial on energy-based learning. In G. Bakir, T. Hofman, B. Sch¨ lkopf, A. Smola, and B. Taskar, editors, Predicting Structured o Data. MIT Press, 2006. H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 609–616, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-516-1. 2795 B RAKEL , S TROOBANDT AND S CHRAUWEN P. Mirowski and Y. LeCun. Dynamic factor graphs for time series modeling. In Proceedings of the European Conference on Machine Learning, 2009. F. V. Nelwamondo, S. Mohamed, and T. Marwala. Missing data: A comparison of neural network and expectation maximisation techniques. arXiv pre-print, 2007. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988. ISBN 0-934613-73-7. B. A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6:147–160, 1994. B. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4:1–17, 1964. S. Roth and M. J. Black. Fields of experts: A framework for learning image priors. In In CVPR, pages 860–867, 2005. R. Salakhutdinov and G. E. Hinton. Deep boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2009. R. Salakhutdinov and H. Larochelle. Efficient learning of deep boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010. V. Stoyanov, A. Ropson, and J. Eisner. Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, volume 15, pages 725–733, Fort Lauderdale, 2011. I. Sutskever, G. E. Hinton, and G. W. Taylor. The recurrent temporal restricted boltzmann machine. In Advances in Neural Information Processing Systems, volume 21, Cambridge, MA, 2008. MIT Press. M. F. Tappen. Utilizing variational optimization to learn Markov random fields. In CVPR. IEEE Computer Society, 2007. G. W. Taylor, G. E. Hinton, and S. Roweis. Modeling human motion using binary latent variables. In Advances in Neural Information Processing Systems, volume 19, Cambridge, MA, 2007. MIT Press. T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the International Conference on Machine Learning, 2008. T. Tieleman and G. E. Hinton. Using fast weights to improve persistent contrastive divergence. In Proceedings of the 26th International Conference on Machine learning, pages 1033–1040. ACM New York, NY, USA, 2009. P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096–1103, New York, NY, USA, 2008. ACM. 2796 E NERGY-BASED M ODELS FOR T IME -S ERIES I MPUTATION A. Waibel, T. Hanazawa, G. E. Hinton, K. Shikano, and K. J. Lang. Phoneme recognition using time-delay neural networks. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37 (3):328–339, 1989. M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Now Publishers Inc., Hanover, MA, USA, 2008. ISBN 1601981848, 9781601981844. J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian process dynamical models for human motion. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(2):283–298, 2008. C. K. Williams and G. E. Hinton. Mean field networks that learn to discriminate temporally distorted strings. In Connectionist Models: Proceedings of the 1990 Summer School, pages 18–22. San Mateo, CA: Morgan Kaufmann, 1991. 2797