nips nips2009 nips2009-37 nips2009-37-reference knowledge-graph by maker-knowledge-mining

37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models


Source: pdf

Author: Percy Liang, Guillaume Bouchard, Francis R. Bach, Michael I. Jordan

Abstract: Many types of regularization schemes have been employed in statistical learning, each motivated by some assumption about the problem domain. In this paper, we present a unified asymptotic analysis of smooth regularizers, which allows us to see how the validity of these assumptions impacts the success of a particular regularizer. In addition, our analysis motivates an algorithm for optimizing regularization parameters, which in turn can be analyzed within our framework. We apply our analysis to several examples, including hybrid generative-discriminative learning and multi-task learning. 1


reference text

[1] H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:716–723, 1974.

[2] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems (NIPS), pages 41–48, 2007.

[3] B. Bakker and T. Heskes. Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4:83–99, 2003.

[4] M. S. Bartlett. Approximate confidence intervals. II. More than one unknown parameter. Biometrika, 40:306–317, 1953.

[5] P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Annals of Statistics, 33(4):1497–1537, 2005.

[6] G. Bouchard. Bias-variance tradeoff in hybrid generative-discriminative models. In Sixth International Conference on Machine Learning and Applications (ICMLA), pages 124–129, 2007.

[7] G. Bouchard and B. Triggs. The trade-off between generative and discriminative classifiers. In International Conference on Computational Statistics, pages 721–728, 2004.

[8] O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499–526, 2002.

[9] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.

[10] P. Craven and G. Wahba. Smoothing noisy data with spline functions. estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31(4):377–403, 1978.

[11] Y. C. Eldar. Generalized SURE for exponential families: Applications to regularization. IEEE Transactions on Signal Processing, 57(2):471–481, 2009.

[12] T. Evgeniou, C. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005.

[13] L. Jacob, F. Bach, and J. Vert. Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems (NIPS), pages 745–752, 2009.

[14] W. James and C. Stein. Estimation with quadratic loss. In Fourth Berkeley Symposium in Mathematics, Statistics, and Probability, pages 361–380, 1961.

[15] J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled hybrids of generative and discriminative models. In Computer Vision and Pattern Recognition (CVPR), pages 87–94, 2006.

[16] P. Liang, F. Bach, G. Bouchard, and M. I. Jordan. Asymptotically optimal regularization in smooth parametric models. Technical report, ArXiv, 2010.

[17] P. Liang and M. I. Jordan. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In International Conference on Machine Learning (ICML), 2008.

[18] A. McCallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/discriminative training for clustering and classification. In Association for the Advancement of Artificial Intelligence (AAAI), 2006.

[19] B. Peters, H. Bui, S. Frankild, M. Nielson, C. Lundegaard, E. Kostem, D. Basch, K. Lamberth, M. Harndahl, W. Fleri, S. S. Wilson, J. Sidney, O. Lund, S. Buus, and A. Sette. A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Compututational Biology, 2, 2006.

[20] R. Raina, Y. Shen, A. Ng, and A. McCallum. Classification with hybrid generative/discriminative models. In Advances in Neural Information Processing Systems (NIPS), 2004.

[21] C. M. Stein. Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9(6):1135–1151, 1981.

[22] A. W. van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998. 9