jmlr jmlr2013 jmlr2013-68 jmlr2013-68-reference knowledge-graph by maker-knowledge-mining

68 jmlr-2013-Machine Learning with Operational Costs

Source: pdf

Author: Theja Tulabandhula, Cynthia Rudin

Abstract: This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models, so as to provide a useful way for practitioners to understand uncertainty. To do this, the operational cost is cast as a regularization term in a learning algorithm’s objective function, allowing either an optimistic or pessimistic view of possible costs, depending on the regularization parameter. From another perspective, if we have prior knowledge about the operational cost, for instance that it should be low, this knowledge can help to restrict the hypothesis space, and can help with generalization. We provide a theoretical generalization bound for this scenario. We also show that learning with operational costs is related to robust optimization. Keywords: statistical learning theory, optimization, covering numbers, decision theory

reference text

Sivan Aldor-Noiman, Paul D. Feigin, and Avishai Mandelbaum. Workload forecasting for a call center: Methodology and a case study. The Annals of Applied Statistics, 3(4):1403–1447, 2009. Martin Anthony and Peter L. Bartlett. Neural network Learning: Theoretical Foundations. Cambridge University Press, 1999. 2025 T ULABANDHULA AND RUDIN Kevin Bache and Moshe Lichman. http://archive.ics.uci.edu/ml. UCI machine learning repository, 2013. URL Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. Information Theory, IEEE Transactions on, 39(3):930–945, 1993. Peter L. Bartlett and Shahar Mendelson. Gaussian and Rademacher complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. Russell R. Barton, Barry L. Nelson, and Wei Xie. A framework for input uncertainty analysis. In Winter Simulation Conference, pages 1189–1198. WSC, 2010. John R. Birge and Francois Louveaux. Introduction to Stochastic Programming. Springer Verlag, ¸ 1997. Pierre Bonami, Lorenz T. Biegler, Andrew R. Conn, G´ rard Cornu´ jols, Ignacio E. Grossmann, e e Carl D. Laird, Jon Lee, Andrea Lodi, Francois Margot, Nicolas W. Sawaya, and Andreas W¨ chter. ¸ a An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optimization, 5(2):186–204, 2008. Olivier Bousquet. New approaches to statistical learning theory. Annals of the Institute of Statistical Mathematics, 55(2):371–389, 2003. Lawrence D. Brown, Ren Zhang, and Linda Zhao. Root-unroot methods for nonparametric density estimation and poisson random-effects models. Department of Statistics University of Pennsylvania, Tech. Rep, 2001. Olivier Chapelle, Bernhard Sch¨ lkopf, and Alexander Zien, editors. Semi-Supervised Learning. o MIT Press, Cambridge, MA, 2006. Je´us A. De Loera. The many aspects of counting lattice points in polytopes. Mathematische s Semesterberichte, 52(2):175–195, 2005. Simon French. Decision Theory: An Introduction to the Mathematics of Rationality. Halsted Press, 1986. Sven Ove Hansson. Decision Theory: A Brief Introduction. Online manuscript. Department of Philosophy and the History of Technology, Royal Institute of Technology, Stockholm, 1994. Yaochu Jin. Multi-Objective Machine Learning, In Studies in Computational Intelligence, volume 16. Springer, 2006. Lee K. Jones. A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. The Annals of Statistics, 20(1): 608–613, 1992. Andrey Nikolaevich Kolmogorov and Vladimir Mikhailovich Tikhomirov. ε-entropy and ε-capacity of sets in function spaces. Uspekhi Matematicheskikh Nauk, 14(2):3–86, 1959. Vladimir Koltchinskii and Dmitriy Panchenko. Complexities of convex combinations and bounding the generalization error in classiﬁcation. The Annals of Statistics, 33(4):1455–1496, 2005. 2026 M ACHINE L EARNING WITH O PERATIONAL C OSTS Hiroshi Konno and Hiroaki Yamazaki. Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Management Science, pages 519–531, 1991. Agop Koulakezian, Hazem M. Soliman, Tang Tang, and Alberto Leon-Garcia. Robust trafﬁc assignment in transportation networks using network criticality. In Proceedings of 2012 IEEE 76th Vehicular Technology Conference, 2012. S. Muthukrishnan, Martin Pal, and Zoya Svitkina. Stochastic models for budget optimization in search-based advertising. Internet and Network Economics, pages 131–142, 2007. John Ashworth Nelder and Roger Mead. A simplex method for function minimization. Computer Journal, 7(4):308–313, 1965. David Pollard. Convergence of Stochastic Processes. Springer, 1984. Cynthia Rudin and Robert E. Schapire. Margin-based ranking and an equivalence between AdaBoost and RankBoost. The Journal of Machine Learning Research, 10:2193–2232, 2009. Cynthia Rudin, Rebecca Passonneau, Axinia Radeva, Haimonti Dutta, Steve Ierome, and Delﬁna Isaac. A process for predicting manhole events in Manhattan. Machine Learning, 80:1–31, 2010. Cynthia Rudin, Rebecca Passonneau, Axinia Radeva, Steve Ierome, and Delﬁna Isaac. 21st-century data miners meet 19th-century electrical cables. IEEE Computer, 44(6):103–105, June 2011. Cynthia Rudin, David Waltz, Roger N. Anderson, Albert Boulanger, Ansaf Salleb-Aouissi, Maggie Chow, Haimonti Dutta, Philip Gross, Bert Huang, Steve Ierome, Delﬁna Isaac, Arthur Kressner, Rebecca J. Passonneau, Axinia Radeva, and Leon Wu. Machine learning for the New York City power grid. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2):328–345, February 2012. Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, pages 1651–1686, 1998. Maurice Sion. On general minimax theorems. Paciﬁc Journal of Mathematics, 8(1):171–176, 1958. Michel Talagrand. The Generic Chaining. Springer, 2005. Theja Tulabandhula and Cynthia Rudin. Machine learning with operational costs. In Proceedings of the International Symposium on Artiﬁcial Intelligence and Mathematics, 2012. Theja Tulabandhula, Cynthia Rudin, and Patrick Jaillet. The machine learning and traveling repairman problem. In Ronen I. Brafman, Fred S. Roberts, and Alexis Tsouki` s, editors, ADT, volume a 6992 of Lecture Notes in Computer Science, pages 262–276. Springer, 2011. Ian Urbina. Mandatory safety rules are proposed for electric utilities. New York Times, 2004. August 21, Late Edition, Section B, Column 3, Metropolitan Desk, Page 2. Robert J. Vanderbei. Linear Programming: Foundations and Extensions, Third Edition. Springer, 2008. 2027 T ULABANDHULA AND RUDIN Vladimir Naumovich Vapnik. Statistical Learning Theory, volume 2. Wiley New York, 1998. Huan Xu, Constantine Caramanis, and Shie Mannor. Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10:1485–1510, December 2009. Hua Yu, Jianxin Chen, Xue Xu, Yan Li, Huihui Zhao, Yupeng Fang, Xiuxiu Li, Wei Zhou, Wei Wang, and Yonghua Wang. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS ONE, 5(7), 2012. Tong Zhang. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research, 2:527–550, 2002. Xiaojin Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences TR 1530, University of Wisconsin Madison, December 2007. 2028