nips nips2012 nips2012-215 nips2012-215-reference knowledge-graph by maker-knowledge-mining

215 nips-2012-Minimizing Uncertainty in Pipelines

Source: pdf

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a conﬁdence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efﬁcient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

reference text

[1] Ashraf M. Abdelbar and Sandra M. Hedetniemi. Approximating maps for belief networks is np-hard and other theorems. Artif. Intell., 102(1):21–38, June 1998.

[2] Maria-Florina Balcan, Alina Beygelzimer, and John Langford. Agnostic active learning. J. Comput. Syst. Sci., 75(1):78–89, 2009.

[3] Kedar Bellare, Suresh Iyengar, Aditya Parameswaran, and Vibhor Rastogi. Active sampling for entity matching. In KDD, 2012.

[4] Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. Importance weighted active learning. In ICML, page 7, 2009.

[5] Alina Beygelzimer, Daniel Hsu, John Langford, and Tong Zhang. Agnostic active learning without constraints. In NIPS, pages 199–207, 2010.

[6] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1 edition, 2007.

[7] Philip Bohannon, Srujana Merugu, Cong Yu, Vipul Agarwal, Pedro DeRose, Arun Iyer, Ankur Jain, Vinay Kakade, Mridul Muralidharan, Raghu Ramakrishnan, and Warren Shen. Purple sox extraction management system. SIGMOD Rec., 37:21–27, March 2009.

[8] Gregory F. Cooper. The computational complexity of probabilistic inference using bayesian belief networks. Artif. Intell., 42(2-3):393–405, 1990.

[9] Nilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, and Srujana Merugu. A web of concepts (keynote). In PODS. Providence, Rhode Island, USA, June 2009.

[10] Nilesh Dalvi, Ravi Kumar, and Mohamed A. Soliman. Automatic wrappers for large scale web extraction. PVLDB, 4(4):219–230, 2011.

[11] Nilesh Dalvi, Aditya Parameswaran, and Vibhor Rastogi. Minimizing uncertainty in pipelines. Technical report, Stanford Infolab, 2012.

[12] Sanjoy Dasgupta and John Langford. Tutorial summary: Active learning. In ICML, page 178, 2009.

[13] Yoav Freund, H. Sebastian Seung, Eli Shamir, and Naftali Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133–168, 1997.

[14] Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Sengamedu, and Ashwin Tengli. Exploiting content redundancy for web information extraction. PVLDB, 3(1):578–587, 2010.

[15] Steve Hanneke. A bound on the label complexity of agnostic active learning. In ICML, pages 353–360, 2007.

[16] Don Herbison-Evans. Solving quartics and cubics for graphics. 1994.

[17] Nikos Karampatziakis and John Langford. Online importance weight aware updates. In UAI, pages 392–399, 2011.

[18] Richard M. Karp and Michael Luby. Monte-carlo algorithms for enumeration and reliability problems. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science, pages 56–64, 1983.

[19] Andreas Krause and Carlos Guestrin. Near-optimal nonmyopic value of information in graphical models. In UAI, pages 324–331, 2005.

[20] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650–1654, 2007.

[21] J. Kwisthout. The Computational Complexity of Probabilistic Inference. Technical Report ICIS–R11003, Radboud University Nijmegen, April 2011.

[22] Michael L. Littman, Stephen M. Majercik, and Toniann Pitassi. Stochastic boolean satisﬁability. J. Autom. Reasoning, 27(3):251–296, 2001.

[23] A. Parameswaran, A. Das Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it’s okay to ask questions. In VLDB, 2011.

[24] James D. Park and Adnan Darwiche. Complexity results and approximation strategies for map explanations. J. Artif. Intell. Res. (JAIR), 21:101–133, 2004.

[25] J. Scott Provan and Michael O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777–788, 1983.

[26] Dan Roth. On the hardness of approximate reasoning. Artif. Intell., 82(1-2):273–302, 1996.

[27] Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009.

[28] Alice X. Zheng, Irina Rish, and Alina Beygelzimer. Efﬁcient test selection in active diagnosis via entropy approximation. In UAI, pages 675–, 2005. 9