nips nips2009 nips2009-250 nips2009-250-reference knowledge-graph by maker-knowledge-mining

250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

Source: pdf

Author: Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael J. Black

Abstract: Large, relational factor graphs with structure deﬁned by ﬁrst-order logic or other languages give rise to notoriously difﬁcult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring conﬁgurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efﬁcient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain. 1

reference text

[1] Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy markov models for information extraction and segmentation. In International Conference on Machine Learning (ICML), 2000.

[2] John D. Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In Int Conf on Machine Learning (ICML), 2001.

[3] Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin markov networks. In NIPS, 2003.

[4] Ryan McDonald and Fernando Pereira. Online learning of approximate dependency parsing algorithms. In European Chapter of the Association for Computational Linguistics (EACL), pages 81–88, 2006.

[5] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62, 2006.

[6] Brian Milch, Bhaskara Marthi, and Stuart Russell. BLOG: Relational Modeling with Unknown Objects. PhD thesis, University of California, Berkeley, 2006.

[7] Andrew McCallum, Khashayar Rohanimanesh, Michael Wick, Karl Schultz, and Sameer Singh. Factorie: Efﬁcient probabilistic programming via imperative declarations of structure, inference and learning. In Neural Information Processing Systems(NIPS) Workshop on Probabilistic Programming, Vancouver, BC, Canda, 2008.

[8] Aria Haghighi and Dan Klein. Unsupervised coreference resolution in a nonparametric bayesian model. In Association for Computational Linguistics (ACL), 2007.

[9] Hanna Pasula, Bhaskara Marthi, Brian Milch, Stuart Russell, and Ilya Shpitser. Identity uncertainty and citation matching. In Advances in Neural Information Processing Systems 15. MIT Press, 2003.

[10] Sonia Jain and Radford M. Neal. A split-merge markov chain monte carlo procedure for the dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13:158–182, 2004.

[11] Aron Culotta. Learning and inference in weighted logic with application to natural language processing. PhD thesis, University of Massachusetts, May 2008.

[12] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, March 1998.

[13] Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, pages 9–44, 1988.

[14] Robert H. Crites and Andrew G. Barto. Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems 8, pages 1017–1023. MIT Press, 1996.

[15] Wei Zhang and Thomas G. Dietterich. Solving combinatorial optimization tasks by reinforcement learning: A general methodology applied to resource-constrained scheduling. Journal of Artiﬁcial Intelligence Reseach, 1, 2000.

[16] Gerald Tesauro. Temporal difference learning and td-gammon. Commun. ACM, 38(3):58–68, 1995.

[17] Khashayar Rohanimanesh, Michael Wick, Sameer Singh, and Andrew McCallum. Reinforcement learning for map inference in large factor graphs. Technical Report #UM-CS-2008-040, University of Massachusetts, Amherst, 2008.

[18] Christopher J. Watkins. Learning from Delayed Rewards. PhD thesis, Kings College, Cambridge, 1989.

[19] Christopher J. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, May 1992.

[20] Andrew McCallum and Charles Sutton. Piecewise training with parameter independence diagrams: Comparing globally- and locallytrained linear-chain CRFs. In NIPS Workshop on Learning with Structured Outputs, 2004.

[21] Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.

[22] Culotta. First. In International Joint Conference on Artiﬁcial Intelligence, 2007.

[23] Khashayar Rohanimanesh, Michael Wick, and Andrew McCallum. Inference and learning in large factor graphs with adaptive proposal distributions. Technical Report #UM-CS-2009-028, University of Massachusetts, Amherst, 2009.

[24] AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Y. Halevy. Learning to map between ontologies on the semantic web. In WWW, page 662, 2002.

[25] Wei Zhang and Thomas G. Dietterich. A reinforcement learning approach to job-shop scheduling. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 1114–1120, 1995.

[26] Justin Boyan and Andrew W. Moore. Learning evaluation functions to improve optimization by local search. J. Mach. Learn. Res., 1:77–112, 2001.

[27] Hal Daum´ III and Daniel Marcu. Learning as search optimization: approximate large margin methods for structured prediction. In e International Conference on Machine learning (ICML), 2005.

[28] Hal Daum´ III, John Langford, and Daniel Marcu. Search-based structured prediction. Machine Learning, 2009. e 9