nips nips2012 nips2012-53 nips2012-53-reference knowledge-graph by maker-knowledge-mining

53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization


Source: pdf

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difficult to interpret and nontrivial to extend to consider interactions among sites. These developments and difficulties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efficient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1


reference text

[1] G. Mendel. Experiments in plant-hybridisation. In English Translation and Commentary by R. A. Fisher, J.H. Bennett, ed. Oliver and Boyd, Edinburgh 1965, 1866.

[2] A. H. Sturtevant. The linear arrangement of six sex-linked factors in drosophila, as shown by their mode of association. Journal of Experimental Zoology, 14:43–59, 1913. 8

[3] GR Abecasis, SS Cherny, WO Cookson, et al. Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics, 30:97–101, 2002.

[4] M Silberstein, A. Tzemach, N. Dovgolevsky, M. Fishelson, A. Schuster, and D. Geiger. On-line system for faster linkage analysis via parallel execution on thousands of personal computers. Americal Journal of Human Genetics, 78(6):922–935, 2006.

[5] D. Geiger, C. Meek, and Y. Wexler. Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space. Bioinformatics, 25(12):i196, 2009.

[6] C. A. Albers, M. A. R. Leisink, and H. J. Kappen. The cluster variation method for efficient linkage analysis on extended pedigrees. BMC Bioinformatics, 7(S-1), 2006.

[7] M. L. Metzker. Sequencing technologies–the next generation. Nat Rev Genet, 11(1):31–46, January 2010.

[8] M. Abney, C. Ober, and M. S. McPeek. Quantitative-trait homozygosity and association mapping and empirical genome wide significance in large, complex pedigrees: Fasting serum-insulin level in the hutterites. American Journal of Human Genetics, 70(4):920 – 934, 2002.

[9] N.B. Sutter and et al. A Single IGF1 Allele Is a Major Determinant of Small Size in Dogs. Science, 316(5821):112–115, 2007.

[10] M. Mohri. Handbook of Weighted Automata, chapter 6. Monographs in Theoretical Computer Science. Springer, 2009.

[11] A. Bouchard-Cˆ t´ and M. I. Jordan. Variational Inference over Combinatorial Spaces. In Advances in oe Neural Information Processing Systems 23 (NIPS), 2010.

[12] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Bethe free energy, Kikuchi approximations and belief propagation algorithms. In Advances in Neural Information Processing Systems (NIPS), 2001.

[13] E. M. Wijsman. Penetrance. John Wiley & Sons, Ltd, 2005.

[14] R.C. Elston and J. Stewart. A general model for the analysis of pedigree data. Human Heredity, 21:523– 542, 1971.

[15] E.S. Lander and P. Green. Construction of multilocus genetic linkage maps in humans. Proceedings of the National Academy of Science, 84(5):2363–2367, 1987.

[16] J. Marchini, P. Donnelly, and L. R. Cardon. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet., 37(4):413–417, 2005.

[17] Y. W. Teh, C. Blundell, and L. T. Elliott. Modelling genetic variations with fragmentation-coagulation processes. In Advances In Neural Information Processing Systems, 2011.

[18] A. Piccolboni and D. Gusfield. On the complexity of fundamental computational problems in pedigree analysis. Journal of Computational Biology, 10(5):763–773, 2003.

[19] S. L. Lauritzen and N. A. Sheehan. Graphical models for genetic analysis. Statistical Science, 18(4):489– 514, 2003.

[20] A. Thomas, A. Gutin, V. Abkevich, and A. Bansal. Multilocus linkage analysis by blocked Gibbs sampling. Statistics and Computing, 10(3):259–269, July 2000.

[21] G. O. Roberts and S. K. Sahu. Updating schemes, correlation structure, blocking and parameterization for the Gibbs sampler. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(2):291– 317, 1997.

[22] A. Bouchard-Cˆ t´ and M.I. Jordan. Optimization of structured mean field objectives. In Proceedings of oe the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-09), pages 67–74, Corvallis, Oregon, 2009. AUAI Press.

[23] T. Minka and Y. Qi. Tree-structured approximations by expectation. In Advances in Neural Information Processing Systems (NIPS), 2003.

[24] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In AISTATS, 2003.

[25] R. H. Swendsen and J.-S. Wang. Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett., 58:86–88, Jan 1987.

[26] J. Wakeley. Coalescent Theory: An Introduction. Roberts & Company Publishers, 1 edition, June 2008.

[27] B. Kirkpatrick, E. Halperin, and R. M. Karp. Haplotype inference in complex pedigrees. Journal of Computational Biology, 17(3):269–280, 2010.

[28] C. A. Albers, T. Heskes, and H. J. Kappen. Haplotype inference in general pedigrees using the cluster variation method. Genetics, 177(2):1101–1116, October 2007.

[29] The International HapMap Consortium. The international HapMap project. Nature, 426:789–796, 2003. 9