nips nips2012 nips2012-53 knowledge-graph by maker-knowledge-mining

53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization


Source: pdf

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difficult to interpret and nontrivial to extend to consider interactions among sites. These developments and difficulties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efficient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. [sent-8, score-0.341]

2 This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. [sent-10, score-0.306]

3 These developments and difficulties call for the creation of modern methods of pedigree analysis. [sent-12, score-0.481]

4 Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. [sent-13, score-0.818]

5 We show that these disease models can be turned into accurate and computationally efficient estimators. [sent-14, score-0.414]

6 This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. [sent-16, score-0.931]

7 1 Introduction Finding genetic correlates of disease is a long-standing important problem with potential contributions to diagnostics and treatment of disease. [sent-17, score-0.54]

8 The pedigree model for inheritance is one of the best defined models in biology, and it has been an area of active statistical and biological research for over a hundred years. [sent-18, score-0.587]

9 The most commonly used method to analyze genetic correlates of disease is quite old. [sent-19, score-0.54]

10 After Mendel introduced, in 1866, the basic model for the inheritance of genomic sites [1] Sturtevant was the first, in 1913, to provide a method for ordering the sites of the genome [2]. [sent-20, score-0.53]

11 The method of Sturtevant became the foundation for linkage analysis with pedigrees [3, 4, 5, 6]. [sent-21, score-0.335]

12 The problem can be thought of in Sturtevant’s framework as that of finding the position of a disease site relative to an map of existing sites. [sent-22, score-0.682]

13 Genomic sites are becoming considerably denser in the genome and technologies allow us to interrogate the genome for the position of sites [7]. [sent-25, score-0.512]

14 Additionally, most current pedigree 1 analysis methods are exponential either in the number of sites or in the number of individuals. [sent-26, score-0.636]

15 This is in contrast to the size of pedigrees being collected: for example the work of [8] includes a connected human pedigree containing 13 generations and 1623 individuals, and the work of [9] includes a connected non-human data set containing thousands of breeding dogs. [sent-28, score-0.766]

16 Apart from the issues of pedigree size, the LOD value is difficult to interpret, since there are few models for the distribution of the statistic. [sent-29, score-0.481]

17 These developments and difficulties call for the creation of modern methods of pedigree analysis. [sent-30, score-0.481]

18 In this work, we propose a new framework for expressing genetic disease models. [sent-31, score-0.523]

19 The key component of our models, the Haplotype-Phenotype Transducer (HPT), draws from recent advances in graphical model inference and transducer theory [10], and provides a simple and flexible formalism for building genetic disease models. [sent-32, score-0.818]

20 The output of inference over HPT models is a posterior distribution over disease sites, which is easier to interpret than LOD scores. [sent-33, score-0.449]

21 The cost of this modeling flexibility is that the graphical model corresponding to the inference problem is larger and has more loops that traditional pedigree graphical models. [sent-34, score-0.633]

22 Our framework can be specialized to create analogues of classical penetrance disease models [13]. [sent-41, score-0.486]

23 Our experiments show that even for these simpler cases, our approach can achieve significant gains in disease site identification accuracy compared to the most commonly used method, Merlin’s implementation of LOD scores [3, 5]. [sent-43, score-0.682]

24 Moreover, our inference method allows us to perform experiments on unprecedented pedigree sizes, well beyond the capacity of Merlin and other pedigree analysis tools typically used in practice. [sent-44, score-0.997]

25 While graphical models have played an important role in the development of pedigree analysis methods [14, 15], only recently were variational methods applied to the problem [6]. [sent-45, score-0.551]

26 Most current work on more advanced disease models have focused on a very different type of data, population data, for genome wide association studies (GWAS) [16]. [sent-47, score-0.5]

27 The point at which the copying of the chromosomes switches from one of the grand-maternal (grand-paternal) chromosomes to the other, is called a recombination breakpoint. [sent-50, score-0.33]

28 A site is a particular position in the genome at which we can obtain measurable values. [sent-51, score-0.354]

29 For the purposes of this paper, an allele is the nucleotide at a particular site on a particular chromosome. [sent-52, score-0.387]

30 If we had complete data, we would know the positions of all of the haplotypes, all of the recombination breakpoints as well as which allele came from which parent. [sent-54, score-0.4]

31 2 A pedigree is a directed acyclic graph with individuals as nodes, where boxes are males and circles are females, and edges directed downward from parent to child. [sent-60, score-0.68]

32 The individuals without parents in the graph are called founders, and the individuals with parents are non-founders. [sent-62, score-0.323]

33 The pedigree encodes a set of relationships that constrain the allowed inheritance options. [sent-63, score-0.587]

34 These inheritance options define a probability distribution which is investigated during pedigree analysis. [sent-64, score-0.587]

35 Assume a single-site disease model, where a diploid genotype, GD , determines the affection status (phenotype), P ∈ {’h’,’d’}, according to the penetrance probabilities: f2 = P(P = ’d’|GD = 11), f1 = P(P = ’d’|GD = 10), f0 = P(P = ’d’|GD = 00). [sent-65, score-0.507]

36 Here the disease site usually has a disease allele, 1, that confers greater risk of having the disease. [sent-66, score-1.096]

37 Let the pedigree model for n individuals be specified by a pedigree graph, a disease model f , and the minor allele frequency, µ, for a single site of interest, k. [sent-68, score-1.89]

38 Between the disease site and site k, we model the per chromosome, per generation recombination fraction, ρ, which is the frequency with which recombinations occur between those two sites. [sent-77, score-1.205]

39 Other sites linked to k can contribute to our estimate via their arrangement in single firstorder Markov chain with some sites falling to the left of the disease site and others to the right of the site of interest. [sent-78, score-1.26]

40 Previous work has shown that given a pedigree model, affection data, and genotype data, we can estimate ρ. [sent-79, score-0.603]

41 We define the likelihood as L(ρ) = P(P = p, G = g|ρ, f, µ) where ρ is the recombination probability between the disease site and the first site, p are the founder allele frequencies, and f are the penetrance probabilities. [sent-80, score-1.197]

42 To test for linkage between the disease site and the other sites, we maximize the likelihood to obtain the optimal recombination fraction ρ∗ = argmaxρ L(ρ)/L(1/2). [sent-81, score-1.039]

43 3 Methods In this section, we describe our model for inferring relationships between phenotypes and genotyped pedigree datasets. [sent-84, score-0.568]

44 The first step in this generative process consists in sampling a collection of disease model (DM) variables, which encode putative relationships between the genetic sites and the observed phenotypes. [sent-86, score-0.678]

45 There is one disease model variable for each site, s, and to a first approximation, Ds can be thought as taking values zero or one, depending on whether site s is the closest to the primary genetic factor involved in a disease (a more elaborate example is presented in the Supplement). [sent-87, score-1.224]

46 We will define the distribution of Pi conditionally on the haplotype of the individual in question, Hi , and on the global disease model D. [sent-95, score-0.644]

47 ) Figure 1: (a) The pedigree graphical model for independent sites. [sent-108, score-0.531]

48 The nodes are labeled as follows: M for the marriage node which enforces the Mendelian inheritance constraints, H for haplotype, L and L for the two alleles, D(1) for the disease site indicator, and D(2) for the disease allele value. [sent-110, score-1.364]

49 This transducer for HPT(·) models a recessive disease where the input at each state is the disease (top) and haplotype alleles (bottom). [sent-114, score-1.349]

50 The remaining variables (the non-founder individuals’ haplotype variables) are obtained deterministically from the values of the founders and the inheritance: Hi,s,x = Hx(i),s,Ri,s,x , where x(i) denotes the index of the father (mother) of i if x = ‘father’ (‘mother’). [sent-116, score-0.304]

51 The distribution on the founder haplotypes is a product of independent Bernoulli distributions, one for each site (the parameters of these Bernoulli distributions is not restricted to be identically distributed and can be estimated [3]). [sent-117, score-0.563]

52 Having generated all the haplotypes and disease variables, we denote the conditional distribution of the phenotypes as follows: Pi |(D, Hi ) ∼ HPT( · ; D, Hi ), where HPT stands for a Haplotype-Phenotype Transducer. [sent-119, score-0.689]

53 We also make two simplifications to facilitate exposition: first, that the disease site is one of the observed sites, and second, that the disease allele is the less frequent (minor) allele (we show in the Supplement a slightly more complicated transducer that does not make these assumptions). [sent-125, score-1.522]

54 Under the two above assumptions, we claim that the state diagrams in Figure 1(b) specify an HPT transducer for a recessive disease model. [sent-126, score-0.682]

55 0 denotes that a disease indicator is emitted with weight one. [sent-129, score-0.414]

56 The set of valid paths along with their weights can be thought of as encoding a parametric disease model. [sent-136, score-0.456]

57 For example, with a recessive disease, shown in Figure 1(b), we can see that if the transducer is at the site of the disease (encoded as the current symbol in c being equal to 1) then only an input homozygous haplotype ‘AA’ will lead to an output disease phenotype ‘d. [sent-137, score-1.605]

58 ’ This formalism gives a considerable amount of flexibility to the modeler, who can go beyond simple Mendelian disease models by constructing different transducers. [sent-138, score-0.436]

59 After explaining in more detail the graphical model of interest, we discuss in this section the approximation algorithm that we have used to infer haplotypes, disease loci, and other disease statistics. [sent-146, score-0.878]

60 Note that our graphical model has more cycles than standard pedigree graphical models [19]; even if we assumed the sites to be independent and the pedigree to be acyclic, our graphical model would still be cyclic. [sent-149, score-1.267]

61 Our inference method is based on the following observation: if we kept only one subtype of factors in the Supplement, say only those connected to the recombination variables R, then inference could be done easily. [sent-150, score-0.359]

62 The main difficulty arises when attempting to resample D: because of the deterministic constraints that arise even in 5 the simplest disease model, it is necessary to sample D in a block also containing a large subset of R and H. [sent-157, score-0.414]

63 Disease predictions were used to validate the HPT disease model. [sent-196, score-0.414]

64 The pedigree is built starting from the oldest generation. [sent-200, score-0.481]

65 20 (a) Forest-Cover Factors Figure 2: The pedigree was generated with the following parameters, number of generations 20 and n = 15 which resulted in a pedigree with 424 individuals, 197 marriage nodes, 47 founders. [sent-231, score-1.019]

66 Panel (a) shows the effect of removing factors from the forest cover of the pedigree where the lines are labeled with the number of factors that each experiment contains. [sent-234, score-0.627]

67 This panel shows that the recombination parameter can be off by an order of magnitude and the haplotype reconstruction is robust. [sent-243, score-0.47]

68 Genotype data were simulated in the simulated pedigree graph. [sent-244, score-0.539]

69 The founder haplotypes were drawn from an empirical distribution (see Supplement for details). [sent-245, score-0.295]

70 The recombination parameters used for inheritance are given in the Supplement. [sent-246, score-0.342]

71 We then simulated the inheritance and recombination process to obtain the haplotypes of the descendants using the external program [27]. [sent-247, score-0.559]

72 An independent 50% of individuals have missing phenotypes for the disease prediction comparison. [sent-251, score-0.659]

73 For the haplotype reconstruction, the inference being scored is, for each individual, the maximum a posteriori haplotype predicted by the marginal haplotype distribution. [sent-253, score-0.653]

74 These haplotypes are not necessarily Mendelian consistent, meaning that it is possible for a child to have an allele on the maternal haplotype that could not possibly be inherited from the mother according to the mother’s marginal distribution. [sent-254, score-0.581]

75 However, transforming the posterior distribution over haplotypes into a set of globally consistent haplotypes is somewhat orthogonal to the methods in this paper, and there exist methods for this task [28]. [sent-255, score-0.376]

76 The goal of this comparison is threefold: 1) to see if adding more factors improves inference, 2) to see if more iterations of the measure factorization algorithm help, and 3) to see if there is robustness of the results to the recombination parameters. [sent-256, score-0.301]

77 Synthetic founder haplotypes were simulated, see Supplement for details. [sent-257, score-0.295]

78 Each experiment was replicated 10 times where for each replicate the founder haplotypes were sampled with a different random seed. [sent-258, score-0.313]

79 We computed a metric φ which is a normalized count of the number of sites that differ between the held-out haplotype and the predicted haplotype. [sent-259, score-0.361]

80 For disease prediction, the inference being scored is the ranking of the sites given by our Bayesian method as compared with LOD estimates computed by Merlin [3]. [sent-265, score-0.604]

81 The disease models we consider are recessive f = (0. [sent-266, score-0.494]

82 The disease site is one of the sites chosen uniformly at random. [sent-273, score-0.837]

83 The goal of this comparison is to see whether our disease model performs at least as well as the LOD estimator used by Merlin. [sent-274, score-0.414]

84 16) Table 1: This table gives the performance of our method and Merlin for recessive and dominant diseases as measured by the disease prediction metric. [sent-317, score-0.494]

85 The sizes of the simulated pedigrees are given in the first three columns, the disease model in the next three columns, and the performance of our method and that of Merlin in the final four columns. [sent-318, score-0.657]

86 The founder haplotypes were taken from the phased haplotypes of the JPT+CHB HapMap [29] populations, see Supplement for details. [sent-322, score-0.483]

87 Each experiment was replicated 10 times where for each replicate the founder haplotypes were sampled with a different random seed. [sent-323, score-0.313]

88 We computed a metric ψ which is roughly the rank of the disease site in the sorted list of predictions given by each method. [sent-324, score-0.682]

89 Between delineated rows of the table, we can compare the effect of pedigree size, and we observe that larger pedigrees aid in disease site prediction. [sent-329, score-1.404]

90 Indeed, the largest pedigree of 1276 individuals reaches an accuracy of 6e−4 . [sent-330, score-0.608]

91 This pedigree is the largest pedigree that we know of being analyzed in the literature. [sent-331, score-0.962]

92 6 Discussion This paper introduces a new disease model and a new variational inference method which are applied to find a Bayesian solution to the disease-site correlation problem. [sent-332, score-0.469]

93 This is in contrast to traditional linkage analysis where a likelihood ratio statistic is computed to find the position of the disease site relative to a map of existing sites. [sent-333, score-0.803]

94 Instead, our approach is to use a Haplotype-Phenotype Transducer to obtain a posterior for the probability of each site to be the disease site. [sent-334, score-0.682]

95 Particularly with sequencing data, it is likely that either the disease site or a nearby site will be observed. [sent-336, score-0.983]

96 Our method performs well in practice both for genotype prediction and for disease site prediction. [sent-337, score-0.764]

97 In the presence of missing data, where for some individuals the whole genome is missing, our method is able to infer the missing genotypes with high accuracy. [sent-338, score-0.297]

98 As compared with LOD linkage analysis method, our method was better able to predict the disease site when one observed site was responsible for the disease. [sent-339, score-1.071]

99 A general model for the analysis of pedigree data. [sent-426, score-0.481]

100 On the complexity of fundamental computational problems in pedigree analysis. [sent-454, score-0.481]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pedigree', 0.481), ('disease', 0.414), ('site', 0.268), ('recombination', 0.236), ('pedigrees', 0.214), ('haplotype', 0.206), ('haplotypes', 0.188), ('transducer', 0.188), ('hpt', 0.165), ('sites', 0.155), ('lod', 0.13), ('individuals', 0.127), ('linkage', 0.121), ('allele', 0.119), ('genetic', 0.109), ('founder', 0.107), ('merlin', 0.107), ('inheritance', 0.106), ('phenotypes', 0.087), ('genome', 0.086), ('genotype', 0.082), ('recessive', 0.08), ('mother', 0.068), ('recomb', 0.067), ('forest', 0.061), ('mendelian', 0.053), ('penetrance', 0.053), ('zdm', 0.053), ('dm', 0.053), ('supplement', 0.052), ('father', 0.051), ('graphical', 0.05), ('alleles', 0.047), ('chromosome', 0.047), ('chromosomes', 0.047), ('hi', 0.044), ('affection', 0.04), ('whpt', 0.04), ('zhpt', 0.04), ('gd', 0.037), ('ds', 0.035), ('hapmap', 0.035), ('phenotype', 0.035), ('sturtevant', 0.035), ('inference', 0.035), ('factors', 0.034), ('generations', 0.033), ('sequencing', 0.033), ('missing', 0.031), ('factorization', 0.031), ('genetics', 0.031), ('aa', 0.03), ('technologies', 0.03), ('acyclic', 0.03), ('simulated', 0.029), ('genomic', 0.028), ('panel', 0.028), ('oe', 0.027), ('breakpoints', 0.027), ('delineated', 0.027), ('founders', 0.027), ('multilocus', 0.027), ('oval', 0.027), ('wdm', 0.027), ('wrecomb', 0.027), ('zrecomb', 0.027), ('graph', 0.025), ('individual', 0.024), ('albers', 0.024), ('collage', 0.024), ('kirkpatrick', 0.024), ('loci', 0.024), ('marriage', 0.024), ('reparameterization', 0.024), ('tractable', 0.023), ('formalism', 0.022), ('parents', 0.022), ('automaton', 0.022), ('genotypes', 0.022), ('panels', 0.022), ('paths', 0.021), ('valid', 0.021), ('variational', 0.02), ('index', 0.02), ('fk', 0.019), ('generation', 0.019), ('factor', 0.019), ('nodes', 0.019), ('connected', 0.019), ('analogues', 0.019), ('potentials', 0.018), ('incorrect', 0.018), ('came', 0.018), ('replicate', 0.018), ('male', 0.018), ('parent', 0.017), ('loops', 0.017), ('correlates', 0.017), ('ak', 0.017), ('cover', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difficult to interpret and nontrivial to extend to consider interactions among sites. These developments and difficulties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efficient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1

2 0.12959923 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

Author: Jonathan Huang, Daniel Alexander

Abstract: Accurate and detailed models of neurodegenerative disease progression are crucially important for reliable early diagnosis and the determination of effective treatments. We introduce the ALPACA (Alzheimer’s disease Probabilistic Cascades) model, a generative model linking latent Alzheimer’s progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a fixed event ordering, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed progression models. We describe efficient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer’s patients from the Alzheimer’s Disease Neuroimaging Initiative. 1

3 0.10886425 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Author: Lloyd Elliott, Yee W. Teh

Abstract: We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by the splitting and merging of their clusters. Our model can be thought of as a discrete analogue of the continuous fragmentation-coagulation process [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining accuracies comparable to other state-of-the-art genotype imputation methods. 1

4 0.076606378 102 nips-2012-Distributed Non-Stochastic Experts

Author: Varun Kanade, Zhenming Liu, Bozidar Radunovic

Abstract: We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1, . . . , n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O( log(n)T ) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1

5 0.074488737 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

6 0.057718329 37 nips-2012-Affine Independent Variational Inference

7 0.05669478 147 nips-2012-Graphical Models via Generalized Linear Models

8 0.049173005 151 nips-2012-High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

9 0.046838783 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

10 0.046348382 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

11 0.046202201 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

12 0.039803214 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

13 0.039013471 305 nips-2012-Selective Labeling via Error Bound Minimization

14 0.037066597 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

15 0.03585406 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

16 0.035515461 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

17 0.035473809 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

18 0.035008516 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

19 0.034591459 180 nips-2012-Learning Mixtures of Tree Graphical Models

20 0.033595674 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.098), (1, 0.024), (2, 0.003), (3, -0.005), (4, -0.055), (5, -0.015), (6, -0.006), (7, -0.03), (8, -0.078), (9, 0.039), (10, -0.02), (11, -0.013), (12, 0.004), (13, -0.017), (14, -0.041), (15, -0.029), (16, -0.0), (17, 0.012), (18, 0.012), (19, -0.048), (20, -0.004), (21, 0.025), (22, -0.089), (23, -0.068), (24, 0.047), (25, -0.048), (26, 0.022), (27, 0.035), (28, -0.064), (29, 0.013), (30, -0.012), (31, -0.08), (32, -0.0), (33, 0.016), (34, -0.029), (35, -0.03), (36, 0.07), (37, -0.031), (38, -0.047), (39, -0.024), (40, 0.002), (41, 0.098), (42, 0.01), (43, 0.05), (44, 0.149), (45, -0.016), (46, -0.083), (47, -0.066), (48, -0.059), (49, -0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8799603 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difficult to interpret and nontrivial to extend to consider interactions among sites. These developments and difficulties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efficient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1

2 0.72410333 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

Author: Jonathan Huang, Daniel Alexander

Abstract: Accurate and detailed models of neurodegenerative disease progression are crucially important for reliable early diagnosis and the determination of effective treatments. We introduce the ALPACA (Alzheimer’s disease Probabilistic Cascades) model, a generative model linking latent Alzheimer’s progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a fixed event ordering, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed progression models. We describe efficient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer’s patients from the Alzheimer’s Disease Neuroimaging Initiative. 1

3 0.5238018 151 nips-2012-High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

Author: Hua Wang, Feiping Nie, Heng Huang, Jingwen Yan, Sungeun Kim, Shannon Risacher, Andrew Saykin, Li Shen

Abstract: Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in imaging and cognitive data by structured sparsity-inducing norms. The sparsity of the model enables the selection of a small number of imaging measures while maintaining high prediction accuracy. The empirical studies, using the longitudinal imaging and cognitive data of the ADNI cohort, have yielded promising results.

4 0.50034773 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

5 0.50018412 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

Author: Won H. Kim, Deepti Pachauri, Charles Hatt, Moo. K. Chung, Sterling Johnson, Vikas Singh

Abstract: Hypothesis testing on signals defined on surfaces (such as the cortical surface) is a fundamental component of a variety of studies in Neuroscience. The goal here is to identify regions that exhibit changes as a function of the clinical condition under study. As the clinical questions of interest move towards identifying very early signs of diseases, the corresponding statistical differences at the group level invariably become weaker and increasingly hard to identify. Indeed, after a multiple comparisons correction is adopted (to account for correlated statistical tests over all surface points), very few regions may survive. In contrast to hypothesis tests on point-wise measurements, in this paper, we make the case for performing statistical analysis on multi-scale shape descriptors that characterize the local topological context of the signal around each surface vertex. Our descriptors are based on recent results from harmonic analysis, that show how wavelet theory extends to non-Euclidean settings (i.e., irregular weighted graphs). We provide strong evidence that these descriptors successfully pick up group-wise differences, where traditional methods either fail or yield unsatisfactory results. Other than this primary application, we show how the framework allows performing cortical surface smoothing in the native space without mappint to a unit sphere. 1

6 0.44189355 46 nips-2012-Assessing Blinding in Clinical Trials

7 0.42500865 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

8 0.41892526 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

9 0.41168556 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

10 0.3938424 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

11 0.38344786 215 nips-2012-Minimizing Uncertainty in Pipelines

12 0.35673469 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

13 0.340137 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

14 0.33583373 22 nips-2012-A latent factor model for highly multi-relational data

15 0.33231986 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

16 0.32710704 206 nips-2012-Majorization for CRFs and Latent Likelihoods

17 0.32444695 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

18 0.32314584 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

19 0.31743324 234 nips-2012-Multiresolution analysis on the symmetric group

20 0.31551033 147 nips-2012-Graphical Models via Generalized Linear Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.035), (9, 0.027), (21, 0.023), (38, 0.089), (39, 0.017), (42, 0.017), (54, 0.017), (55, 0.019), (74, 0.052), (76, 0.089), (79, 0.413), (80, 0.077), (92, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72189951 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difficult to interpret and nontrivial to extend to consider interactions among sites. These developments and difficulties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efficient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1

2 0.63303494 128 nips-2012-Fast Resampling Weighted v-Statistics

Author: Chunxiao Zhou, Jiseong Park, Yun Fu

Abstract: In this paper, a novel and computationally fast algorithm for computing weighted v-statistics in resampling both univariate and multivariate data is proposed. To avoid any real resampling, we have linked this problem with finite group action and converted it into a problem of orbit enumeration. For further computational cost reduction, an efficient method is developed to list all orbits by their symmetry orders and calculate all index function orbit sums and data function orbit sums recursively. The computational complexity analysis shows reduction in the computational cost from n! or nn level to low-order polynomial level. 1

3 0.44572949 16 nips-2012-A Polynomial-time Form of Robust Regression

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

Abstract: Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression—Variational M-estimation—that unifies a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods. 1

4 0.38318375 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Author: Lloyd Elliott, Yee W. Teh

Abstract: We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by the splitting and merging of their clusters. Our model can be thought of as a discrete analogue of the continuous fragmentation-coagulation process [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining accuracies comparable to other state-of-the-art genotype imputation methods. 1

5 0.36628515 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

6 0.36623329 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

7 0.36618418 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

8 0.36429295 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

9 0.3641195 168 nips-2012-Kernel Latent SVM for Visual Recognition

10 0.36404076 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

11 0.36395562 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

12 0.36240244 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

13 0.36230177 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

14 0.36192289 197 nips-2012-Learning with Recursive Perceptual Representations

15 0.36032292 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

16 0.35988525 65 nips-2012-Cardinality Restricted Boltzmann Machines

17 0.35978824 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

18 0.35898641 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

19 0.35888633 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

20 0.35879242 200 nips-2012-Local Supervised Learning through Space Partitioning