nips nips2006 nips2006-90 knowledge-graph by maker-knowledge-mining

90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

Source: pdf

Author: Kyung-ah Sohn, Eric P. Xing

Abstract: We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among possibly inﬁnite number of founders and the coalescence-with-mutation events in the resulting genealogies. The HMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. We have developed an efﬁcient sampling algorithm for HMDP based on a two-level nested P´ lya urn scheme. On both simulated o and real SNP haplotype data, our method performs competitively or signiﬁcantly better than extant methods in uncovering the recombination hotspots along chromosomal loci; and in addition it also infers the ancestral genetic patterns and offers a highly accurate map of ancestral compositions of modern populations. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among possibly inﬁnite number of founders and the coalescence-with-mutation events in the resulting genealogies. [sent-6, score-0.325]

2 The HMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. [sent-7, score-1.536]

3 Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. [sent-8, score-1.181]

4 1 Introduction Recombinations between ancestral chromosomes during meiosis play a key role in shaping the patterns of linkage disequilibrium (LD)—the non-random association of alleles at different loci—in a population. [sent-11, score-0.331]

5 The deluge of SNP data also fuels the long-standing interest of analyzing patterns of genetic variations to reconstruct the evolutionary history and ancestral structures of human populations, using, for example, variants of admixture models on genetic polymorphisms [Rosenberg et al. [sent-17, score-0.521]

6 In this paper, we leverage on this approach and present a uniﬁed framework to model complex genetic inheritance process that allows recombinations among possibly inﬁnite founding alleles and coalescence-with-mutation events in the resulting genealogies. [sent-24, score-0.507]

7 AK Inheritance of unknown generations Hi Hi 0 1 Figure 1: An illustration of a hidden Markov Dirichlet process for haplotype recombination and inheritance. [sent-29, score-0.873]

8 We assume that individual chromosomes in a modern population are originated from an unknown number of ancestral haplotypes via biased random recombinations and mutations (Fig 1). [sent-30, score-0.709]

9 The recombinations between the ancestors follow a a state-transition process we refer to as hidden Markov Dirichlet process (originated from the inﬁnite HMM by Beal et al. [sent-31, score-0.328]

10 [2001]), which travels in an open ancestor space, with nonstationary recombination rates depending on the genetic distances between SNP loci. [sent-32, score-0.957]

11 Our model draws inspiration from the HMM proposed in [Greenspan and Geiger, 2003], but we employ a two-level P´ lya urn scheme akin to the hierarchical DP [Teh et al. [sent-33, score-0.377]

12 , 2004] to aco commodate an open ancestor space, and allow full posterior inference of the recombination sites, mutation rates, haplotype origin, ancestor patterns, etc. [sent-34, score-1.743]

13 2 Hidden Markov Dirichlet Process for Recombination Sequentially choosing recombination targets from a set of ancestral chromosomes can be modeled as a hidden Markov process [Niu et al. [sent-37, score-0.741]

14 In this section, we describe such an inﬁnite HMM formalism, which we would like to call hidden Markov Dirichlet process, for modeling recombination in an open ancestral space. [sent-42, score-0.638]

15 A haplotype refers to the joint allele conﬁguration of a contiguous list of SNPs located on a single chromosome (Fig 1). [sent-46, score-0.513]

16 Under a well-known genetic model known as coalescence-with-mutation (but without recombination), one can treat a haplotype from a modern individual as a descendent of an unknown ancestor haplotype (i. [sent-47, score-1.462]

17 It can be shown that such a coalescent process in an inﬁnite population leads to a partition of the population that can be succinctly captured by the following P´ lya urn scheme. [sent-50, score-0.517]

18 At each step we either draw a ball from the urn and replace it with two balls of the same color, or we are given a ball of a new color which we place in the urn. [sent-52, score-0.433]

19 , Hi,T ] denote a haplotype over T SNPs from chromosome i 1 ; let Ak = [Ak,1 , . [sent-62, score-0.513]

20 , Ak,T ] denote an ancestor haplotype (indexed by k) and θk denote the mutation rate of ancestor k; and let Ci denote an inheritance variable that speciﬁes the ancestor of haplotype Hi . [sent-65, score-2.342]

21 [2006], under a DP mixture, we have the following P´ lya urn o 1 We ignore the parental origin index of haplotype as used in Xing et al. [sent-67, score-0.791]

22 – sample the founder of haplotype i (indexed by ci ): = {acj , θcj } if ci = cj for some j < i (i. [sent-80, score-0.633]

23 , ci refers to a new founder) ci |DP(τ, Q0 ) ∼ – sample the haplotype according to its founder: hi | ci ∼ Ph (·|aci , θci ). [sent-84, score-0.61]

24 Notice that the above generative process assumes each modern haplotype to be originated from a single ancestor, this is only plausible for haplotypes spanning a short region on a chromosome. [sent-85, score-0.7]

25 Now we consider long haplotypes possibly bearing multiple ancestors due to recombinations between an unknown number of founders. [sent-86, score-0.386]

26 We set up a single “stock” urn at the top level, which contains balls of colors that are represented by at least one ball in one or multiple urns at the bottom level. [sent-99, score-0.404]

27 Now let’s suppose that at time t the stock urn contains n balls of K distinct colors indexed by an integer set C = {1, 2, . [sent-103, score-0.364]

28 , K}; the number of balls of color k in this urn is denoted by nk , k ∈ C. [sent-106, score-0.364]

29 , uK , let mj,k denote the number of balls of color k in urn uj , and mj = k∈C mj,k denote the total number of balls in urn uj . [sent-110, score-0.696]

30 Then at time t, we either draw a ball randomly from urn uk , and place back two balls both of that color; or with probability mjτ+τ we turn to the top level. [sent-112, score-0.345]

31 From the stock urn, we can either draw a ball randomly and put back two balls of that color to the stock urn and one to uk , or obtain γ a ball of a new color K + 1 with probability n+γ and put back a ball of this color to both the stock urn and urn uk of the lower level. [sent-113, score-1.232]

32 3 HMDP Model for Recombination and Inheritance Now we describe a stochastic model, based on an HMDP, for generating individual haplotypes in a modern population from a hypothetical pool of ancestral haplotypes via recombination and mutations (i. [sent-126, score-1.134]

33 , Ci,T ] denote the sequence of inheritance variables specifying the index of the ancestral chromosome at each SNP locus. [sent-132, score-0.437]

34 When no recombination takes place during the inheritance process that produces haplotype Hi (say, from ancestor k), then Ci,t = k, ∀t. [sent-133, score-1.423]

35 When a recombination occurs, say, between loci t and t + 1, we have Ci,t = Ci,t+1 . [sent-134, score-0.517]

36 , ancestor chromosome k) pairs with a target state (e. [sent-138, score-0.5]

37 , ancestor chromosome k ) between loci t and t+1, with probability (1−e−dr )πkk . [sent-140, score-0.628]

38 Hence, each haplotype Hi is a mosaic of segments of multiple ancestral chromosomes from the ancestral pool {Ak,· }∞ . [sent-141, score-0.894]

39 The emission process of the HMDM corresponds to an inheritance model from an ancestor to the matching descendent. [sent-144, score-0.602]

40 [2004]: 1 − θ I(ht =at ) p(ht |at , θ) = θI(ht =at ) , (2) |B| − 1 where ht and at denote the alleles at locus t of an individual haplotype and its corresponding ancestor, respectively; θ indicates the ancestor-speciﬁc mutation rate; and |B| denotes the number of possible alleles. [sent-146, score-0.682]

41 [2004], assume that the mutation rate θ admits a Beta prior, the marginal conditional likelihood of a haplotype given its matching ancestor can be computed by integrating out θ under the Bayesian rule. [sent-150, score-0.911]

42 The variables of interest include {Ci,t }, the inheritance variables specifying the origins of SNP alleles of all loci on each haplotype; and {Ak,t }, the founding alleles at all loci of each ancestral haplotype. [sent-152, score-0.807]

43 First it samples the inheritance variables {ci,t }, conditioning on all given individual haplotypes h = {h1 , . [sent-154, score-0.372]

44 , h2N }, and the most recently sampled conﬁguration of the ancestor pool a = {a1 , . [sent-157, score-0.419]

45 , aK }; then given h and current values of the ci,t ’s, it samples every ancestor ak . [sent-160, score-0.451]

46 (For simplicity we omit the haplotype index i here and in the forthcoming expositions when it is clear from context that the statements or formulas apply to all individual haplotypes. [sent-166, score-0.457]

47 , the urns corresponding to the recombination choices by each ancestor) — {mk : ∀k} ∪ {mk,k : ∀k, k }. [sent-172, score-0.448]

48 And let lk denote the sufﬁcient statistics associated with all haplotype instances originated from ancestor k. [sent-173, score-0.882]

49 Note that naively, the sampling space of an inheritance block of length δ is |A|δ where |A| represents the cardinality of the ancestor pool. [sent-183, score-0.623]

50 However, if we assume that the recombination rate is low and block length is not too big, then the probability of having two or more recombination events within a δ-block is very small and thus can be ignored. [sent-184, score-0.842]

51 , |A| possible recombination targets times δ possible recombination locations. [sent-187, score-0.778]

52 , transition to a new ancestor, where πk represents the transition probability vector for ancestor k under HMDP, as deﬁned in Eq. [sent-197, score-0.493]

53 Note that when a new ancestor aK+1 is instantiated, we need to immediately instantiate a new DP under F to model the transition probabilities from this ancestor to all instantiated ancestors (including itself). [sent-199, score-1.034]

54 For example, at the distal boarder of the δ-block, since ct+δ+1 always indexes a previously inherited ancestor (and therefore must be present in the stock-urn), we have: nc (5) p(ct+δ+1 |ct+δ = K + 1, m, n) = λ × t+δ+1 . [sent-205, score-0.439]

55 (2), by integrating out the mutation rate θ under a Beta prior (and also the ancestor a under a uniform prior if ct refers to an ancestor to be newly instantiated) [Xing et al. [sent-211, score-1.048]

56 In particular, if an ancestor gets no occupancy in either the stock and the HMM urns afterwards, we remove it from our repository. [sent-216, score-0.581]

57 , a new ancestor is to be drawn), we update n = n + 1, set nK+1 = 1, mct = mct + 1, mct ,K+1 = 1, and set up a new (empty) HMM urn with color K + 1 (i. [sent-220, score-0.764]

58 To instantiate a complete ancestor, after sampling the alleles in the ancestor corresponding to the segment according to Eq. [sent-227, score-0.504]

59 4 Experiments We applied the HMDP model to both simulated and real haplotype data. [sent-230, score-0.473]

60 1 2 4 3 5 Figure 2: Analysis of simulated haplotype populations. [sent-237, score-0.473]

61 (a) A comparison of ancestor reconstruction errors for the ﬁve ancestors (indexed along x-axis). [sent-238, score-0.571]

62 (b) A plot of the empirical recombination rates along 100 SNP loci in one of the 30 populations. [sent-240, score-0.517]

63 (c) The true (panel 1) and estimated (panel 2 for HMDP, and panel 3-5 for 3 HMMs) population maps of ancestral compositions in a simulated population. [sent-242, score-0.382]

64 1 Analyzing simulated haplotype population To simulate a population of individual haplotypes, we started with a ﬁxed number, Ks (unknown to the HMDP model), of randomly generated ancestor haplotypes, on each of which a set of recombination hotspots are (randomly) pre-speciﬁed. [sent-245, score-1.581]

65 At the hotspots, we deﬁned the recombination rate to be 0. [sent-247, score-0.389]

66 , 200 haplotypes) with 100 SNPs were generated from Ks = 5 ancestor haplotypes. [sent-253, score-0.419]

67 From samples of ancestor states {ak,t }, we reconstructed the ancestral haplotypes under the HMDP model. [sent-258, score-0.839]

68 We deﬁne the ancestor reconstruction error a for each ancestor to be the ratio of incorrectly recovered loci over all the chromosomal sites. [sent-260, score-1.015]

69 Speciﬁcally, the average (over all simulated populations) fraction of SNP loci originated from each ancestor among all loci in the population is 0. [sent-269, score-0.842]

70 As one would expect, the higher the population frequency an ancestor is, the better its reconstruction accuracy. [sent-275, score-0.534]

71 Interestingly, under the ﬁxed-dimensional HMM, even when we use the correct number of ancestor states, i. [sent-276, score-0.419]

72 We conjecture that this is because the non-parametric Bayesian treatment of the transition rates and ancestor conﬁgurations under the HMDP model leads to a desirable adaptive smoothing effect and also less constraints on the model parameters, which allow them to be more accurately estimated. [sent-280, score-0.456]

73 (a) A plot of λe estimated via HMDP; and the haplotype block boundaries according to HMDP (black solid line), HMM [Daly et al. [sent-287, score-0.529]

74 (b) IT scores for haplotype blocks from each method. [sent-289, score-0.432]

75 LD-block Analysis From samples of the inheritance variables {ci,t } under HMDP, we can infer the recombination status of each locus of each haplotype. [sent-290, score-0.604]

76 We deﬁne the empirical recombination rates λe at each locus to be the ratio of individuals who had recombinations at that locus over the total number of haploids in the population. [sent-291, score-0.588]

77 We can identify the recombination hotspots directly from such a plot based on an empirical threshold λt (i. [sent-293, score-0.474]

78 For comparison, we also give the true recombination hotspots (depicted as dotted vertical lines) chosen in the ancestors for simulating the recombinant population. [sent-297, score-0.606]

79 Population Structural Analysis Finally, from samples of the inheritance variables {ci,t }, we can also uncover the genetic origins of all loci of each individual haplotype in a population. [sent-301, score-0.871]

80 For each individual, we deﬁne an empirical ancestor composition vector ηe , which records the fractions of every ancestor in all the ci,t ’s of that individuals. [sent-302, score-0.858]

81 In the population map, each individual is represented by a thin vertical line which is partitioned into colored segments in proportion to the ancestral fraction recorded by ηe . [sent-304, score-0.336]

82 Five population maps, corresponding to (1) true ancestor compositions, (2) ancestor compositions inferred by HMDP, and (3-5) ancestor compositions inferred by HMMs with 3, 5, 10 states, respectively, are shown in Fig 2c. [sent-305, score-1.494]

83 To assess the accuracy of our estimation, we calculated the distance between the true ancestor compositions and the estimated ones as the mean squared distance between true and the estimated ηe over all individuals in a population, and then over all 30 simulated populations. [sent-306, score-0.534]

84 2 Analyzing two real haplotype datasets We applied HMDP to two real haplotype datasets, the single-population Daly data [Daly et al. [sent-316, score-0.918]

85 We ﬁrst analyzed the 256 individuals from Daly data We compared the recovered recombination hotspots with those reported in Daly et al. [sent-320, score-0.551]

86 3a shows the plot of empirical recombination rates estimated under HMDP, side-by-side with the reported recombination hotspots. [sent-323, score-0.778]

87 There is no ground truth to judge which one is correct; hence we computed information-theoretic (IT) scores based on the estimated within-block haplotype frequencies and the between-block transition probabilities under each model for a comparison. [sent-324, score-0.469]

88 The left panel of Fig 3b shows the total pairwise mutual information between adjacent haplotype blocks segmented by the recombination hotspots uncovered by the three methods. [sent-325, score-0.906]

89 Note that the Daly and the MDL methods allow the number of haplotype founders to vary across blocks to get the most compact local ancestor constructions. [sent-330, score-0.91]

90 Thus their reported scores might be an underestimate of the true global score because certain segments of an ancestor haplotype that are not or rarely inherited are not counted in the score. [sent-331, score-0.892]

91 Thus the low IT scores achieved by HMDP suggest that HMDP can effectively avoid inferring spurious global and local ancestor patterns. [sent-332, score-0.419]

92 This is conﬁrmed by the population map shown in Fig 4a, which shows that HMDP recovered 6 ancestors and among them the 3 dominant ancestors account for 98% of all the modern haplotypes in the population. [sent-333, score-0.574]

93 We found that the two dominant haplotypes covered over 85% of the CEPH population (and the overall breakup among all four ancestors is 0. [sent-341, score-0.413]

94 On the other hand, the frequencies of each ancestor in YRI population are 0. [sent-346, score-0.514]

95 Due to space limit, we omit the recombination map of this dataset. [sent-353, score-0.389]

96 While our current model employs only phased haplotype data, it is straightforward to generalize it to unphased genotype data as provided by the HapMap project. [sent-360, score-0.482]

97 Finding haplotype block boundaries by using the minimum-descriptionlength principle. [sent-367, score-0.475]

98 Bayesian haplotype inference for multiple linked single nucleotide polymorphisms. [sent-422, score-0.432]

99 Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. [sent-431, score-0.513]

100 Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture. [sent-488, score-0.542]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('haplotype', 0.432), ('ancestor', 0.419), ('recombination', 0.389), ('hmdp', 0.271), ('urn', 0.237), ('ancestral', 0.195), ('haplotypes', 0.186), ('inheritance', 0.161), ('ancestors', 0.132), ('loci', 0.128), ('genetic', 0.125), ('hmm', 0.105), ('daly', 0.102), ('ct', 0.096), ('population', 0.095), ('alleles', 0.085), ('hotspots', 0.085), ('snp', 0.085), ('chromosome', 0.081), ('xing', 0.077), ('dirichlet', 0.07), ('founder', 0.068), ('lya', 0.068), ('recombinations', 0.068), ('stock', 0.065), ('dp', 0.063), ('balls', 0.062), ('mutation', 0.06), ('founders', 0.059), ('urns', 0.059), ('mj', 0.056), ('et', 0.054), ('locus', 0.054), ('chromosomes', 0.051), ('compositions', 0.051), ('ci', 0.049), ('ball', 0.046), ('hapmap', 0.044), ('block', 0.043), ('fig', 0.042), ('color', 0.042), ('simulated', 0.041), ('states', 0.039), ('occupancy', 0.038), ('transition', 0.037), ('ld', 0.037), ('cj', 0.035), ('ceph', 0.034), ('greenspan', 0.034), ('novembre', 0.034), ('sohn', 0.034), ('yri', 0.034), ('ak', 0.032), ('originated', 0.031), ('populations', 0.031), ('hi', 0.031), ('beal', 0.03), ('hidden', 0.03), ('chromosomal', 0.029), ('mutations', 0.029), ('modern', 0.029), ('genetics', 0.028), ('dr', 0.028), ('rosenberg', 0.027), ('instantiated', 0.027), ('anderson', 0.027), ('teh', 0.026), ('ht', 0.026), ('founding', 0.025), ('niu', 0.025), ('phased', 0.025), ('thorisson', 0.025), ('uncovering', 0.025), ('individual', 0.025), ('genotype', 0.025), ('open', 0.024), ('hmms', 0.024), ('individuals', 0.023), ('nk', 0.023), ('mk', 0.023), ('process', 0.022), ('polymorphisms', 0.022), ('snps', 0.022), ('dps', 0.022), ('kk', 0.022), ('mct', 0.022), ('markov', 0.022), ('events', 0.021), ('act', 0.021), ('segments', 0.021), ('reconstruction', 0.02), ('records', 0.02), ('inherited', 0.02), ('blackwell', 0.02), ('mdl', 0.02), ('patil', 0.02), ('inferred', 0.02), ('geiger', 0.019), ('hierarchical', 0.018), ('bayesian', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

Author: Kyung-ah Sohn, Eric P. Xing

2 0.050741278 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects

Author: Seyoung Kim, Padhraic Smyth

Abstract: Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI). 1

3 0.046926159 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

Author: Su-in Lee, Varun Ganapathi, Daphne Koller

Abstract: Markov networks are commonly used in a wide variety of applications, ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally efﬁcient method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the log-linear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efﬁcient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: pixel values in the MNIST data, and genetic sequence variations in the human HapMap data. We show that our L1 -based method achieves considerably higher generalization performance than the more standard L2 -based method (a Gaussian parameter prior) or pure maximum-likelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem.

4 0.041581664 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

Author: Jeremy Lewi, Robert Butera, Liam Paninski

Abstract: Adaptively optimizing experiments can signiﬁcantly reduce the number of trials needed to characterize neural responses using parametric statistical models. However, the potential for these methods has been limited to date by severe computational challenges: choosing the stimulus which will provide the most information about the (typically high-dimensional) model parameters requires evaluating a high-dimensional integration and optimization in near-real time. Here we present a fast algorithm for choosing the optimal (most informative) stimulus based on a Fisher approximation of the Shannon information and specialized numerical linear algebra techniques. This algorithm requires only low-rank matrix manipulations and a one-dimensional linesearch to choose the stimulus and is therefore efﬁcient even for high-dimensional stimulus and parameter spaces; for example, we require just 15 milliseconds on a desktop computer to optimize a 100-dimensional stimulus. Our algorithm therefore makes real-time adaptive experimental design feasible. Simulation results show that model parameters can be estimated much more efﬁciently using these adaptive techniques than by using random (nonadaptive) stimuli. Finally, we generalize the algorithm to efﬁciently handle both fast adaptation due to spike-history effects and slow, non-systematic drifts in the model parameters. Maximizing the efﬁciency of data collection is important in any experimental setting. In neurophysiology experiments, minimizing the number of trials needed to characterize a neural system is essential for maintaining the viability of a preparation and ensuring robust results. As a result, various approaches have been developed to optimize neurophysiology experiments online in order to choose the “best” stimuli given prior knowledge of the system and the observed history of the cell’s responses. The “best” stimulus can be deﬁned a number of different ways depending on the experimental objectives. One reasonable choice, if we are interested in ﬁnding a neuron’s “preferred stimulus,” is the stimulus which maximizes the ﬁring rate of the neuron [1, 2, 3, 4]. Alternatively, when investigating the coding properties of sensory cells it makes sense to deﬁne the optimal stimulus in terms of the mutual information between the stimulus and response [5]. Here we take a system identiﬁcation approach: we deﬁne the optimal stimulus as the one which tells us the most about how a neural system responds to its inputs [6, 7]. We consider neural systems in † ‡ http://www.prism.gatech.edu/∼gtg120z http://www.stat.columbia.edu/∼liam which the probability p(rt |{xt , xt−1 , ..., xt−tk }, {rt−1 , . . . , rt−ta }) of the neural response rt given the current and past stimuli {xt , xt−1 , ..., xt−tk }, and the observed recent history of the neuron’s activity, {rt−1 , . . . , rt−ta }, can be described by a model p(rt |{xt }, {rt−1 }, θ), speciﬁed by a ﬁnite vector of parameters θ. Since we estimate these parameters from experimental trials, we want to choose our stimuli so as to minimize the number of trials needed to robustly estimate θ. Two inconvenient facts make it difﬁcult to realize this goal in a computationally efﬁcient manner: 1) model complexity — we typically need a large number of parameters to accurately model a system’s response p(rt |{xt }, {rt−1 }, θ); and 2) stimulus complexity — we are typically interested in neural responses to stimuli xt which are themselves very high-dimensional (e.g., spatiotemporal movies if we are dealing with visual neurons). In particular, it is computationally challenging to 1) update our a posteriori beliefs about the model parameters p(θ|{rt }, {xt }) given new stimulus-response data, and 2) ﬁnd the optimal stimulus quickly enough to be useful in an online experimental context. In this work we present methods for solving these problems using generalized linear models (GLM) for the input-output relationship p(rt |{xt }, {rt−1 }, θ) and certain Gaussian approximations of the posterior distribution of the model parameters. Our emphasis is on ﬁnding solutions which scale well in high dimensions. We solve problem (1) by using efﬁcient rank-one update methods to update the Gaussian approximation to the posterior, and problem (2) by a reduction to a highly tractable onedimensional optimization problem. Simulation results show that the resulting algorithm produces a set of stimulus-response pairs which is much more informative than the set produced by random sampling. Moreover, the algorithm is efﬁcient enough that it could feasibly run in real-time. Neural systems are highly adaptive and more generally nonstatic. A robust approach to optimal experimental design must be able to cope with changes in θ. We emphasize that the model framework analyzed here can account for three key types of changes: stimulus adaptation, spike rate adaptation, and random non-systematic changes. Adaptation which is completely stimulus dependent can be accounted for by including enough stimulus history terms in the model p(rt |{xt , ..., xt−tk }, {rt−1 , ..., rt−ta }). Spike-rate adaptation effects, and more generally spike history-dependent effects, are accounted for explicitly in the model (1) below. Finally, we consider slow, non-systematic changes which could potentially be due to changes in the health, arousal, or attentive state of the preparation. Methods We model a neuron as a point process whose conditional intensity function (instantaneous ﬁring rate) is given as the output of a generalized linear model (GLM) [8, 9]. This model class has been discussed extensively elsewhere; brieﬂy, this class is fairly natural from a physiological point of view [10], with close connections to biophysical models such as the integrate-and-ﬁre cell [9], and has been applied in a wide variety of experimental settings [11, 12, 13, 14]. The model is summarized as: tk λt = E(rt ) = f ta aj rt−j ki,t−l xi,t−l + i l=1 (1) j=1 In the above summation the ﬁlter coefﬁcients ki,t−l capture the dependence of the neuron’s instantaneous ﬁring rate λt on the ith component of the vector stimulus at time t − l, xt−l ; the model therefore allows for spatiotemporal receptive ﬁelds. For convenience, we arrange all the stimulus coefﬁcients in a vector, k, which allows for a uniform treatment of the spatial and temporal components of the receptive ﬁeld. The coefﬁcients aj model the dependence on the observed recent activity r at time t − j (these terms may reﬂect e.g. refractory effects, burstiness, ﬁring-rate adaptation, etc., depending on the value of the vector a [9]). For convenience we denote the unknown parameter vector as θ = {k; a}. The experimental objective is the estimation of the unknown ﬁlter coefﬁcients, θ, given knowledge of the stimuli, xt , and the resulting responses rt . We chose the nonlinear stage of the GLM, the link function f (), to be the exponential function for simplicity. This choice ensures that the log likelihood of the observed data is a concave function of θ [9]. Representing and updating the posterior. As emphasized above, our ﬁrst key task is to efﬁciently update the posterior distribution of θ after t trials, p(θt |xt , rt ), as new stimulus-response pairs are trial 100 trial 500 trial 2500 trial 5000 θ true 1 info. max. trial 0 0 random −1 (a) random info. max. 2000 Time(Seconds) Entropy 1500 1000 500 0 −500 0 1000 2000 3000 Iteration (b) 4000 5000 0.1 total time diagonalization posterior update 1d line Search 0.01 0.001 0 200 400 Dimensionality 600 (c) Figure 1: A) Plots of the estimated receptive ﬁeld for a simulated visual neuron. The neuron’s receptive ﬁeld θ has the Gabor structure shown in the last panel (spike history effects were set to zero for simplicity here, a = 0). The estimate of θ is taken as the mean of the posterior, µt . The images compare the accuracy of the estimates using information maximizing stimuli and random stimuli. B) Plots of the posterior entropies for θ in these two cases; note that the information-maximizing stimuli constrain the posterior of θ much more effectively than do random stimuli. C) A plot of the timing of the three steps performed on each iteration as a function of the dimensionality of θ. The timing for each step was well-ﬁt by a polynomial of degree 2 for the diagonalization, posterior update and total time, and degree 1 for the line search. The times are an average over many iterations. The error-bars for the total time indicate ±1 std. observed. (We use xt and rt to abbreviate the sequences {xt , . . . , x0 } and {rt , . . . , r0 }.) To solve this problem, we approximate this posterior as a Gaussian; this approximation may be justiﬁed by the fact that the posterior is the product of two smooth, log-concave terms, the GLM likelihood function and the prior (which we assume to be Gaussian, for simplicity). Furthermore, the main theorem of [7] indicates that a Gaussian approximation of the posterior will be asymptotically accurate. We use a Laplace approximation to construct the Gaussian approximation of the posterior, p(θt |xt , rt ): we set µt to the peak of the posterior (i.e. the maximum a posteriori (MAP) estimate of θ), and the covariance matrix Ct to the negative inverse of the Hessian of the log posterior at µt . In general, computing these terms directly requires O(td2 + d3 ) time (where d = dim(θ); the time-complexity increases with t because to compute the posterior we must form a product of t likelihood terms, and the d3 term is due to the inverse of the Hessian matrix), which is unfortunately too slow when t or d becomes large. Therefore we further approximate p(θt−1 |xt−1 , rt−1 ) as Gaussian; to see how this simpliﬁes matters, we use Bayes to write out the posterior: 1 −1 log p(θ|rt , xt ) = − (θ − µt−1 )T Ct−1 (θ − µt−1 ) + − exp {xt ; rt−1 }T θ 2 + rt {xt ; rt−1 }T θ + const d log p(θ|rt , xt ) −1 = −(θ − µt−1 )T Ct−1 + (2) − exp({xt ; rt−1 }T θ) + rt {xt ; rt−1 }T dθ d2 log p(θ|rt , xt ) −1 = −Ct−1 − exp({xt ; rt−1 }T θ){xt ; rt−1 }{xt ; rt−1 }T dθi dθj (3) Now, to update µt we only need to ﬁnd the peak of a one-dimensional function (as opposed to a d-dimensional function); this follows by noting that that the likelihood only varies along a single direction, {xt ; rt−1 }, as a function of θ. At the peak of the posterior, µt , the ﬁrst term in the gradient must be parallel to {xt ; rt−1 } because the gradient is zero. Since Ct−1 is non-singular, µt − µt−1 must be parallel to Ct−1 {xt ; rt−1 }. Therefore we just need to solve a one dimensional problem now to determine how much the mean changes in the direction Ct−1 {xt ; rt−1 }; this requires only O(d2 ) time. Moreover, from the second derivative term above it is clear that computing Ct requires just a rank-one matrix update of Ct−1 , which can be evaluated in O(d2 ) time via the Woodbury matrix lemma. Thus this Gaussian approximation of p(θt−1 |xt−1 , rt−1 ) provides a large gain in efﬁciency; our simulations (data not shown) showed that, despite this improved efﬁciency, the loss in accuracy due to this approximation was minimal. Deriving the (approximately) optimal stimulus. To simplify the derivation of our maximization strategy, we start by considering models in which the ﬁring rate does not depend on past spiking, so θ = {k}. To choose the optimal stimulus for trial t + 1, we want to maximize the conditional mutual information I(θ; rt+1 |xt+1 , xt , rt ) = H(θ|xt , rt ) − H(θ|xt+1 , rt+1 ) (4) with respect to the stimulus xt+1 . The ﬁrst term does not depend on xt+1 , so maximizing the information requires minimizing the conditional entropy H(θ|xt+1 , rt+1 ) = p(rt+1 |xt+1 ) −p(θ|rt+1 , xt+1 ) log p(θ|rt+1 , xt+1 )dθ = Ert+1 |xt+1 log det[Ct+1 ] + const. rt+1 (5) We do not average the entropy of p(θ|rt+1 , xt+1 ) over xt+1 because we are only interested in the conditional entropy for the particular xt+1 which will be presented next. The equality above is due to our Gaussian approximation of p(θ|xt+1 , rt+1 ). Therefore, we need to minimize Ert+1 |xt+1 log det[Ct+1 ] with respect to xt+1 . Since we set Ct+1 to be the negative inverse Hessian of the log-posterior, we have: −1 Ct+1 = Ct + Jobs (rt+1 , xt+1 ) −1 , (6) Jobs is the observed Fisher information. Jobs (rt+1 , xt+1 ) = −∂ 2 log p(rt+1 |ε = xt θ)/∂ε2 xt+1 xt t+1 t+1 (7) Here we use the fact that for the GLM, the likelihood depends only on the dot product, ε = xt θ. t+1 We can use the Woodbury lemma to evaluate the inverse: Ct+1 = Ct I + D(rt+1 , ε)(1 − D(rt+1 , ε)xt Ct xt+1 )−1 xt+1 xt Ct t+1 t+1 (8) where D(rt+1 , ε) = ∂ 2 log p(rt+1 |ε)/∂ε2 . Using some basic matrix identities, log det[Ct+1 ] = log det[Ct ] − log(1 − D(rt+1 , ε)xt Ct xt+1 ) t+1 = log det[Ct ] + D(rt+1 , ε)xt Ct xt+1 t+1 + o(D(rt+1 , ε)xt Ct xt+1 ) t+1 (9) (10) Ignoring the higher order terms, we need to minimize Ert+1 |xt+1 D(rt+1 , ε)xt Ct xt+1 . In our case, t+1 with f (θt xt+1 ) = exp(θt xt+1 ), we can use the moment-generating function of the multivariate Trial info. max. i.i.d 2 400 −10−4 0 0.05 −10−1 −2 ai 800 2 0 −2 −7 −10 i i.i.d k info. max. 1 1 50 i 1 50 i 1 10 1 i (a) i 100 0 −0.05 10 1 1 (b) i 10 (c) Figure 2: A comparison of parameter estimates using information-maximizing versus random stimuli for a model neuron whose conditional intensity depends on both the stimulus and the spike history. The images in the top row of A and B show the MAP estimate of θ after each trial as a row in the image. Intensity indicates the value of the coefﬁcients. The true value of θ is shown in the second row of images. A) The estimated stimulus coefﬁcients, k. B) The estimated spike history coefﬁcients, a. C) The ﬁnal estimates of the parameters after 800 trials: dashed black line shows true values, dark gray is estimate using information maximizing stimuli, and light gray is estimate using random stimuli. Using our algorithm improved the estimates of k and a. Gaussian p(θ|xt , rt ) to evaluate this expectation. After some algebra, we ﬁnd that to maximize I(θ; rt+1 |xt+1 , xt , rt ), we need to maximize 1 F (xt+1 ) = exp(xT µt ) exp( xT Ct xt+1 )xT Ct xt+1 . t+1 t+1 2 t+1 (11) Computing the optimal stimulus. For the GLM the most informative stimulus is undeﬁned, since increasing the stimulus power ||xt+1 ||2 increases the informativeness of any putatively “optimal” stimulus. To obtain a well-posed problem, we optimize the stimulus under the usual power constraint ||xt+1 ||2 ≤ e < ∞. We maximize Eqn. 11 under this constraint using Lagrange multipliers and an eigendecomposition to reduce our original d-dimensional optimization problem to a onedimensional problem. Expressing Eqn. 11 in terms of the eigenvectors of Ct yields: 1 2 2 F (xt+1 ) = exp( u i yi + ci yi ) ci yi (12) 2 i i i = g( 2 ci yi ) ui yi )h( i (13) i where ui and yi represent the projection of µt and xt+1 onto the ith eigenvector and ci is the corresponding eigenvalue. To simplify notation we also introduce the functions g() and h() which are monotonically strictly increasing functions implicitly deﬁned by Eqn. 12. We maximize F (xt+1 ) by breaking the problem into an inner and outer problem by ﬁxing the value of i ui yi and maximizing h() subject to that constraint. A single line search over all possible values of i ui yi will then ﬁnd the global maximum of F (.). This approach is summarized by the equation: max F (y) = max g(b) · y:||y||2 =e b max y:||y||2 =e,y t u=b 2 ci yi ) h( i Since h() is increasing, to solve the inner problem we only need to solve: 2 ci yi max y:||y||2 =e,y t u=b (14) i This last expression is a quadratic function with quadratic and linear constraints and we can solve it using the Lagrange method for constrained optimization. The result is an explicit system of 1 true θ random info. max. info. max. no diffusion 1 0.8 0.6 trial 0.4 0.2 400 0 −0.2 −0.4 800 1 100 θi 1 θi 100 1 θi 100 1 θ i 100 −0.6 random info. max. θ true θ i 1 0 −1 Entropy θ i 1 0 −1 random info. max. 250 200 i 1 θ Trial 400 Trial 200 Trial 0 (a) 0 −1 20 40 (b) i 60 80 100 150 0 200 400 600 Iteration 800 (c) Figure 3: Estimating the receptive ﬁeld when θ is not constant. A) The posterior means µt and true θt plotted after each trial. θ was 100 dimensional, with its components following a Gabor function. To simulate nonsystematic changes in the response function, the center of the Gabor function was moved according to a random walk in between trials. We modeled the changes in θ as a random walk with a white covariance matrix, Q, with variance .01. In addition to the results for random and information-maximizing stimuli, we also show the µt given stimuli chosen to maximize the information under the (mistaken) assumption that θ was constant. Each row of the images plots θ using intensity to indicate the value of the different components. B) Details of the posterior means µt on selected trials. C) Plots of the posterior entropies as a function of trial number; once again, we see that information-maximizing stimuli constrain the posterior of θt more effectively. equations for the optimal yi as a function of the Lagrange multiplier λ1 . ui e yi (λ1 ) = ||y||2 2(ci − λ1 ) (15) Thus to ﬁnd the global optimum we simply vary λ1 (this is equivalent to performing a search over b), and compute the corresponding y(λ1 ). For each value of λ1 we compute F (y(λ1 )) and choose the stimulus y(λ1 ) which maximizes F (). It is possible to show (details omitted) that the maximum of F () must occur on the interval λ1 ≥ c0 , where c0 is the largest eigenvalue. This restriction on the optimal λ1 makes the implementation of the linesearch signiﬁcantly faster and more stable. To summarize, updating the posterior and ﬁnding the optimal stimulus requires three steps: 1) a rankone matrix update and one-dimensional search to compute µt and Ct ; 2) an eigendecomposition of Ct ; 3) a one-dimensional search over λ1 ≥ c0 to compute the optimal stimulus. The most expensive step here is the eigendecomposition of Ct ; in principle this step is O(d3 ), while the other steps, as discussed above, are O(d2 ). Here our Gaussian approximation of p(θt−1 |xt−1 , rt−1 ) is once again quite useful: recall that in this setting Ct is just a rank-one modiﬁcation of Ct−1 , and there exist efﬁcient algorithms for rank-one eigendecomposition updates [15]. While the worst-case running time of this rank-one modiﬁcation of the eigendecomposition is still O(d3 ), we found the average running time in our case to be O(d2 ) (Fig. 1(c)), due to deﬂation which reduces the cost of matrix multiplications associated with ﬁnding the eigenvectors of repeated eigenvalues. Therefore the total time complexity of our algorithm is empirically O(d2 ) on average. Spike history terms. The preceding derivation ignored the spike-history components of the GLM model; that is, we ﬁxed a = 0 in equation (1). Incorporating spike history terms only affects the optimization step of our algorithm; updating the posterior of θ = {k; a} proceeds exactly as before. The derivation of the optimization strategy proceeds in a similar fashion and leads to an analogous optimization strategy, albeit with a few slight differences in detail which we omit due to space constraints. The main difference is that instead of maximizing the quadratic expression in Eqn. 14 to ﬁnd the maximum of h(), we need to maximize a quadratic expression which includes a linear term due to the correlation between the stimulus coefﬁcients, k, and the spike history coefﬁcients,a. The results of our simulations with spike history terms are shown in Fig. 2. Dynamic θ. In addition to fast changes due to adaptation and spike-history effects, animal preparations often change slowly and nonsystematically over the course of an experiment [16]. We model these effects by letting θ experience diffusion: θt+1 = θt + wt (16) Here wt is a normally distributed random variable with mean zero and known covariance matrix Q. This means that p(θt+1 |xt , rt ) is Gaussian with mean µt and covariance Ct + Q. To update the posterior and choose the optimal stimulus, we use the same procedure as described above1 . Results Our ﬁrst simulation considered the use of our algorithm for learning the receptive ﬁeld of a visually sensitive neuron. We took the neuron’s receptive ﬁeld to be a Gabor function, as a proxy model of a V1 simple cell. We generated synthetic responses by sampling Eqn. 1 with θ set to a 25x33 Gabor function. We used this synthetic data to compare how well θ could be estimated using information maximizing stimuli compared to using random stimuli. The stimuli were 2-d images which were rasterized in order to express x as a vector. The plots of the posterior means µt in Fig. 1 (recall these are equivalent to the MAP estimate of θ) show that the information maximizing strategy converges an order of magnitude more rapidly to the true θ. These results are supported by the conclusion of [7] that the information maximization strategy is asymptotically never worse than using random stimuli and is in general more efﬁcient. The running time for each step of the algorithm as a function of the dimensionality of θ is plotted in Fig. 1(c). These results were obtained on a machine with a dual core Intel 2.80GHz XEON processor running Matlab. The solid lines indicate ﬁtted polynomials of degree 1 for the 1d line search and degree 2 for the remaining curves; the total running time for each trial scaled as O(d2 ), as predicted. When θ was less than 200 dimensions, the total running time was roughly 50 ms (and for dim(θ) ≈ 100, the runtime was close to 15 ms), well within the range of tolerable latencies for many experiments. In Fig. 2 we apply our algorithm to characterize the receptive ﬁeld of a neuron whose response depends on its past spiking. Here, the stimulus coefﬁcients k were chosen to follow a sine-wave; 1 The one difference is that the covariance matrix of p(θt+1 |xt+1 , rt+1 ) is in general no longer just a rankone modiﬁcation of the covariance matrix of p(θt |xt , rt ); thus, we cannot use the rank-one update to compute the eigendecomposition. However, it is often reasonable to take Q to be white, Q = cI; in this case the eigenvectors of Ct + Q are those of Ct and the eigenvalues are ci + c where ci is the ith eigenvalue of Ct ; thus in this case, our methods may be applied without modiﬁcation. the spike history coefﬁcients a were inhibitory and followed an exponential function. When choosing stimuli we updated the posterior for the full θ = {k; a} simultaneously and maximized the information about both the stimulus coefﬁcients and the spike history coefﬁcients. The information maximizing strategy outperformed random sampling for estimating both the spike history and stimulus coefﬁcients. Our ﬁnal set of results, Fig. 3, considers a neuron whose receptive ﬁeld drifts non-systematically with time. We take the receptive ﬁeld to be a Gabor function whose center moves according to a random walk (we have in mind a slow random drift of eye position during a visual experiment). The results demonstrate the feasibility of the information-maximization strategy in the presence of nonstationary response properties θ, and emphasize the superiority of adaptive methods in this context. Conclusion We have developed an efﬁcient implementation of an algorithm for online optimization of neurophysiology experiments based on information-theoretic criterion. Reasonable approximations based on a GLM framework allow the algorithm to run in near-real time even for high dimensional parameter and stimulus spaces, and in the presence of spike-rate adaptation and time-varying neural response properties. Despite these approximations the algorithm consistently provides signiﬁcant improvements over random sampling; indeed, the differences in efﬁciency are large enough that the information-optimization strategy may permit robust system identiﬁcation in cases where it is simply not otherwise feasible to estimate the neuron’s parameters using random stimuli. Thus, in a sense, the proposed stimulus-optimization technique signiﬁcantly extends the reach and power of classical neurophysiology methods. Acknowledgments JL is supported by the Computational Science Graduate Fellowship Program administered by the DOE under contract DE-FG02-97ER25308 and by the NSF IGERT Program in Hybrid Neural Microsystems at Georgia Tech via grant number DGE-0333411. LP is supported by grant EY018003 from the NEI and by a Gatsby Foundation Pilot Grant. We thank P. Latham for helpful conversations. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] I. Nelken, et al., Hearing Research 72, 237 (1994). P. Foldiak, Neurocomputing 38–40, 1217 (2001). K. Zhang, et al., Proceedings (Computational and Systems Neuroscience Meeting, 2004). R. C. deCharms, et al., Science 280, 1439 (1998). C. Machens, et al., Neuron 47, 447 (2005). A. Watson, et al., Perception and Psychophysics 33, 113 (1983). L. Paninski, Neural Computation 17, 1480 (2005). P. McCullagh, et al., Generalized linear models (Chapman and Hall, London, 1989). L. Paninski, Network: Computation in Neural Systems 15, 243 (2004). E. Simoncelli, et al., The Cognitive Neurosciences, M. Gazzaniga, ed. (MIT Press, 2004), third edn. P. Dayan, et al., Theoretical Neuroscience (MIT Press, 2001). E. Chichilnisky, Network: Computation in Neural Systems 12, 199 (2001). F. Theunissen, et al., Network: Computation in Neural Systems 12, 289 (2001). L. Paninski, et al., Journal of Neuroscience 24, 8551 (2004). M. Gu, et al., SIAM Journal on Matrix Analysis and Applications 15, 1266 (1994). N. A. Lesica, et al., IEEE Trans. On Neural Systems And Rehabilitation Engineering 13, 194 (2005).

5 0.036443207 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

Author: Gloria Haro, Gregory Randall, Guillermo Sapiro

Abstract: The study of point cloud data sampled from a stratiﬁcation, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artiﬁcial and real examples demonstrating the importance of extending manifold learning to stratiﬁcation learning. 1

6 0.032363299 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype

7 0.032077882 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields

8 0.031446122 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

9 0.031104494 175 nips-2006-Simplifying Mixture Models through Function Approximation

10 0.030710449 108 nips-2006-Large Scale Hidden Semi-Markov SVMs

11 0.030421322 192 nips-2006-Theory and Dynamics of Perceptual Bistability

12 0.030191205 198 nips-2006-Unified Inference for Variational Bayesian Linear Gaussian State-Space Models

13 0.029919313 134 nips-2006-Modeling Human Motion Using Binary Latent Variables

14 0.02968966 124 nips-2006-Linearly-solvable Markov decision problems

15 0.029300231 69 nips-2006-Distributed Inference in Dynamical Systems

16 0.028538339 98 nips-2006-Inferring Network Structure from Co-Occurrences

17 0.027837355 40 nips-2006-Bayesian Detection of Infrequent Differences in Sets of Time Series with Shared Structure

18 0.027433202 154 nips-2006-Optimal Change-Detection and Spiking Neurons

19 0.026132576 161 nips-2006-Particle Filtering for Nonparametric Bayesian Matrix Factorization

20 0.025766833 115 nips-2006-Learning annotated hierarchies from relational data

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.084), (1, -0.008), (2, 0.0), (3, -0.043), (4, 0.021), (5, -0.014), (6, 0.06), (7, -0.019), (8, -0.023), (9, 0.0), (10, -0.013), (11, 0.026), (12, 0.006), (13, 0.023), (14, -0.004), (15, -0.022), (16, 0.029), (17, 0.023), (18, -0.049), (19, -0.015), (20, -0.043), (21, -0.002), (22, 0.005), (23, -0.014), (24, -0.013), (25, -0.044), (26, 0.059), (27, -0.034), (28, -0.011), (29, 0.064), (30, -0.06), (31, -0.023), (32, -0.003), (33, -0.033), (34, 0.031), (35, -0.048), (36, 0.032), (37, 0.068), (38, 0.056), (39, 0.035), (40, -0.005), (41, 0.036), (42, -0.02), (43, -0.07), (44, 0.007), (45, -0.045), (46, 0.08), (47, -0.019), (48, 0.123), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90173423 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

Author: Kyung-ah Sohn, Eric P. Xing

2 0.55205125 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects

Author: Seyoung Kim, Padhraic Smyth

3 0.49808532 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Author: Alexander T. Ihler, Padhraic Smyth

Abstract: Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what “factors” are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-speciﬁc effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real–world data sets of count data involving both vehicles and people are used to illustrate the technique. 1

4 0.49328816 158 nips-2006-PG-means: learning the number of clusters in data

Author: Yu Feng, Greg Hamerly

Abstract: We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efﬁcient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difﬁcult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods. 1

5 0.47245055 131 nips-2006-Mixture Regression for Covariate Shift

Author: Masashi Sugiyama, Amos J. Storkey

Abstract: In supervised learning there is a typical presumption that the training and test points are taken from the same distribution. In practice this assumption is commonly violated. The situations where the training and test data are from different distributions is called covariate shift. Recent work has examined techniques for dealing with covariate shift in terms of minimisation of generalisation error. As yet the literature lacks a Bayesian generative perspective on this problem. This paper tackles this issue for regression models. Recent work on covariate shift can be understood in terms of mixture regression. Using this view, we obtain a general approach to regression under covariate shift, which reproduces previous work as a special case. The main advantages of this new formulation over previous models for covariate shift are that we no longer need to presume the test and training densities are known, the regression and density estimation are combined into a single procedure, and previous methods are reproduced as special cases of this procedure, shedding light on the implicit assumptions the methods are making. 1

6 0.47179148 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures

7 0.45065171 192 nips-2006-Theory and Dynamics of Perceptual Bistability

8 0.44656286 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

9 0.42539191 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

10 0.41840017 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype

11 0.40908244 71 nips-2006-Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning

12 0.38815355 41 nips-2006-Bayesian Ensemble Learning

13 0.38189238 29 nips-2006-An Information Theoretic Framework for Eukaryotic Gradient Sensing

14 0.37607464 23 nips-2006-Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models

15 0.37350935 113 nips-2006-Learning Structural Equation Models for fMRI

16 0.37167937 40 nips-2006-Bayesian Detection of Infrequent Differences in Sets of Time Series with Shared Structure

17 0.36346117 5 nips-2006-A Kernel Method for the Two-Sample-Problem

18 0.35585719 1 nips-2006-A Bayesian Approach to Diffusion Models of Decision-Making and Response Time

19 0.35314423 98 nips-2006-Inferring Network Structure from Co-Occurrences

20 0.34414643 124 nips-2006-Linearly-solvable Markov decision problems

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.061), (3, 0.023), (7, 0.036), (9, 0.025), (20, 0.034), (22, 0.039), (28, 0.434), (44, 0.052), (54, 0.023), (57, 0.063), (65, 0.045), (69, 0.023), (71, 0.02), (90, 0.012), (93, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78919905 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

Author: Kyung-ah Sohn, Eric P. Xing

2 0.62023062 86 nips-2006-Graph-Based Visual Saliency

Author: Jonathan Harel, Christof Koch, Pietro Perona

Abstract: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%. 1

3 0.38124466 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency

Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf

Abstract: This paper addresses the bottom-up inﬂuence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear ﬁlters, e.g., Gabor or Difference-of-Gaussians ﬁlters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end ﬁlters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justiﬁed. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the ﬁeld that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble ﬁndings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive ﬁelds in the early human visual system. 1

4 0.28317064 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

Author: Yevgeny Seldin, Noam Slonim, Naftali Tishby

Abstract: We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X, Y , but rather as a sample of values of an unknown (stochastic) function Z(X, Y ). For example, in gene expression data, the expression level Z is a function of gene X and condition Y ; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative ﬁltering) it improves the currently best known results. 1

5 0.28138196 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf

Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an inﬁnite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantiﬁed. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1

6 0.28084174 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

7 0.27988496 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

8 0.27823687 161 nips-2006-Particle Filtering for Nonparametric Bayesian Matrix Factorization

9 0.27752566 175 nips-2006-Simplifying Mixture Models through Function Approximation

10 0.27750468 34 nips-2006-Approximate Correspondences in High Dimensions

11 0.27744052 79 nips-2006-Fast Iterative Kernel PCA

12 0.27726948 65 nips-2006-Denoising and Dimension Reduction in Feature Space

13 0.27543071 115 nips-2006-Learning annotated hierarchies from relational data

14 0.27527133 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

15 0.2742686 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

16 0.27394256 154 nips-2006-Optimal Change-Detection and Spiking Neurons

17 0.27368188 42 nips-2006-Bayesian Image Super-resolution, Continued

18 0.27353376 41 nips-2006-Bayesian Ensemble Learning

19 0.27289584 98 nips-2006-Inferring Network Structure from Co-Occurrences

20 0.27253914 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis