nips nips2006 nips2006-91 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Seyoung Kim, Padhraic Smyth
Abstract: Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI). 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper we extend hierarchical Dirichlet processes to model such data. [sent-6, score-0.289]
2 Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. [sent-7, score-0.816]
3 Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. [sent-8, score-0.503]
4 In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. [sent-9, score-0.515]
5 This group-level variability in the component parameters is handled using a random effects model. [sent-10, score-0.258]
6 We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI). [sent-11, score-0.674]
7 Ĺš‚exible framework for probabilistic modeling when data are observed in a grouped fashion and each group can be thought of as being generated from a mixture model. [sent-14, score-0.3]
8 In the hierarchical DPs all of, or a subset of, the mixture components are shared by different groups and the number of such components are inferred from the data using a DP prior. [sent-15, score-0.685]
9 Variability across groups is modeled by allowing different mixing proportions for different groups. [sent-16, score-0.284]
10 In this paper we focus on the problem of modeling systematic variation in the shared mixture component parameters and not just in the mixing proportions. [sent-17, score-0.5]
11 We will use the problem of modeling spatial fMRI activation across multiple brain images as a motivating application, where the images are obtained from one or more subjects performing the same cognitive tasks. [sent-18, score-0.663]
12 We assume that there is an unknown true template for mixture component parameters, and that the mixture components for each group are noisy realizations of the template components. [sent-20, score-0.847]
13 For our application, groups and data points correspond to images and pixels. [sent-21, score-0.156]
14 , a set of images) we are interested in learning both the overall template model and the random variation relative to the template for each group. [sent-24, score-0.353]
15 For the fMRI application, we model the images as mixtures of activation patterns, assigning a mixture component to each spatial activation cluster in an image. [sent-25, score-1.032]
16 As shown in Figure 1 our goal is to extract activation patterns that are common across multiple images, while allowing for variation in fMRI signal intensity and activation location in individual images. [sent-26, score-0.711]
17 In our proposed approach, the amount of variation (called random effects) from the overall true component parameters is modeled as coming from a prior distribution on group-level component parameters (Gelman et al. [sent-27, score-0.33]
18 By combining hierarchical DPs with a random effects model we let both mixing proportions and mixture component parameters adapt to the data in each group. [sent-29, score-0.791]
19 Although we focus on image data in this paper, the proposed Template mixture model C c s Ă‚ Group-level mixture model fMRI brain activation Figure 1: Illustration of group level variations from the template model. [sent-30, score-0.952]
20 Model Hierarchical DPs Transformed DPs Hierarchical DPs with random effects Group-level mixture components θ a Ä‚— ma , θ b Ä‚— mb θa + ∆a1 , . [sent-31, score-0.376]
21 approach is applicable to more general problems of modeling group-level random variation with mixture models. [sent-38, score-0.272]
22 , 2005) both address a similar problem of modeling groups of data using mixture models with mixture components shared across groups. [sent-40, score-0.594]
23 This is not suitable for modeling the type of group variation illustrated in Figure 1 because there is no direct way to enforce ∆a1 = . [sent-50, score-0.204]
24 In this general context the model we propose here can be viewed as being closely related to both hierarchical DPs and transformed DPs, but having application to quite different types of problems in practice, e. [sent-57, score-0.303]
25 , as an intermediate between the highly constrained variation allowed by the hierarchical DP and the relatively unconstrained variation present in the computer vision scenes to which the transformed DP has been applied (Sudderth et al, 2005). [sent-59, score-0.399]
26 From an applications viewpoint the use of DPs for modeling multiple fMRI brain images is novel and shows considerable promise as a new tool for analyzing such data. [sent-60, score-0.191]
27 One exception is the approach of Penny and Friston (2003) who proposed a probabilistic mixture model for spatial activation modeling and demonstrated its advantages over voxel-wise analysis. [sent-62, score-0.521]
28 , N are observed data and zi is a component label for yi . [sent-70, score-0.157]
29 The probability that zi is assigned to a new component is proportional to ĂŽÄ…0 . [sent-75, score-0.163]
30 2 Hierarchical Dirichlet processes When multiple groups of data are present and each group can be modeled as a mixture it is often useful to let different groups share mixture components. [sent-78, score-0.587]
31 , 2006) components are shared by different groups with varying mixing proportions for each group, and the number of components in the model can be inferred from data. [sent-80, score-0.473]
32 , J), ĂŽË› the global mixing proportions, ÄŽ€ j the mixing proportions for group j, and ĂŽÄ…0 , ĂŽĹ‚, H are the hyperparameters for the DP. [sent-87, score-0.37]
33 Mixture components described by the ÂĞk ’s can be shared across the J groups. [sent-89, score-0.187]
34 The hierarchical DP has clustering properties similar to that for DP mixtures, i. [sent-90, score-0.243]
35 Notice that more than one local cluster in group j can be linked to the same global cluster. [sent-95, score-0.167]
36 3 Hierarchical Dirichlet processes with random effects We now propose an extension of the standard hierarchical DP to a version that includes random effects. [sent-97, score-0.392]
37 Ĺš c problem of modeling activation patterns in fMRI brain images. [sent-100, score-0.399]
38 (4) k=1 Each group j has its own component mean ujk for the kth component and these group-level param2 eters come from a common prior distribution N (Ă‚Äžk , ÄŽ„k ). [sent-102, score-0.723]
39 Thus, Ă‚Äžk can be viewed as a template, and 2 ujk as a noisy observation of the template for group j with variance ÄŽ„k . [sent-103, score-0.664]
40 The random effects parameters ujk are generated once per group and shared by local clusters in group j that are assigned to the same global cluster k. [sent-104, score-0.994]
41 R 0 (5a) (5b) (5c) In Equation (5a) the summation is over components in A = {k| some hjiâ€Ë› for iâ€Ë› = i is assigned to k}, representing global clusters that already have some local clusters in group j assigned to them. [sent-109, score-0.379]
42 In this case, since ujk is already known, we can simply compute the likelihood p(yji |ujk ). [sent-110, score-0.431]
43 In Equation (5b) the summation is over B = {k| no hjiâ€Ë› for iâ€Ë› = i is assigned to k} representing global clusters that have not yet been assigned in group j. [sent-111, score-0.258]
44 For conjugate priors we can integrate over 2 the unknown random effects parameter ujk to compute the likelihood using N (yji |Ă‚Äžk , ÄŽ„k + ÄŽƒ 2 ) 2 and sample ujk from the posterior distribution p(ujk |Ă‚Äžk , ÄŽ„k , yji ). [sent-112, score-1.234]
45 The integral cannot be evaluated analytically, so we 2 approximate the integral by sampling new values for Ă‚Äžk , ÄŽ„k , and ujk from prior distributions and evaluating p(yji |ujk ) given these new values for the parameters (Neal, 1998). [sent-114, score-0.566]
46 As in the sampling of hji , if k is new in group j we can evaluate the integral analytically and sample ujk from the posterior distribution. [sent-137, score-0.764]
47 If k is a new component we approximate the integral by sampling 2 new values for Ă‚Äžk , ÄŽ„k , and ujk from the prior and evaluating the likelihood. [sent-138, score-0.6]
48 Given h and l we can update the component parameters Ă‚Äž, ÄŽ„ and u using standard Gibbs sampling for a normal hierarchical model (Gelman et al. [sent-139, score-0.407]
49 To address this problem and restore the correct correspondence between template components and group-level components we propose a move that swaps the labels for two group-level components at the end of each sampling iteration and accepts the move based on a Metropolis-Hastings acceptance rule. [sent-142, score-0.393]
50 To illustrate the proposed model we simulated data from a mixture of one-dimensional Gaussian densities with known parameters and tested if the sampling algorithm can recover the parameters from the data. [sent-143, score-0.263]
51 From a template mixture model with three mixture components we generated 10 group-level mixture models by adding random effects in the form of mean-shifts to the template means, sampled from N (0, 1). [sent-144, score-0.922]
52 Using varying mixing proportions for each group we generated 200 samples from each of the 10 mixture models. [sent-145, score-0.428]
53 We can see that the sampling algorithm was able to learn the original model successfully despite the variability in both component means and mixing proportions of the mixture model. [sent-148, score-0.515]
54 4 A model for fMRI activation surfaces We now apply the general framework of the hierarchical DP with random effects to the problem of detecting and characterizing spatial activation patterns in fMRI brain images. [sent-149, score-1.043]
55 Our goal is to infer the unknown true activation from multiple such activation images. [sent-151, score-0.556]
56 We model each activation image using a mixture of experts model, with a component expert assigned to each local activation cluster (Rasmussen and Ghahramani, 2002). [sent-152, score-1.007]
57 By introducing a hierarchical DP into this model we allow activation clusters to be shared across images, inferring the number of such clusters from the data. [sent-153, score-0.746]
58 In addition, the random effects component can be incorporated to allow activation centers to be slightly shifted in terms of pixel locations or in terms of peak intensity. [sent-154, score-0.462]
59 Ĺš‚y discuss the mixture of experts model below (Kim et al. [sent-159, score-0.219]
60 , N are conditionally independent of each other given the voxel position xi = (xi1 , xi2 ) and the model parameters, we model the activation yi at voxel xi as a mixture of experts: p(yi |xi , θ) = p(yi |c, xi )P (c|xi ), (7) c∈C where C = {cbg , cm , m = 1, . [sent-164, score-0.746]
61 , M − 1} is a set of M expert component labels for background cbg and M − 1 activation components cm ’s. [sent-167, score-0.561]
62 We model the expert for an activation component as a (a) (b) (c) (d) Figure 4: Results from eight runs for subject 2 at Stanford. [sent-171, score-0.56]
63 (a) Raw images for a cross section of right precentral gyrus and surrounding area. [sent-172, score-0.176]
64 Activation components estimated from the images using (b) DP mixtures, (c) hierarchical DPs, and (d) hierarchical DP with random effects. [sent-173, score-0.679]
65 Gaussian-shaped surface centered at bm with width ĂŽĹ m and height hm as follows. [sent-174, score-0.298]
66 −1 yi = hm exp −(xi − bm )â€Ë› (ĂŽĹ m ) (xi − bm ) + ĂŽÄž, (8) 2 where ĂŽÄž is an additive noise term distributed as N (0, ÄŽƒact ). [sent-175, score-0.427]
67 The background component is modeled 2 as yi = Ă‚Äž + ĂŽÄž, having a constant activation level Ă‚Äž with additive noise distributed as N (0, ÄŽƒbg ). [sent-176, score-0.415]
68 The second term in Equation (7) is known as a gate function in the mixture of experts framework— it decides which expert should be used to make a prediction for the activation level at position xi . [sent-177, score-0.592]
69 For activation components, p(xi |cm ) is a normal density with mean bm and covariance ĂŽĹ m . [sent-181, score-0.417]
70 bm and ĂŽĹ m are shared with the Gaussian surface model for experts in Equation (8). [sent-182, score-0.297]
71 This implies that the probability of activating the mth expert is highest at the center of the activation and gradually decays as xi moves away from the center. [sent-183, score-0.372]
72 We place a hierarchical DP prior on ÄŽ€c , and let the location parameters bm and the height parameters hm vary in individual images according to a Normal prior distribution with a variance Ψbm and 2 2 ÄŽˆhm using a random effects model. [sent-186, score-0.818]
73 Since the surface model for the activation component is a highly non-linear model, without conjugate prior distributions it is not possible to evaluate the integrals in Equations (5b)-(5c) and (6) analytically in the sampling algorithm. [sent-189, score-0.469]
74 We rely on an approximation of the integrals by sampling new values for bm and hm from their priors and new values for image-speciÄ? [sent-190, score-0.294]
75 In this experiment we analyze a 2D cross-section of the right precentral gyrus brain region, a region that is known to be activated by this sensorimotor task. [sent-195, score-0.162]
76 Ĺš t our model to each set of eight ĂŽË›-maps for each of the subjects at each scanner, and compare the results from the models obtained from the hierarchical DP without random effects. [sent-197, score-0.384]
77 (a) DP mixture, (b) hierarchical DP, and (c) hierarchical DP with random effects. [sent-205, score-0.507]
78 2 Table 2: Predictive logP scores of test images averaged over eight cross-validation runs. [sent-232, score-0.159]
79 From the eight images one can see three primary activation bumps, subsets of which appear in different images with variability in location and intensity. [sent-235, score-0.586]
80 Figures 4 (b)-(d) each show a sample from the model learned on the data in Figure 4(a), where Figure 4(b) is for DP mixtures, Figure 4(c) for hierarchical DPs, and Figure 4(d) for hierarchical DPs with random effects. [sent-236, score-0.531]
81 The sampled activation components are overlaid as ellipses using one standard deviation of the width parameters ĂŽĹ m . [sent-237, score-0.41]
82 The thickness of ellipses indicates the estimated height hm of the bump. [sent-238, score-0.173]
83 In Figures 4(b) and (c) ellipses for activation components shared across images are drawn with the same color. [sent-239, score-0.596]
84 Ĺš‚exible enough to account for bumps that are shared across images but that have variability in their parameters. [sent-243, score-0.331]
85 Ĺš xed set of component parameters shared across images, the hierarchical DPs are too constrained and are unable to detect the more subtle features of individual images. [sent-245, score-0.453]
86 Ĺš nds the three main bumps and a few more bumps with lower intensity for the background. [sent-247, score-0.174]
87 Thus, in terms of generalization, the model with random effects provides a good trade-off between the relatively unconstrained DP mixtures and overly-constrained hierarchical DPs. [sent-248, score-0.431]
88 We also perform a leave-one-image-out cross-validation to compare the predictive performance of hierarchical DPs and our proposed model. [sent-250, score-0.263]
89 For Subject 1 at Duke, the hierarchical DP gives a slightly better result but the difference in scores is not signiÄ? [sent-256, score-0.243]
90 Figure 6 shows the difference in the way the hierarchical DP and our proposed model Ä? [sent-258, score-0.267]
91 The hierarchical DP in Figure 6(b) models the common bump with varying intensity in the middle of each image as a mixture of two components—one for the bump in the Ä? [sent-260, score-0.522]
92 Ĺš rst two images with relatively high intensity and another for the same bump in the rest of the images with lower intensity. [sent-261, score-0.259]
93 Our proposed model recovers the correspondence in the bumps with different intensity across images as shown in Figure 6(c). [sent-262, score-0.26]
94 (a) Raw images for a cross section of right precentral gyrus and surrounding area. [sent-264, score-0.176]
95 Activation components estimated from the images are shown in (b) for hierarchical DPs, and in (c) for hierarchical DP with random effects. [sent-265, score-0.679]
96 6 Conclusions In this paper we proposed a hierarchical DP model with random effects that allows each group (or image) to have group-level mixture component parameters as well as group-level mixing proportions. [sent-266, score-0.813]
97 Using fMRI brain activation images we demonstrated that our model can capture components shared across multiple groups with individual-level variation. [sent-267, score-0.705]
98 Ĺš‚exibility in the model compared to DP mixtures and hierarchical DPs. [sent-269, score-0.325]
99 A nonparametric Bayesian approach to detecting spatial activation patterns in fMRI data. [sent-291, score-0.332]
100 (1998) Markov chain sampling methods for Dirichlet process mixture models. [sent-297, score-0.197]
wordName wordTfidf (topN-words)
[('ujk', 0.431), ('dps', 0.361), ('dp', 0.289), ('activation', 0.278), ('yji', 0.266), ('hierarchical', 0.243), ('fmri', 0.197), ('hji', 0.16), ('mixture', 0.156), ('zji', 0.144), ('bm', 0.139), ('template', 0.124), ('hm', 0.114), ('group', 0.109), ('images', 0.096), ('proportions', 0.087), ('effects', 0.085), ('dirichlet', 0.084), ('duke', 0.08), ('gelman', 0.08), ('ljt', 0.08), ('component', 0.078), ('jt', 0.076), ('components', 0.076), ('mixing', 0.076), ('shared', 0.074), ('bumps', 0.071), ('nh', 0.069), ('dujk', 0.064), ('eight', 0.063), ('subject', 0.061), ('variation', 0.06), ('groups', 0.06), ('brain', 0.06), ('mixtures', 0.058), ('expert', 0.056), ('variability', 0.053), ('bmb', 0.048), ('cbg', 0.048), ('clusters', 0.045), ('voxel', 0.045), ('stick', 0.045), ('zi', 0.044), ('mk', 0.042), ('stern', 0.042), ('ama', 0.042), ('gyrus', 0.042), ('sampling', 0.041), ('assigned', 0.041), ('experts', 0.039), ('rasmussen', 0.039), ('logp', 0.038), ('scanner', 0.038), ('mb', 0.038), ('precentral', 0.038), ('smyth', 0.038), ('xi', 0.038), ('across', 0.037), ('transformed', 0.036), ('cluster', 0.036), ('ji', 0.036), ('ellipses', 0.035), ('bump', 0.035), ('yi', 0.035), ('medical', 0.035), ('modeling', 0.035), ('irvine', 0.034), ('friston', 0.033), ('penny', 0.033), ('sudderth', 0.033), ('subjects', 0.033), ('teh', 0.033), ('intensity', 0.032), ('neal', 0.032), ('mcmc', 0.028), ('spatial', 0.028), ('imaging', 0.028), ('sethuraman', 0.028), ('kim', 0.027), ('prior', 0.027), ('patterns', 0.026), ('gate', 0.025), ('mu', 0.025), ('cm', 0.025), ('height', 0.024), ('modeled', 0.024), ('realizations', 0.024), ('model', 0.024), ('integral', 0.023), ('processes', 0.022), ('sensorimotor', 0.022), ('copies', 0.022), ('global', 0.022), ('surface', 0.021), ('stanford', 0.021), ('random', 0.021), ('plate', 0.021), ('image', 0.021), ('parameters', 0.021), ('predictive', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects
Author: Seyoung Kim, Padhraic Smyth
Abstract: Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI). 1
2 0.11758434 113 nips-2006-Learning Structural Equation Models for fMRI
Author: Enrico Simonotto, Heather Whalley, Stephen Lawrie, Lawrence Murray, David Mcgonigle, Amos J. Storkey
Abstract: Structural equation models can be seen as an extension of Gaussian belief networks to cyclic graphs, and we show they can be understood generatively as the model for the joint distribution of long term average equilibrium activity of Gaussian dynamic belief networks. Most use of structural equation models in fMRI involves postulating a particular structure and comparing learnt parameters across different groups. In this paper it is argued that there are situations where priors about structure are not firm or exhaustive, and given sufficient data, it is worth investigating learning network structure as part of the approach to connectivity analysis. First we demonstrate structure learning on a toy problem. We then show that for particular fMRI data the simple models usually assumed are not supported. We show that is is possible to learn sensible structural equation models that can provide modelling benefits, but that are not necessarily going to be the same as a true causal model, and suggest the combination of prior models and learning or the use of temporal information from dynamic models may provide more benefits than learning structural equations alone. 1
3 0.11749637 86 nips-2006-Graph-Based Visual Saliency
Author: Jonathan Harel, Christof Koch, Pietro Perona
Abstract: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%. 1
4 0.10350916 167 nips-2006-Recursive ICA
Author: Honghao Shan, Lingyun Zhang, Garrison W. Cottrell
Abstract: Independent Component Analysis (ICA) is a popular method for extracting independent features from visual data. However, as a fundamentally linear technique, there is always nonlinear residual redundancy that is not captured by ICA. Hence there have been many attempts to try to create a hierarchical version of ICA, but so far none of the approaches have a natural way to apply them more than once. Here we show that there is a relatively simple technique that transforms the absolute values of the outputs of a previous application of ICA into a normal distribution, to which ICA maybe applied again. This results in a recursive ICA algorithm that may be applied any number of times in order to extract higher order structure from previous layers. 1
5 0.095287986 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
Author: Alexis Battle, Gal Chechik, Daphne Koller
Abstract: We present a probabilistic model applied to the fMRI video rating prediction task of the Pittsburgh Brain Activity Interpretation Competition (PBAIC) [2]. Our goal is to predict a time series of subjective, semantic ratings of a movie given functional MRI data acquired during viewing by three subjects. Our method uses conditionally trained Gaussian Markov random fields, which model both the relationships between the subjects’ fMRI voxel measurements and the ratings, as well as the dependencies of the ratings across time steps and between subjects. We also employed non-traditional methods for feature selection and regularization that exploit the spatial structure of voxel activity in the brain. The model displayed good performance in predicting the scored ratings for the three subjects in test data sets, and a variant of this model was the third place entrant to the 2006 PBAIC. 1
6 0.090707809 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
7 0.078326061 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models
8 0.076356664 164 nips-2006-Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
9 0.076265283 131 nips-2006-Mixture Regression for Covariate Shift
10 0.065664783 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
11 0.065456547 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
12 0.064383402 175 nips-2006-Simplifying Mixture Models through Function Approximation
13 0.060423907 158 nips-2006-PG-means: learning the number of clusters in data
14 0.059311017 21 nips-2006-AdaBoost is Consistent
15 0.058861628 46 nips-2006-Blind source separation for over-determined delayed mixtures
16 0.056791037 66 nips-2006-Detecting Humans via Their Pose
17 0.056608569 12 nips-2006-A Probabilistic Algorithm Integrating Source Localization and Noise Suppression of MEG and EEG data
18 0.053412717 132 nips-2006-Modeling Dyadic Data with Binary Latent Factors
19 0.050741278 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
20 0.050531745 1 nips-2006-A Bayesian Approach to Diffusion Models of Decision-Making and Response Time
topicId topicWeight
[(0, -0.156), (1, 0.011), (2, 0.088), (3, -0.047), (4, -0.014), (5, -0.052), (6, 0.068), (7, -0.047), (8, -0.002), (9, 0.053), (10, 0.059), (11, 0.076), (12, 0.073), (13, 0.007), (14, -0.093), (15, -0.059), (16, 0.125), (17, -0.026), (18, -0.029), (19, -0.078), (20, -0.05), (21, -0.037), (22, -0.064), (23, 0.152), (24, 0.018), (25, -0.035), (26, 0.178), (27, -0.014), (28, -0.022), (29, 0.012), (30, -0.097), (31, 0.07), (32, -0.082), (33, -0.119), (34, 0.082), (35, -0.107), (36, 0.023), (37, 0.184), (38, -0.01), (39, 0.043), (40, -0.104), (41, 0.028), (42, -0.245), (43, -0.011), (44, -0.117), (45, 0.138), (46, 0.14), (47, 0.079), (48, 0.11), (49, -0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.95310634 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects
Author: Seyoung Kim, Padhraic Smyth
Abstract: Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI). 1
2 0.53778672 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
Author: Kenichi Kurihara, Max Welling, Nikos A. Vlassis
Abstract: Dirichlet Process (DP) mixture models are promising candidates for clustering applications where the number of clusters is unknown a priori. Due to computational considerations these models are unfortunately unsuitable for large scale data-mining applications. We propose a class of deterministic accelerated DP mixture models that can routinely handle millions of data-cases. The speedup is achieved by incorporating kd-trees into a variational Bayesian algorithm for DP mixtures in the stick-breaking representation, similar to that of Blei and Jordan (2005). Our algorithm differs in the use of kd-trees and in the way we handle truncation: we only assume that the variational distributions are fixed at their priors after a certain level. Experiments show that speedups relative to the standard variational algorithm can be significant. 1
3 0.52043992 113 nips-2006-Learning Structural Equation Models for fMRI
Author: Enrico Simonotto, Heather Whalley, Stephen Lawrie, Lawrence Murray, David Mcgonigle, Amos J. Storkey
Abstract: Structural equation models can be seen as an extension of Gaussian belief networks to cyclic graphs, and we show they can be understood generatively as the model for the joint distribution of long term average equilibrium activity of Gaussian dynamic belief networks. Most use of structural equation models in fMRI involves postulating a particular structure and comparing learnt parameters across different groups. In this paper it is argued that there are situations where priors about structure are not firm or exhaustive, and given sufficient data, it is worth investigating learning network structure as part of the approach to connectivity analysis. First we demonstrate structure learning on a toy problem. We then show that for particular fMRI data the simple models usually assumed are not supported. We show that is is possible to learn sensible structural equation models that can provide modelling benefits, but that are not necessarily going to be the same as a true causal model, and suggest the combination of prior models and learning or the use of temporal information from dynamic models may provide more benefits than learning structural equations alone. 1
4 0.51137507 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
Author: Kyung-ah Sohn, Eric P. Xing
Abstract: We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among possibly infinite number of founders and the coalescence-with-mutation events in the resulting genealogies. The HMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. We have developed an efficient sampling algorithm for HMDP based on a two-level nested P´ lya urn scheme. On both simulated o and real SNP haplotype data, our method performs competitively or significantly better than extant methods in uncovering the recombination hotspots along chromosomal loci; and in addition it also infers the ancestral genetic patterns and offers a highly accurate map of ancestral compositions of modern populations. 1
5 0.48435181 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models
Author: Alexander T. Ihler, Padhraic Smyth
Abstract: Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what “factors” are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real–world data sets of count data involving both vehicles and people are used to illustrate the technique. 1
6 0.46775645 131 nips-2006-Mixture Regression for Covariate Shift
7 0.44522876 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
8 0.38417587 175 nips-2006-Simplifying Mixture Models through Function Approximation
9 0.35309839 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
10 0.32604229 158 nips-2006-PG-means: learning the number of clusters in data
11 0.31391993 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
12 0.31065002 167 nips-2006-Recursive ICA
13 0.29697907 12 nips-2006-A Probabilistic Algorithm Integrating Source Localization and Noise Suppression of MEG and EEG data
14 0.28528851 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
15 0.28315178 1 nips-2006-A Bayesian Approach to Diffusion Models of Decision-Making and Response Time
16 0.27949232 86 nips-2006-Graph-Based Visual Saliency
17 0.27214471 192 nips-2006-Theory and Dynamics of Perceptual Bistability
18 0.27126303 41 nips-2006-Bayesian Ensemble Learning
19 0.25662646 194 nips-2006-Towards a general independent subspace analysis
20 0.25540787 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
topicId topicWeight
[(1, 0.118), (3, 0.02), (7, 0.052), (9, 0.036), (20, 0.042), (22, 0.061), (34, 0.031), (37, 0.306), (44, 0.043), (55, 0.013), (57, 0.081), (65, 0.031), (69, 0.04), (71, 0.015), (90, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.77984768 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects
Author: Seyoung Kim, Padhraic Smyth
Abstract: Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI). 1
2 0.68448603 163 nips-2006-Prediction on a Graph with a Perceptron
Author: Mark Herbster, Massimiliano Pontil
Abstract: We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1
3 0.50327945 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf
Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1
4 0.50326288 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
5 0.49319214 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
6 0.49273935 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
7 0.49259344 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
8 0.49241161 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
9 0.49132067 138 nips-2006-Multi-Task Feature Learning
10 0.4907594 65 nips-2006-Denoising and Dimension Reduction in Feature Space
11 0.48926243 130 nips-2006-Max-margin classification of incomplete data
12 0.48903465 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
13 0.48795721 179 nips-2006-Sparse Representation for Signal Classification
14 0.48775941 167 nips-2006-Recursive ICA
15 0.4875963 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
16 0.48483384 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections
17 0.48478457 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
18 0.48474327 158 nips-2006-PG-means: learning the number of clusters in data
19 0.48412845 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
20 0.48379731 175 nips-2006-Simplifying Mixture Models through Function Approximation