nips nips2012 nips2012-28 knowledge-graph by maker-knowledge-mining

28 nips-2012-A systematic approach to extracting semantic information from functional MRI data


Source: pdf

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. [sent-4, score-0.131]

2 The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. [sent-5, score-0.763]

3 1 Introduction Functional Magnetic Resonance Imaging (fMRI) is a technique used in psychological experiments to measure the blood oxygenation level throughout the brain, which is a proxy for neural activity; this measurement is called brain activation. [sent-7, score-0.217]

4 In a typical experiment, brain activation is measured during a task of interest, e. [sent-9, score-0.382]

5 reading nonsense words, with the goal of identifying brain locations where the two differ. [sent-13, score-0.272]

6 The most common analysis technique for doing this – statistical parametric mapping [4] – tests each voxel individually by regressing its time series on a predicted time series determined by the task contrast of interest. [sent-14, score-0.334]

7 This fit is scored and thresholded at a given statistical significance level to yield a brain image with clusters of voxels that respond very differently to the two tasks (colloquially, these are the images that show parts of the brain that “light up”). [sent-15, score-0.993]

8 The output of this process for a given experiment is a set of 3D coordinates of all the voxel clusters that appear reliably across all the subjects in a study. [sent-17, score-0.486]

9 This result is easy to interpret, since there is a lot of information about what processes each brain area may be involved in. [sent-18, score-0.217]

10 In recent years, there has been increasing awareness of the fact that there is information in the entire pattern of brain activation and not just in saliently active locations. [sent-20, score-0.478]

11 Classifiers have been the tool 1 of choice for capturing this information and used to make predictions ranging from what stimulus a subject is seeing, what kind of object they are thinking about or what decision they will make [12] [14] [8]. [sent-21, score-0.103]

12 The most common situation is to have an example correspond to the average brain image during one or a few performances of the task of interest, and voxels as the features, and we will discuss various issues with this scenario in mind. [sent-22, score-0.68]

13 If only two conditions are being contrasted this is relatively straightforward as information is, at its simplest, a difference in activation of a voxel in the two conditions. [sent-24, score-0.519]

14 Often, this means the best accuracy is obtained using few voxels, from all across the brain, and that different voxels will be chosen in different cross-validation folds; this presents a problem for interpretability of the locations in question. [sent-26, score-0.529]

15 One approach to this problem is to try and regularize classifiers so that they include as many informative voxels as possible [2], thus identifying localizable clusters of voxels that may overlap across folds. [sent-27, score-0.956]

16 A different approach is to cross-validate classifiers over small sections of the grid covering the brain, known as searchlights [10]. [sent-28, score-0.309]

17 This can be used to produce a map of the cross-validated accuracy in the searchlight around each voxel, taking advantage of the pattern of activation across all the voxels contained in it. [sent-29, score-1.376]

18 Such a map can then be thresholded to leave only locations where accuracy is significantly above chance. [sent-30, score-0.102]

19 Knowing the location of a voxel does not suffice to interpret what it is doing, as it could be very different from stimulus to stimulus (rather than just active or not, as in the two condition situation). [sent-32, score-0.388]

20 This method is partially based on the notion of pattern feature introduced in an earlier paper by us [15], but has been developed much further so as to dispense with most parameters and allow the creation of spatial maps usable for group or exploratory analyses, as will be discussed later. [sent-36, score-0.157]

21 1 Data and Methods Data The grid covering the brain contains on the order of tens of thousands voxels, measured over time as tasks are performed, every 1-2 seconds, yielding hundreds to thousands of 3D images per experiment. [sent-38, score-0.441]

22 During an experiment a given task is performed a certain number of times – trials – and often the images collected during one trial are collapsed or averaged together, giving us one 3D image that can be clearly labeled with what happened in that trial, e. [sent-39, score-0.087]

23 Although the grid covers the entire head, only a fraction of its voxels contain cortex in a typical subject; hence we only consider these voxels as features. [sent-42, score-0.882]

24 1 Interpretation is more complicated if nonlinear classifiers are being used [6], [17], but this is far less common 2 A searchlight is a small section of the 3D grid, in our case a 27 = 3 × 3 × 3 voxel cube. [sent-43, score-0.917]

25 Analyses using searchlights generally entail computing a statistic [10] or cross-validating a classifier over the dataset containing just those voxels [16], and do so for the searchlight around each voxel in the brain, covering it in its entirety. [sent-44, score-1.635]

26 The intuition for this is that individual voxels are very noisy features, and an effect observed across a group of voxels is more trustworthy. [sent-45, score-0.892]

27 In the experiment performed to obtain our dataset 2 [13], subjects observed a word and a line drawing of an item, displayed on a screen for 3 seconds and followed by 8 seconds of a blank screen. [sent-46, score-0.128]

28 The items named/depicted belonged to one of 12 categories: animals, body parts, buildings, building parts, clothing, furniture, insects, kitchen, man-made objects, tools, vegetables and vehicles. [sent-47, score-0.134]

29 There were 5 different exemplars of each of the 12 categories and 6 experimental epochs. [sent-49, score-0.156]

30 During an experiment the task repeated a total of 360 times, and a 3D image of the fMRI-measured brain activation acquired every second. [sent-51, score-0.444]

31 Each example for classification purposes is the average image during a 4 second span while the subject was thinking about the item shown a few seconds earlier (a period which contains the peak of the signal during the trial; the dataset thus contains 360 examples, as many as there were trials. [sent-52, score-0.133]

32 The voxel size was 3 × 3 × 5 mm, with the number of voxels being between 20000 and 21000 depending on which of the 9 subjects was considered. [sent-53, score-0.811]

33 The features in each example are voxels, and the example labels are the category of the item being shown in the trial each example came from. [sent-54, score-0.165]

34 1 2 for each classification task, cross-validate a classifier in all of the searchlights searchlight: - a 3x3x3 voxel cube - one centered around each voxel in cortex - overlapping test the result at each searchlight, which yields a binary significance image e. [sent-55, score-0.985]

35 90 3 image as a vector of voxels result significant this is done for all 66 pairwise classification tasks and adjacent searchlights supporting similar pairwise distinctions are clustered together using modularity 5 animals vs insects . [sent-64, score-1.054]

36 vehicles vehicles the binary vector of significance for each searchlight is rearranged into a binary confusion matrix animals insects tools buildings clothing body parts furniture 4 animals insects tools buildings clothing body parts furniture . [sent-82, score-1.831]

37 searchlight Figure 1: Construction of data-driven searchlights. [sent-85, score-0.583]

38 2 Method The goal of the experiment our dataset comes from is to understand how a certain semantic category is represented throughout the brain (e. [sent-87, score-0.299]

39 Intuitively, there is information in a given location if at least two categories can be distinguished looking at their respective patterns of activation there; otherwise, the pattern of activation is noise or common to all categories. [sent-91, score-0.664]

40 the construction of data-driven searchlights, parcels of the 3D grid where the same discriminations between pairs of categories can be made (these are generally larger than the 3 × 3 × 3 basic searchlight) 2. [sent-94, score-0.198]

41 the synthesis of pattern features from each data-driven searchlight, corresponding to the presence or absence of a certain pattern of activation across it 3. [sent-95, score-0.432]

42 the training and use of a classifier based on pattern features and the generation of an anatomical map of the impact of each voxel on classification and these are described in detail in each of the following sections. [sent-96, score-0.641]

43 1 Construction of data-driven searchlights Create pairwise searchlight maps In order to identify informative locations we start by considering whether a given pair of categories can be distinguished in each of the thousands of 3 × 3 × 3 searchlights covering the brain: 1. [sent-99, score-1.447]

44 For each searchlight cross-validate a classifier using the voxels belonging to it, obtaining an accuracy value which will be assigned to the voxel at the center of the searchlight, as shown in part 1 of Figure 1. [sent-100, score-1.409]

45 The classifier used in this case was Linear Discriminant Analysis (LDA, [7]), with a shrinkage estimator for the covariance matrix [18], as this was shown to be effective at both modeling the joint activation of voxels in a searchlight and classification [16]. [sent-101, score-1.172]

46 Transform the resulting brain image with the accuracy of each voxel into a p-value brain image (of obtaining accuracy as high or higher under the null hypothesis that the classes are not distinguishable, see [11]), as shown in part 1 of Figure 1. [sent-103, score-0.92]

47 Threshold the p-value brain image using False Discovery Rate [5] (q = 0. [sent-105, score-0.256]

48 01) to correct multiple for multiple comparisons and get a binary brain image with candidate locations where this pair of categories can be distinguished, as shown in part 2 of Figure 1. [sent-106, score-0.471]

49 The outcome for each pair of categories is a binary significance image, where a voxel is 1 if the categories can be distinguished in the searchlight surrounding it or 0 if not; this is shown for all pairs of categories in part 3 of Figure 1. [sent-107, score-1.496]

50 This can also be viewed per-searchlight, yielding a binary vector encoding which category pairs can be distinguished and which can be rearranged into a binary matrix, as shown in part 4 of Figure 1. [sent-108, score-0.3]

51 That said, if the same categories are distinguishable in two adjacent searchlights – which overlap – then it is reasonable to assume that all their voxels put together would still be able to make the same distinctions. [sent-110, score-0.849]

52 At the same time we would like to constrain data-driven searchlights to the boundaries of known, large, anatomically determined regions of interest (ROI), both for computational efficiency and for interpretability, as will be described later. [sent-112, score-0.237]

53 At the start of the aggregation process, each searchlight is by itself and has an associated binary information vector with 66 entries corresponding to which pairs of classes can be distinguished in its surrounding searchlight (part 3 of Figure 1). [sent-113, score-1.322]

54 For each searchlight we compute the similarity of its information vector with those of all its neighbours, which yields a 3D grid similarity graph. [sent-114, score-0.617]

55 We then take the portion of the graph corresponding to each ROI in the AAL brain atlas [19], and use modularity [1] to divide it into a number of clusters of adjacent searchlights supporting similar distinctions, as shown in panel 5 of Figure 1. [sent-115, score-0.534]

56 After this is done for all ROIs we obtain a partition of the brain into a few hundred clusters, the data-driven searchlights. [sent-116, score-0.217]

57 Figure 2 depicts the granularity of a typical clustering across multiple brain slices of one of the participants. [sent-117, score-0.327]

58 The centroid for each cluster encodes the pairs of categories that can be distinguished in that datadriven searchlight. [sent-122, score-0.308]

59 The centroid is obtained by combining the binary information vectors for each of the searchlights in it using a soft-AND function, and is itself a binary information vector. [sent-123, score-0.302]

60 A given entry is 1 – the respective pair of categories is distinguishable – if it is 1 in at least q% of the cluster members (where q is the false discovery rate used earlier to threshold the binary image for that pair of categories). [sent-124, score-0.282]

61 2 Generation of pattern features from each data-driven searchlight voxels examples 1 clusters (across class pairs) clusters (across all examples) training data singular vectors pattern features SVD cluster 1 . [sent-127, score-1.419]

62 3 animals vs insects animals vs tools vegetables vs vehicles 2 cluster 2 . [sent-130, score-0.713]

63 cluster 3 body parts vs buildings animals vs insects . [sent-133, score-0.556]

64 body parts vs buildings vegetables vs vehicles Figure 3: Construction of pattern detectors and pattern features from data-driven searchlights. [sent-139, score-0.706]

65 Construct two-way classifiers from each data-driven searchlight Each data-driven searchlight has a set of pairs of categories that can be distinguished in it. [sent-140, score-1.414]

66 This indicates that there are particular patterns of activation across the voxels in it which are characteristic of one or more categories, and absent in others. [sent-141, score-0.654]

67 We can leverage this to convert the pattern of activation across the brain into a series of sub-patterns, one from each data-driven searchlight. [sent-142, score-0.522]

68 5 Use two-way classifiers to generate pattern features The set of pattern-detectors learned from each data-driven searchlight can be applied to any example, not just the ones from the categories that were used to learn them. [sent-144, score-0.843]

69 For each data-driven searchlight, we apply all of its detectors to all the examples in the training set, over the voxels belonging to the searchlight, as illustrated in part 2 of Figure 3. [sent-146, score-0.524]

70 The output of each detector across all examples becomes a new, synthetic pattern feature. [sent-147, score-0.166]

71 The number of these pattern features varies per searchlight, as does the number of searchlights per subject, but at the end we will typically have between 10K and 20K of them. [sent-148, score-0.364]

72 ones that captured a pattern present in all animate object categories versus one present in all inanimate object ones); these will be highly correlated and redundant. [sent-151, score-0.229]

73 We address this by using Singular Value Decomposition (SVD, [7]) to reduce the dimensionality of the matrix of pattern features to the same as the number of examples (180), keeping all singular vectors; this is shown in part 3 of Figure 3. [sent-152, score-0.206]

74 Given the low-dimensional pattern feature dataset, we train a one-versus-rest classifier (a linear SVM with λ = 1, [3]) for each category; these are then applied to each example in the test set, with the label prediction corresponding to the class with the highest class probability. [sent-157, score-0.096]

75 The classifiers can also be used to determine the extent to which each data-driven searchlight was responsible for correctly predicting each class. [sent-158, score-0.583]

76 A one-versus-rest category classifier consists of a vector of 180 weights, which can be converted into an equivalent classifier over pattern features by inverting the SVD, as shown in part 1 of Figure 4. [sent-159, score-0.21]

77 The impact of each pattern feature in correctly predicting this category can be calculated by multiplying each weight by the values taken by the corresponding pattern feature over examples in the category, and averaging across all examples; this is shown in part 2 of Figure 4. [sent-160, score-0.474]

78 These pattern-feature impact values can then be aggregated by the data-driven searchlight they came from, yielding a net impact value for that searchlight. [sent-161, score-0.894]

79 This is the value that is propagated to each voxel in the data-driven searchlight (part 3 of Figure 4) in order to generate an impact map. [sent-162, score-1.046]

80 1 Experiments and Discussion Classification Our goal in this experiment is to determine whether transforming the data from voxel features to pattern features preserves information, and how competitive the results are with a classifier combined with voxel selection. [sent-164, score-0.849]

81 If cross-validation inside a split-half training set is required, we use leave-one-epoch out cross-validation, Baseline We contrasted experimental results obtained with our method with a baseline of classification using voxel selection. [sent-166, score-0.354]

82 The scoring criterion used to rank each voxel was the accuracy of a LDA classifier – same as described above – using the 3 × 3 × 3 searchlight around each voxel to do 12-category classification. [sent-167, score-1.295]

83 The number of voxels to use was selected by nested cross-validation inside the training set 3 . [sent-168, score-0.424]

84 The classifier used was a linear SVM (λ = 1, [3]), same as the whole brain classifier in our method. [sent-169, score-0.217]

85 Results The results are shown in the first line of Table 1; across subjects, our method is better than voxel selection, with the p-value of a sign-test of this being < 0. [sent-170, score-0.378]

86 It is substantially better than a classifier using all the voxels in the brain directly. [sent-172, score-0.641]

87 The first is that some classes give rise to very similar patterns of activation (e. [sent-176, score-0.186]

88 The second factor is that subjects vary in their ability to stay focused on the task and avoid stray thoughts or remembering other parts of the experiment, hence examples may not belong to the class corresponding to the label or even any class at all. [sent-179, score-0.122]

89 2 Impact maps tool building Figure 5: Average example for categories “tool” and “building” in participant P1 (slices ordered from inferior to superior, red is activation above the image mean, blue below). [sent-210, score-0.497]

90 3, an impact map can be produced for each category, showing the extent to which each data-driven searchlight helped classify that category correctly. [sent-213, score-0.791]

91 Figure 5 shows the average example for the two categories; note how similar the two examples are across the slices, indicating that most activation is shared between the two categories. [sent-215, score-0.235]

92 The impact maps for the same participant in Figure 6 show that much of the common activation is eliminated, and that the areas known to be informative are assigned high impact in their respective 3 Possible choices were 50, 100, 200, 400, 800, 1200, 1600, 2000, 4000, 8000, 16000 or all voxels. [sent-216, score-0.546]

93 7 tool building Figure 6: Impact map for categories “tool” and “building” in participant P1. [sent-217, score-0.279]

94 tool building Figure 7: Average impact map for categories “tool” and “building” across the nine participants. [sent-218, score-0.395]

95 Impact is positive, regardless of whether activation in each voxel involved is above or below the mean of the image; the activation of each voxel influences the classifier only in the context of its neighbours in each data-driven searchlight. [sent-220, score-1.022]

96 Finally, consider that impact maps can be averaged across subjects, as shown in Figure 7, or undergo t-tests or a more complex second-level group analysis. [sent-222, score-0.207]

97 Thresholding of statistical maps in functional neuroimaging using the false discovery rate. [sent-241, score-0.111]

98 A neurosemantic theory of concrete noun representation based on the underlying brain codes. [sent-255, score-0.217]

99 Predicting human brain activity associated with the meanings of nouns. [sent-290, score-0.239]

100 Classification of functional magnetic resonance imaging data using informative pattern features. [sent-304, score-0.226]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('searchlight', 0.583), ('voxels', 0.424), ('voxel', 0.334), ('searchlights', 0.237), ('brain', 0.217), ('activation', 0.165), ('categories', 0.133), ('impact', 0.129), ('insects', 0.103), ('buildings', 0.103), ('pattern', 0.096), ('animals', 0.085), ('er', 0.084), ('distinguished', 0.084), ('classi', 0.079), ('vs', 0.072), ('slices', 0.066), ('vegetables', 0.063), ('tools', 0.062), ('vehicles', 0.06), ('category', 0.059), ('participant', 0.057), ('neuroimage', 0.056), ('subjects', 0.053), ('clothing', 0.047), ('furniture', 0.047), ('distinctions', 0.046), ('ers', 0.045), ('across', 0.044), ('parts', 0.043), ('vj', 0.042), ('svd', 0.041), ('cluster', 0.039), ('body', 0.039), ('image', 0.039), ('covering', 0.038), ('tool', 0.037), ('functional', 0.037), ('mitchell', 0.036), ('locations', 0.036), ('grid', 0.034), ('maps', 0.034), ('thousands', 0.033), ('tens', 0.033), ('building', 0.032), ('clusters', 0.032), ('informative', 0.032), ('resonance', 0.032), ('fmri', 0.032), ('pereira', 0.032), ('yielding', 0.032), ('pairs', 0.031), ('vi', 0.031), ('features', 0.031), ('distinguishable', 0.031), ('anatomical', 0.031), ('detectors', 0.031), ('item', 0.029), ('magnetic', 0.029), ('singular', 0.029), ('marcel', 0.028), ('stimulus', 0.027), ('exploratory', 0.027), ('examples', 0.026), ('seconds', 0.026), ('mismatches', 0.026), ('rearranged', 0.026), ('roi', 0.026), ('cance', 0.025), ('trial', 0.025), ('princeton', 0.025), ('accuracy', 0.025), ('modularity', 0.024), ('neighbours', 0.024), ('xor', 0.024), ('adjacent', 0.024), ('part', 0.024), ('experiment', 0.023), ('exemplars', 0.023), ('neuroimaging', 0.022), ('activity', 0.022), ('binary', 0.022), ('thresholded', 0.021), ('came', 0.021), ('centroid', 0.021), ('invert', 0.021), ('patterns', 0.021), ('hundreds', 0.021), ('mri', 0.02), ('subject', 0.02), ('contrasted', 0.02), ('analyses', 0.02), ('map', 0.02), ('belonging', 0.019), ('around', 0.019), ('surrounding', 0.019), ('reading', 0.019), ('thinking', 0.019), ('discovery', 0.018), ('locate', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

2 0.12800375 167 nips-2012-Kernel Hyperalignment

Author: Alexander Lorbert, Peter J. Ramadge

Abstract: We offer a regularized, kernel extension of the multi-set, orthogonal Procrustes problem, or hyperalignment. Our new method, called Kernel Hyperalignment, expands the scope of hyperalignment to include nonlinear measures of similarity and enables the alignment of multiple datasets with a large number of base features. With direct application to fMRI data analysis, kernel hyperalignment is well-suited for multi-subject alignment of large ROIs, including the entire cortex. We report experiments using real-world, multi-subject fMRI data. 1

3 0.12103223 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

Author: Firdaus Janoos, Weichang Li, Niranjan Subrahmanya, Istvan Morocz, William Wells

Abstract: Identifying patterns from the neuroimaging recordings of brain activity related to the unobservable psychological or mental state of an individual can be treated as a unsupervised pattern recognition problem. The main challenges, however, for such an analysis of fMRI data are: a) deďŹ ning a physiologically meaningful feature-space for representing the spatial patterns across time; b) dealing with the high-dimensionality of the data; and c) robustness to the various artifacts and confounds in the fMRI time-series. In this paper, we present a network-aware feature-space to represent the states of a general network, that enables comparing and clustering such states in a manner that is a) meaningful in terms of the network connectivity structure; b)computationally efďŹ cient; c) low-dimensional; and d) relatively robust to structured and random noise artifacts. This feature-space is obtained from a spherical relaxation of the transportation distance metric which measures the cost of transporting “massâ€? over the network to transform one function into another. Through theoretical and empirical assessments, we demonstrate the accuracy and efďŹ ciency of the approximation, especially for large problems. 1

4 0.085656866 200 nips-2012-Local Supervised Learning through Space Partitioning

Author: Joseph Wang, Venkatesh Saligrama

Abstract: We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.

5 0.060272109 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

Author: Chris Hinrichs, Vikas Singh, Jiming Peng, Sterling Johnson

Abstract: Multiple Kernel Learning (MKL) generalizes SVMs to the setting where one simultaneously trains a linear classifier and chooses an optimal combination of given base kernels. Model complexity is typically controlled using various norm regularizations on the base kernel mixing coefficients. Existing methods neither regularize nor exploit potentially useful information pertaining to how kernels in the input set ‘interact’; that is, higher order kernel-pair relationships that can be easily obtained via unsupervised (similarity, geodesics), supervised (correlation in errors), or domain knowledge driven mechanisms (which features were used to construct the kernel?). We show that by substituting the norm penalty with an arbitrary quadratic function Q 0, one can impose a desired covariance structure on mixing weights, and use this as an inductive bias when learning the concept. This formulation significantly generalizes the widely used 1- and 2-norm MKL objectives. We explore the model’s utility via experiments on a challenging Neuroimaging problem, where the goal is to predict a subject’s conversion to Alzheimer’s Disease (AD) by exploiting aggregate information from many distinct imaging modalities. Here, our new model outperforms the state of the art (p-values 10−3 ). We briefly discuss ramifications in terms of learning bounds (Rademacher complexity). 1

6 0.058434103 185 nips-2012-Learning about Canonical Views from Internet Image Collections

7 0.05711551 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

8 0.053354345 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

9 0.051142182 344 nips-2012-Timely Object Recognition

10 0.049882758 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

11 0.047445778 197 nips-2012-Learning with Recursive Perceptual Representations

12 0.047220964 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

13 0.046927068 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

14 0.046521846 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

15 0.04559882 126 nips-2012-FastEx: Hash Clustering with Exponential Families

16 0.044623572 62 nips-2012-Burn-in, bias, and the rationality of anchoring

17 0.044623572 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

18 0.044179391 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

19 0.043065213 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

20 0.042770423 198 nips-2012-Learning with Target Prior


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.117), (1, 0.031), (2, -0.088), (3, -0.016), (4, 0.037), (5, -0.019), (6, -0.0), (7, 0.038), (8, -0.021), (9, 0.005), (10, -0.001), (11, -0.023), (12, 0.03), (13, -0.013), (14, 0.046), (15, -0.014), (16, -0.045), (17, 0.06), (18, -0.019), (19, -0.024), (20, -0.011), (21, 0.019), (22, -0.084), (23, -0.048), (24, 0.037), (25, -0.085), (26, -0.04), (27, 0.042), (28, 0.011), (29, 0.022), (30, -0.027), (31, 0.062), (32, 0.018), (33, -0.015), (34, 0.064), (35, -0.04), (36, -0.016), (37, 0.003), (38, -0.097), (39, -0.033), (40, -0.012), (41, -0.082), (42, 0.032), (43, 0.033), (44, 0.026), (45, -0.056), (46, -0.009), (47, 0.052), (48, 0.028), (49, -0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90076101 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

2 0.74444419 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

Author: Uri Maoz, Shengxuan Ye, Ian Ross, Adam Mamelak, Christof Koch

Abstract: The ability to predict action content from neural signals in real time before the action occurs has been long sought in the neuroscientific study of decision-making, agency and volition. On-line real-time (ORT) prediction is important for understanding the relation between neural correlates of decision-making and conscious, voluntary action as well as for brain-machine interfaces. Here, epilepsy patients, implanted with intracranial depth microelectrodes or subdural grid electrodes for clinical purposes, participated in a “matching-pennies” game against an opponent. In each trial, subjects were given a 5 s countdown, after which they had to raise their left or right hand immediately as the “go” signal appeared on a computer screen. They won a fixed amount of money if they raised a different hand than their opponent and lost that amount otherwise. The question we here studied was the extent to which neural precursors of the subjects’ decisions can be detected in intracranial local field potentials (LFP) prior to the onset of the action. We found that combined low-frequency (0.1–5 Hz) LFP signals from 10 electrodes were predictive of the intended left-/right-hand movements before the onset of the go signal. Our ORT system predicted which hand the patient would raise 0.5 s before the go signal with 68±3% accuracy in two patients. Based on these results, we constructed an ORT system that tracked up to 30 electrodes simultaneously, and tested it on retrospective data from 7 patients. On average, we could predict the correct hand choice in 83% of the trials, which rose to 92% if we let the system drop 3/10 of the trials on which it was less confident. Our system demonstrates— for the first time—the feasibility of accurately predicting a binary action on single trials in real time for patients with intracranial recordings, well before the action occurs. 1 1

3 0.68890178 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

Author: Joan Fruitet, Alexandra Carpentier, Maureen Clerc, Rémi Munos

Abstract: Brain-computer interfaces (BCI) allow users to “communicate” with a computer without using their muscles. BCI based on sensori-motor rhythms use imaginary motor tasks, such as moving the right or left hand, to send control signals. The performances of a BCI can vary greatly across users but also depend on the tasks used, making the problem of appropriate task selection an important issue. This study presents a new procedure to automatically select as fast as possible a discriminant motor task for a brain-controlled button. We develop for this purpose an adaptive algorithm, UCB-classif , based on the stochastic bandit theory. This shortens the training stage, thereby allowing the exploration of a greater variety of tasks. By not wasting time on inefficient tasks, and focusing on the most promising ones, this algorithm results in a faster task selection and a more efficient use of the BCI training session. Comparing the proposed method to the standard practice in task selection, for a fixed time budget, UCB-classif leads to an improved classification rate, and for a fixed classification rate, to a reduction of the time spent in training by 50%. 1

4 0.61740005 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

Author: Pieter-jan Kindermans, Hannes Verschore, David Verstraeten, Benjamin Schrauwen

Abstract: The usability of Brain Computer Interfaces (BCI) based on the P300 speller is severely hindered by the need for long training times and many repetitions of the same stimulus. In this contribution we introduce a set of unsupervised hierarchical probabilistic models that tackle both problems simultaneously by incorporating prior knowledge from two sources: information from other training subjects (through transfer learning) and information about the words being spelled (through language models). We show, that due to this prior knowledge, the performance of the unsupervised models parallels and in some cases even surpasses that of supervised models, while eliminating the tedious training session. 1

5 0.57735169 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

Author: Won H. Kim, Deepti Pachauri, Charles Hatt, Moo. K. Chung, Sterling Johnson, Vikas Singh

Abstract: Hypothesis testing on signals defined on surfaces (such as the cortical surface) is a fundamental component of a variety of studies in Neuroscience. The goal here is to identify regions that exhibit changes as a function of the clinical condition under study. As the clinical questions of interest move towards identifying very early signs of diseases, the corresponding statistical differences at the group level invariably become weaker and increasingly hard to identify. Indeed, after a multiple comparisons correction is adopted (to account for correlated statistical tests over all surface points), very few regions may survive. In contrast to hypothesis tests on point-wise measurements, in this paper, we make the case for performing statistical analysis on multi-scale shape descriptors that characterize the local topological context of the signal around each surface vertex. Our descriptors are based on recent results from harmonic analysis, that show how wavelet theory extends to non-Euclidean settings (i.e., irregular weighted graphs). We provide strong evidence that these descriptors successfully pick up group-wise differences, where traditional methods either fail or yield unsatisfactory results. Other than this primary application, we show how the framework allows performing cortical surface smoothing in the native space without mappint to a unit sphere. 1

6 0.56808096 167 nips-2012-Kernel Hyperalignment

7 0.5607776 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

8 0.53387392 198 nips-2012-Learning with Target Prior

9 0.53124768 46 nips-2012-Assessing Blinding in Clinical Trials

10 0.52200586 256 nips-2012-On the connections between saliency and tracking

11 0.49039647 146 nips-2012-Graphical Gaussian Vector for Image Categorization

12 0.48622921 289 nips-2012-Recognizing Activities by Attribute Dynamics

13 0.47769925 200 nips-2012-Local Supervised Learning through Space Partitioning

14 0.45733765 303 nips-2012-Searching for objects driven by context

15 0.4461849 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

16 0.43660456 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

17 0.43596283 151 nips-2012-High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

18 0.43412921 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

19 0.43350995 155 nips-2012-Human memory search as a random walk in a semantic network

20 0.43339652 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.034), (11, 0.01), (21, 0.021), (38, 0.073), (42, 0.022), (54, 0.017), (55, 0.035), (74, 0.036), (76, 0.573), (77, 0.014), (80, 0.049), (92, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99343228 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

Abstract: We propose an efficient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1

2 0.9827475 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

Author: Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K. Duvenaud, Stephen J. Roberts, Carl E. Rasmussen

Abstract: Numerical integration is a key component of many problems in scientific computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model’s hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy. 1

3 0.98180956 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller

Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video. 1

same-paper 4 0.98163003 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

5 0.98049676 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

6 0.97428858 205 nips-2012-MCMC for continuous-time discrete-state systems

7 0.9741469 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

8 0.94630629 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

9 0.941248 247 nips-2012-Nonparametric Reduced Rank Regression

10 0.94079894 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

11 0.88596159 338 nips-2012-The Perturbed Variation

12 0.87380713 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

13 0.8713395 142 nips-2012-Generalization Bounds for Domain Adaptation

14 0.86447299 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

15 0.86428291 327 nips-2012-Structured Learning of Gaussian Graphical Models

16 0.8633064 41 nips-2012-Ancestor Sampling for Particle Gibbs

17 0.86023271 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

18 0.85660201 95 nips-2012-Density-Difference Estimation

19 0.85557318 291 nips-2012-Reducing statistical time-series problems to binary classification

20 0.85457921 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points