nips nips2013 nips2013-306 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chris Hinrichs, Vamsi Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh
Abstract: Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given α-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated α-threshold is also recovered faithfully, and is stable. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu/˜vamsi/pt_fast Abstract Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. [sent-14, score-0.376]
2 The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. [sent-16, score-0.164]
3 In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix P. [sent-18, score-0.6]
4 By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0. [sent-19, score-0.266]
5 Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. [sent-21, score-0.568]
6 Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. [sent-22, score-0.528]
7 1 Introduction Suppose we have completed a placebo-controlled clinical trial of a promising new drug for a neurodegenerative disorder such as Alzheimer’s disease (AD) on a small sized cohort. [sent-24, score-0.281]
8 The rationale here is that, even if the drug does induce variations in cognitive symptoms, the brain changes are observable much earlier in the imaging data. [sent-28, score-0.197]
9 On the imaging front, this analysis checks for statistically significant differences between brain images of subjects assigned to the two trial arms: treatment and placebo. [sent-29, score-0.254]
10 Alternatively, consider a second scenario where we have completed a neuroimaging research study of a particular controlled factor, such as genotype, and the interest is to evaluate group-wise differences in the brain images: to identify which regions are affected as a function of class membership. [sent-30, score-0.247]
11 As one might expect, given the number of hypotheses tests v, multiple testing issues in this setting are quite severe, making it difficult to assess the true FamilyWise Type I Error Rate (FWER) [3]. [sent-42, score-0.239]
12 If we were to address this issue via Bonferroni correction [4], the enormous number of separate tests implies that certain weaker signals will almost certainly never be detected, even if they are real. [sent-43, score-0.148]
13 This directly affects studies of neurodegenerative disorders in which atrophy proceeds at a very slow rate and the therapeutic effects of a drug is likely to be mild to moderate anyway. [sent-44, score-0.168]
14 In the worst case, an otherwise real treatment effect of a drug may not survive correction, and the trial may be deemed a failure. [sent-47, score-0.173]
15 If so, then the extremely low Bonferroni corrected α-threshold crossings effectively become mutually exclusive, which makes the Union Bound (on which Bonferroni correction is based) nearly tight. [sent-52, score-0.171]
16 Thus, many methods have been developed to more accurately and efficiently estimate or approximate the FWER [5, 6, 7, 8], which is a subject of much interest in statistics [9], machine learning [10], bioinformatics [11], and neuroimaging [12]. [sent-54, score-0.212]
17 A commonly used method of directly and non-parametrically estimating the FWER is Permutation testing [12, 13], which is a method of sampling from the Global (i. [sent-56, score-0.227]
18 Permutation testing ensures that any relevant dependencies present in the data carry through to the test statistics, giving an unbiased estimator of the FWER. [sent-59, score-0.255]
19 If we want to choose a threshold sufficient to exclude all spurious results with probability 1 − α, we can construct a histogram of sample maxima taken from permutation samples, and choose a threshold giving the 1 − α/2 quantile. [sent-60, score-0.467]
20 Unfortunately, reliable FWER estimates derived via permutation testing come at excessive (and often infeasible) computational cost – often tens of thousands or even millions of permutation samples are required, each of which requires a complete pass over the entire data set. [sent-61, score-0.925]
21 Observe that the very same dependencies between voxels, that forced the usage of permutation testing, indicate that the overwhelming majority of work in computing so many highly correlated Null statistics is redundant. [sent-63, score-0.406]
22 Note that regardless of their description, strong dependencies of almost any kind will tend to concentrate most of their co-variation into a low-rank subspace, leaving a high-rank, low-variance residual [5]. [sent-64, score-0.22]
23 In fact, for Genome wide Association studies (GWAS), many strategies calculate the ‘effective number’ (Meff ) of independent tests corresponding to the rank of this subspace [16, 5]. [sent-65, score-0.174]
24 This paper is based on the observation that such a low-rank structure must also appear in permutation test samples. [sent-66, score-0.407]
25 Using ideas from online low-rank matrix completion [17] we can sample a few of the Null statistics and reconstruct the remainder as long as we properly account for the residual. [sent-67, score-0.249]
26 The contribution of our work is to significantly speed up permutation testing in neuroimaging, delivering running time improvements of up to 50×. [sent-69, score-0.525]
27 In other words, our algorithm does the same job as permutation testing, but takes anywhere from a few minutes up to a few hours, rather than days or weeks. [sent-70, score-0.361]
28 Further, based on recent work in random matrix theory, we provide an analysis which sheds additional light on the use of matrix completion methods in this context. [sent-71, score-0.291]
29 2 2 The Proposed Algorithm We first cover some basic concepts underlying permutation testing and low rank matrix completion in more detail, before presenting our algorithm and the associated analysis. [sent-73, score-0.858]
30 1 Permutation testing Randomly sampled permutation testing [18] is a methodology for drawing samples under the Global (Family-Wise) Null hypothesis. [sent-75, score-0.689]
31 The basic idea of permutation testing is very simple, yet extremely powerful. [sent-78, score-0.525]
32 Suppose we have a set of labeled high dimensional data points, and a univariate test statistic which measures some interaction between labeled groups for every dimension (or feature). [sent-79, score-0.167]
33 The maximum over all of these statistics for every permutation sample is then used to construct a histogram, which therefore is a non-parametric estimate of the distribution of the sample maximum of Null statistics. [sent-81, score-0.455]
34 For a test statistic derived from the real labels, the FWER corrected p-value is then equal to the fraction of permutation samples which were more extreme. [sent-82, score-0.567]
35 Note that all of the permutation samples can be assembled into a matrix P ∈ Rv×T where v is the number of comparisons (voxels for images), and T is the number of permutation samples. [sent-83, score-0.797]
36 01 threshold from the Null sample maximum distribution, we require many thousands of permutation samples — each requires randomizing the labels and recalculating all test statistics, a very computationally expensive procedure when v is large. [sent-88, score-0.507]
37 2 Low-rank Matrix completion Low-rank matrix completion [19] seeks to reconstruct missing entries from a matrix, given only a small fraction of its entries. [sent-92, score-0.429]
38 By placing an 1 -norm penalty on the eigenvalues of the recovered matrix via the nuclear norm [20, 21] we can ensure that the solution is as low rank as possible. [sent-95, score-0.369]
39 Alternatively, we can specify a rank r ahead of time, and estimate an orthogonal basis of that rank by following a gradient along the Grassmannian manifold [22, 17]. [sent-96, score-0.183]
40 Denoting the set of randomly subsampled entries as Ω, the matrix completion problem is given as, ˜ min PΩ − PΩ ˜ P 2 F ˜ s. [sent-97, score-0.301]
41 3 Low rank plus a long tail Real-world data often have a dominant low-rank component. [sent-104, score-0.162]
42 While the data may not be exactly characterized by a low-rank basis, the residual will not significantly alter the eigen-spectrum of the sample covariance in such cases. [sent-105, score-0.175]
43 , normalizing by pooled variances, will contribute a long tail of eigenvalues, and so we require that this long tail will either decay rapidly, or that it does not overlap with the dominant eigenvalues. [sent-110, score-0.222]
44 For t-statistics, the pooled variances are unlikely to change very much from one permutation sample to another (barring outliers) — hence we expect that the spectrum of P will resemble that of the data covariance, with the addition of a long, exponentially decaying tail. [sent-111, score-0.469]
45 A Central Limit argument appeals to the number of independent eigenfunctions that contribute to this residual, and, the orthogonality of eigenfunctions implies that as more of them meaningfully contribute to each entry in the residual, the more independent those entries become. [sent-117, score-0.183]
46 residual; and if it decays rapidly, then the residual will perhaps be less Gaussian, but also more negligible. [sent-121, score-0.175]
47 Thus, our development in the next section makes no direct assumption about these eigenvalues themselves, but rather that the residual corresponds to a low-variance i. [sent-122, score-0.291]
48 4 Our Method It still remains to model the residual numerically. [sent-127, score-0.175]
49 By sub-sampling we can reconstruct the low-rank portion of P via matrix completion, but in order to obtain the desired sample maximum distribution we must also recover the residual. [sent-128, score-0.212]
50 Exact recovery of the residual is essentially impossible; fortunately, for our purposes we need only need its effect on the distribution of the maximum per permutation test. [sent-129, score-0.652]
51 During the training phase we conduct a small number of fully sampled permutation tests (100 permutations in our experiments). [sent-133, score-0.436]
52 From these permutation tests, we estimate U using sub-sampled matrix completion methods [22, 17], making multiple passes over the training set (with fixed sub-sampling rate), until convergence. [sent-134, score-0.577]
53 Then, we obtain a distribution of the residual S over the entire training set. [sent-136, score-0.214]
54 Next is the recovery phase, in which we sub-sample a small fraction of the entries of each successive column t, solve for the reconstruction coefficients W(·, t) in the basis U by least-squares, and then add random residuals using parameters estimated during training. [sent-137, score-0.185]
55 After that, we proceed exactly as in a normal permutation testing, to recover the statistics. [sent-138, score-0.418]
56 This sampling artifact has the effect of ‘shifting’ the distribution of the sample maximum towards 0. [sent-144, score-0.175]
57 3 Analysis We now discuss two results which show that as long as the variance of the residual is below a certain level, we can recover the distribution of the sample maximum. [sent-146, score-0.232]
58 We can then treat the residual S as a random matrix whose entries are i. [sent-148, score-0.289]
59 We arrive at our first result by analyzing how the low-rank portion of P’s singular spectrum interlaces with the contribution coming from the residual by treating P as a low-rank perturbation of a random matrix. [sent-152, score-0.287]
60 If this low-rank perturbation is sufficient to dominate the eigenvalues of the random matrix, then P can be recovered with high fidelity at a low sampling rate [22, 17]. [sent-153, score-0.339]
61 The following development relies on the observation that the eigenvalues of PPT are the squared singular values of P. [sent-155, score-0.173]
62 Thus, rather than analyzing the singular value spectrum of P directly, we can analyze the eigenvalues of PPT using a recent result from [24]. [sent-156, score-0.228]
63 This is important because in order to ensure recovery of P, we require that its singular value spectrum will approximately retain the shape of UW’s. [sent-157, score-0.181]
64 1 relates the rate at which eigenvalues are perturbed, δ, to the parameterization of S in terms of σ 2 . [sent-168, score-0.156]
65 As ˜ v, t → ∞ such that v 1, the eigenvalues λi of the perturbed matrix Q + SST will satisfy t ˜ |λi − λi | < δλi ˜ λi < δλr i = 1, . [sent-186, score-0.229]
66 Having justified the model in (2), the following thorem shows that the empirical distribution of the maximum Null statistic approximates the true distribution. [sent-205, score-0.168]
67 Let mt = maxi Pi,t be the maximum observed test statistic at permutation trial ˆ t, and similarly let mt = maxi Pi,t be the maximum reconstructed test statistic. [sent-208, score-0.794]
68 4 Experimental evaluations Our experimental evaluations include four separate neuroimaging datasets of Alzheimer’s Disease (AD) patients, cognitively healthy age-matched controls (CN), and in some cases Mild Cognitive Impairment (MCI) patients. [sent-213, score-0.509]
69 Our evaluations focus on three main questions: (i) Can we recover an acceptable approximation of the maximum statistic Null distribution from an approximation of the permutation test matrix? [sent-225, score-0.707]
70 (ii) What degree of computational speedup can we expect at various subsampling rates, and how does this affect the trade-off with approximation error? [sent-226, score-0.292]
71 (iii) How sensitive is the estimated α-level threshold with respect to the recovered Null distribution? [sent-227, score-0.157]
72 In all our experiments, the rank estimate for subspace tracking (to construct the low–rank basis U) was taken as the number of subjects. [sent-228, score-0.167]
73 1 shows the KL and BD values obtained from three datasets, at 20 different subsampling rates (ranging from 0. [sent-237, score-0.161]
74 1 is that both KL and BD measures of the recovered Null to the true distribution are < e−5 for sampling rates more than 0. [sent-244, score-0.167]
75 This (a) Dataset A (b) Dataset B (c) Dataset C Figure 1: KL (blue) and BD (red) measures between the true max Null distribution (given by the full matrix P) and that recovered by our method (thick lines), along with the baseline naive subsampling method (dotted lines). [sent-246, score-0.254]
76 The other three curves include : subsampling (blue), GRASTA recovery (red) and total time taken by our model (black). [sent-260, score-0.187]
77 suggests that our model recovers both the shape (low BD) and position (low KL) of the null to high accuracy at extremely low sub-sampling. [sent-264, score-0.344]
78 We also see that above a certain minimum subsampling rate (∼ 0. [sent-265, score-0.158]
79 This is expected from the theory on matrix completion where after observing a minimum number of data samples, adding in new samples does not substantially increase information content. [sent-267, score-0.216]
80 Our experiments suggest that the speedup is substantial. [sent-272, score-0.174]
81 3 and 2 compare the time taken to perform the complete permutation testing to that of our model. [sent-274, score-0.525]
82 Each plot contains 4 curves and represent the time taken by our model, the corresponding sampling and GRASTA [17] recovery (plus training) times and the total time to construct the entire matrix P (horizontal line). [sent-278, score-0.29]
83 2 shows the scatter plot of computational speedup vs. [sent-280, score-0.218]
84 6% sub- Figure 2: Scatter plot of computational speedup vs. [sent-285, score-0.218]
85 The plot corresponds to the 20 different samplings < e−5 ), the computation speed-up factor aver- on all 4 datasets (for 5 repeated set of experiments) and aged over all datasets was 45×. [sent-287, score-0.178]
86 2 that there is a trade–off between the speedup factor and approximation error (KL or BD). [sent-295, score-0.174]
87 Overall the highest computational speedup factor achieved at a recovery level of e−5 on KL and BD is around 50x (and this occured around 0. [sent-296, score-0.243]
88 It was observed that a speedup factor of upto 55× was obtained for Datasets C and D at 0. [sent-300, score-0.174]
89 4 show the error in estimating the true max thresh7 (a) Datasets A, B (b) Datasets C, D Figure 4: Error of estimated t statistic thresholds (red) for the 20 different subsampling rates on the four Datasets. [sent-310, score-0.387]
90 The x–axis corresponds to the 20 different sampling rates used and y–axis shows the absolute difference of thresholds in log scale. [sent-318, score-0.168]
91 Observe that for sampling rates higher than 3%, the mean and maximum differences was 0. [sent-319, score-0.153]
92 Table 1: Errors of estimated t statistic thresholds on The increase in error for 1 − α > 0. [sent-376, score-0.226]
93 995 is a all datasets at two different subsampling rates. [sent-377, score-0.185]
94 Experiments on four different neuroimaging datasets show that we can recover the distribution of the maximum Null statistic to a high degree of accuracy, while maintaining a computational speedup factor of roughly 50×. [sent-387, score-0.678]
95 Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. [sent-424, score-0.28]
96 Controlling the familywise error rate with plug-in estimator for the proportion of true null hypotheses. [sent-434, score-0.426]
97 Controlling the familywise error rate in functional neuroimaging: a comparative review. [sent-465, score-0.141]
98 Group imaging of task-related changes in cortical synchronisation using nonparametric permutation testing. [sent-473, score-0.406]
99 A comparison of random field theory and permutation methods for the statistical analysis of meg data. [sent-482, score-0.361]
100 The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. [sent-545, score-0.233]
wordName wordTfidf (topN-words)
[('permutation', 0.361), ('null', 0.285), ('fwer', 0.247), ('bd', 0.22), ('neuroimaging', 0.212), ('uw', 0.177), ('residual', 0.175), ('speedup', 0.174), ('testing', 0.164), ('bonferroni', 0.155), ('completion', 0.141), ('statistic', 0.121), ('subsampling', 0.118), ('eigenvalues', 0.116), ('kl', 0.108), ('familywise', 0.101), ('mci', 0.101), ('voxels', 0.085), ('cn', 0.084), ('alzheimer', 0.082), ('subjects', 0.078), ('drug', 0.077), ('ad', 0.077), ('hinrichs', 0.076), ('uwst', 0.076), ('tests', 0.075), ('neuroimage', 0.075), ('evaluations', 0.075), ('matrix', 0.075), ('correction', 0.073), ('cance', 0.072), ('recovery', 0.069), ('tail', 0.068), ('datasets', 0.067), ('ppt', 0.067), ('artifact', 0.065), ('sampling', 0.063), ('thresholds', 0.062), ('sst', 0.062), ('recovered', 0.061), ('low', 0.059), ('rank', 0.058), ('recover', 0.057), ('singular', 0.057), ('disease', 0.056), ('clinical', 0.055), ('spectrum', 0.055), ('age', 0.055), ('treatment', 0.054), ('pooled', 0.053), ('threshold', 0.053), ('ashburner', 0.051), ('heredity', 0.051), ('impairment', 0.051), ('ithapu', 0.051), ('morphometry', 0.051), ('neurodegenerative', 0.051), ('swt', 0.051), ('uwwt', 0.051), ('veterans', 0.051), ('maximum', 0.047), ('test', 0.046), ('subsampled', 0.046), ('rv', 0.046), ('dependencies', 0.045), ('imaging', 0.045), ('adni', 0.045), ('cognitively', 0.045), ('plot', 0.044), ('estimated', 0.043), ('ut', 0.043), ('rates', 0.043), ('axis', 0.042), ('mt', 0.042), ('trial', 0.042), ('grasta', 0.041), ('subspace', 0.041), ('rate', 0.04), ('cognitive', 0.04), ('entire', 0.039), ('entries', 0.039), ('eigenfunctions', 0.039), ('nichols', 0.039), ('genotype', 0.039), ('corrected', 0.039), ('perturbed', 0.038), ('dataset', 0.037), ('grassmannian', 0.037), ('plus', 0.036), ('faithful', 0.035), ('healthy', 0.035), ('median', 0.035), ('brain', 0.035), ('tracking', 0.034), ('delity', 0.034), ('balzano', 0.034), ('basis', 0.034), ('reconstruct', 0.033), ('orthogonal', 0.033), ('contribute', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999851 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
Author: Chris Hinrichs, Vamsi Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh
Abstract: Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given α-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated α-threshold is also recovered faithfully, and is stable. 1
2 0.19734547 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
Author: Deepti Pachauri, Risi Kondor, Vikas Singh
Abstract: The problem of matching not just two, but m different sets of objects to each other arises in many contexts, including finding the correspondence between feature points across multiple images in computer vision. At present it is usually solved by matching the sets pairwise, in series. In contrast, we propose a new method, Permutation Synchronization, which finds all the matchings jointly, in one shot, via a relaxation to eigenvector decomposition. The resulting algorithm is both computationally efficient, and, as we demonstrate with theoretical arguments as well as experimental results, much more stable to noise than previous methods. 1
3 0.17720604 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
Author: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
Abstract: A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the B-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice. 1
4 0.162066 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
Author: Akshay Krishnamurthy, Aarti Singh
Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1
5 0.13027564 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
6 0.10691844 183 nips-2013-Mapping paradigm ontologies to and from the brain
7 0.10406408 73 nips-2013-Convex Relaxations for Permutation Problems
8 0.098859861 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
9 0.096042924 173 nips-2013-Least Informative Dimensions
10 0.094276935 185 nips-2013-Matrix Completion From any Given Set of Observations
11 0.091755763 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
12 0.090548359 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
13 0.090372361 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
14 0.089961462 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
15 0.088563606 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
16 0.086843491 9 nips-2013-A Kernel Test for Three-Variable Interactions
17 0.082265854 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
18 0.081982829 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
19 0.081625983 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
20 0.07903146 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
topicId topicWeight
[(0, 0.215), (1, 0.101), (2, 0.052), (3, 0.109), (4, -0.024), (5, -0.016), (6, -0.001), (7, 0.0), (8, -0.111), (9, 0.009), (10, -0.022), (11, 0.007), (12, -0.0), (13, -0.015), (14, 0.004), (15, 0.033), (16, -0.05), (17, -0.001), (18, -0.139), (19, 0.005), (20, -0.012), (21, -0.156), (22, -0.094), (23, -0.027), (24, -0.001), (25, -0.026), (26, 0.01), (27, -0.015), (28, -0.092), (29, 0.045), (30, 0.085), (31, 0.003), (32, -0.002), (33, 0.02), (34, 0.076), (35, -0.014), (36, 0.018), (37, -0.05), (38, 0.064), (39, 0.089), (40, -0.028), (41, 0.078), (42, 0.089), (43, 0.088), (44, 0.059), (45, 0.113), (46, 0.027), (47, -0.129), (48, -0.032), (49, 0.16)]
simIndex simValue paperId paperTitle
same-paper 1 0.94146532 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
Author: Chris Hinrichs, Vamsi Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh
Abstract: Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given α-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated α-threshold is also recovered faithfully, and is stable. 1
2 0.75415283 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
Author: Deepti Pachauri, Risi Kondor, Vikas Singh
Abstract: The problem of matching not just two, but m different sets of objects to each other arises in many contexts, including finding the correspondence between feature points across multiple images in computer vision. At present it is usually solved by matching the sets pairwise, in series. In contrast, we propose a new method, Permutation Synchronization, which finds all the matchings jointly, in one shot, via a relaxation to eigenvector decomposition. The resulting algorithm is both computationally efficient, and, as we demonstrate with theoretical arguments as well as experimental results, much more stable to noise than previous methods. 1
3 0.67570567 73 nips-2013-Convex Relaxations for Permutation Problems
Author: Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre D'Aspremont
Abstract: Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We prove the equivalence between the seriation and the combinatorial 2-SUM problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-SUM problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences. 1
4 0.62609148 284 nips-2013-Robust Spatial Filtering with Beta Divergence
Author: Wojciech Samek, Duncan Blythe, Klaus-Robert Müller, Motoaki Kawanabe
Abstract: The efficiency of Brain-Computer Interfaces (BCI) largely depends upon a reliable extraction of informative features from the high-dimensional EEG signal. A crucial step in this protocol is the computation of spatial filters. The Common Spatial Patterns (CSP) algorithm computes filters that maximize the difference in band power between two conditions, thus it is tailored to extract the relevant information in motor imagery experiments. However, CSP is highly sensitive to artifacts in the EEG data, i.e. few outliers may alter the estimate drastically and decrease classification performance. Inspired by concepts from the field of information geometry we propose a novel approach for robustifying CSP. More precisely, we formulate CSP as a divergence maximization problem and utilize the property of a particular type of divergence, namely beta divergence, for robustifying the estimation of spatial filters in the presence of artifacts in the data. We demonstrate the usefulness of our method on toy data and on EEG recordings from 80 subjects. 1
5 0.60798079 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
6 0.59816945 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
7 0.58409381 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
8 0.56670082 352 nips-2013-What do row and column marginals reveal about your dataset?
9 0.56173134 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
10 0.56137508 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
11 0.53980666 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
12 0.53844881 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
13 0.52371556 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
14 0.52187705 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
15 0.52113652 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
16 0.49756637 9 nips-2013-A Kernel Test for Three-Variable Interactions
17 0.49638948 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
18 0.49534053 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
19 0.49013686 186 nips-2013-Matrix factorization with binary components
20 0.48864031 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
topicId topicWeight
[(16, 0.026), (33, 0.631), (34, 0.066), (49, 0.027), (56, 0.088), (70, 0.018), (85, 0.026), (89, 0.027), (93, 0.019)]
simIndex simValue paperId paperTitle
1 0.9951936 88 nips-2013-Designed Measurements for Vector Count Data
Author: Liming Wang, David Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin
Abstract: We consider design of linear projection measurements for a vector Poisson signal model. The projections are performed on the vector Poisson rate, X ∈ Rn , and the + observed data are a vector of counts, Y ∈ Zm . The projection matrix is designed + by maximizing mutual information between Y and X, I(Y ; X). When there is a latent class label C ∈ {1, . . . , L} associated with X, we consider the mutual information with respect to Y and C, I(Y ; C). New analytic expressions for the gradient of I(Y ; X) and I(Y ; C) are presented, with gradient performed with respect to the measurement matrix. Connections are made to the more widely studied Gaussian measurement model. Example results are presented for compressive topic modeling of a document corpora (word counting), and hyperspectral compressive sensing for chemical classification (photon counting). 1
2 0.99342549 217 nips-2013-On Poisson Graphical Models
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1
same-paper 3 0.99235731 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
Author: Chris Hinrichs, Vamsi Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh
Abstract: Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given α-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated α-threshold is also recovered faithfully, and is stable. 1
4 0.99169648 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
Author: Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov
Abstract: Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources – such as text data – both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model. 1
5 0.98347563 160 nips-2013-Learning Stochastic Feedforward Neural Networks
Author: Yichuan Tang, Ruslan Salakhutdinov
Abstract: Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks. As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. However, this predictive distribution is assumed to be unimodal (e.g. Gaussian). For tasks involving structured prediction, the conditional distribution should be multi-modal, resulting in one-to-many mappings. By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space. However, previously proposed learning algorithms for SBNs are not efficient and unsuitable for modeling real-valued data. In this paper, we propose a stochastic feedforward network with hidden layers composed of both deterministic and stochastic variables. A new Generalized EM training procedure using importance sampling allows us to efficiently learn complicated conditional distributions. Our model achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks. In addition, the latent features of our model improves classification and can learn to generate colorful textures of objects. 1
6 0.97977757 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
7 0.97443849 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
8 0.97145665 253 nips-2013-Prior-free and prior-dependent regret bounds for Thompson Sampling
9 0.96518439 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
10 0.93520641 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
11 0.92007577 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors
12 0.89999241 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
13 0.89939803 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
14 0.89606154 67 nips-2013-Conditional Random Fields via Univariate Exponential Families
15 0.89476281 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
16 0.89288431 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks
17 0.89185286 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
18 0.88892466 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
19 0.88652569 335 nips-2013-Transfer Learning in a Transductive Setting
20 0.88415587 226 nips-2013-One-shot learning by inverting a compositional causal process