nips nips2013 nips2013-9 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. [sent-8, score-1.064]
2 The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. [sent-9, score-0.337]
3 We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. [sent-10, score-0.155]
4 This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures. [sent-11, score-0.331]
5 1 Introduction The problem of nonparametric testing of interaction between variables has been widely treated in the machine learning and statistics literature. [sent-12, score-0.312]
6 Much of the work in this area focuses on measuring or testing pairwise interaction: for instance, the Hilbert-Schmidt Independence Criterion (HSIC) or Distance Covariance [1, 2, 3], kernel canonical correlation [4, 5, 6], and mutual information [7]. [sent-13, score-0.307]
7 In cases where more than two variables interact, however, the questions we can ask about their interaction become significantly more involved. [sent-14, score-0.222]
8 This is already a more general question than pairwise independence, since pairwise independence does not imply total (mutual) independence, while the implication holds in the other direction. [sent-16, score-0.396]
9 uniform on {−1, 1}, then (X, Y, XY ) is a pairwise independent but mutually dependent triplet [9]. [sent-20, score-0.143]
10 Tests of total and pairwise independence are insufficient, however, since they do not rule out all third order factorizations of the joint distribution. [sent-21, score-0.447]
11 An important class of high order interactions occurs when the simultaneous effect of two variables on a third may not be additive. [sent-22, score-0.121]
12 In particular, it may be possible that X ⊥ Z and Y ⊥ Z, ⊥ ⊥ whereas ¬ ((X, Y ) ⊥ Z) (for example, neither adding sugar to coffee nor stirring the coffee in⊥ dividually have an effect on its sweetness but the joint presence of the two does). [sent-23, score-0.093]
13 In addition, study of three-variable interactions can elucidate certain switching mechanisms between positive and negative correlation of two genes expressions, as controlled by a third gene [10]. [sent-24, score-0.098]
14 The presence of such interactions is typically tested using some form of analysis of variance (ANOVA) model which includes additional interaction terms, such as products of individual variables. [sent-25, score-0.264]
15 Since each such additional term requires a new hypothesis test, this increases the risk that some hypothesis test will produce a false positive by chance. [sent-26, score-0.186]
16 Therefore, a test that is able to directly detect the presence of any kind of higher-order interaction would be of a broad interest in statistical modeling. [sent-27, score-0.302]
17 In the present work, we provide to our knowledge the first nonparametric test for three-variable interaction. [sent-28, score-0.113]
18 This work generalizes the HSIC test of pairwise independence, and has as its test statistic the norm 1 of an embedding of an appropriate signed measure to a reproducing kernel Hilbert space (RKHS). [sent-29, score-0.793]
19 When the statistic is non-zero, all third order factorizations can be ruled out. [sent-30, score-0.188]
20 Moreover, this test is applicable to the cases where X, Y and Z are themselves multivariate objects, and may take values in non-Euclidean or structured domains. [sent-31, score-0.076]
21 1 One important application of interaction measures is in learning structure for graphical models. [sent-32, score-0.258]
22 If the graphical model is assumed to be Gaussian, then second order interaction statistics may be used to construct an undirected graph [11, 12]. [sent-33, score-0.199]
23 An alternative approach to structure learning is to employ conditional independence tests. [sent-35, score-0.228]
24 A number of algorithms have since extended the basic PC algorithm to arbitrary probability distributions over multivariate random variables [16, 17, 18], by using nonparametric kernel independence tests [19] and conditional dependence tests [20, 18]. [sent-39, score-0.783]
25 We begin our presentation in Section 2 with a definition of interaction measures, these being the signed measures we will embed in an RKHS. [sent-41, score-0.433]
26 We describe a statistic to test mutual independence for more than three variables, and provide a brief overview of the more complex high-order interactions that may be observed when four or more variables are considered. [sent-44, score-0.554]
27 2 Interaction Measure An interaction measure [21, 22] associated to a multidimensional probability distribution P of a random vector (X1 , . [sent-46, score-0.229]
28 , XD ) taking values in the product space X1 ×· · ·×XD is a signed measure ∆P that vanishes whenever P can be factorised in a non-trivial way as a product of its (possibly multivariate) marginal distributions. [sent-49, score-0.303]
29 For the cases D = 2, 3 the correct interaction measure coincides with the the notion introduced by Lancaster [21] as a formal product D ∆L P ∗ PXi − PXi , = (1) i=1 where each product D′ j=1 ∗ PXi signifies the joint probability distribution PXi1 ···Xi D′ j of a subvector Xi1 , . [sent-50, score-0.36]
30 We will term the signed measure in (1) the Lancaster interaction measure. [sent-54, score-0.404]
31 (3) For D > 3, however, (1) does not capture all possible factorizations of the joint distribution, e. [sent-57, score-0.078]
32 2 work, however, we will focus on the case of three variables and formulate interaction tests based on embedding of (2) into an RKHS. [sent-64, score-0.458]
33 The implication (3) states that the presence of Lancaster interaction rules out the possibility of any factorization of the joint distribution, but the converse is not generally true; see Appendix C for details. [sent-65, score-0.265]
34 In addition, it is important to note the distinction between the absence of Lancaster interaction and the total (mutual) independence of (X, Y, Z), i. [sent-66, score-0.475]
35 While total independence implies the absence of Lancaster interaction, the signed measure ∆tot P = PXY Z −PX PY PZ associated to the total (mutual) independence of (X, Y, Z) does not vanish if, e. [sent-69, score-0.757]
36 In this contribution, we construct the non-parametric test for the hypothesis ∆L P = 0 (no Lancaster interaction), as well as the non-parametric test for the hypothesis ∆tot P = 0 (total independence), based on the embeddings of the corresponding signed measures ∆L P and ∆tot P into an RKHS. [sent-72, score-0.567]
37 Both tests are particularly suited to the cases where X, Y and Z take values in a high-dimensional space, and, moreover, they remain valid for a variety of non-Euclidean and structured domains, i. [sent-73, score-0.163]
38 In the case of total independence testing, our approach can be viewed as a generalization of the tests proposed in [24] based on the empirical characteristic functions. [sent-76, score-0.504]
39 3 Kernel Embeddings We review the embedding of signed measures to a reproducing kernel Hilbert space. [sent-77, score-0.494]
40 The RKHS norms of such embeddings will then serve as our test statistics. [sent-78, score-0.147]
41 19], for every symmetric, positive definite function (henceforth kernel) k : Z × Z → R, there is an associated reproducing kernel Hilbert space (RKHS) Hk of real-valued functions on Z with reproducing kernel k. [sent-81, score-0.374]
42 Denote by M(Z) the Banach space of all finite signed Borel measures on Z. [sent-83, score-0.234]
43 The notion of a feature map can then be extended to kernel embeddings of elements of M(Z) [25, Chapter 4]. [sent-84, score-0.196]
44 (Kernel embedding) Let k be a kernel on Z, and ν ∈ M(Z). [sent-86, score-0.125]
45 The kernel embedding ´ of ν into the RKHS Hk is µk (ν) ∈ Hk such that f (z)dν(z) = f, µk (ν) Hk for all f ∈ Hk . [sent-87, score-0.198]
46 ´ Alternatively, the kernel embedding can be defined by the Bochner integral µk (ν) = k(·, z) dν(z). [sent-88, score-0.198]
47 If a measurable kernel k is a bounded function, it is straightforward to show using the Riesz representation theorem that µk (ν) exists for all ν ∈ M(Z). [sent-89, score-0.125]
48 2 For many interesting bounded kernels k, including the Gaussian, Laplacian and inverse multiquadratics, the embedding µk : M(Z) → Hk is injective. [sent-90, score-0.126]
49 A related but weaker notion is that of a characteristic kernel [20, 27], which requires the kernel embedding to be injective only on the set M1 (Z) of probability measures. [sent-93, score-0.41]
50 In the case that k is ISPD, since + Hk is a Hilbert space, we can introduce a notion of an inner product between two signed measures ν, ν ′ ∈ M(Z), ˆ ′ ′ ν, ν k := µk (ν), µk (ν ) Hk = k(z, z ′ )dν(z)dν ′ (z ′ ). [sent-94, score-0.295]
51 This fact has been used extensively in the literature to formulate: (a) a nonparametric two-sample test based on estimation of maximum mean discrepancy i. [sent-96, score-0.113]
52 dence test based on estimation of PXY − PX PY k⊗l , for a joint sample {(Xi , Yi )}i=1 ∼ PXY [19] (the latter is also called a Hilbert-Schmidt independence criterion), with kernel k ⊗ l on the product space defined as k(x, x′ )l(y, y ′ ). [sent-105, score-0.503]
53 When a bounded characteristic kernel is used, the above tests are consistent against all alternatives, and their alternative interpretation is as a generalization [29, 3] of energy distance [30, 31] and distance covariance [2, 32]. [sent-106, score-0.353]
54 In this case, one can still study embeddings 1/2 1/2 of the signed measures Mk (Z) ⊂ M(Z), which satisfy a finite moment condition, i. [sent-108, score-0.305]
55 Using the same arguments as in the tests of [28, 19], these tests are also consistent against all alternatives as long as ISPD kernels are used. [sent-112, score-0.379]
56 1 Two-Variable (Independence) Test We provide a short overview of the kernel independence test of [19], which we write as the RKHS norm of the embedding of a signed measure. [sent-119, score-0.677]
57 4]), it will help define how to proceed when a third variable is introduced, and the signed measures 2 become more involved. [sent-121, score-0.267]
58 Each of the terms above corresponds to an estimate of an inner product ν, ν ′ k⊗l for probability measures ν and ν ′ taking values in {PXY , PX PY }, as summarized in Table 1. [sent-129, score-0.12]
59 When three variables are present, a two-variable test already allows us to determine whether for instance (X, Y ) ⊥ Z, i. [sent-134, score-0.099]
60 It is sufficient to treat (X, Y ) as a single variable on the product space X × Y, with the product kernel k ⊗ l. [sent-137, score-0.195]
61 3 What is not obvious, however, is if a and the corresponding V -statistic is 12 K ◦ L ◦ M n ++ V-statistic for the Lancaster interaction (which can be thought of as a surrogate for the composite hypothesis of various factorizations) can be obtained in a similar form. [sent-139, score-0.254]
62 Again, it is easy to see that these can be expressed as certain expectations of kernel functions, and thereby can be calculated by an appropriate manipulation of the three Gram matrices. [sent-145, score-0.125]
63 Based on the individual RKHS inner product estimators, we can now easily derive estimators for various signed measures arising as linear combinations of PXY Z , PXY PZ , and so on. [sent-148, score-0.295]
64 The first such measure is an “incomplete” Lancaster interaction measure ∆(Z) P = PXY Z +PX PY PZ −PY Z PX − PXZ PY , which vanishes if (Y, Z) ⊥ X or (X, Z) ⊥ Y , but not necessarily if (X, Y ) ⊥ Z. [sent-149, score-0.287]
65 In particular, one requires centering of all three kernel matrices to perform a “complete” Lancaster interaction test, as given by the following Proposition. [sent-155, score-0.357]
66 As we will demonstrate in the experiments in Section 5, while particularly useful for testing the factorization hypothesis, i. [sent-160, score-0.08]
67 The null distribution under each of these hypotheses can be estimated using a standard permutation-based approach described in Appendix D. [sent-168, score-0.101]
68 Another way to obtain the Lancaster interaction statistic is as the RKHS norm of the joint “central moment” ΣXY Z = EXY Z [(kX − µX ) ⊗ (lY − µY ) ⊗ (mZ − µZ )] of RKHS-valued random variables kX , lY and mZ (understood as an element of the tensor RKHS Hk ⊗ Hl ⊗ Hm ). [sent-169, score-0.377]
69 This is related to a classical characterization of the Lancaster interaction [21, Ch. [sent-170, score-0.199]
70 XII]: there is no Lancaster interaction between X, Y and Z if and only if cov [f (X), g(Y ), h(Z)] = 0 for all L2 functions f , g and h. [sent-171, score-0.199]
71 And finally, we give an estimator of the RKHS norm of the total independence measure ∆tot P . [sent-174, score-0.306]
72 While the test statistic for total independence has a somewhat more complicated form than that of Lancaster interaction, it can also be computed in quadratic time. [sent-179, score-0.468]
73 3 Interaction for D > 3 Streitberg’s correction of the interaction measure for D > 3 has the form (−1)|π|−1 (|π| − 1)! [sent-181, score-0.261]
74 |πr maps the probability measure P to the product measure Pπ = r j=1 Pπj , where Pπj is the marginal distribution of the subvector (Xi : i ∈ πj ) . [sent-188, score-0.117]
75 While the Lancaster interaction o has an interpretation in terms of joint central moments, Streitberg’s correction corresponds to joint (1) cumulants [22, Section 4]. [sent-190, score-0.408]
76 Xn [ kX1 − µX1 ⊗ (n) · · · ⊗ kXn − µXn ] does not capture the correct notion of the interaction measure. [sent-194, score-0.199]
77 This can be viewed by analogy with the scalar case, where it is well known that the second and third cumulants coincide with the second and third central moments, whereas the higher order cumulants are neither moments nor central moments, but some other polynomials of the moments. [sent-196, score-0.292]
78 4 Total independence for D > 3 In general, the test statistic for total independence in the D-variable case is 2 D ˆ PXi ˆ PX1:D − i=1 = D i=1 k(i) + 1 n2 n n D (i) Kab − a=1 b=1 i=1 D 1 n2D n 2 nD+1 n D n (i) Kab a=1 i=1 b=1 n (i) Kab . [sent-198, score-0.696]
79 i=1 a=1 b=1 A similar statistic for total independence is discussed by [24] where testing of total independence based on empirical characteristic functions is considered. [sent-199, score-0.786]
80 Our test has a direct interpretation in terms of characteristic functions as well, which is straightforward to see in the case of translation invariant kernels on Euclidean spaces, using their Bochner representation, similarly as in [27, Corollary 4]. [sent-200, score-0.194]
81 6 Marginal independence tests: Dataset B Null acceptance rate (Type II error) Marginal independence tests: Dataset A 1 1 0. [sent-201, score-0.495]
82 4 In this case, (X, Y, Z) is clearly a pairwise independent but mutually dependent triplet. [sent-228, score-0.113]
83 The mutual dependence becomes increasingly difficult to detect as the dimensionality p increases. [sent-229, score-0.117]
84 2 CI: X ⊥ Y |Z ⊥ 0 0 1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13 15 17 19 Dimension Dimension Figure 3: Factorization hypothesis: Lancaster statistic vs. [sent-252, score-0.116]
85 a two-variable based test; Test for X ⊥ ⊥ Y |Z from [18] In all cases, we use permutation tests as described in Appendix D. [sent-253, score-0.163]
86 As expected, in dataset B, we see that dependence of Z on pair (X, Y ) is somewhat easier to detect than on X individually with two-variable tests. [sent-257, score-0.122]
87 In both datasets, however, the Lancaster interaction appears significantly more sensitive in detecting this dependence as dimensionality p 2 ˆ increases. [sent-258, score-0.274]
88 Figure 2 plots the Type II error of total independence tests with statistics ∆L P ˆ and ∆tot P k⊗l⊗m 2 k⊗l⊗m . [sent-259, score-0.439]
89 The Lancaster statistic outperforms the total independence statistic everywhere apart from the Dataset B when the number of dimensions is small (between 1 and 5). [sent-260, score-0.508]
90 , test for (X, Y ) ⊥ Z ∨ (X, Z) ⊥ Y ∨ (Y, Z) ⊥ X ⊥ ⊥ ⊥ with Lancaster statistic with Holm-Bonferroni correction as described in Appendix D, as well as the two-variable based test (which performs three standard two-variable tests and applies the HolmBonferroni correction). [sent-263, score-0.463]
91 We also plot the Type II error for the conditional independence test for X ⊥ Y |Z from [18]. [sent-264, score-0.304]
92 Under assumption that X ⊥ Y (correct on both datasets), negation of each ⊥ ⊥ of these three hypotheses is equivalent to the presence of V-structure X → Z ← Y , so the rejection of the null can be viewed as a V-structure detection procedure. [sent-265, score-0.101]
93 As dimensionality increases, the Lancaster statistic appears significantly more sensitive to the interactions present than the competing approaches, which is particularly pronounced in Dataset A. [sent-266, score-0.181]
94 6 Conclusions We have constructed permutation-based nonparametric tests for three-variable interactions, including the Lancaster interaction and total independence. [sent-267, score-0.447]
95 The tests can be used in datasets where only higher-order interactions persist, i. [sent-268, score-0.228]
96 , variables are pairwise independent; as well as in cases where joint dependence may be easier to detect than pairwise dependence, for instance when the effect of two variables on a third is not additive. [sent-270, score-0.309]
97 The flexibility of the framework of RKHS embeddings of signed measures allows us to consider variables that are themselves multidimensional. [sent-271, score-0.328]
98 While the total independence case readily generalizes to more than three dimensions, the combinatorial nature of joint cumulants implies that detecting interactions of higher order requires significantly more costly computation. [sent-272, score-0.485]
99 Kernel-based conditional independence test and application in causal discovery. [sent-404, score-0.33]
100 Hilbert space embeddings o and metrics on probability measures. [sent-467, score-0.071]
wordName wordTfidf (topN-words)
[('lancaster', 0.474), ('pxy', 0.453), ('py', 0.237), ('independence', 0.228), ('interaction', 0.199), ('tot', 0.185), ('px', 0.183), ('signed', 0.175), ('pz', 0.171), ('tests', 0.163), ('rkhs', 0.13), ('kernel', 0.125), ('statistic', 0.116), ('gretton', 0.107), ('hk', 0.088), ('kab', 0.084), ('fukumizu', 0.082), ('test', 0.076), ('cumulants', 0.074), ('pxi', 0.074), ('embedding', 0.073), ('embeddings', 0.071), ('pxz', 0.067), ('null', 0.066), ('interactions', 0.065), ('characteristic', 0.065), ('ex', 0.065), ('reproducing', 0.062), ('ey', 0.06), ('pairwise', 0.06), ('measures', 0.059), ('sriperumbudur', 0.056), ('hypothesis', 0.055), ('kernels', 0.053), ('testing', 0.053), ('exy', 0.05), ('hlh', 0.05), ('ispd', 0.05), ('streitberg', 0.05), ('hilbert', 0.048), ('total', 0.048), ('mutual', 0.046), ('dependence', 0.044), ('pc', 0.041), ('sejdinovic', 0.041), ('joint', 0.039), ('acceptance', 0.039), ('factorizations', 0.039), ('kely', 0.038), ('nonparametric', 0.037), ('sch', 0.036), ('gram', 0.036), ('hypotheses', 0.035), ('rp', 0.035), ('product', 0.035), ('bius', 0.034), ('hkh', 0.034), ('jyv', 0.034), ('npxy', 0.034), ('skyl', 0.034), ('third', 0.033), ('appendix', 0.033), ('centering', 0.033), ('correction', 0.032), ('mutually', 0.031), ('ip', 0.031), ('detecting', 0.031), ('measure', 0.03), ('triplet', 0.03), ('bochner', 0.03), ('moments', 0.028), ('sz', 0.028), ('vanishes', 0.028), ('coffee', 0.027), ('mz', 0.027), ('factorization', 0.027), ('dataset', 0.027), ('detect', 0.027), ('aij', 0.026), ('inner', 0.026), ('causal', 0.026), ('central', 0.025), ('hsic', 0.024), ('janzing', 0.024), ('ly', 0.024), ('directed', 0.024), ('individually', 0.024), ('proposition', 0.024), ('partitions', 0.024), ('hl', 0.023), ('variables', 0.023), ('canonical', 0.023), ('injective', 0.022), ('subvector', 0.022), ('topological', 0.022), ('dependent', 0.022), ('child', 0.022), ('lkopf', 0.022), ('kl', 0.022), ('expressions', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 9 nips-2013-A Kernel Test for Three-Variable Interactions
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
2 0.19208193 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
Author: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
Abstract: A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the B-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice. 1
3 0.18090761 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
Author: Remi Gribonval, Pierre Machart
Abstract: There are two major routes to address linear inverse problems. Whereas regularization-based approaches build estimators as solutions of penalized regression optimization problems, Bayesian estimators rely on the posterior distribution of the unknown, given some assumed family of priors. While these may seem radically different approaches, recent results have shown that, in the context of additive white Gaussian denoising, the Bayesian conditional mean estimator is always the solution of a penalized regression problem. The contribution of this paper is twofold. First, we extend the additive white Gaussian denoising results to general linear inverse problems with colored Gaussian noise. Second, we characterize conditions under which the penalty function associated to the conditional mean estimator can satisfy certain popular properties such as convexity, separability, and smoothness. This sheds light on some tradeoff between computational efficiency and estimation accuracy in sparse regularization, and draws some connections between Bayesian estimation and proximal optimization. 1
4 0.16836326 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
5 0.15720347 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
6 0.14963304 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
7 0.14530015 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
8 0.096389711 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
9 0.086843491 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
10 0.081064373 156 nips-2013-Learning Kernels Using Local Rademacher Complexity
11 0.080107756 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
12 0.077099971 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
13 0.076283805 5 nips-2013-A Deep Architecture for Matching Short Texts
14 0.072255582 88 nips-2013-Designed Measurements for Vector Count Data
15 0.069888704 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
16 0.069789022 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
17 0.067514107 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
18 0.05684955 224 nips-2013-On the Sample Complexity of Subspace Learning
19 0.055523902 289 nips-2013-Scalable kernels for graphs with continuous attributes
20 0.053056579 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
topicId topicWeight
[(0, 0.14), (1, 0.074), (2, 0.03), (3, 0.051), (4, -0.007), (5, 0.028), (6, 0.034), (7, -0.003), (8, -0.1), (9, 0.025), (10, -0.017), (11, -0.092), (12, -0.117), (13, -0.153), (14, 0.215), (15, 0.101), (16, 0.034), (17, 0.066), (18, -0.193), (19, -0.077), (20, -0.071), (21, -0.021), (22, 0.013), (23, -0.004), (24, -0.017), (25, -0.005), (26, 0.023), (27, 0.031), (28, 0.049), (29, 0.042), (30, 0.183), (31, -0.056), (32, 0.095), (33, -0.015), (34, -0.023), (35, -0.06), (36, 0.042), (37, 0.118), (38, -0.033), (39, -0.004), (40, -0.075), (41, -0.015), (42, 0.022), (43, 0.005), (44, 0.111), (45, -0.021), (46, -0.017), (47, -0.115), (48, -0.082), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.96464527 9 nips-2013-A Kernel Test for Three-Variable Interactions
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
2 0.80226851 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
Author: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
Abstract: A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the B-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice. 1
3 0.76842755 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
4 0.65637594 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
5 0.60617727 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
Author: Le Song, Bo Dai
Abstract: Kernel embedding of distributions has led to many recent advances in machine learning. However, latent and low rank structures prevalent in real world distributions have rarely been taken into account in this setting. Furthermore, no prior work in kernel embedding literature has addressed the issue of robust embedding when the latent and low rank information are misspecified. In this paper, we propose a hierarchical low rank decomposition of kernels embeddings which can exploit such low rank structures in data while being robust to model misspecification. We also illustrate with empirical evidence that the estimated low rank embeddings lead to improved performance in density estimation. 1
6 0.57407576 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
7 0.5702461 327 nips-2013-The Randomized Dependence Coefficient
8 0.54433143 156 nips-2013-Learning Kernels Using Local Rademacher Complexity
9 0.4961372 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
10 0.42202488 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
11 0.41215393 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
12 0.41201779 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
13 0.39390957 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
14 0.39068148 289 nips-2013-Scalable kernels for graphs with continuous attributes
15 0.39007348 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
16 0.34305462 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections
17 0.34210026 76 nips-2013-Correlated random features for fast semi-supervised learning
18 0.34096932 332 nips-2013-Tracking Time-varying Graphical Structure
19 0.33972636 88 nips-2013-Designed Measurements for Vector Count Data
20 0.33069262 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
topicId topicWeight
[(2, 0.012), (16, 0.023), (19, 0.334), (33, 0.142), (34, 0.089), (36, 0.019), (41, 0.05), (49, 0.024), (56, 0.106), (70, 0.028), (85, 0.027), (89, 0.026), (93, 0.033), (95, 0.012)]
simIndex simValue paperId paperTitle
1 0.79208356 72 nips-2013-Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses
Author: Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari
Abstract: The design of convex, calibrated surrogate losses, whose minimization entails consistency with respect to a desired target loss, is an important concept to have emerged in the theory of machine learning in recent years. We give an explicit construction of a convex least-squares type surrogate loss that can be designed to be calibrated for any multiclass learning problem for which the target loss matrix has a low-rank structure; the surrogate loss operates on a surrogate target space of dimension at most the rank of the target loss. We use this result to design convex calibrated surrogates for a variety of subset ranking problems, with target losses including the precision@q, expected rank utility, mean average precision, and pairwise disagreement. 1
same-paper 2 0.7907244 9 nips-2013-A Kernel Test for Three-Variable Interactions
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
3 0.76168501 119 nips-2013-Fast Template Evaluation with Vector Quantization
Author: Mohammad Amin Sadeghi, David Forsyth
Abstract: Applying linear templates is an integral part of many object detection systems and accounts for a significant portion of computation time. We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. Our method is a combination of approximating scores by vector quantizing feature windows and a number of speedup techniques including cascade. Our procedure allows speed and accuracy to be traded off in two ways: by choosing the number of Vector Quantization levels, and by choosing to rescore windows or not. Our method can be directly plugged into any recognition system that relies on linear templates. We demonstrate our method to speed up the original Exemplar SVM detector [1] by an order of magnitude and Deformable Part models [2] by two orders of magnitude with no loss of accuracy. 1
4 0.62746489 201 nips-2013-Multi-Task Bayesian Optimization
Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams
Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1
5 0.61330116 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
6 0.57858318 173 nips-2013-Least Informative Dimensions
7 0.56630075 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
8 0.56501913 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
9 0.56138134 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
10 0.55619669 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
11 0.54986191 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
12 0.54533535 137 nips-2013-High-Dimensional Gaussian Process Bandits
13 0.53728408 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
14 0.53446329 11 nips-2013-A New Convex Relaxation for Tensor Completion
15 0.53228205 318 nips-2013-Structured Learning via Logistic Regression
16 0.53206789 166 nips-2013-Learning invariant representations and applications to face verification
17 0.53088427 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
18 0.53084999 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
19 0.53032631 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
20 0.52975714 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result