nips nips2007 nips2007-192 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Moulines Eric, Francis R. Bach, Zaïd Harchaoui
Abstract: We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial and real datasets is provided. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Testing for Homogeneity with Kernel Fisher Discriminant Analysis Za¨d Harchaoui ı LTCI, TELECOM ParisTech and CNRS 46, rue Barrault, 75634 Paris cedex 13, France zaid. [sent-1, score-0.141]
2 org ´ Eric Moulines LTCI, TELECOM ParisTech and CNRS 46, rue Barrault, 75634 Paris cedex 13, France eric. [sent-5, score-0.141]
3 fr Abstract We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. [sent-7, score-0.707]
4 Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. [sent-8, score-0.786]
5 1 Introduction An important problem in statistics and machine learning consists in testing whether the distributions of two random variables are identical under the alternative that they may differ in some ways. [sent-10, score-0.162]
6 The problem consists in testing the null hypothesis H0 : P1 = P2 against the alternative HA : P1 = P2 . [sent-18, score-0.475]
7 We shall allow the input space X to be quite general, including for example finite-dimensional Euclidean spaces or more sophisticated structures such as strings or graphs (see [17]) arising in applications such as bioinformatics [4]. [sent-20, score-0.208]
8 The most popular procedures are the two-sample Kolmogorov-Smirnov tests or the Cramer-Von Mises tests, that have been the standard for addressing these issues (at least when the dimension of the input space is small, and most often when X = R). [sent-22, score-0.099]
9 Although these tests are popular due to their simplicity, they are known to be insensitive to certain characteristics of the distribution, such as densities containing highfrequency components or local features such as bumps. [sent-23, score-0.145]
10 The low-power of the traditional density based statistics can be improved on using test statistics based on kernel density estimators [2] and [1] and wavelet estimators [6]. [sent-24, score-0.468]
11 Recent work [11] has shown that one could difference in means in RKHSs in order to consistently test for homogeneity. [sent-25, score-0.088]
12 In this paper, we show that taking into account the covariance structure in the RKHS allows to obtain simple limiting distributions. [sent-26, score-0.144]
13 The paper is organized as follows: in Section 2 and Section 3, we state the main definitions and we construct the test statistics. [sent-27, score-0.088]
14 In Section 4, we give the asymptotic distribution of our test statistic under the null hypothesis, and investigate, the consistency and the power of the test for fixed alternatives. [sent-28, score-0.911]
15 In 1 Section 5 we provide experimental evidence of the performance of our test statistic on both artificial and real datasets. [sent-29, score-0.273]
16 2 Mean and covariance in reproducing kernel Hilbert spaces We first highlight the main assumptions we make in the paper on the reproducing kernel, then introduce operator-theoretic tools for working with distributions in infinite-dimensional spaces. [sent-31, score-0.562]
17 1 Reproducing kernel Hilbert spaces Let (X, d) be a separable metric space, and denote by X the associated σ-algebra. [sent-33, score-0.276]
18 The Hilbert space H is an RKHS if at each x ∈ X, the point evaluation operator δx : H → R, which maps f ∈ H to f (x) ∈ R, is a bounded linear functional. [sent-36, score-0.209]
19 Note that this is always the case if X is a separable metric space and if the kernel is continuous (see [18]). [sent-40, score-0.228]
20 Throughout this paper, we make the following two assumptions on the kernel: (A1) The kernel k is bounded, that is |k|∞ = sup(x,y)∈X×X k(x, y) < ∞. [sent-41, score-0.178]
21 The asymptotic normality of our test statistics is valid without assumption (A2), while consistency results against fixed alternatives does need (A2). [sent-43, score-0.532]
22 Assumption (A2) is true for translation-invariant kernels [8], and in particular for the Gaussian kernel on Rd [18]. [sent-44, score-0.178]
23 2 Mean element and covariance operator We shall need some operator-theoretic tools to define mean elements and covariance operators in RKHS. [sent-46, score-0.728]
24 A linear operator T is said to be bounded if there is a number C such that T f H ≤ C f H for all f ∈ H. [sent-47, score-0.241]
25 If k 1/2 (x, x)P(dx) < ∞, the mean element µP is defined for all functions f ∈ H as the unique element in H satisfying, µP , f def H = Pf = f dP . [sent-50, score-0.426]
26 (1) If furthermore k(x, x)P(dx) < ∞, then the covariance operator ΣP is defined as the unique linear operator onto H satisfying for all f, g ∈ H, f, ΣP g def H = (f − Pf )(g − Pg)dP . [sent-51, score-0.866]
27 The operator ΣP is a self-adjoint nonnegative trace-class operator. [sent-53, score-0.242]
28 , Xn }, the empirical estimates respectively of the mean element and the covariance operator are then defined using empirical moments and lead to: n µ = n−1 ˆ i=1 n k(Xi , ·) , ˆ Σ = n−1 i=1 2 k(Xi , ·) ⊗ k(Xi , ·) − µ ⊗ µ . [sent-58, score-0.591]
29 ˆ ˆ (3) The operator Σ is a self-adjoint nonnegative trace-class operators. [sent-59, score-0.242]
30 Hence, it can de diagonalized in an orthonormal basis, with a spectrum composed of a strictly decreasing sequence λp > 0 tending to zero and potentially a null space N (Σ) composed of functions f in H such that {f − Pf }2 dP = 0 [5], i. [sent-60, score-0.397]
31 The null space may be reduced to the null element (in particular for the Gaussian kernel), or may be infinite-dimensional. [sent-63, score-0.563]
32 Similarly, there may be infinitely many strictly positive eigenvalues (true nonparametric case) or finitely many (underlying finite dimensional problems). [sent-64, score-0.231]
33 3 KFDA-based test statistic In the feature space, the two-sample homogeneity test procedure can be formulated as follows. [sent-65, score-0.558]
34 Denote by ΣW = (n1 /n)Σ1 +(n2 /n)Σ2 the pooled covariance operator, where def n = n1 + n2 , corresponding to the within-class covariance matrix in the finite-dimensional setting def (see [14]. [sent-74, score-0.949]
35 Let us denote ΣB = (n1 n2 /n2 )(µ2 −µ1 )⊗(µ2 −µ1 ) the between-class covariance operator. [sent-75, score-0.144]
36 For a = 1, 2, denote by (ˆa , Σa ) respectively the empirical estimates of the mean element and µ ˆ ˆ def ˆ ˆ the covariance operator, defined as previously stated in (3). [sent-76, score-0.595]
37 Denote ΣW = (n1 /n)Σ1 + (n2 /n)Σ2 ˆ def the empirical pooled covariance estimator, and ΣB = (n1 n2 /n2 )(ˆ2 − µ1 ) ⊗ (ˆ2 − µ1 ) the emµ ˆ µ ˆ pirical between-class covariance operator. [sent-77, score-0.679]
38 Let {γn }n≥0 be a sequence of strictly positive numbers. [sent-78, score-0.111]
39 The maximum Fisher discriminant ratio serves as a basis of our test statistics: ˆ f, ΣB f 2 1 n1 n2 H ˆ ˆ n max (ΣW + γn I)− 2 δ = , (4) f ∈H n H ˆ f, (ΣW + γn I)f H where I denotes the identity operator. [sent-79, score-0.202]
40 X = Rd , the kernel is linear k(x, y) = x⊤ y and γn = 0, this quantity matches the so-called Hotelling’s T 2 statistic in the two-sample case [15]. [sent-82, score-0.363]
41 Moreover, in practice it may be computed thanks to the kernel trick, adapted to the kernel Fisher discriminant analysis and outlined in [17, Chapter 6]. [sent-83, score-0.47]
42 We shall make the following assumptions respectively on Σ1 and Σ2 (B1) For u = 1, 2, the eigenvalues {λp (Σu )}p≥1 satisfy ∞ p=1 1/2 λp (Σu ) < ∞. [sent-84, score-0.24]
43 (B2) For u = 1, 2, there are infinitely many strictly positive eigenvalues {λp (Σu )}p≥1 of Σu . [sent-85, score-0.168]
44 These roles, recentering and rescaling, will be played respectively by d1 (ΣW , γ) and d2 (ΣW , γ), where for a given compact operator Σ with decreasing eigenvalues λp (S), the quantity dr (Σ, γ) is defined for all q ≥ 1 as def 1/r ∞ dr (Σ, γ) = (λp + γ)−r λr p . [sent-87, score-0.755]
45 (5) p=1 4 Theoretical results We consider in the sequel the following studentized test statistic: Tn (γn ) = n1 n2 n 2 ˆ − d1 (ΣW , γn ) ˆ ˆ (ΣW + γn I)−1/2 δ H . [sent-88, score-0.128]
46 √ ˆ 2d2 (ΣW , γn ) 3 (6) In this paper, we first consider the asymptotic behavior of Tn under the null hypothesis, and then against a fixed alternative. [sent-89, score-0.396]
47 This will establish that our nonparametric test procedure is consistent in power. [sent-90, score-0.216]
48 1 Asymptotic normality under null hypothesis In this section, we derive the distribution of the test statistics under the null hypothesis H0 : P1 = P2 of homogeneity, i. [sent-92, score-1.096]
49 Under the assumptions of Theorem 1, the sequence of tests that ˆ rejects the null hypothesis when Tn (γn ) ≥ z1−α , where z1−α is the (1 − α)-quantile of the standard normal distribution, is asymptotically level α. [sent-98, score-0.581]
50 Note that the limiting distribution does not depend on the kernel nor on the regularization parameter. [sent-99, score-0.178]
51 2 Power consistency We study the power of the test based on Tn (γn ) under alternative hypotheses. [sent-101, score-0.242]
52 The minimal requirement is to to prove that this sequence of tests is consistent in power. [sent-102, score-0.181]
53 A sequence of tests of constant level α is said to be consistent in power if the probability of accepting the null hypothesis of homogeneity goes to zero as the sample size goes to infinity under a fixed alternative. [sent-103, score-0.986]
54 The following proposition shows that the limit is finite, strictly positive and independent of the kernel otherwise (see [8] for similar results for canonical correlation analysis). [sent-104, score-0.294]
55 the population counterpart of (ΣW + γn I)−1/2 δ H H which our test statistics is based upon. [sent-107, score-0.147]
56 5 (8) Experiments In this section, we investigate the experimental performances of our test statistic KFDA, and compare it in terms of power against other nonparametric test statistics. [sent-121, score-0.518]
57 1 Artificial data We shall focus here on a particularly simple setting, in order analyze the major issues arising in applying our approach in practice. [sent-123, score-0.126]
58 Indeed, we consider the periodic smoothing spline kernel (see 4 γ= KFDA MMD 10−1 0. [sent-124, score-0.297]
59 Table 1: Evolution of power of KFDA and MMD respectively, as γ goes to 0. [sent-137, score-0.135]
60 [19] for a detailed derivation), for which explicit formulae are available for the eigenvalues of the corresponding covariance operator when the underlying distribution is uniform. [sent-138, score-0.449]
61 This allows us to alleviate the issue of estimating the spectrum of the covariance operator, and weigh up the practical impact of the regularization on the power of our test statistic. [sent-139, score-0.361]
62 Periodic smoothing spline kernel Consider X as the two-dimensional circle identified with the interval [0, 1] (with periodicity conditions). [sent-140, score-0.252]
63 We consider the strictly positive sequence Kν = (2πν)−2m and the following norm: f, c0 2 f, cν 2 + f, sν 2 f 2 = + H K0 Kν ν>0 √ √ where cν (t) = 2 cos 2πνt and sν (t) = 2 sin 2πνt for ν ≥ 1 and c0 (t) = 1X . [sent-141, score-0.111]
64 This is always an RKHS norm associated with the following kernel (−1)m−1 K(s, t) = B2m ((s − t) − ⌊s − t⌋) (2m)! [sent-142, score-0.178]
65 We consider the following testing problem H0 : p1 = p2 HA : p2 = p2 with p1 the uniform density (i. [sent-145, score-0.113]
66 , the density with respect to the Lebesgue measure is equal to c0 ), and densities p2 = p1 (c0 + . [sent-147, score-0.088]
67 The covariance operator Σ(p1 ) has eigenvectors c0 , cν , sν with eigenvalues 0 for c0 and Kν for others. [sent-149, score-0.449]
68 All quantities involving the eigenvalues of the covariance operator were computed from their counterparts instead of being estimated. [sent-152, score-0.449]
69 2 Speaker verification We conducted experiments in a speaker verification task [3], on a subset of 8 female speakers using data from the NIST 2004 Speaker Recognition Evaluation. [sent-156, score-0.265]
70 For each couple of speaker, at each run we took 3000 samples of each speaker and launched our KFDA-test to decide whether samples come from the same speaker or not, and computed the type II error by comparing the prediction to ground truth. [sent-159, score-0.525]
71 05, since the empirical level seemed to match the prescribed for this value of the level as we noticed in previous subsection. [sent-162, score-0.162]
72 We performed the same experiments for the Maximum Mean Discrepancy and the Tajvidi-Hall test statistic (TH, [13]). [sent-163, score-0.273]
73 Our method reaches good empirical power for a small value of the prescribed level (1 − β = 90% for α = 0. [sent-165, score-0.217]
74 6 Conclusion We proposed a well-calibrated test statistic, built on kernel Fisher discriminant analysis, for which we proved that the asymptotic limit distribution under null hypothesis is standard normal distribution. [sent-168, score-0.973]
75 Our test statistic can be readily computed from Gram matrices once a kernel is defined, and 5 ROC Curve 1 Power 0. [sent-169, score-0.451]
76 5 Level Figure 1: Comparison of ROC curves in a speaker verification task allows us to perform nonparametric hypothesis testing for homogeneity for high-dimensional data. [sent-178, score-0.71]
77 The KFDA-test statistic yields competitive performance for speaker identification. [sent-179, score-0.411]
78 7 Sketch of proof of asymptotic normality under null hypothesis Outline. [sent-180, score-0.743]
79 The proof of the asymptotic normality of the test statistics under null hypothesis follows four steps. [sent-181, score-0.89]
80 As a first step, we derive an asymptotic approximation of the test statistics as γn + −1 ˆ γn n−1/2 → 0 , where the only remaining stochastic term is δ. [sent-182, score-0.292]
81 The test statistics is then spanned onto the eigenbasis of Σ, and decomposed into two terms Bn and Cn . [sent-183, score-0.147]
82 The second step allows to prove the asymptotic negligibility of Bn , while the third step establishes the asymptotic normality of Cn by a martingale central limit theorem (MCLT). [sent-184, score-0.727]
83 First, we may prove, using perturbation results of covariance −1 operators, that, as γn + γn n−1/2 → 0 , we have −1/2 ˆ (n1 n2 /n) (Σ + γI) δ √ Tn (γn ) = 2d2 (Σ, γ) 2 H − d1 (Σ, γ) + oP (1) . [sent-186, score-0.144]
84 (9) For ease of notation, in the following, we shall often omit Σ in quantities involving it. [sent-187, score-0.092]
85 Define n2 1/2 (1) (1) ep (Xi ) − E[ep (X1 )] 1 ≤ i ≤ n1 , def n1 n Yn,p,i = (10) 1/2 (2) (2) − n1 ep (Xi−n1 ) − E[ep (X1 )] n1 + 1 ≤ i ≤ n . [sent-189, score-0.43]
86 p q def Denote Sn,p = with def An = n1 n2 n (12) √ n −1 ˜ An i=1 Yn,p,i . [sent-192, score-0.608]
87 (11), our test statistics now writes as Tn = ( 2d2,n ) ˆ (Σ + γn I)−1/2 δ 2 ∞ − d1,n = (λp + γn ) p=1 −1 2 2 Sn,p − ESn,p = Bn + 2Cn . [sent-194, score-0.147]
88 (13) 6 where Bn and Cn are defined as follows ∞ def n 2 2 Yn,p,i − EYn,p,i Bn = p=1 i=1 ∞ def , n Cn = (λp + γn ) −1 p=1 Yn,p,i i=1 (14) i−1 Yn,p,j j=1 . [sent-195, score-0.608]
89 Since the variables n Yn,p,i and Yn,q,j are independent if i = j, then Var(Bn ) = i=1 vn,i , where ∞ def vn,i = Var p=1 2 2 (λp + γn )−1 {Yn,p,i − E[Yn,p,i ]} ∞ 2 2 (λp + γn )−1 (λq + γn )−1 Cov(Yn,p,i , Yn,q,i ) . [sent-198, score-0.304]
90 We use the central limit theorem (MCLT) for triangular arrays of 2,n martingale differences (see e. [sent-202, score-0.182]
91 , n, denote def ξn,i = ∞ d−1 2,n −1 (λp + γn ) Yn,p,i Mn,p,i−1 , where def i Mn,p,i = p=1 Yn,p,j , (16) j=1 and let Fn,i = σ (Yn,p,j , p ∈ {1, . [sent-209, score-0.608]
92 Note that, by construction, ξn,i is a martingale increment, i. [sent-216, score-0.099]
93 The first step in the proof of the CLT is to establish that n P s2 = n i=1 2 E ξn,i Fn,i−1 −→ 1/2 . [sent-219, score-0.118]
94 (17) The second step of the proof is to establish the negligibility condition. [sent-220, score-0.189]
95 We will establish the two conditions simultaneously by checking that 2 max ξn,i E = o(1) . [sent-223, score-0.104]
96 Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. [sent-253, score-0.231]
97 Integrating structured o biological data by kernel maximum mean discrepancy. [sent-278, score-0.178]
98 Permutation tests for equality of distributions in high-dimensional settings. [sent-334, score-0.131]
99 Feature space mahalanobis sequence kernels: Application to svm speaker verification. [sent-353, score-0.265]
100 An explicit description of the reproducing kernel hilbert spaces of gaussian RBF kernels. [sent-366, score-0.396]
wordName wordTfidf (topN-words)
[('def', 0.304), ('null', 0.251), ('speaker', 0.226), ('operator', 0.209), ('homogeneity', 0.197), ('tn', 0.19), ('statistic', 0.185), ('kernel', 0.178), ('hypothesis', 0.153), ('bn', 0.152), ('asymptotic', 0.145), ('covariance', 0.144), ('mmd', 0.142), ('normality', 0.141), ('cn', 0.137), ('fisher', 0.13), ('kfda', 0.123), ('op', 0.118), ('discriminant', 0.114), ('dn', 0.105), ('tests', 0.099), ('martingale', 0.099), ('eigenvalues', 0.096), ('en', 0.095), ('power', 0.094), ('shall', 0.092), ('hilbert', 0.09), ('test', 0.088), ('rue', 0.085), ('rkhs', 0.084), ('reproducing', 0.08), ('ha', 0.079), ('operators', 0.078), ('spline', 0.074), ('strictly', 0.072), ('testing', 0.071), ('mclt', 0.071), ('negligibility', 0.071), ('pf', 0.071), ('veri', 0.068), ('gretton', 0.068), ('establish', 0.065), ('nonparametric', 0.063), ('ep', 0.063), ('barrault', 0.062), ('element', 0.061), ('consistency', 0.06), ('paris', 0.059), ('statistics', 0.059), ('moments', 0.057), ('france', 0.057), ('rasch', 0.056), ('cedex', 0.056), ('proof', 0.053), ('ltci', 0.053), ('paristech', 0.053), ('telecom', 0.053), ('pooled', 0.053), ('respectively', 0.052), ('var', 0.051), ('separable', 0.05), ('prescribed', 0.05), ('cnrs', 0.05), ('hall', 0.049), ('dp', 0.049), ('spaces', 0.048), ('couples', 0.047), ('dr', 0.047), ('densities', 0.046), ('borgwardt', 0.045), ('periodic', 0.045), ('limit', 0.044), ('nitely', 0.043), ('prove', 0.043), ('density', 0.042), ('couple', 0.042), ('discrepancy', 0.042), ('goes', 0.041), ('sequel', 0.04), ('alternatives', 0.039), ('checking', 0.039), ('negligible', 0.039), ('cov', 0.039), ('sequence', 0.039), ('conducted', 0.039), ('theorem', 0.039), ('level', 0.039), ('spectrum', 0.035), ('arising', 0.034), ('bioinformatics', 0.034), ('empirical', 0.034), ('roc', 0.033), ('nonnegative', 0.033), ('said', 0.032), ('nally', 0.032), ('distributions', 0.032), ('moulines', 0.031), ('launched', 0.031), ('francis', 0.031), ('ulm', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
Author: Moulines Eric, Francis R. Bach, Zaïd Harchaoui
Abstract: We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial and real datasets is provided. 1
2 0.1825431 7 nips-2007-A Kernel Statistical Test of Independence
Author: Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Schölkopf, Alex J. Smola
Abstract: Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m2 ), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.
3 0.11696299 108 nips-2007-Kernel Measures of Conditional Dependence
Author: Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, Bernhard Schölkopf
Abstract: We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1
4 0.10605985 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan
Abstract: We develop and analyze an algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions. Our method is based on a variational characterization of f -divergences, which turns the estimation into a penalized convex risk minimization problem. We present a derivation of our kernel-based estimation algorithm and an analysis of convergence rates for the estimator. Our simulation results demonstrate the convergence behavior of the method, which compares favorably with existing methods in the literature. 1
5 0.093204573 156 nips-2007-Predictive Matrix-Variate t Models
Author: Shenghuo Zhu, Kai Yu, Yihong Gong
Abstract: It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate t model (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model selection to encourage sparse predictive models. Due to the non-conjugacy of its prior, it is difficult to make predictions by computing the mode or mean of the posterior distribution. We suggest an optimization method that sequentially minimizes a convex upper-bound of the log-likelihood, which is very efficient and scalable. The experiments on a toy data and EachMovie dataset show a good predictive accuracy of the model. 1
6 0.092785239 118 nips-2007-Learning with Transformation Invariant Kernels
7 0.091655344 59 nips-2007-Continuous Time Particle Filtering for fMRI
8 0.090051576 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes
9 0.088291116 43 nips-2007-Catching Change-points with Lasso
10 0.085437365 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
11 0.083071418 147 nips-2007-One-Pass Boosting
12 0.08201056 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
13 0.081712119 91 nips-2007-Fitted Q-iteration in continuous action-space MDPs
14 0.080937169 160 nips-2007-Random Features for Large-Scale Kernel Machines
15 0.080571406 195 nips-2007-The Generalized FITC Approximation
16 0.080513604 135 nips-2007-Multi-task Gaussian Process Prediction
17 0.079664886 185 nips-2007-Stable Dual Dynamic Programming
18 0.079365216 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
19 0.078103699 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections
20 0.076279663 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
topicId topicWeight
[(0, -0.214), (1, 0.016), (2, -0.06), (3, 0.111), (4, -0.017), (5, 0.056), (6, -0.005), (7, -0.06), (8, -0.206), (9, 0.036), (10, 0.058), (11, -0.041), (12, 0.038), (13, -0.001), (14, -0.057), (15, -0.188), (16, 0.224), (17, 0.034), (18, -0.079), (19, -0.038), (20, 0.129), (21, 0.128), (22, 0.025), (23, -0.145), (24, 0.049), (25, 0.055), (26, 0.113), (27, -0.061), (28, -0.009), (29, -0.004), (30, -0.001), (31, 0.034), (32, 0.106), (33, 0.013), (34, 0.004), (35, -0.027), (36, -0.03), (37, -0.017), (38, 0.142), (39, -0.087), (40, 0.103), (41, -0.035), (42, 0.007), (43, 0.056), (44, 0.049), (45, -0.084), (46, -0.043), (47, 0.128), (48, -0.007), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.97085923 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
Author: Moulines Eric, Francis R. Bach, Zaïd Harchaoui
Abstract: We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial and real datasets is provided. 1
2 0.77433532 7 nips-2007-A Kernel Statistical Test of Independence
Author: Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Schölkopf, Alex J. Smola
Abstract: Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m2 ), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.
3 0.71239763 108 nips-2007-Kernel Measures of Conditional Dependence
Author: Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, Bernhard Schölkopf
Abstract: We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1
4 0.56803221 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan
Abstract: We develop and analyze an algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions. Our method is based on a variational characterization of f -divergences, which turns the estimation into a penalized convex risk minimization problem. We present a derivation of our kernel-based estimation algorithm and an analysis of convergence rates for the estimator. Our simulation results demonstrate the convergence behavior of the method, which compares favorably with existing methods in the literature. 1
Author: Ping Li, Trevor J. Hastie
Abstract: Many tasks (e.g., clustering) in machine learning only require the lα distances instead of the original data. For dimension reductions in the lα norm (0 < α ≤ 2), the method of stable random projections can efficiently compute the lα distances in massive datasets (e.g., the Web or massive data streams) in one pass of the data. The estimation task for stable random projections has been an interesting topic. We propose a simple estimator based on the fractional power of the samples (projected data), which is surprisingly near-optimal in terms of the asymptotic variance. In fact, it achieves the Cram´ r-Rao bound when α = 2 and α = 0+. This e new result will be useful when applying stable random projections to distancebased clustering, classifications, kernels, massive data streams etc.
6 0.47204798 49 nips-2007-Colored Maximum Variance Unfolding
7 0.45266685 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
8 0.44623768 118 nips-2007-Learning with Transformation Invariant Kernels
9 0.44384599 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes
10 0.44352427 101 nips-2007-How SVMs can estimate quantiles and the median
11 0.44088542 184 nips-2007-Stability Bounds for Non-i.i.d. Processes
12 0.42645851 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes
13 0.42565659 156 nips-2007-Predictive Matrix-Variate t Models
14 0.41509485 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
15 0.4136028 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
16 0.40928474 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
17 0.38386607 43 nips-2007-Catching Change-points with Lasso
18 0.38300377 160 nips-2007-Random Features for Large-Scale Kernel Machines
19 0.38101867 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
20 0.37612718 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
topicId topicWeight
[(5, 0.033), (13, 0.026), (16, 0.033), (19, 0.018), (21, 0.07), (34, 0.016), (35, 0.023), (47, 0.063), (49, 0.471), (83, 0.108), (90, 0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.83140743 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
Author: Moulines Eric, Francis R. Bach, Zaïd Harchaoui
Abstract: We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial and real datasets is provided. 1
2 0.79920244 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers
Author: Kaihua Zhu, Hao Wang, Hongjie Bai, Jian Li, Zhihuan Qiu, Hang Cui, Edward Y. Chang
Abstract: Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a row-based, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let n denote the number of training instances, p the reduced matrix dimension after factorization (p is significantly smaller than n), and m the number of machines. PSVM reduces the memory requirement from O(n2 ) to O(np/m), and improves computation time to O(np2 /m). Empirical study shows PSVM to be effective. PSVM Open Source is available for download at http://code.google.com/p/psvm/.
3 0.73632234 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks
Author: Alex Graves, Marcus Liwicki, Horst Bunke, Jürgen Schmidhuber, Santiago Fernández
Abstract: In online handwriting recognition the trajectory of the pen is recorded during writing. Although the trajectory provides a compact and complete representation of the written output, it is hard to transcribe directly, because each letter is spread over many pen locations. Most recognition systems therefore employ sophisticated preprocessing techniques to put the inputs into a more localised form. However these techniques require considerable human effort, and are specific to particular languages and alphabets. This paper describes a system capable of directly transcribing raw online handwriting data. The system consists of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained online database, we record excellent results using either raw or preprocessed data, well outperforming a state-of-the-art HMM based system in both cases. 1
4 0.73565775 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
Author: Kristiaan Pelckmans, Johan Suykens, Bart D. Moor
Abstract: This paper1 explores the use of a Maximal Average Margin (MAM) optimality principle for the design of learning algorithms. It is shown that the application of this risk minimization principle results in a class of (computationally) simple learning machines similar to the classical Parzen window classifier. A direct relation with the Rademacher complexities is established, as such facilitating analysis and providing a notion of certainty of prediction. This analysis is related to Support Vector Machines by means of a margin transformation. The power of the MAM principle is illustrated further by application to ordinal regression tasks, resulting in an O(n) algorithm able to process large datasets in reasonable time. 1
5 0.43494317 7 nips-2007-A Kernel Statistical Test of Independence
Author: Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Schölkopf, Alex J. Smola
Abstract: Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m2 ), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.
6 0.4203665 108 nips-2007-Kernel Measures of Conditional Dependence
7 0.39577109 43 nips-2007-Catching Change-points with Lasso
8 0.38632527 40 nips-2007-Bundle Methods for Machine Learning
9 0.38369429 185 nips-2007-Stable Dual Dynamic Programming
10 0.3761431 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning
11 0.36617184 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
12 0.36285949 200 nips-2007-The Tradeoffs of Large Scale Learning
13 0.36087412 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
14 0.36000124 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
15 0.35927853 179 nips-2007-SpAM: Sparse Additive Models
16 0.35749099 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI
17 0.35732365 24 nips-2007-An Analysis of Inference with the Universum
18 0.35692453 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
19 0.35640013 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
20 0.3556875 56 nips-2007-Configuration Estimates Improve Pedestrian Finding