nips nips2007 nips2007-108 knowledge-graph by maker-knowledge-mining

108 nips-2007-Kernel Measures of Conditional Dependence

Source: pdf

Author: Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, Bernhard Schölkopf

Abstract: We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of inﬁnite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. [sent-11, score-0.361]

2 Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of inﬁnite data, for a wide class of kernels. [sent-12, score-0.353]

3 1 Introduction Measuring dependence of random variables is one of the main concerns of statistical inference. [sent-15, score-0.137]

4 A typical example is the inference of a graphical model, which expresses the relations among variables in terms of independence and conditional independence. [sent-16, score-0.175]

5 Independent component analysis employs a measure of independence as the objective function, and feature selection in supervised learning looks for a set of features on which the response variable most depends. [sent-17, score-0.144]

6 Kernel methods have been successfully used for capturing (conditional) dependence of variables [1, 5, 8, 9, 16]. [sent-18, score-0.16]

7 With the ability to represent high order moments, mapping of variables into reproducing kernel Hilbert spaces (RKHSs) allows us to infer properties of the distributions, such as independence and homogeneity [7]. [sent-19, score-0.284]

8 A drawback of previous kernel dependence measures, however, is that their value depends not only on the distribution of the variables, but also on the kernel, in contrast to measures such as mutual information. [sent-20, score-0.334]

9 In this paper, we propose to use the Hilbert-Schmidt norm of the normalized conditional crosscovariance operator, and show that this operator encodes the dependence structure of random variables. [sent-21, score-0.301]

10 Our criterion includes a measure of unconditional dependence as a special case. [sent-22, score-0.138]

11 We prove in the limit of inﬁnite data, under assumptions on the richness of the RKHS, that this measure has an explicit integral expression which depends only on the probability densities of the variables, despite being deﬁned in terms of kernels. [sent-23, score-0.051]

12 Furthermore, we provide a general formulation for 1 the “richness” of an RKHS, and a theoretically motivated kernel selection method. [sent-25, score-0.12]

13 2 Measuring conditional dependence with kernels The probability law of a random variable X is denoted by PX , and the space of the square integrable functions with probability P by L2 (P ). [sent-27, score-0.246]

14 The null space and the range of an operator T are written N (T ) and R(T ), respectively. [sent-29, score-0.131]

15 1 Dependence measures with normalized cross-covariance operators Covariance operators on RKHSs have been successfully used for capturing dependence and conditional dependence of random variables, by incorporating high order moments [5, 8, 16]. [sent-31, score-0.436]

16 Suppose we have a random variable (X, Y ) on X × Y, and RKHSs HX and HY on X and Y, respectively, with measurable positive deﬁnite kernels kX and kY . [sent-33, score-0.119]

17 The cross-covariance operator ΣY X : HX → HY is deﬁned by the unique bounded operator that satisﬁes g, ΣY X f HY = Cov[f (X), g(Y )] ( = E[f (X)g(Y )] − E[f (X)]E[g(Y )]) (1) for all f ∈ HX and g ∈ HY . [sent-36, score-0.262]

18 The operator ΣY X naturally extends the covariance matrix CY X on Euclidean spaces, and represents higher order correlations of X and Y through f (X) and g(Y ) with nonlinear kernels. [sent-38, score-0.158]

19 It is known [2] that the cross-covariance operator can be decomposed into the covariance of the marginals and the correlation; that is, there exists a unique bounded operator V Y X such that 1/2 1/2 ΣY X = ΣY Y VY X ΣXX , (2) R(VY X ) ⊂ R(ΣY Y ), and N (VY X )⊥ ⊂ R(ΣXX ). [sent-39, score-0.289]

20 The operator norm of VY X is less than or equal to 1. [sent-40, score-0.156]

21 We call VY X the normalized cross-covariance operator (NOCCO, see also [4]). [sent-41, score-0.131]

22 While the operator VY X encodes the same information regarding the dependence of X and Y as ΣY X , the former rather expresses the information more directly than ΣY X , with less inﬂuence of the marginals. [sent-42, score-0.244]

23 This relation can be understood as an analogue to the difference between the covariance Cov[X, Y ] and the correlation Cov[X, Y ]/(Var(X)Var(Y ))1/2 . [sent-43, score-0.05]

24 Note also that kernel canonical correlation analysis [1] uses the largest eigenvalue of VY X and its corresponding eigenfunctions [4]. [sent-44, score-0.169]

25 We then deﬁne the normalized conditional cross-covariance operator, VY X|Z = VY X − VY Z VZX , (3) for measuring the conditional dependence of X and Y given Z, where VY Z and VZX are deﬁned similarly to Eq. [sent-46, score-0.205]

26 The operator VY X|Z may be better understood by expressing it as −1/2 VY X|Z = ΣY Y −1/2 ΣY X − ΣY Z Σ−1 ΣZX ΣXX , ZZ where ΣY X|Z = ΣY X − ΣY Z Σ−1 ΣZX can be interpreted as a nonlinear extension of the condiZZ −1 tional covariance matrix CY X − CY Z CZZ CZX of Gaussian random variables. [sent-48, score-0.184]

27 The operator ΣY X can be used to determine the independence of X and Y : roughly speaking, ΣY X = O if and only if X⊥ . [sent-49, score-0.25]

28 Similarly, a relation between ΣY X|Z and conditional independence, ⊥Y ¨ ¨ X⊥ | Z, has been established in [5]: if the extended variables X = (X, Z) and Y = (Y, Z) are ⊥Y used, X⊥ | Z is equivalent to ΣX Y |Z = O. [sent-50, score-0.056]

29 2 ⊥Y ¨¨ Noting that the conditions ΣY X = O and ΣY X|Z = O are equivalent to VY X = O and VY X|Z = O, respectively, we propose to use the Hilbert-Schmidt norms of the latter operators as dependence 2 measures. [sent-52, score-0.163]

30 Recall that an operator A : H1 → H2 is called Hilbert-Schmidt if for complete orthonormal systems (CONSs) {φi } of H1 and {ψj } of H2 , the sum i,j ψj , Aφi 2 2 is ﬁnite (see H [13]). [sent-53, score-0.131]

31 For a Hilbert-Schmidt operator A, the Hilbert-Schmidt (HS) norm A HS is deﬁned by A 2 = i,j ψj , Aφi 2 2 . [sent-54, score-0.156]

32 HS A sufﬁcient condition that these operators are Hilbert-Schmidt will be discussed in Section 2. [sent-57, score-0.05]

33 The HS norm of the ﬁnite rank operator VY X|Z is easy to calculate. [sent-71, score-0.156]

34 The empirical dependence measures are then ˆCON D ≡ V (n) In ¨ ¨ Y X|Z 2 HS = Tr RY RX − 2RY RX RZ + RY RZ RX RZ , ¨ ¨ ¨ ¨ ¨ ¨ (7) (n) 2 ˆN In OCCO (X, Y ) ≡ VY X HS = Tr RY RX , (8) ˆCON D . [sent-73, score-0.192]

35 These empirical estimators, and use of εn , will be where the extended variables are used for In justiﬁed in Section 2. [sent-74, score-0.048]

36 2 Inference on probabilities by characteristic kernels To relate I N OCCO and I CON D with independence and conditional independence, respectively, the RKHS should contain a sufﬁciently rich class of functions to represent all higher order moments. [sent-78, score-0.301]

37 Similar notions have already appeared in the literature: universal kernel on compact domains [15] and Gaussian kernels on the entire Rm characterize independence via the cross-covariance operator [8, 1]. [sent-79, score-0.447]

38 We now discuss a uniﬁed class of kernels for inference on probabilities. [sent-80, score-0.077]

39 Let (X , B) be a measurable space, X a random variable on X , and (H, k) an RKHS on X satisfying assumption (A-1). [sent-81, score-0.042]

40 The kernel k is said to be characteristic1 if the map Mk is injective, or equivalently, if the condition EX∼P [f (X)] = EX∼Q [f (X)] (∀f ∈ H) implies P = Q. [sent-85, score-0.12]

41 √ T The notion of a characteristic kernel is an analogy to the characteristic function E P [e −1u X ], √ T which is the expectation of the Fourier kernel kF (x, u) = e −1u x . [sent-86, score-0.386]

42 Noting that mP = mQ iff EP [k(u, X)] = EQ [k(u, X)] for all u ∈ X , the deﬁnition of a characteristic kernel generalizes the well-known property of the characteristic function that EP [kF (u, X)] uniquely determines a Borel probability P on Rm . [sent-87, score-0.266]

43 The next lemma is useful to show that a kernel is characteristic. [sent-88, score-0.12]

44 1 Although the same notion was called probability-determining in [5], we call it ”characteristic” by analogy with the characteristic function. [sent-89, score-0.073]

45 Suppose that (H, k) is an RKHS on a measurable space (X , B) with k measurable and bounded. [sent-92, score-0.084]

46 If H + R (the direct sum of the two RKHSs) is dense in L q (X , P ) for any probability P on (X , B), the kernel k is characteristic. [sent-93, score-0.154]

48 For a compact metric space, it is easy to see that the RKHS given by a universal kernel [15] is dense in L2 (P ) for any P , and thus characteristic (see also [7] Theorem 3). [sent-99, score-0.227]

49 It is also important to consider kernels on non-compact spaces, since many standard random variables, such as Gaussian variables, are deﬁned on non-compact spaces. [sent-100, score-0.077]

50 The next theorem implies that many kernels on the entire Rm , including Gaussian and Laplacian, are characteristic. [sent-101, score-0.103]

51 Let φ(z) be a continuous positive function on Rm with the Fourier transform φ(u), m and k be a kernel of the form k(x, y) = φ(x − y). [sent-104, score-0.12]

52 If for any ξ ∈ R there exists τ0 such that ˜ φ(τ (u+ξ))2 du < ∞ for all τ > τ0 , then the RKHS associated with k is dense in L2 (P ) for any ˜ φ(u) Borel probability P on Rm . [sent-105, score-0.034]

53 Hence k is characteristic with respect to the Borel σ-ﬁeld. [sent-106, score-0.073]

54 The assumptions to relate the operators with independence are well described by using characteristic kernels and denseness. [sent-107, score-0.319]

55 In addition to (A-1), assume that the product kX kY is a ¨ ¨ characteristic kernel on (X × Z) × Y, and HZ + R is dense in L2 (PZ ). [sent-113, score-0.227]

56 ⊥Y From the above results, we can guarantee that VY X and VY X|Z will detect independence and condi¨ tional independence, if we use a Gaussian or Laplacian kernel either on a compact set or the whole of Rm . [sent-115, score-0.286]

57 3 Kernel-free integral expression of the measures A remarkable property of I N OCCO and I CON D is that they do not depend on the kernels under some assumptions, having integral expressions containing only the probability density functions. [sent-118, score-0.132]

58 Let µX and µY be measures on X and Y, respectively, and assume that the probabilities PXY and EZ [PX|Z ⊗ PY |Z ] are absolutely continuous with respect to µX × µY with probability density functions pXY and pX⊥ |Z , respectively. [sent-121, score-0.055]

59 While the empirical estimate from ﬁnite samples depends on the choice of kernels, it is a desirable property for the empirical dependence measure to converge to a value that depends only on the distributions of the variables. [sent-134, score-0.186]

60 (9) shows that, under the assumptions, I N OCCO is equal to the mean square contingency, a well-known dependence measure[14] commonly used for discrete variables. [sent-136, score-0.137]

61 4, In OCCO works as a consistent kernel estimator of the mean square contingency. [sent-138, score-0.144]

62 (9) can be compared with the mutual information, M I(X, Y ) = pXY (x, y) log X ×Y pXY (x, y) dµX dµY . [sent-140, score-0.046]

63 pX (x)pY (y) Both the mutual information and the mean square contingency are nonnegative, and equal to zero if and only if X and Y are independent. [sent-141, score-0.102]

64 While the mutual information is the best known dependence measure, its ﬁnite sample empirical estimate is not straightforward, especially for continuous variables. [sent-143, score-0.183]

65 4 Consistency of the measures It is important to ask whether the empirical measures converge to the population value I CON D and I N OCCO , since this provides a theoretical justiﬁcation for the empirical measures. [sent-146, score-0.158]

66 It is known [4] (n) that VY X converges in probability to VY X in operator norm. [sent-147, score-0.131]

67 Although the proof is analogous to the case of operator norm, it is more involved to discuss the HS norm. [sent-149, score-0.131]

68 5 Choice of kernels ˆN ˆCON D are dependent on the As with all empirical measures, the sample estimates In OCCO and In kernel, and the problem of choosing a kernel has yet to be solved. [sent-156, score-0.221]

69 Unlike supervised learning, there are no easy criteria to choose a kernel for dependence measures. [sent-157, score-0.233]

70 We propose a method of choosing a kernel by considering the large sample behavior. [sent-158, score-0.12]

71 The basic idea is that a kernel should be chosen so that the covariance operator detects independence of variables as effectively as possible. [sent-160, score-0.447]

72 It has been recently shown [10], under the independence of 5 4 2 2 0 1. [sent-161, score-0.119]

73 Right: The marks ”o” and ”+” ˆN show In OCCO for each angle and the 95th percentile of the permutation test, respectively. [sent-173, score-0.085]

74 We choose a HS HS kernel so that the bootstrapped variance VarB [nHSIC] of nHSIC is close to this theoretical limit variance. [sent-175, score-0.12]

75 We can expect that the chosen kernel uses the data effectively. [sent-182, score-0.12]

76 the next section we see that the method gives a reasonable result for In OCCO and In 3 Experiments To evaluate the dependence measures, we use a permutation test of independence for data sets with various degrees of dependence. [sent-184, score-0.297]

77 In the following experiments, we always use Gaussian kernels 2 1 e− 2σ2 x1 −x2 and choose σ by the method proposed in Section 2. [sent-197, score-0.077]

78 The random variables X (0) and Y (0) are independent and uniformly distributed on [−2, 2] and [a, b] ∪ [−b, −a], respectively, so that (X (0) , Y (0) ) has a scalar covariance matrix. [sent-200, score-0.051]

79 (n) ˆN We perform permutation tests with In OCCO , HSIC = ΣY X 2 , and the mutual information HS (MI). [sent-204, score-0.154]

80 Since In OCCO is an estimate of the mean square contingency, we also apply a relevant contingency-table-based independence test ([12]), partitioning the variables ˆN into bins. [sent-206, score-0.167]

81 We evaluate a chaotic time series derived from the coupled H´ non map. [sent-219, score-0.066]

82 5 0 0 0 0 0 0 0 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 0 0 0 Table 1: Comparison of dependence measures. [sent-240, score-0.113]

83 The number of times independence is accepted out of 100 permutation tests is shown. [sent-241, score-0.25]

84 ”Median” is a heuristic method [8] which chooses σ as the median of pairwise distances of the data. [sent-246, score-0.045]

85 (c,d) examples of In the threshholds of the permutation test with signiﬁcance level 5% (black ”+”). [sent-262, score-0.065]

86 shows the results of permutation tests of independence for the instantaneous pairs (X(t), Y (t)) 100 . [sent-263, score-0.227]

87 ˆCON D to detect the causal structure of the same time series. [sent-265, score-0.05]

88 ⊥Y ⊥X ˆCON D detects the small causal inﬂuence from Xt to Yt+1 for In Table 3, it is remarkable that In γ ≥ 0. [sent-269, score-0.055]

89 The data consist of three variables, creatinine clearance (C), digoxin clearance (D), urine ﬂow (U). [sent-273, score-0.06]

90 Table 4 shows the results of the permutation tests and a comparison with the linear method. [sent-279, score-0.108]

91 6 0 0 0 0 0 Table 2: Results for the independence tests for the chaotic time series. [sent-288, score-0.228]

92 The number of times independence was accepted out of 100 permutation tests is shown. [sent-289, score-0.25]

93 6 0 1 Table 3: Results of the permutation test of non-causality for the chaotic time series. [sent-305, score-0.131]

94 The number of times non-causality was accepted out of 100 tests is shown. [sent-306, score-0.066]

95 4 Concluding remarks There are many dependence measures, and further theoretical and experimental comparison is important. [sent-326, score-0.113]

96 That said, one unambiguous strength of the kernel measure we propose is its kernel-free population expression. [sent-327, score-0.145]

97 It is interesting to ask if other classical dependence measures, such as the mutual information, can be estimated by kernels (in a broader sense than the expansion about independence of [9]). [sent-328, score-0.384]

98 A relevant measure is the kernel generalized variance (KGV [1]), which is based on a sum of the logarithm of the eigenvalues of VY X , while I N OCCO is their squared sum. [sent-329, score-0.145]

99 Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. [sent-363, score-0.141]

100 On the inﬂuence of the kernel on the consistency of support vector machines. [sent-436, score-0.12]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vy', 0.648), ('occo', 0.425), ('hs', 0.177), ('px', 0.176), ('py', 0.174), ('hy', 0.145), ('operator', 0.131), ('hx', 0.121), ('kernel', 0.12), ('independence', 0.119), ('dependence', 0.113), ('kx', 0.113), ('vzx', 0.106), ('pxy', 0.09), ('kernels', 0.077), ('nhsic', 0.076), ('ry', 0.076), ('rz', 0.076), ('characteristic', 0.073), ('hsic', 0.072), ('rx', 0.072), ('rkhs', 0.072), ('xx', 0.071), ('chaotic', 0.066), ('permutation', 0.065), ('ky', 0.064), ('mx', 0.061), ('fukumizu', 0.061), ('measures', 0.055), ('yt', 0.054), ('mi', 0.054), ('operators', 0.05), ('gretton', 0.048), ('rkhss', 0.048), ('xt', 0.048), ('mutual', 0.046), ('con', 0.046), ('corr', 0.046), ('gz', 0.046), ('scholkopf', 0.046), ('spemannstra', 0.046), ('median', 0.045), ('tests', 0.043), ('measurable', 0.042), ('borel', 0.04), ('gx', 0.04), ('mp', 0.039), ('rm', 0.035), ('dense', 0.034), ('gy', 0.034), ('cy', 0.034), ('conditional', 0.032), ('contingency', 0.032), ('bach', 0.032), ('clearance', 0.03), ('conss', 0.03), ('kgv', 0.03), ('nocco', 0.03), ('ppxyy', 0.03), ('thresh', 0.03), ('varb', 0.03), ('ez', 0.03), ('causal', 0.029), ('expansion', 0.029), ('measuring', 0.028), ('covariance', 0.027), ('xp', 0.026), ('detects', 0.026), ('eigenfunctions', 0.026), ('mq', 0.026), ('pz', 0.026), ('richness', 0.026), ('tional', 0.026), ('zx', 0.026), ('theorem', 0.026), ('bingen', 0.026), ('mk', 0.026), ('cov', 0.025), ('measure', 0.025), ('norm', 0.025), ('variables', 0.024), ('kz', 0.024), ('zz', 0.024), ('square', 0.024), ('table', 0.024), ('medical', 0.024), ('empirical', 0.024), ('capturing', 0.023), ('correlation', 0.023), ('accepted', 0.023), ('kf', 0.023), ('respectively', 0.022), ('cybernetics', 0.022), ('hz', 0.022), ('laplacian', 0.021), ('detect', 0.021), ('reproducing', 0.021), ('angle', 0.02), ('bins', 0.02), ('germany', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 108 nips-2007-Kernel Measures of Conditional Dependence

Author: Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, Bernhard Schölkopf

2 0.20429285 7 nips-2007-A Kernel Statistical Test of Independence

Author: Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Schölkopf, Alex J. Smola

Abstract: Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically signiﬁcant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m2 ), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.

3 0.15479878 184 nips-2007-Stability Bounds for Non-i.i.d. Processes

Author: Mehryar Mohri, Afshin Rostamizadeh

Abstract: The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are designed for speciﬁc learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems. This paper studies the scenario where the observations are drawn from a stationary mixing sequence, which implies a dependence between observations that weaken over time. It proves novel stability-based generalization bounds that hold even with this more general setting. These bounds strictly generalize the bounds given in the i.i.d. case. It also illustrates their application in the case of several general classes of learning algorithms, including Support Vector Regression and Kernel Ridge Regression.

4 0.11696299 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

Author: Moulines Eric, Francis R. Bach, Zaïd Harchaoui

Abstract: We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against ﬁxed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artiﬁcial and real datasets is provided. 1

5 0.088784881 118 nips-2007-Learning with Transformation Invariant Kernels

Author: Christian Walder, Olivier Chapelle

Abstract: This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial positive deﬁnite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive deﬁnite (c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm for arbitrary c.p.d. kernels. For the thinplate kernel this leads to a classiﬁer with only one parameter (the amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even though the Gaussian involves a second parameter (the length scale). 1

6 0.069195978 49 nips-2007-Colored Maximum Variance Unfolding

7 0.064728506 101 nips-2007-How SVMs can estimate quantiles and the median

8 0.059122406 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

9 0.058930092 109 nips-2007-Kernels on Attributed Pointsets with Applications

10 0.05379511 160 nips-2007-Random Features for Large-Scale Kernel Machines

11 0.053569168 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

12 0.050002988 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

13 0.049505949 146 nips-2007-On higher-order perceptron algorithms

14 0.047957875 185 nips-2007-Stable Dual Dynamic Programming

15 0.047788717 21 nips-2007-Adaptive Online Gradient Descent

16 0.046073146 140 nips-2007-Neural characterization in partially observed populations of spiking neurons

17 0.044948041 91 nips-2007-Fitted Q-iteration in continuous action-space MDPs

18 0.042930946 135 nips-2007-Multi-task Gaussian Process Prediction

19 0.039520942 164 nips-2007-Receptive Fields without Spike-Triggering

20 0.039312657 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.13), (1, 0.009), (2, -0.025), (3, 0.079), (4, -0.006), (5, -0.017), (6, 0.005), (7, 0.003), (8, -0.152), (9, 0.042), (10, 0.103), (11, -0.001), (12, 0.054), (13, 0.039), (14, -0.111), (15, -0.152), (16, 0.174), (17, -0.015), (18, -0.032), (19, 0.074), (20, 0.238), (21, 0.123), (22, 0.059), (23, -0.087), (24, 0.006), (25, 0.078), (26, 0.188), (27, -0.047), (28, 0.044), (29, -0.039), (30, -0.026), (31, -0.009), (32, 0.044), (33, -0.036), (34, 0.032), (35, -0.152), (36, -0.147), (37, 0.072), (38, 0.013), (39, 0.014), (40, 0.093), (41, 0.047), (42, -0.034), (43, -0.007), (44, 0.057), (45, 0.094), (46, 0.141), (47, -0.088), (48, -0.015), (49, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95518655 108 nips-2007-Kernel Measures of Conditional Dependence

Author: Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, Bernhard Schölkopf

2 0.84946018 7 nips-2007-A Kernel Statistical Test of Independence

Author: Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Schölkopf, Alex J. Smola

3 0.64210284 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

Author: Moulines Eric, Francis R. Bach, Zaïd Harchaoui

4 0.57543886 184 nips-2007-Stability Bounds for Non-i.i.d. Processes

Author: Mehryar Mohri, Afshin Rostamizadeh

5 0.54304004 49 nips-2007-Colored Maximum Variance Unfolding

Author: Le Song, Arthur Gretton, Karsten M. Borgwardt, Alex J. Smola

Abstract: Maximum variance unfolding (MVU) is an effective heuristic for dimensionality reduction. It produces a low-dimensional representation of the data by maximizing the variance of their embeddings while preserving the local distances of the original data. We show that MVU also optimizes a statistical dependence measure which aims to retain the identity of individual observations under the distancepreserving constraints. This general view allows us to design “colored” variants of MVU, which produce low-dimensional representations for a given task, e.g. subject to class labels or other side information. 1

6 0.42260554 101 nips-2007-How SVMs can estimate quantiles and the median

7 0.41712016 118 nips-2007-Learning with Transformation Invariant Kernels

8 0.35031545 109 nips-2007-Kernels on Attributed Pointsets with Applications

9 0.29583722 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

10 0.28796414 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

11 0.27222162 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

12 0.26611817 99 nips-2007-Hierarchical Penalization

13 0.24342415 15 nips-2007-A general agnostic active learning algorithm

14 0.23559482 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

15 0.23406591 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

16 0.23237012 160 nips-2007-Random Features for Large-Scale Kernel Machines

17 0.22338587 46 nips-2007-Cluster Stability for Finite Samples

18 0.21865633 91 nips-2007-Fitted Q-iteration in continuous action-space MDPs

19 0.21209037 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

20 0.21100003 185 nips-2007-Stable Dual Dynamic Programming

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.054), (13, 0.03), (16, 0.03), (19, 0.015), (21, 0.046), (31, 0.012), (34, 0.015), (35, 0.02), (46, 0.369), (47, 0.038), (49, 0.068), (55, 0.016), (83, 0.095), (85, 0.029), (90, 0.072)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.75189376 81 nips-2007-Estimating disparity with confidence from energy neurons

Author: Eric K. Tsang, Bertram E. Shi

Abstract: The peak location in a population of phase-tuned neurons has been shown to be a more reliable estimator for disparity than the peak location in a population of position-tuned neurons. Unfortunately, the disparity range covered by a phasetuned population is limited by phase wraparound. Thus, a single population cannot cover the large range of disparities encountered in natural scenes unless the scale of the receptive fields is chosen to be very large, which results in very low resolution depth estimates. Here we describe a biologically plausible measure of the confidence that the stimulus disparity is inside the range covered by a population of phase-tuned neurons. Based upon this confidence measure, we propose an algorithm for disparity estimation that uses many populations of high-resolution phase-tuned neurons that are biased to different disparity ranges via position shifts between the left and right eye receptive fields. The population with the highest confidence is used to estimate the stimulus disparity. We show that this algorithm outperforms a previously proposed coarse-to-fine algorithm for disparity estimation, which uses disparity estimates from coarse scales to select the populations used at finer scales and can effectively detect occlusions.

same-paper 2 0.73306024 108 nips-2007-Kernel Measures of Conditional Dependence

Author: Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, Bernhard Schölkopf

3 0.70958984 199 nips-2007-The Price of Bandit Information for Online Optimization

Author: Varsha Dani, Sham M. Kakade, Thomas P. Hayes

Abstract: In the online linear optimization problem, a learner must choose, in each round, a decision from a set D ⊂ Rn in order to minimize an (unknown and changing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting. For the full informa√ tion case, the upper bound on the regret is O∗ ( nT ), where n is the ambient dimension and T is the time horizon. For the bandit case, we present an algorithm √ which achieves O∗ (n3/2 T ) regret — all previous (nontrivial) bounds here were O(poly(n)T 2/3 ) or worse. It is striking that the convergence rate for the bandit setting is only a factor of n worse than in the full information case — in stark contrast to the K-arm bandit setting, where the gap in the dependence on K is √ √ exponential ( T K√ vs. T log K). We also present lower bounds showing that this gap is at least n, which we conjecture to be the correct order. The bandit algorithm we present can be implemented efﬁciently in special cases of particular interest, such as path planning and Markov Decision Problems. 1

4 0.50892836 140 nips-2007-Neural characterization in partially observed populations of spiking neurons

Author: Jonathan W. Pillow, Peter E. Latham

Abstract: Point process encoding models provide powerful statistical methods for understanding the responses of neurons to sensory stimuli. Although these models have been successfully applied to neurons in the early sensory pathway, they have fared less well capturing the response properties of neurons in deeper brain areas, owing in part to the fact that they do not take into account multiple stages of processing. Here we introduce a new twist on the point-process modeling approach: we include unobserved as well as observed spiking neurons in a joint encoding model. The resulting model exhibits richer dynamics and more highly nonlinear response properties, making it more powerful and more ﬂexible for ﬁtting neural data. More importantly, it allows us to estimate connectivity patterns among neurons (both observed and unobserved), and may provide insight into how networks process sensory input. We formulate the estimation procedure using variational EM and the wake-sleep algorithm, and illustrate the model’s performance using a simulated example network consisting of two coupled neurons.

5 0.46456295 189 nips-2007-Supervised Topic Models

Author: Jon D. Mcauliffe, David M. Blei

Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the ﬁtted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the beneﬁts of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1

6 0.41520101 7 nips-2007-A Kernel Statistical Test of Independence

7 0.37123027 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

8 0.3707816 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers

9 0.36904669 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

10 0.36695266 5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

11 0.36407498 194 nips-2007-The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

12 0.36396843 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

13 0.3634007 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks

14 0.35922351 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

15 0.3584885 70 nips-2007-Discriminative K-means for Clustering

16 0.35730222 185 nips-2007-Stable Dual Dynamic Programming

17 0.35709575 156 nips-2007-Predictive Matrix-Variate t Models

18 0.35630113 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

19 0.35602146 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

20 0.35438111 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs