nips nips2012 nips2012-175 knowledge-graph by maker-knowledge-mining

175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Source: pdf

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

Abstract: We propose an efﬁcient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efﬁcient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il Abstract We propose an efﬁcient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. [sent-10, score-0.182]

2 Our work is motivated by the need to detect changes in data streams, and the test is especially efﬁcient in this context. [sent-12, score-0.138]

3 We provide the theoretical foundations of our test and show its superiority over existing methods. [sent-13, score-0.13]

4 1 Introduction The Kolmogorov-Smirnov (KS) test is efﬁcient, simple, and often considered the choice method for comparing distributions. [sent-14, score-0.091]

5 The goal of the KS test is to determine whether F = F . [sent-25, score-0.091]

6 However, nonparametric extensions of this test to high-dimensional data are hard to deﬁne since there are 2d−1 ways to represent a d-dimensional distribution by a CDF. [sent-27, score-0.141]

7 Indeed, due to this limitation, several extensions of the KS test to more than one dimension have been proposed [17, 9] but their practical applications are mostly limited to a few dimensions. [sent-28, score-0.091]

8 One prominent approach of generalizing the KS test to beyond one-dimensional data is that of Polonik [18]. [sent-29, score-0.091]

9 It is based on a generalized quantile transform to a set of high-density hierarchical regions. [sent-30, score-0.189]

10 Polonik’s transform is based on a density estimation over X . [sent-32, score-0.143]

11 It maps the input quantile in [0, 1] to a level-set of the estimated density such that the expected probability of feature vectors to lie within it is equal to its associated quantile. [sent-33, score-0.181]

12 The expected plots are the quantiles, and the empirical plots are fractions of examples in X that lie within each mapped region. [sent-34, score-0.142]

13 Polonik’s approach can handle multivariate data, but is hard to apply in high-dimensional or smallsample-sized settings where a reliable density estimation is hard. [sent-35, score-0.139]

14 However, instead of a density estimator, we use a novel hierarchical minimumvolume sets estimator to estimate the set of high-density regions directly. [sent-37, score-0.295]

15 Because the estimation of such regions is intrinsically simpler than density estimation, our test is more accurate than densityestimation approaches. [sent-38, score-0.281]

16 In addition, whereas Polonik’s work was largely theoretical, we take a practical approach and empirically show the superiority of our test over existing nonparametric tests in realistic, high-dimensional data. [sent-39, score-0.285]

17 We present here a novel method for approximate MV-sets estimation that guarantees the hierarchy, thus allowing the KS test to be generalized to high dimensions. [sent-42, score-0.152]

18 We test our method with two types of estimators: one-class SVMs (OCSVMs) and one-class neighbor machines (OCNMs). [sent-44, score-0.125]

19 While the statistical test introduced in this paper traces distributional changes in high dimensional data in general, it is effective in particular for change detection in data streams. [sent-45, score-0.283]

20 process control) work in dynamic environments where streams of multivariate data are collected over time, during which unanticipated distributional changes in data streams might prevent the proper operation of these applications. [sent-48, score-0.24]

21 We extensively evaluate our test on a collection of change-detection tasks. [sent-52, score-0.091]

22 We also show that our proposed test can be used for the classical setting of the two-sample problem using symmetric and asymmetric variations of our test. [sent-53, score-0.091]

23 2 Learning Hierarchical High-Density Regions Our approach for generalizing the KS test is based on estimating a hierarchical set of MV-sets in input space. [sent-54, score-0.15]

24 Following the notion of multivariate quantiles [8], let X = {x1 , . [sent-56, score-0.122]

25 Polonik introduced a new approach that uses a hierarchical set of MV-sets to generalize the KS test beyond one dimension. [sent-68, score-0.15]

26 Assume F has a density function f with respect to λ, and let Lf (c) = {x : f (x) ≥ c} be the level set of f at level c. [sent-69, score-0.083]

27 Hence, a density estimator was used to deﬁne a family of MV-sets {C(α), α ∈ [0, 1]} such that a hierarchy constraint C(α) ⊂ C(β) is satisﬁed for 0 ≤ α < β ≤ 1. [sent-73, score-0.207]

28 However, since a density estimation is hard to apply in high-dimensional data, a more practical solution is proposed. [sent-75, score-0.116]

29 Instead of basing our method on the products of a density estimation method, we introduce a novel nonparametric method, which uses MV-set estimators (OCSVM and OCNM) as a basic component, to estimate hierarchical MV-sets without the need for a density estimation step. [sent-76, score-0.377]

30 Let H be a hypothesis space of half-space decision functions fC (x) = sgn ((w · Φ(x)) − ρ) such that fC (x) = +1 if x ∈ C, and −1 otherwise. [sent-80, score-0.086]

31 (w · Φ (xi )) ≥ ρ − ξi , ξi ≥ 0, (2) i where ξ is the vector of the slack variables, and 0 < ν < 1 is a regularization parameter related to the proportion of outliers in the training data. [sent-83, score-0.123]

32 The following o statements hold: (1) ν is an upper bound on the fraction of outliers. [sent-90, score-0.102]

33 This theorem shows that we can use OCSVMs to estimate high-density regions in the input space while bounding the number of examples in X lying outside these regions. [sent-98, score-0.109]

34 In the following algorithm, we propose a modiﬁed construction of these regions such that both the hierarchical constraint and the density assumption (Theorem 1) will hold for each region. [sent-102, score-0.242]

35 Given X and a kernel function k (·, ·), our hierarchical MV-sets estimator iteratively trains a set of q OCSVMs, one for each quantile, ˆ ˆ and returns a set of decision functions, fC(α1 ) , . [sent-107, score-0.206]

36 , fC(αq ) that satisfy both hierarchy and density requirements. [sent-110, score-0.128]

37 Let fC(αi ) , SVbi be the decision function and the calculated outliers q (bounded SVs) of the OCSVM trained for the i-th quantile. [sent-113, score-0.139]

38 At each iteration, Di contains examples in X that were not classiﬁed as outliers in previous iterations (not in Oi+1 ). [sent-115, score-0.136]

39 In addition, ν is set to the required fraction of outliers over Di that will keep the total fraction of ˆ outliers over X equal to 1 − αi . [sent-116, score-0.266]

40 After each iteration, fC(αi ) corresponds to the intersection between the region associated with the previous decision function and the half-space associated with the ˆ current learned OCSVM. [sent-117, score-0.09]

41 The outliers in Oi are points that lie strictly outside the constructed region. [sent-119, score-0.124]

42 , fC(α ) 1 q ˆ ˆ The following theorem shows that the regions speciﬁed by the decision functions fC(α1 ) , . [sent-130, score-0.112]

43 Then, by the induction hypothesis, statements (2)-(3) hold for the ﬁrst n − 1 iterations over the αq , . [sent-171, score-0.096]

44 ˆ ˆ We Time prove that statements (2)-(3) hold for fC(αq−n ) in the next iteration. [sent-175, score-0.096]

45 , xm  n ˆ −1 implies fC(αq−n ) (x) = −1, Oq−n+1 are outliers with respect to fC(αq−n ) . [sent-194, score-0.145]

46 Hence, following Theorem 1, the total proportion of outliers with respect to Testing sets X is |Oq−n | = |SVbq−n | + |Oq−n+1 | ≤ ν|Di | + |Oq−n+1 | = (1 − αq−n )|X |, and |SVubq−n | + |O |SVub | |+|Oq−n | q−n q−n |Oq−n+1 | ≥ (1 − αq−n )|X |. [sent-202, score-0.123]

47 Examples xi in the input space and their mapped vectors φ(xi ) in F ˆ are contained in the same relative regions in both spaces. [sent-209, score-0.103]

48 2 Learning Minimum-Volume Sets with One-Class Neighbor Machine Estimators OCNM [15] is as an alternative method to the OCSVM estimator for ﬁnding regions close to C(α). [sent-212, score-0.153]

49 In contrast to OCSVMs, when OCNMs are iteratively trained on X using a growing sequence of ν values, outliers need not be moved from previous iterations to ensure that the ν-property will hold for each decision function. [sent-227, score-0.165]

50 Since Theorem 2 relies on the ν-property of the estimator, it can be shown that similar statements to those of Theorem 2 also hold when OCNM is used. [sent-229, score-0.096]

51 As previously discussed, since the estimation of MV-sets is simpler than density estimation, our test can achieve higher accuracy than approaches based on density estimation. [sent-230, score-0.29]

52 We use the OCNM 2 2 and OCSVM versions of our estimator to approximate hierarchical MV-sets for qα = 9 quantiles: α = 0. [sent-242, score-0.171]

53 The advantages of our approach can easily be seen: both versions of our estimator preform notably better, especially for small sample sizes. [sent-251, score-0.112]

54 3 Generalized Kolmogorov-Smirnov Test We now introduce a nonparametric, generalized Kolmogorov-Smirnov (GKS) statistical test for determining whether F = F in high-dimensional data. [sent-252, score-0.119]

55 Under the null hypothesis, assume F = F and let F −1 be a quantile transform of F , i. [sent-258, score-0.102]

56 In practice, when |X | is ﬁnite, the expected proportion of examples that lie and marked as C 3 ˆ Note that intersection is still needed (Algorithm 1, line 10) to ensure the hierarchical property on C(αi ). [sent-270, score-0.163]

57 Our ﬁnal test statistic is Fd C1 O C2ˆ Tn,m ˆ ˆ ˆ = sup Fn (CX (αi )) − Fm (CX (αi )) , (7) 1≤i≤q x1 ˆ ˆ ˆ where Fn (CX (αi )) is the estimate of Fn (CX (αi )). [sent-273, score-0.116]

58 The two-sample KS statistical test is used over F1 ˆ Tn,m to calculate the resulting p-value. [sent-274, score-0.091]

59 The test deﬁned above works only in one direction by predicting whether distributions of the samples share the same “concentrations” as regions estimated according to X , and not according to X . [sent-275, score-0.165]

60 We may symmetrize it by running the non-symmetric test twice, once in each direction, and return twice hj their minimum p-value (Bonferroni correction). [sent-276, score-0.143]

61 Note that by doing so in the context of a change hj j detection task, we pay in runtime 1 required for learning MV-sets for each X . [sent-277, score-0.119]

62 ptop 4 j ptop1 Empirical Evaluation We ﬁrst evaluated our test on concept-drift detection problems in data-stream classiﬁcation tasks. [sent-278, score-0.178]

63 Concept drifts are associated with distributional changes inj data streams that occur due to hidden psv context [22] — changes of which the classiﬁer is unaware. [sent-279, score-0.249]

64 Stawith radius 1 tistical tests were evaluated with X and all possible X windows. [sent-292, score-0.105]

65 We compare our one-directional (GKS1d ) and two-directional (GKS2d ) tests to the following 5 reference tests: kdq-tree test (KDQ) [4], Metavariable Wald-Wolfowitz test (WW) [10], Kernel change detection (KCD) [5], Maximum mean discrepancy test (MMD) [12], and PAC-Bayesian margin test (PBM) [6]. [sent-324, score-0.626]

66 The implementation of MMD test provided by the authors 5 was used with default parameters (RBF kernels with automatic kernel width detection) and Rademacher bounds. [sent-327, score-0.121]

67 Note that we cannot compare our test to Polonik’s test since density estimations and level-sets extractions are not practically feasible on high-dimensional data. [sent-329, score-0.33]

68 7 Results For better visualization, results are shown in two separate ﬁgures: Figure 3 shows the precisionrecall plots averaged over the 33 experiments for the OCSVM version of our tests, and the 5 reference tests. [sent-384, score-0.15]

69 Figure 4 shows the precision-recall plots averaged over the 33 experiments for the OCSVM and OCNM versions of our tests. [sent-385, score-0.099]

70 In terms of their break even point (BEP) measures – the points at which precision equals recall – GKS1d outperforms the other 5 reference tests with a BEP of 0. [sent-392, score-0.223]

71 Mean precisions for each dataset were compared using the Wilcoxon statistical test with α = 0. [sent-395, score-0.091]

72 Although the plots for our GKS1d (OCSVM) test (Figure 4) look better than GKS2d , no signiﬁcant difference was found. [sent-400, score-0.133]

73 This result is consistent with previous studies which claim that variants of solutions whose goal is to make the tests more symmetric have empirically shown no conclusive superiority [4]. [sent-401, score-0.144]

74 We also found that the GKS1d (OCSVM) version of our test has the least runtime and scales well with dimensionality, while the GKS1d (OSNM) version suffers from increased time complexity, especially in high dimensions, due to its expensive neighborhood measure. [sent-402, score-0.129]

75 As opposed to the KCD, and, PBM, tests, our GKS1d test need not be retrained on each X . [sent-404, score-0.091]

76 2 Topic Change Detection among Documents We evaluated our test on an additional setup of high-dimensionality problems pertaining to the detection of topic changes in streams of documents. [sent-411, score-0.268]

77 Once again, our GKS1d test dominates the others with the best precision-recall compromise. [sent-429, score-0.091]

78 With regard to BEP values, GKS1d outperforms the other reference tests with a BEP of 0. [sent-430, score-0.151]

79 70 precision on average), while its second best competitor (MMD) does so with a BEP of 0. [sent-432, score-0.085]

80 According to the Wilcoxon statistical test with α = 0. [sent-435, score-0.091]

81 5 Related Work Our proposed test belongs to a family of nonparametric tests for detecting change in multivariate data that compare distributions without the intermediate density estimation step. [sent-437, score-0.412]

82 Our reference tests were thus taken from this family of studies. [sent-438, score-0.151]

83 The kdq-tree test (KDQ) [4] uses a spatial scheme (called kdq-tree) to partition the data into small cells. [sent-439, score-0.091]

84 A permutation (bootstrapping) test [7] is used to calculate the signiﬁcant difference (p-value). [sent-441, score-0.091]

85 The metavariable WaldWolfowitz test (WW) [10] measures the differences between two samples according to the minimum spanning tree in the graph of distances between all pairs in both samples. [sent-442, score-0.18]

86 Then, the Wald-Wolfowitz test statistics are computed over the number of components left in the graph after removing edges between examples of different samples. [sent-443, score-0.126]

87 The kernel change detection (KCD) [5] measures the distance between two samples according to a “Fisher-like” distance between samples. [sent-444, score-0.139]

88 The maximum mean discrepancy test (MMD) [12] meausres discrepancy according to a complete matrix of kernel-based dissimilarity measures between all examples, and test statistics are then computed. [sent-446, score-0.27]

89 (5) The PAC-Bayesian margin test (PBM) [6] measures the distance between two samples according to the average margins of a linear SVM classiﬁer between the samples, and test statistics are computed. [sent-447, score-0.21]

90 As discussed in detail before, our test follows the general approach of Polonik but differs in three important ways: (1) While Polonik uses a density estimator for specifying the MV-sets, we introduce a simpler method that ﬁnds the MV-sets directly from the data. [sent-448, score-0.253]

91 (2) Once the MV-sets are deﬁned, Polonik uses their hypothetical quantiles as the expected plots, and hence, runs the KS test in its onesample version (goodness-of-ﬁt test). [sent-450, score-0.215]

92 Instead of using the hypothetical measures, we estimate the expected plots of X empirically and use the two-sample KS test instead. [sent-452, score-0.158]

93 (3) Unlike Polonik’s work, ours was evaluated empirically and its superiority demonstrated over a wide range of nonparametric tests. [sent-453, score-0.089]

94 Moreover, since Polonik’s test relies on a density estimation and the ability to extract its level-sets, it is not practically feasible in high-dimensional settings. [sent-454, score-0.236]

95 Second, it presents a nonparametric, generalized, KS test that uses our representation method to detect distributional changes in highdimensional data. [sent-461, score-0.202]

96 Our test was found superior to competing tests in the sense of average precision and BEP measures, especially in the context of change-detection tasks. [sent-462, score-0.24]

97 An interesting and still open question is how we should set the input α quantiles for our method. [sent-463, score-0.099]

98 The problem of determining the number of quantiles – and the gaps between consecutive ones – is related to the problem of histogram design. [sent-464, score-0.099]

99 Learning distributions by their density levels: A paradigm for learning without a teacher. [sent-468, score-0.083]

100 Level sets and minimum volume sets of probability density functions. [sent-559, score-0.106]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ocsvm', 0.469), ('fc', 0.405), ('ocnm', 0.3), ('polonik', 0.281), ('ocsvms', 0.225), ('oq', 0.206), ('ks', 0.188), ('bep', 0.132), ('mmd', 0.129), ('tests', 0.105), ('outliers', 0.101), ('quantiles', 0.099), ('kdq', 0.094), ('test', 0.091), ('oi', 0.085), ('density', 0.083), ('fn', 0.082), ('estimator', 0.079), ('svs', 0.076), ('quantile', 0.075), ('kcd', 0.075), ('ocnms', 0.075), ('pbm', 0.075), ('svbi', 0.075), ('regions', 0.074), ('ww', 0.072), ('statements', 0.07), ('cx', 0.067), ('distributional', 0.064), ('fm', 0.063), ('hierarchical', 0.059), ('hmve', 0.056), ('israel', 0.055), ('asymptotically', 0.055), ('detection', 0.054), ('streams', 0.053), ('nonparametric', 0.05), ('lf', 0.05), ('di', 0.049), ('changes', 0.047), ('reference', 0.046), ('hierarchy', 0.045), ('precision', 0.044), ('xm', 0.044), ('hypersphere', 0.043), ('plots', 0.042), ('competitor', 0.041), ('superiority', 0.039), ('decision', 0.038), ('runtime', 0.038), ('astronomical', 0.038), ('metavariable', 0.038), ('osnm', 0.038), ('osnms', 0.038), ('precisionrecall', 0.038), ('psv', 0.038), ('svbq', 0.038), ('technion', 0.036), ('estimations', 0.036), ('estimators', 0.036), ('drift', 0.035), ('examples', 0.035), ('neighbor', 0.034), ('estimation', 0.033), ('ptop', 0.033), ('notices', 0.033), ('versions', 0.033), ('fraction', 0.032), ('haifa', 0.032), ('cdfs', 0.031), ('discrepancy', 0.03), ('kernel', 0.03), ('lkopf', 0.03), ('hj', 0.029), ('xi', 0.029), ('practically', 0.029), ('monthly', 0.029), ('generalized', 0.028), ('region', 0.028), ('measures', 0.028), ('change', 0.027), ('wilcoxon', 0.027), ('transform', 0.027), ('hold', 0.026), ('hypothesis', 0.025), ('sup', 0.025), ('hypothetical', 0.025), ('annals', 0.025), ('sch', 0.025), ('intersection', 0.024), ('uci', 0.024), ('averaged', 0.024), ('lie', 0.023), ('multivariate', 0.023), ('bootstrapping', 0.023), ('fd', 0.023), ('sgn', 0.023), ('minimum', 0.023), ('setup', 0.023), ('proportion', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

2 0.1188648 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

Author: Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

Abstract: Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

3 0.11800414 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

Author: Xianghang Liu, James Petterson, Tibério S. Caetano

Abstract: We present a new formulation for binary classiﬁcation. Instead of relying on convex losses and regularizers such as in SVMs, logistic regression and boosting, or instead non-convex but continuous formulations such as those encountered in neural networks and deep belief networks, our framework entails a non-convex but discrete formulation, where estimation amounts to ﬁnding a MAP conﬁguration in a graphical model whose potential functions are low-dimensional discrete surrogates for the misclassiﬁcation loss. We argue that such a discrete formulation can naturally account for a number of issues that are typically encountered in either the convex or the continuous non-convex approaches, or both. By reducing the learning problem to a MAP inference problem, we can immediately translate the guarantees available for many inference settings to the learning problem itself. We empirically demonstrate in a number of experiments that this approach is promising in dealing with issues such as severe label noise, while still having global optimality guarantees. Due to the discrete nature of the formulation, it also allows for direct regularization through cardinality-based penalties, such as the 0 pseudo-norm, thus providing the ability to perform feature selection and trade-oﬀ interpretability and predictability in a principled manner. We also outline a number of open problems arising from the formulation. 1

4 0.084104925 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suﬀer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for suﬃciently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed oﬄine. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suﬀer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1

5 0.070369154 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

Author: Azadeh Khaleghi, Daniil Ryabko

Abstract: The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical results are complemented with experimental evaluations. 1

6 0.062486473 16 nips-2012-A Polynomial-time Form of Robust Regression

7 0.061107457 206 nips-2012-Majorization for CRFs and Latent Likelihoods

8 0.057538819 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

9 0.055111229 338 nips-2012-The Perturbed Variation

10 0.051624298 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

11 0.05101721 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

12 0.04894089 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

13 0.04868615 200 nips-2012-Local Supervised Learning through Space Partitioning

14 0.04754309 303 nips-2012-Searching for objects driven by context

15 0.047140934 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

16 0.047122616 32 nips-2012-Active Comparison of Prediction Models

17 0.046704311 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

18 0.046660155 254 nips-2012-On the Sample Complexity of Robust PCA

19 0.045900211 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

20 0.044119317 95 nips-2012-Density-Difference Estimation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.139), (1, 0.02), (2, -0.0), (3, -0.044), (4, 0.018), (5, -0.012), (6, 0.007), (7, 0.062), (8, -0.055), (9, -0.043), (10, 0.002), (11, -0.03), (12, 0.071), (13, -0.057), (14, 0.029), (15, -0.074), (16, 0.043), (17, 0.002), (18, -0.034), (19, 0.029), (20, 0.033), (21, -0.036), (22, 0.058), (23, 0.015), (24, -0.049), (25, 0.031), (26, 0.024), (27, 0.019), (28, 0.035), (29, -0.03), (30, 0.071), (31, -0.014), (32, -0.041), (33, 0.004), (34, 0.046), (35, 0.005), (36, -0.022), (37, 0.077), (38, 0.036), (39, 0.03), (40, 0.007), (41, 0.0), (42, 0.013), (43, -0.017), (44, 0.056), (45, 0.057), (46, -0.052), (47, -0.033), (48, -0.068), (49, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90541202 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

2 0.75128496 95 nips-2012-Density-Difference Estimation

Author: Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

Abstract: We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of ﬁrst estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the ﬁrst step is performed without regard to the second step and thus a small estimation error incurred in the ﬁrst stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric ﬁnite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be utilized in L2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.

3 0.69449562 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

4 0.67136329 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

Author: Han Liu, Larry Wasserman, John D. Lafferty

Abstract: We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the exponential inequality is that, combined with the union bound, we can guarantee accurate estimators of the mutual information for many pairs of random variables simultaneously. As an application, we show how to use such a result to optimally estimate the density function and graph of a distribution which is Markov to a forest graph. 1

5 0.66109091 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek

Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the beneﬁts of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1

6 0.61014044 338 nips-2012-The Perturbed Variation

7 0.60371298 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

8 0.59894663 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

9 0.58844566 222 nips-2012-Multi-Task Averaging

10 0.58514124 291 nips-2012-Reducing statistical time-series problems to binary classification

11 0.56679368 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

12 0.56235027 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

13 0.52116257 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

14 0.51429754 188 nips-2012-Learning from Distributions via Support Measure Machines

15 0.4964785 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

16 0.49596617 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

17 0.4956153 330 nips-2012-Supervised Learning with Similarity Functions

18 0.49009168 16 nips-2012-A Polynomial-time Form of Robust Regression

19 0.48598924 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

20 0.47796908 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.028), (21, 0.011), (38, 0.064), (42, 0.02), (53, 0.011), (54, 0.022), (55, 0.013), (74, 0.036), (76, 0.635), (80, 0.041), (84, 0.014), (92, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99104649 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

2 0.97877377 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

Author: Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K. Duvenaud, Stephen J. Roberts, Carl E. Rasmussen

Abstract: Numerical integration is a key component of many problems in scientiﬁc computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efﬁciency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model’s hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientiﬁc problem from astronomy. 1

3 0.97569913 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received signiﬁcant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

4 0.97557575 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller

Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest ﬁrst. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features speciﬁc to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the beneﬁt of our approach to adapt object detectors to video. 1

5 0.97391701 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classiﬁcation method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

6 0.96837687 205 nips-2012-MCMC for continuous-time discrete-state systems

7 0.96767175 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

8 0.93846524 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

9 0.92952734 247 nips-2012-Nonparametric Reduced Rank Regression

10 0.92819303 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

11 0.86924601 338 nips-2012-The Perturbed Variation

12 0.85450161 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

13 0.85350984 142 nips-2012-Generalization Bounds for Domain Adaptation

14 0.84743327 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

15 0.84611744 41 nips-2012-Ancestor Sampling for Particle Gibbs

16 0.84317541 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

17 0.84212816 327 nips-2012-Structured Learning of Gaussian Graphical Models

18 0.83701575 163 nips-2012-Isotropic Hashing

19 0.83673739 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

20 0.83601081 291 nips-2012-Reducing statistical time-series problems to binary classification