nips nips2012 nips2012-119 knowledge-graph by maker-knowledge-mining

119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections


Source: pdf

Author: Ping Li, Cun-hui Zhang

Abstract: Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e. [sent-4, score-0.783]

2 For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. [sent-7, score-0.512]

3 However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. [sent-8, score-0.254]

4 In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. [sent-9, score-0.538]

5 In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. [sent-10, score-0.78]

6 Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. [sent-11, score-1.365]

7 Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage. [sent-12, score-0.294]

8 1 Introduction Computing the Shannon entropy in massive data have important applications in neural computation [17], graph estimation [5], query logs analysis in Web search [14], network anomaly detection [21], etc. [sent-13, score-0.641]

9 In modern applications, as massive datasets are often generated in a streaming fashion, entropy estimation in data streams has become a challenging and interesting problem. [sent-17, score-0.661]

10 Mining data streams at petabyte scale has become an important research area [1], as network data can easily reach that scale [20]. [sent-21, score-0.279]

11 In the standard turnstile model [15], a data stream is a vector At of length D, where D = 264 or even D = 2128 is possible in network applications, e. [sent-22, score-0.295]

12 At time t, there is an input stream at = (it , It ), it ∈ [1, D] which updates At by a linear rule: At [it ] = At−1 [it ] + It . [sent-25, score-0.151]

13 For network traffic, normally At [i] ≥ 0, which is called the strict turnstile model and suffices for describing certain natural phenomena. [sent-27, score-0.144]

14 Also, many applications (such as anomaly detections of network traffic) require computing the summary statistics in real-time. [sent-32, score-0.234]

15 An effective and reliable measurement of network traffic in real-time is crucial for anomaly detection and network diagnosis; and one such measurement metric is the Shannon entropy [4, 8, 19, 2, 9, 21]. [sent-36, score-0.713]

16 The exact entropy measurement in real-time on high-speed links is however computationally prohibitive. [sent-37, score-0.332]

17 The Distributed Denial of Service (DDoS) attack is a representative example of network anomalies. [sent-38, score-0.187]

18 A DDoS attack attempts to make computers unavailable to intended users, either by forcing users to reset the computers or by exhausting the resources of service-hosting sites. [sent-39, score-0.165]

19 A DDoS attack normally changes the statistical distribution of network traffic, which could be reliably captured by the abnormal variations in the measurements of Shannon entropy [4]. [sent-42, score-0.481]

20 source IP address: entropy value 10 9 8 7 6 5 4 3 2 1 0 200 400 600 800 1000 packet counts (thousands) 1200 Figure 1: This plot is reproduced from a DARPA conference [4]. [sent-44, score-0.334]

21 Apparently, the entropy measurements do not have to be “perfect” for detecting attacks. [sent-47, score-0.316]

22 3 Symmetric Stable Random Projections and Entropy Estimation Using Moments It turns out that, for 0 < α ≤ 2, one can use stable random projections to compute F(α) efficiently because the Turnstile model (1) is a linear model and the random projection operation is also linear (i. [sent-52, score-0.179]

23 Conceptually, we multiply the data stream vector At ∈ RD by a random matrix R ∈ RD×k , resulting in a vector X = At × R ∈ Rk with entries D xj = [At × R]j = rij At [i], j = 1, 2, . [sent-55, score-0.336]

24 In data stream computations, the matrix R is not materialized. [sent-60, score-0.151]

25 When a stream element at = (it , It ) arrives, one updates the entries of X: xj ← xj + It rit j , j = 1, 2, . [sent-63, score-0.318]

26 (3) By property of stable distributions, the samples xj , j = 1 to k, are also i. [sent-67, score-0.219]

27 stable D xj = D rij At [i] ∼ S |At [i]|α α, F(α) = i=1 (4) i=1 Therefore, the task boils down to estimating the scale parameter from k i. [sent-70, score-0.279]

28 Because the Shannon entropy is essentially the derivative of the frequency moment at α = 1, the popular approach is to approximate the Shannon entropy by the Tsallis entropy [18]: Tα = 1 α−1 1− F(α) α F(1) . [sent-74, score-1.037]

29 (5) which approaches the Shannon entropy H as α → 1. [sent-75, score-0.294]

30 The estimated moments are then plugged in (5) to estimate the Shannon entropy H. [sent-81, score-0.364]

31 Ideally, if V ar ˆ F(α) ˆ Fα = O ∆2 , the (1) (1) ˆ variance of the estimated Tsallis entropy Tα = It turns out that finding an estimator with V ar 1 α−1 ˆ F(α) ˆ Fα 1− ˆ F(α) ˆ Fα will be essentially independent of ∆. [sent-98, score-0.784]

32 It is known (1) that around α = 1, the geometric mean estimator [10] is nearly statistically optimal. [sent-100, score-0.342]

33 Interestingly, our analysis and simulation show that using the geometric mean estimator, we can essentially only achieve V ar ˆ F(α) ˆ Fα = O (∆), which, albeit a large improvement, is not small sufficient to cancel (1) 1 ∆2 the O term. [sent-101, score-0.271]

34 Therefore, our second key component is a new estimator of Tα using a moment estimator which does not have (or barely has) finite variance. [sent-102, score-0.488]

35 Even though such an estimator is not good for estimating the single moment compared to the geometric mean, due to the high correlation, the ratio ˆ F(α) ˆα F(1) is still very well-behaved and its variance is essentially O ∆2 , as shown in our theoretical analysis and experiments. [sent-103, score-0.55]

36 5 Compressed Counting (CC) for Nonnegative Data Streams The recent work [13] on Compressed Counting (CC) [11] provides an ideal solution to the problem of entropy estimation in nonnegative data streams. [sent-105, score-0.362]

37 This observation lead to the conjecture that estimating F(α) should be also easy if α ≈ 1, which consequently lead to the development of Compressed Counting which used maximally-skewed stable random projections instead of symmetric stable projections. [sent-110, score-0.406]

38 The most recent work of CC [13] provided a new moment estimator to achieve the variance ∝ O ∆2 . [sent-111, score-0.341]

39 Unfortunately, for general data streams where entries can be negative, we have to resort to symmetric stable random projections. [sent-112, score-0.441]

40 Fundamentally, the reason that skewed projections work well on nonnegative data streams is because the data themselves are skewed. [sent-113, score-0.307]

41 [21] used the difference between data streams from a slightly different motivation. [sent-118, score-0.239]

42 The goal of [21] is to measure the entropies of all OD pairs (origin-destination) in a network, because entropy measurements are crucial for detecting anomaly events such as DDoS attacks and network failures. [sent-119, score-0.675]

43 They argued that the change of entropy of the traffic distribution may be invisible (i. [sent-120, score-0.327]

44 Instead, they proposed to measure the entropy from a number of locations across the network, i. [sent-123, score-0.294]

45 , 3 by examining the entropy of every OD flow in the network. [sent-125, score-0.294]

46 In a similar argument, a DDoS attack may be invisible in terms of the traffic volume change, if the attack is launched outside the network. [sent-126, score-0.279]

47 While [21] successfully demonstrated that measuring the Shannon entropy of OD flows is effective for detecting anomaly events, at that time they did not have the tools for efficiently estimating the entropy. [sent-127, score-0.502]

48 Using symmetric stable random projections and independent samples, they needed a large 1 number of samples (e. [sent-128, score-0.269]

49 For anomaly detection, reducing the sample size (k) is crucial because k determines the storage and estimation speed; and it is often required to detect the events at real time. [sent-131, score-0.231]

50 2 Our Proposed Algorithm Recall that a data stream is a long vector At [i], i = 1 to D. [sent-133, score-0.151]

51 Conceptually, we generate a random matrix R ∈ RD×k whose entries are sampled from a stable distribution and multiply it with At : X = At × R. [sent-135, score-0.198]

52 The matrix multiplication is linear and can be conducted incrementally as the new stream elements arrive. [sent-136, score-0.187]

53 R is not materialized; its entries are re-generated on demand using pseudorandom numbers, as the standard practice in data stream computations [7]. [sent-137, score-0.261]

54 Our method does not require At [i] ≥ 0 and hence it can handle the difference between two streams (e. [sent-138, score-0.239]

55 1 The Symmetric Stable Law Our work utilizes the symmetric stable distribution. [sent-142, score-0.187]

56 random numbers: wij ∼ exp(1), uij ∼ unif orm(−π/2, π/2), i = 1, 2, . [sent-150, score-0.172]

57 , k, (9) As new stream elements arrive, we incrementally maintain two sets of samples, i. [sent-156, score-0.187]

58 , for i = 1 to k, D xj = D At [i]g(wij , uij , 1), yj = i=1 At [i]g(wij , uij , α) (10) i=1 Note that xj and yj are highly correlated because they are generated using the same random numbers (with different α). [sent-158, score-0.564]

59 Our recommended estimator of the Tsallis entropy Tα is  √ 1  π ˆα,0. [sent-160, score-0.511]

60 When ∆ is sufficiently small, the estimated Tsallis entropy will be sufficiently close to the Shannon entropy. [sent-163, score-0.32]

61 While it is intuitively clear that it is beneficial to make xj and yj ˆ highly correlated for the sake of reducing the variance, it might not be as intuitive why Tα,0. [sent-165, score-0.225]

62 We will explain why the obvious geometric mean estimator [10] is not sufficient for entropy estimation. [sent-167, score-0.636]

63 4 3 The Geometric Mean Estimator For estimating F(α) , the geometric mean estimator [10] is close to be statistically optimal (efficiency ≈ 80%) at α ≈ 1. [sent-168, score-0.382]

64 Thus, it was our first attempt to test the following estimator of the Tsallis entropy: ˆ Tα,gm = 1 α−1 1− ˆ F(α),gm ˆ Fα ˆ , where F(α),gm = (1),gm k α/k j=1 |yj | , Gk (α, α/k) ˆ F(1),gm = k 1/k j=1 |xj | , Gk (1, 1/k) where G() is defined in (8). [sent-169, score-0.194]

65 1 (12) Theoretical Analysis ˆ The theoretical analysis of T(α),gm , however, turns out to be difficult, as it requires computing   sα/k sα/k D At [i]g(wij , uij , α) yj i=1 , E =E s = 1, 2, (13) D xj i=1 At [i]g(wij , uij , 1) where g() is defined in (7). [sent-172, score-0.307]

66 6822γ 2 ∆2 + O γ∆4 + O γ 2 ∆3 (14) Note that we need to keep higher order terms in order to prove Lemma 2, to show the properties of the geometric mean estimator, when D = 1 (i. [sent-179, score-0.148]

67 In this case, the geometric mean estimator Tα,gm is 1 asymptotically unbiased with variance essentially free of ∆ , which is very encouraging. [sent-186, score-0.468]

68 2 and the theoretical analysis of a more general estimator 1 ˆ in Sec. [sent-194, score-0.194]

69 Independent samples) ˆ We present some experimental results for evaluating Tα,gm , to demonstrate that (i) using correlation does substantially reduce variance and hence reduces the required sample size, and (ii) the variance 1 ˆ (or MSE, the mean square error) of Tα,gm is roughly O ∆ . [sent-198, score-0.18]

70 , the prior work [21]) and the geometric mean estimator. [sent-213, score-0.148]

71 The middle panels contain the results using correlated sampling (i. [sent-214, score-0.17]

72 The right panels multiply the results of the middle panels by ∆ to illustrate that the variance of the geometric mean 1 ˆ estimator for entropy Tα,gm is essentially O ∆ . [sent-217, score-0.991]

73 We conducted symmetric random projections using both independent sampling (left panels, as in [21]) and correlated sampling (middle panels, as our proposal). [sent-226, score-0.186]

74 The Tsallis entropy (of the difference vector) is estimated using the geometric mean estimator (12) with three sample sizes k = 10, 100, and 1000. [sent-227, score-0.686]

75 The normalized mean square errors ˆ (MSE: E|Tα,gm − H|2 /H 2 ) verify that correlated sampling reduces the errors substantially. [sent-228, score-0.283]

76 4 The General Estimator Since the geometric mean estimator could not satisfactorily solve the entropy estimation problem, we resort to estimators which behave dramatically different from the geometric mean. [sent-229, score-0.808]

77 Thus, it is clear that, as long as 0 < λ < 1, F(α),γ is a consistent estimator of ˆ ˆ F(α) and E F(α),γ is finite. [sent-238, score-0.194]

78 5: ˆ E F(α),γ = F(α) + O 1 k , ˆ V ar F(α),γ = 6 2 F(α) α2 G(α, 2γ) − G2 (α, γ) k γ2 G2 (α, γ) +O 1 k2 The variance is unbounded if γ = 0. [sent-240, score-0.147]

79 In fact, when γ → 0, F(α),γ ˆ ˆ converges to the geometric mean estimator F(α),gm . [sent-243, score-0.342]

80 1 Theoretical Analysis Based on Lemma 3 and Lemma 4 (which is a fairly technical proof), we know that the variance of the 2γ−1 ˆ general estimator is essentially V ar Tα,γ = O ∆ k , for fixed γ ∈ (0, 1/2). [sent-246, score-0.392]

81 In other words, when γ is close to 0, the variance of the entropy estimator is essentially on the order of O (1/(k∆)), and while γ is close to 1/2, the variance is essentially O(1/k) as desired. [sent-247, score-0.74]

82 2 Experimental Results ˆ Figure 3 presents some empirical results, for testing the general estimator Tα,γ (17), using more word vector pairs (including the same 2 pairs in Figure 2). [sent-261, score-0.222]

83 If the goal is to estimate the Shannon entropy within a few √ percentages of the the true value, then k = 100 ∼ 1000 should be sufficient, because M SE/H < 0. [sent-268, score-0.294]

84 5 Conclusion Entropy estimation is an important task in machine learning, data mining, network measurement, anomaly detection, neural computations, etc. [sent-270, score-0.239]

85 In modern applications, the data are often generated in a streaming fashion and many operations on the streams can only be conducted in one-pass of the data. [sent-271, score-0.326]

86 It has been a challenging problem to estimate the Shannon entropy of data streams. [sent-272, score-0.294]

87 The prior work [21] achieved some success in entropy estimation using symmetric stable random projections. [sent-273, score-0.51]

88 In our approach, we approximate the Shannon entropy using two high correlated estimates of the frequent comments. [sent-277, score-0.366]

89 The positive correlation can substantially reduce the variance of the Shannon entropy estimate. [sent-278, score-0.369]

90 However, finding the appropriate estimator of the frequency moment is another challenging task. [sent-279, score-0.298]

91 We successfully find such an estimator and show that its variance (of the Shannon entropy estimate) is very small. [sent-280, score-0.563]

92 7 −4 10 −5 −4 −3 −2 −1 0 10 10 10 10 10 10 ∆=α−1 Figure 3: The first two rows are the normalized MSEs for same two vectors used in Figure 2, for ˆ estimating Shannon entropy using the general estimator Tα,γ with γ = 0. [sent-352, score-0.709]

93 , the prior work [21]) and the geometric mean estimator. [sent-360, score-0.148]

94 The second column of panels are the results of using correlated samples and the geometric mean estimator. [sent-361, score-0.347]

95 The right three columns of panels are for the ˆ proposed general estimator Tα,γ with γ = 0. [sent-362, score-0.292]

96 Stable distributions, pseudorandom generators, embeddings, and data stream computation. [sent-396, score-0.2]

97 Data streaming algorithms for estimating entropy of network traffic. [sent-402, score-0.486]

98 A unified near-optimal estimator for dimension reduction in lα (0 < α ≤ 2) using stable random projections. [sent-412, score-0.32]

99 A new algorithm for compressed counting with applications in shannon entropy estimation in dynamic data. [sent-415, score-0.684]

100 A data streaming algorithm for estimating entropies of od flows. [sent-443, score-0.292]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mse', 0.521), ('entropy', 0.294), ('shannon', 0.266), ('gm', 0.226), ('streams', 0.215), ('estimator', 0.194), ('normalized', 0.181), ('ddos', 0.16), ('stream', 0.151), ('anomaly', 0.146), ('traf', 0.134), ('stable', 0.126), ('attack', 0.123), ('od', 0.12), ('geometric', 0.118), ('tsallis', 0.106), ('panels', 0.098), ('yj', 0.089), ('mses', 0.088), ('streaming', 0.088), ('ping', 0.081), ('turnstile', 0.08), ('uij', 0.077), ('love', 0.077), ('variance', 0.075), ('correlated', 0.072), ('moment', 0.072), ('ar', 0.072), ('cc', 0.065), ('food', 0.065), ('network', 0.064), ('xj', 0.064), ('symmetric', 0.061), ('news', 0.058), ('projections', 0.053), ('compressed', 0.053), ('wij', 0.051), ('essentially', 0.051), ('washington', 0.05), ('united', 0.05), ('pseudorandom', 0.049), ('attacks', 0.049), ('rij', 0.049), ('entropies', 0.044), ('unif', 0.044), ('moments', 0.044), ('mining', 0.043), ('counting', 0.042), ('kg', 0.042), ('review', 0.041), ('detection', 0.04), ('anukool', 0.04), ('imc', 0.04), ('lall', 0.04), ('mitsunori', 0.04), ('ogihara', 0.04), ('packet', 0.04), ('estimating', 0.04), ('nonnegative', 0.039), ('entries', 0.039), ('measurement', 0.038), ('recommend', 0.037), ('incrementally', 0.036), ('sigcomm', 0.035), ('ashwin', 0.035), ('massive', 0.035), ('multiply', 0.033), ('lemma', 0.033), ('orm', 0.033), ('invisible', 0.033), ('logs', 0.033), ('numbers', 0.032), ('frequency', 0.032), ('sin', 0.031), ('generators', 0.031), ('mean', 0.03), ('samples', 0.029), ('crucial', 0.029), ('estimation', 0.029), ('barely', 0.028), ('word', 0.028), ('conceptually', 0.027), ('ows', 0.027), ('events', 0.027), ('estimated', 0.026), ('carefully', 0.026), ('estimators', 0.025), ('jun', 0.024), ('detections', 0.024), ('ip', 0.024), ('difference', 0.024), ('fashion', 0.023), ('positively', 0.023), ('gk', 0.023), ('recommended', 0.023), ('soda', 0.023), ('store', 0.023), ('detecting', 0.022), ('computations', 0.022), ('computers', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

Author: Ping Li, Cun-hui Zhang

Abstract: Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.

2 0.39465889 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suffer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for sufficiently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed offline. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suffer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1

3 0.14283173 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

Author: Evan Archer, Il M. Park, Jonathan W. Pillow

Abstract: We consider the problem of estimating Shannon’s entropy H in the under-sampled regime, where the number of possible symbols may be unknown or countably infinite. Dirichlet and Pitman-Yor processes provide tractable prior distributions over the space of countably infinite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we show that they provide natural priors for Bayesian entropy estimation, due to the analytic tractability of the moments of the induced posterior distribution over entropy H. We derive formulas for the posterior mean and variance of H given data. However, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior on H, meaning the prior strongly determines the estimate in the under-sampled regime. We therefore define a family of continuous mixing measures such that the resulting mixture of Dirichlet or Pitman-Yor processes produces an approximately flat prior over H. We explore the theoretical properties of the resulting estimators and show that they perform well on data sampled from both exponential and power-law tailed distributions. 1

4 0.10745091 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

Author: Han Liu, Larry Wasserman, John D. Lafferty

Abstract: We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the exponential inequality is that, combined with the union bound, we can guarantee accurate estimators of the mutual information for many pairs of random variables simultaneously. As an application, we show how to use such a result to optimally estimate the density function and graph of a distribution which is Markov to a forest graph. 1

5 0.094401203 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

Author: Suvrit Sra

Abstract: Symmetric positive definite (spd) matrices pervade numerous scientific disciplines, including machine learning and optimization. We consider the key task of measuring distances between two spd matrices; a task that is often nontrivial whenever the distance function must respect the non-Euclidean geometry of spd matrices. Typical non-Euclidean distance measures such as the Riemannian metric δR (X, Y ) = log(Y −1/2 XY −1/2 ) F , are computationally demanding and also complicated to use. To allay some of these difficulties, we introduce a new metric on spd matrices, which not only respects non-Euclidean geometry but also offers faster computation than δR while being less complicated to use. We support our claims theoretically by listing a set of theorems that relate our metric to δR (X, Y ), and experimentally by studying the nonconvex problem of computing matrix geometric means based on squared distances. 1

6 0.086720601 279 nips-2012-Projection Retrieval for Classification

7 0.076851398 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

8 0.073572166 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

9 0.070882186 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

10 0.06913162 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

11 0.066299893 319 nips-2012-Sparse Prediction with the $k$-Support Norm

12 0.061212953 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

13 0.057690896 222 nips-2012-Multi-Task Averaging

14 0.057408687 16 nips-2012-A Polynomial-time Form of Robust Regression

15 0.057099003 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

16 0.056583811 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

17 0.055889454 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

18 0.053782094 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

19 0.052966289 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

20 0.051971391 329 nips-2012-Super-Bit Locality-Sensitive Hashing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, 0.035), (2, 0.02), (3, -0.042), (4, -0.014), (5, 0.038), (6, 0.0), (7, 0.092), (8, -0.029), (9, -0.077), (10, -0.022), (11, -0.041), (12, 0.046), (13, -0.122), (14, -0.034), (15, -0.099), (16, 0.219), (17, -0.049), (18, -0.05), (19, -0.071), (20, 0.059), (21, 0.03), (22, 0.231), (23, 0.014), (24, -0.029), (25, -0.023), (26, 0.195), (27, 0.044), (28, 0.191), (29, -0.036), (30, 0.128), (31, -0.053), (32, 0.155), (33, 0.097), (34, -0.053), (35, -0.099), (36, 0.188), (37, 0.154), (38, -0.103), (39, -0.046), (40, -0.015), (41, -0.099), (42, -0.029), (43, -0.071), (44, 0.09), (45, -0.018), (46, -0.092), (47, -0.042), (48, 0.017), (49, -0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97922504 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

Author: Ping Li, Cun-hui Zhang

Abstract: Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.

2 0.86878908 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suffer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for sufficiently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed offline. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suffer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1

3 0.71604306 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

Author: Han Liu, Larry Wasserman, John D. Lafferty

Abstract: We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the exponential inequality is that, combined with the union bound, we can guarantee accurate estimators of the mutual information for many pairs of random variables simultaneously. As an application, we show how to use such a result to optimally estimate the density function and graph of a distribution which is Markov to a forest graph. 1

4 0.69237 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

Author: Evan Archer, Il M. Park, Jonathan W. Pillow

Abstract: We consider the problem of estimating Shannon’s entropy H in the under-sampled regime, where the number of possible symbols may be unknown or countably infinite. Dirichlet and Pitman-Yor processes provide tractable prior distributions over the space of countably infinite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we show that they provide natural priors for Bayesian entropy estimation, due to the analytic tractability of the moments of the induced posterior distribution over entropy H. We derive formulas for the posterior mean and variance of H given data. However, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior on H, meaning the prior strongly determines the estimate in the under-sampled regime. We therefore define a family of continuous mixing measures such that the resulting mixture of Dirichlet or Pitman-Yor processes produces an approximately flat prior over H. We explore the theoretical properties of the resulting estimators and show that they perform well on data sampled from both exponential and power-law tailed distributions. 1

5 0.60012174 95 nips-2012-Density-Difference Estimation

Author: Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

Abstract: We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be utilized in L2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.

6 0.52745646 222 nips-2012-Multi-Task Averaging

7 0.51372987 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

8 0.49842897 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

9 0.49440229 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

10 0.39797065 279 nips-2012-Projection Retrieval for Classification

11 0.38687086 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

12 0.37255964 364 nips-2012-Weighted Likelihood Policy Search with Model Selection

13 0.34132335 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

14 0.33508646 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA

15 0.32012385 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

16 0.31300187 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

17 0.30617827 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

18 0.29865322 16 nips-2012-A Polynomial-time Form of Robust Regression

19 0.2924611 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

20 0.28511426 329 nips-2012-Super-Bit Locality-Sensitive Hashing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.056), (21, 0.043), (31, 0.31), (38, 0.105), (39, 0.017), (42, 0.024), (54, 0.033), (55, 0.033), (74, 0.052), (76, 0.126), (80, 0.067), (92, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7324394 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

Author: Ping Li, Cun-hui Zhang

Abstract: Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.

2 0.71046185 152 nips-2012-Homeostatic plasticity in Bayesian spiking networks as Expectation Maximization with posterior constraints

Author: Stefan Habenschuss, Johannes Bill, Bernhard Nessler

Abstract: Recent spiking network models of Bayesian inference and unsupervised learning frequently assume either inputs to arrive in a special format or employ complex computations in neuronal activation functions and synaptic plasticity rules. Here we show in a rigorous mathematical treatment how homeostatic processes, which have previously received little attention in this context, can overcome common theoretical limitations and facilitate the neural implementation and performance of existing models. In particular, we show that homeostatic plasticity can be understood as the enforcement of a ’balancing’ posterior constraint during probabilistic inference and learning with Expectation Maximization. We link homeostatic dynamics to the theory of variational inference, and show that nontrivial terms, which typically appear during probabilistic inference in a large class of models, drop out. We demonstrate the feasibility of our approach in a spiking WinnerTake-All architecture of Bayesian inference and learning. Finally, we sketch how the mathematical framework can be extended to richer recurrent network architectures. Altogether, our theory provides a novel perspective on the interplay of homeostatic processes and synaptic plasticity in cortical microcircuits, and points to an essential role of homeostasis during inference and learning in spiking networks. 1

3 0.64996248 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Author: Ronald Ortner, Daniil Ryabko

Abstract: We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are H¨ lder continuity of rewards and transition o probabilities. 1

4 0.62577713 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar

Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1

5 0.56302238 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

6 0.53848565 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

7 0.53805476 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

8 0.53673398 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

9 0.53439635 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

10 0.53336716 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

11 0.53333002 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

12 0.53215253 260 nips-2012-Online Sum-Product Computation Over Trees

13 0.53181922 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

14 0.53128582 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

15 0.53075516 38 nips-2012-Algorithms for Learning Markov Field Policies

16 0.53071129 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

17 0.53061819 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

18 0.53061581 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

19 0.53051984 188 nips-2012-Learning from Distributions via Support Measure Machines

20 0.53040707 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs