nips nips2004 nips2004-114 knowledge-graph by maker-knowledge-mining

114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension


Source: pdf

Author: Elizaveta Levina, Peter J. Bickel

Abstract: We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. [sent-5, score-0.929]

2 We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. [sent-6, score-0.501]

3 We also show it has the best overall performance compared with two other intrinsic dimension estimators. [sent-7, score-0.768]

4 1 Introduction There is a consensus in the high-dimensional data analysis community that the only reason any methods work in very high dimensions is that, in fact, the data are not truly high-dimensional. [sent-8, score-0.12]

5 Then one can reduce dimension without losing much information for many types of real-life high-dimensional data, such as images, and avoid many of the “curses of dimensionality”. [sent-10, score-0.422]

6 Learning these data manifolds can improve performance in classification and other applications, but if the data structure is complex and nonlinear, dimensionality reduction can be a hard problem. [sent-11, score-0.151]

7 Traditional methods for dimensionality reduction include principal component analysis (PCA), which only deals with linear projections of the data, and multidimensional scaling (MDS), which aims at preserving pairwise distances and traditionally is used for visualizing data. [sent-12, score-0.242]

8 Recently, there has been a surge of interest in manifold projection methods (Locally Linear Embedding (LLE) [1], Isomap [2], Laplacian and Hessian Eigenmaps [3, 4], and others), which focus on finding a nonlinear low-dimensional embedding of high-dimensional data. [sent-13, score-0.312]

9 The dimension of the embedding is a key parameter for manifold projection methods: if the dimension is too small, important data features are “collapsed” onto the same dimension, and if the dimension is too large, the projections become noisy and, in some cases, unstable. [sent-15, score-1.576]

10 There is no consensus, however, on how this dimension should be determined. [sent-16, score-0.422]

11 LLE [1] and its variants assume the manifold dimension is provided by the user. [sent-17, score-0.578]

12 The charting algorithm, a recent LLE variant [7], uses a heuristic estimate of dimension which is essentially equivalent to the regression estimator of [8] discussed below. [sent-19, score-0.807]

13 Constructing a reliable estimator of intrinsic dimension and understanding its statistical properties will clearly facilitate further applications of manifold projection methods and improve their performance. [sent-20, score-1.226]

14 We note that for applications such as classification, cross-validation is in principle the simplest solution – just pick the dimension which gives the lowest classification error. [sent-21, score-0.422]

15 However, in practice the computational cost of cross-validating for the dimension is prohibitive, and an estimate of the intrinsic dimension will still be helpful, either to be used directly or to narrow down the range for cross-validation. [sent-22, score-1.3]

16 In this paper, we present a new estimator of intrinsic dimension, study its statistical properties, and compare it to other estimators on both simulated and real datasets. [sent-23, score-0.736]

17 In Section 3 we derive the estimator and give its approximate asymptotic bias and variance. [sent-25, score-0.411]

18 Section 4 presents results on datasets and compares our estimator to two other estimators of intrinsic dimension. [sent-26, score-0.729]

19 2 Previous Work on Intrinsic Dimension Estimation The existing approaches to estimating the intrinsic dimension can be roughly divided into two groups: eigenvalue or projection methods, and geometric methods. [sent-28, score-0.928]

20 Eigenvalue methods, from the early proposal of [9] to a recent variant [10] are based on a global or local PCA, with intrinsic dimension determined by the number of eigenvalues greater than a given threshold. [sent-29, score-0.768]

21 The eigenvalue methods may be a good tool for exploratory data analysis, where one might plot the eigenvalues and look for a clear-cut boundary, but not for providing reliable estimates of intrinsic dimension. [sent-31, score-0.498]

22 The geometric methods exploit the intrinsic geometry of the dataset and are most often based on fractal dimensions or nearest neighbor (NN) distances. [sent-32, score-0.587]

23 Perhaps the most popular fractal dimension is the correlation dimension [12, 13]: given a set Sn = {x1 , . [sent-33, score-1.046]

24 (1) The correlation dimension is then estimated by plotting log Cn (r) against log r and estimating the slope of the linear part [12]. [sent-37, score-0.732]

25 A recent variant [13] proposed plotting this estimate against the true dimension for some simulated data and then using this calibrating curve to estimate the dimension of a new dataset. [sent-38, score-1.071]

26 This requires a different curve for each n, and the choice of calibration data may affect performance. [sent-39, score-0.1]

27 The capacity dimension and packing numbers have also been used [14]. [sent-40, score-0.468]

28 While the fractal methods successfully exploit certain geometric aspects of the data, the statistical properties of these methods have not been studied. [sent-41, score-0.115]

29 The correlation dimension (1) implicitly uses NN distances, and there are methods that focus on them explicitly. [sent-42, score-0.51]

30 ) sample from a density f (x) in Rm , and Tk (x) is the Euclidean distance from a fixed point x to its k-th NN in the sample, then k ≈ f (x)V (m)[Tk (x)]m , (2) n where V (m) = π m/2 [Γ(m/2 + 1)]−1 is the volume of the unit sphere in Rm . [sent-49, score-0.221]

31 That is, the proportion of sample points falling into a ball around x is roughly f (x) times the volume of the ball. [sent-50, score-0.171]

32 ¯ The relationship (2) can be used to estimate the dimension by regressing log Tk n ¯ on log k over a suitable range of k, where Tk = n−1 i=1 Tk (Xi ) is the average of distances from each point to its k-th NN [8, 11]. [sent-51, score-0.755]

33 A comparison of this method to a local eigenvalue method [11] found that the NN method suffered more from underestimating dimension for high-dimensional datasets, but the eigenvalue method was sensitive to noise and parameter settings. [sent-52, score-0.518]

34 A more sophisticated NN approach was recently proposed in [15], where the dimension is estimated from the length of the minimal spanning tree on the geodesic NN distances computed by Isomap. [sent-53, score-0.557]

35 While there are certainly existing methods available for estimating intrinsic dimension, there are some issues that have not been adequately addressed. [sent-54, score-0.379]

36 3 A Maximum Likelihood Estimator of Intrinsic Dimension Here we derive the maximum likelihood estimator (MLE) of the dimension m from i. [sent-56, score-0.677]

37 The observations represent an embedding of a lower-dimensional sample, i. [sent-63, score-0.115]

38 The basic idea is to fix a point x, assume f (x) ≈ const in a small sphere S x (R) of radius R around x, and treat the observations as a homogeneous Poisson process in Sx (R). [sent-67, score-0.212]

39 Consider the inhomogeneous process {N (t, x), 0 ≤ t ≤ R}, n N (t, x) = i=1 1{Xi ∈ Sx (t)} (3) which counts observations within distance t from x. [sent-68, score-0.121]

40 The MLEs must satisfy the likelihood equations R R ∂L = dN (t) − λ(t)dt = N (R) − eθ V (m)Rm = 0, (5) ∂θ 0 0 ∂L ∂m = 1 V (m) + m V (m) R N (R) + −eθ V (m)Rm log R + 0 log t dN (t) − V (m) V (m) = 0. [sent-73, score-0.143]

41 mR (x) =  ˆ N (R, x) j=1 Tj (x) (7) In practice, it may be more convenient to fix the number of neighbors k rather than the radius of the sphere R. [sent-75, score-0.181]

42 Then the estimate in (7) becomes  −1 k−1 1 Tk (x)  mk (x) =  ˆ log . [sent-76, score-0.36]

43 One could divide by k − 2 rather than k − 1 to make the estimator asymptotically unbiased, as we show below. [sent-78, score-0.226]

44 For some applications, one may want to evaluate local dimension estimates at every data point, or average estimated dimensions within data clusters. [sent-80, score-0.576]

45 It can be the case that a dataset has different intrinsic dimensions at different scales, e. [sent-83, score-0.472]

46 In general, for our estimator to work well the sphere should be small and contain sufficiently many points, and we have work in progress on choosing such a k automatically. [sent-87, score-0.338]

47 k2 to get the final estimates mk = ˆ 1 n n mk (Xi ) , ˆ i=1 m= ˆ 1 k2 − k 1 + 1 k2 mk . [sent-91, score-0.782]

48 ˆ (9) k=k1 The choice of k1 and k2 and behavior of mk as a function of k are discussed further ˆ in Section 4. [sent-92, score-0.251]

49 The only parameters to set for this method are k1 and k2 , and the computational cost is essentially the cost of finding k2 nearest neighbors for every point, which has to be done for most manifold projection methods anyway. [sent-93, score-0.241]

50 1 Asymptotic behavior of the estimator for m fixed, n → ∞. [sent-95, score-0.226]

51 Here we give a sketchy discussion of the asymptotic bias and variance of our estimator, to be elaborated elsewhere. [sent-96, score-0.226]

52 As we remarked, for a given x if n → ∞ and R → 0, the inhomogeneous binomial process N (t, x) in (3) converges weakly to the inhomogeneous Poisson process with rate λ(t) given by (4). [sent-98, score-0.216]

53 If we condition on the distance Tk (x) and assume the Poisson approximation is exact, then m−1 log(Tk /Tj ) : 1 ≤ j ≤ k − 1 are distributed as the order statistics of a sample of size k−1 from a standard exponential distribution. [sent-99, score-0.092]

54 With approximations (10), we have E m = E mk = m, but the computation of ˆ ˆ Var(m) is complicated by the dependence among mk (Xi ). [sent-103, score-0.531]

55 We have a heuristic ˆ ˆ argument (omitted for lack of space) that, by dividing mk (Xi ) into n/k roughly ˆ independent groups of size k each, the variance can be shown to be of order n −1 , as it would if the estimators were independent. [sent-104, score-0.408]

56 5 k Dimension estimate m Dimension estimate mk 6 m=20 m=10 m=5 m=2 20 5. [sent-110, score-0.355]

57 5 3 0 10 20 30 40 50 k 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 k Figure 1: The estimator mk as a function of k. [sent-113, score-0.477]

58 We first investigate the properties of our estimator in detail by simulations, and then apply it to real datasets. [sent-116, score-0.252]

59 The first issue is the behavior of mk as a function ˆ of k. [sent-117, score-0.251]

60 1(a) shows mk for a 5-d Gaussian as a function of k for ˆ several sample sizes n. [sent-121, score-0.317]

61 For very small k the approximation does not work yet and mk is unreasonably high, but for k as small as 10, the estimate is near the true value ˆ m = 5. [sent-122, score-0.33]

62 The estimate shows some negative bias for large k, which decreases with growing sample size n, and, as Fig. [sent-123, score-0.27]

63 Note, however, that it is the intrinsic dimension m rather than the embedding dimension p ≥ m that matters; and as our examples below and many examples elsewhere show, the intrinsic dimension for real data is frequently low. [sent-125, score-2.089]

64 k2 is different for every combination of m and n, but the estimator is fairly stable as a function of k, apart from the first few values. [sent-130, score-0.226]

65 In this range, the estimates are not affected much by sample size or the positive bias for very small k, at least for the range of m and n under consideration. [sent-135, score-0.305]

66 Next, we investigate an important and often overlooked issue of what happens when the data are near a manifold as opposed to exactly on a manifold. [sent-136, score-0.183]

67 As ρ changes from 0 to 1, the dimension changes from 5 (full spherical Gaussian) to 1 (a line in R5 ), with intermediate values of ρ providing noisy versions. [sent-139, score-0.457]

68 5 1 0 −4 10 −3 10 −2 10 1−ρ (log scale) −1 10 1 0 0 5 10 15 20 25 30 True dimension Figure 2: (a) Data near a manifold: estimated dimension for correlated 5-d normal as a function of 1 − ρ. [sent-147, score-0.939]

69 (b) The MLE, regression, and correlation dimension for uniform distributions on spheres with n = 1000. [sent-148, score-0.51]

70 2(a) show that the MLE of dimension does not drop unless ρ is very close to 1, so the estimate is not affected by whether the data cloud is spherical or elongated. [sent-151, score-0.509]

71 For ρ close to 1, when the dimension really drops, the estimate depends significantly on the sample size, which is to be expected: n = 100 highly correlated points look like a line, but n = 2000 points fill out the space around the line. [sent-152, score-0.624]

72 This highlights the fundamental dependence of intrinsic dimension on the neighborhood scale, particularly when the data may be observed with noise. [sent-153, score-0.797]

73 A comparison of the MLE, the regression estimator (regressing log T k on log k), and the correlation dimension is shown in Fig. [sent-155, score-0.915]

74 20 (the same as the MLE) for fair comparison, and the regression for correlation dimension was based on the first 10 . [sent-161, score-0.575]

75 100 distinct values of log C n (r), to reflect the fact there are many more points for the log Cn (r) regression than for the log T k regression. [sent-164, score-0.265]

76 The comparison shows that, while all methods suffer from negative bias for higher dimensions, the correlation dimension has the smallest bias, with the MLE coming in close second. [sent-166, score-0.636]

77 However, the variance of correlation dimension is much higher than that of the MLE (the SD is at least 10 times higher for all dimensions). [sent-167, score-0.551]

78 The regression estimator, on the other hand, has relatively low variance (though always higher than the MLE) but the largest negative bias. [sent-168, score-0.106]

79 On the balance of bias and variance, MLE is clearly the best choice. [sent-169, score-0.126]

80 91 Finally, we compare the estimators on three popular manifold datasets (Table 1): the Swiss roll, and two image datasets shown on Fig. [sent-188, score-0.415]

81 For the Swiss roll, the MLE again provides the best combination of bias and variance. [sent-190, score-0.126]

82 The face database consists of images of an artificial face under three changing conditions: illumination, and vertical and horizontal orientation. [sent-191, score-0.097]

83 Hence the intrinsic dimension of the dataset should be 3, but only if we had the full 3-d images of the face. [sent-192, score-0.838]

84 All we have, however, are 2-d projections of the face, and it is clear that one needs more than one “basis” image to represent different poses (from casual inspection, front view and profile seem sufficient). [sent-193, score-0.093]

85 The estimated dimension of about 4 is therefore very reasonable. [sent-194, score-0.464]

86 The hand image data is a real video sequence of a hand rotating along a 1-d curve in space, but again several basis 2-d images are needed to represent different poses (in this case, front, back, and profile seem sufficient). [sent-195, score-0.122]

87 We note that the correlation dimension provides two completely different answers for this dataset, depending on which linear part of the curve is used; this is further evidence of its high variance, which makes it a less reliable estimate that the MLE. [sent-197, score-0.637]

88 5 Discussion In this paper, we have derived a maximum likelihood estimator of intrinsic dimension and some asymptotic approximations to its bias and variance. [sent-198, score-1.208]

89 We have shown 1 This estimate is obtained from the range 500. [sent-199, score-0.11]

90 For this dataset, the correlation dimension curve has two distinct linear parts, with the first part over the range we would normally use, 10. [sent-203, score-0.61]

91 html that the MLE produces good results on a range of simulated and real datasets and outperforms two other dimension estimators. [sent-215, score-0.621]

92 It does, however, suffer from a negative bias for high dimensions, which is a problem shared by all dimension estimators. [sent-216, score-0.548]

93 One reason for this is that our approximation is based on sufficiently many observations falling into a small sphere, and that requires very large sample sizes in high dimensions (we shall elaborate and quantify this further elsewhere). [sent-217, score-0.221]

94 One can potentially reduce the negative bias by removing the edge points by some criterion, but we found that the edge effects are small compared to the sample size problem, and we have been unable to achieve significant improvement in this manner. [sent-219, score-0.247]

95 Another option used by [13] is calibration on simulated datasets with known dimension, but since the bias depends on the sampling distribution, and a different curve would be needed for every sample size, calibration does not solve the problem either. [sent-220, score-0.465]

96 One should keep in mind, however, that for most interesting applications intrinsic dimension will not be very high – otherwise there is not much benefit in dimensionality reduction; hence in practice the MLE will provide a good estimate of dimension most of the time. [sent-221, score-1.325]

97 Laplacian eigenmaps and spectral techniques for embedding and clustering. [sent-240, score-0.126]

98 Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data. [sent-247, score-0.105]

99 Estimating the intrinsic dimension of data with a fractal-based approach. [sent-308, score-0.768]

100 Geodesic entropic graphs for dimension and entropy estimation in manifold learning. [sent-320, score-0.578]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dimension', 0.422), ('mle', 0.397), ('intrinsic', 0.346), ('mk', 0.251), ('estimator', 0.226), ('manifold', 0.156), ('erent', 0.15), ('nn', 0.14), ('tk', 0.135), ('su', 0.134), ('bias', 0.126), ('sphere', 0.112), ('di', 0.102), ('roll', 0.092), ('estimators', 0.09), ('correlation', 0.088), ('swiss', 0.087), ('sd', 0.084), ('dimensionality', 0.083), ('dimensions', 0.083), ('poisson', 0.08), ('fractal', 0.079), ('embedding', 0.076), ('datasets', 0.067), ('sample', 0.066), ('regression', 0.065), ('sx', 0.063), ('rm', 0.062), ('pami', 0.062), ('isomap', 0.062), ('asymptotic', 0.059), ('calibration', 0.058), ('range', 0.058), ('log', 0.057), ('distances', 0.056), ('bickel', 0.053), ('mles', 0.053), ('regressing', 0.053), ('binomial', 0.052), ('lle', 0.052), ('estimate', 0.052), ('eigenmaps', 0.05), ('inhomogeneous', 0.048), ('eigenvalue', 0.048), ('simulated', 0.048), ('packing', 0.046), ('cube', 0.046), ('dataset', 0.043), ('volume', 0.043), ('projection', 0.043), ('charting', 0.042), ('exploratory', 0.042), ('neighbors', 0.042), ('estimated', 0.042), ('curve', 0.042), ('variance', 0.041), ('observations', 0.039), ('dn', 0.038), ('nonlinear', 0.037), ('consensus', 0.037), ('geodesic', 0.037), ('reduction', 0.037), ('geometric', 0.036), ('projections', 0.035), ('face', 0.035), ('spherical', 0.035), ('popular', 0.035), ('process', 0.034), ('plotting', 0.033), ('falling', 0.033), ('estimating', 0.033), ('cn', 0.033), ('reliable', 0.033), ('pca', 0.031), ('held', 0.031), ('tj', 0.031), ('manifolds', 0.031), ('front', 0.031), ('behaves', 0.031), ('preserving', 0.031), ('dt', 0.031), ('var', 0.03), ('ected', 0.03), ('hessian', 0.03), ('belkin', 0.03), ('estimates', 0.029), ('locally', 0.029), ('elsewhere', 0.029), ('points', 0.029), ('dependence', 0.029), ('likelihood', 0.029), ('poses', 0.027), ('near', 0.027), ('images', 0.027), ('nips', 0.027), ('ects', 0.027), ('radius', 0.027), ('real', 0.026), ('correlated', 0.026), ('size', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension

Author: Elizaveta Levina, Peter J. Bickel

Abstract: We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. 1

2 0.18307896 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

Author: Jochen Triesch

Abstract: This paper explores the computational consequences of simultaneous intrinsic and synaptic plasticity in individual model neurons. It proposes a new intrinsic plasticity mechanism for a continuous activation model neuron based on low order moments of the neuron’s firing rate distribution. The goal of the intrinsic plasticity mechanism is to enforce a sparse distribution of the neuron’s activity level. In conjunction with Hebbian learning at the neuron’s synapses, the neuron is shown to discover sparse directions in the input. 1

3 0.17598934 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

4 0.1423886 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

Author: Liam Paninski

Abstract: We develop a family of upper and lower bounds on the worst-case expected KL loss for estimating a discrete distribution on a finite number m of points, given N i.i.d. samples. Our upper bounds are approximationtheoretic, similar to recent bounds for estimating discrete entropy; the lower bounds are Bayesian, based on averages of the KL loss under Dirichlet distributions. The upper bounds are convex in their parameters and thus can be minimized by descent methods to provide estimators with low worst-case error; the lower bounds are indexed by a one-dimensional parameter and are thus easily maximized. Asymptotic analysis of the bounds demonstrates the uniform KL-consistency of a wide class of estimators as c = N/m → ∞ (no matter how slowly), and shows that no estimator is consistent for c bounded (in contrast to entropy estimation). Moreover, the bounds are asymptotically tight as c → 0 or ∞, and are shown numerically to be tight within a factor of two for all c. Finally, in the sparse-data limit c → 0, we find that the Dirichlet-Bayes (add-constant) estimator with parameter scaling like −c log(c) optimizes both the upper and lower bounds, suggesting an optimal choice of the “add-constant” parameter in this regime.

5 0.12635385 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

Author: Laurent Zwald, Gilles Blanchard, Pascal Massart, Régis Vert

Abstract: This paper investigates the effect of Kernel Principal Component Analysis (KPCA) within the classification framework, essentially the regularization properties of this dimensionality reduction method. KPCA has been previously used as a pre-processing step before applying an SVM but we point out that this method is somewhat redundant from a regularization point of view and we propose a new algorithm called Kernel Projection Machine to avoid this redundancy, based on an analogy with the statistical framework of regression for a Gaussian white noise model. Preliminary experimental results show that this algorithm reaches the same performances as an SVM. 1

6 0.12460911 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

7 0.1190272 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images

8 0.10928021 163 nips-2004-Semi-parametric Exponential Family PCA

9 0.10642462 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models

10 0.099320926 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms

11 0.099084057 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

12 0.090185061 160 nips-2004-Seeing through water

13 0.084275037 88 nips-2004-Intrinsically Motivated Reinforcement Learning

14 0.082555346 125 nips-2004-Multiple Relational Embedding

15 0.081600122 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

16 0.075738028 17 nips-2004-Adaptive Manifold Learning

17 0.074032046 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

18 0.066552088 164 nips-2004-Semi-supervised Learning by Entropy Minimization

19 0.065786339 145 nips-2004-Parametric Embedding for Class Visualization

20 0.064430386 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.221), (1, 0.027), (2, -0.058), (3, -0.055), (4, -0.007), (5, 0.063), (6, -0.102), (7, 0.014), (8, 0.068), (9, 0.116), (10, 0.066), (11, -0.215), (12, 0.049), (13, -0.267), (14, -0.163), (15, 0.029), (16, -0.095), (17, -0.038), (18, -0.066), (19, 0.087), (20, -0.029), (21, -0.034), (22, 0.0), (23, 0.024), (24, -0.004), (25, 0.137), (26, 0.061), (27, -0.031), (28, -0.001), (29, -0.027), (30, -0.087), (31, -0.081), (32, 0.029), (33, 0.139), (34, -0.137), (35, -0.211), (36, -0.007), (37, 0.104), (38, 0.014), (39, -0.062), (40, -0.085), (41, 0.018), (42, 0.088), (43, -0.24), (44, 0.039), (45, -0.007), (46, -0.016), (47, -0.008), (48, -0.039), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97629291 114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension

Author: Elizaveta Levina, Peter J. Bickel

Abstract: We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. 1

2 0.62240279 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

3 0.50536042 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

Author: Jochen Triesch

Abstract: This paper explores the computational consequences of simultaneous intrinsic and synaptic plasticity in individual model neurons. It proposes a new intrinsic plasticity mechanism for a continuous activation model neuron based on low order moments of the neuron’s firing rate distribution. The goal of the intrinsic plasticity mechanism is to enforce a sparse distribution of the neuron’s activity level. In conjunction with Hebbian learning at the neuron’s synapses, the neuron is shown to discover sparse directions in the input. 1

4 0.44887263 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

Author: Richard S. Zemel, Miguel Á. Carreira-Perpiñán

Abstract: Many machine learning algorithms for clustering or dimensionality reduction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering) or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fullyconnected graph, a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually produces a better representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimensionality reduction algorithm based on the graph. 1

5 0.44458389 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

Author: Liam Paninski

Abstract: We develop a family of upper and lower bounds on the worst-case expected KL loss for estimating a discrete distribution on a finite number m of points, given N i.i.d. samples. Our upper bounds are approximationtheoretic, similar to recent bounds for estimating discrete entropy; the lower bounds are Bayesian, based on averages of the KL loss under Dirichlet distributions. The upper bounds are convex in their parameters and thus can be minimized by descent methods to provide estimators with low worst-case error; the lower bounds are indexed by a one-dimensional parameter and are thus easily maximized. Asymptotic analysis of the bounds demonstrates the uniform KL-consistency of a wide class of estimators as c = N/m → ∞ (no matter how slowly), and shows that no estimator is consistent for c bounded (in contrast to entropy estimation). Moreover, the bounds are asymptotically tight as c → 0 or ∞, and are shown numerically to be tight within a factor of two for all c. Finally, in the sparse-data limit c → 0, we find that the Dirichlet-Bayes (add-constant) estimator with parameter scaling like −c log(c) optimizes both the upper and lower bounds, suggesting an optimal choice of the “add-constant” parameter in this regime.

6 0.43469745 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

7 0.40133774 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models

8 0.39661008 163 nips-2004-Semi-parametric Exponential Family PCA

9 0.38519385 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms

10 0.36692715 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

11 0.35804906 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images

12 0.35386732 125 nips-2004-Multiple Relational Embedding

13 0.35188845 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

14 0.34313819 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

15 0.33906078 127 nips-2004-Neighbourhood Components Analysis

16 0.33816651 96 nips-2004-Learning, Regularization and Ill-Posed Inverse Problems

17 0.32985792 160 nips-2004-Seeing through water

18 0.3099077 17 nips-2004-Adaptive Manifold Learning

19 0.29106298 88 nips-2004-Intrinsically Motivated Reinforcement Learning

20 0.29046854 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.56), (15, 0.11), (26, 0.034), (31, 0.016), (33, 0.131), (35, 0.011), (39, 0.023), (50, 0.018), (56, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98915941 85 nips-2004-Instance-Based Relevance Feedback for Image Retrieval

Author: Giorgio Gia\-cin\-to, Fabio Roli

Abstract: High retrieval precision in content-based image retrieval can be attained by adopting relevance feedback mechanisms. These mechanisms require that the user judges the quality of the results of the query by marking all the retrieved images as being either relevant or not. Then, the search engine exploits this information to adapt the search to better meet user’s needs. At present, the vast majority of proposed relevance feedback mechanisms are formulated in terms of search model that has to be optimized. Such an optimization involves the modification of some search parameters so that the nearest neighbor of the query vector contains the largest number of relevant images. In this paper, a different approach to relevance feedback is proposed. After the user provides the first feedback, following retrievals are not based on knn search, but on the computation of a relevance score for each image of the database. This score is computed as a function of two distances, namely the distance from the nearest non-relevant image and the distance from the nearest relevant one. Images are then ranked according to this score and the top k images are displayed. Reported results on three image data sets show that the proposed mechanism outperforms other state-of-the-art relevance feedback mechanisms. 1 In t rod u ct i on A large number of content-based image retrieval (CBIR) systems rely on the vector representation of images in a multidimensional feature space representing low-level image characteristics, e.g., color, texture, shape, etc. [1]. Content-based queries are often expressed by visual examples in order to retrieve from the database the images that are “similar” to the examples. This kind of retrieval is often referred to as K nearest-neighbor retrieval. It is easy to see that the effectiveness of content-based image retrieval systems (CBIR) strongly depends on the choice of the set of visual features, on the choice of the “metric” used to model the user’s perception of image similarity, and on the choice of the image used to query the database [1]. Typically, if we allow different users to mark the images retrieved with a given query as relevant or non-relevant, different subsets of images will be marked as relevant. Accordingly, the need for mechanisms to adapt the CBIR system response based on some feedback from the user is widely recognized. It is interesting to note that while relevance feedback mechanisms have been first introduced in the information retrieval field [2], they are receiving more attention in the CBIR field (Huang). The vast majority of relevance feedback techniques proposed in the literature is based on modifying the values of the search parameters as to better represent the concept the user bears in mind. To this end, search parameters are computed as a function of the relevance values assigned by the user to all the images retrieved so far. As an example, relevance feedback is often formulated in terms of the modification of the query vector, and/or in terms of adaptive similarity metrics. [3]-[7]. Recently, pattern classification paradigms such as SVMs have been proposed [8]. Feedback is thus used to model the concept of relevant images and adjust the search consequently. Concept modeling may be difficult on account of the distribution of relevant images in the selected feature space. “Narrow domain” image databases allows extracting good features, so that images bearing similar concepts belong to compact clusters. On the other hand, “broad domain” databases, such as image collection used by graphic professionals, or those made up of images from the Internet, are more difficult to subdivide in cluster because of the high variability of concepts [1]. In these cases, it is worth extracting only low level, non-specialized features, and image retrieval is better formulated in terms of a search problem rather then concept modeling. The present paper aims at offering an original contribution in this direction. Rather then modeling the concept of “relevance” the user bears in mind, feedback is used to assign each image of the database a relevance score. Such a score depends only from two dissimilarities (distances) computed against the images already marked by the user: the dissimilarity from the set of relevant images, and the dissimilarity from the set of non-relevant images. Despite its computational simplicity, this mechanism allows outperforming state-of-the-art relevance feedback mechanisms both on “narrow domain” databases, and on “broad domain” databases. This paper is organized as follows. Section 2 illustrates the idea behind the proposed mechanism and provides the basic assumptions. Section 3 details the proposed relevance feedback mechanism. Results on three image data sets are presented in Section 4, where performances of other relevance feedback mechanisms are compared. Conclusions are drawn in Section 5. 2 In st an ce- b ased rel evan ce est i m at i on The proposed mechanism has been inspired by classification techniques based on the “nearest case” [9]-[10]. Nearest-case theory provided the mechanism to compute the dissimilarity of each image from the sets of relevant and non–relevant images. The ratio between the nearest relevant image and the nearest non-relevant image has been used to compute the degree of relevance of each image of the database [11]. The present section illustrates the rationale behind the use of the nearest-case paradigm. Let us assume that each image of the database has been represented by a number of low-level features, and that a (dis)similarity measure has been defined so that the proximity between pairs of images represents some kind of “conceptual” similarity. In other words, the chosen feature space and similarity metric is meaningful at least for a restricted number of users. A search in image databases is usually performed by retrieving the k most similar images with respect to a given query. The dimension of k is usually small, to avoid displaying a large number of images at a time. Typical values for k are between 10 and 20. However, as the “relevant” images that the user wishes to retrieve may not fit perfectly with the similarity metric designed for the search engine, the user may be interested in exploring other regions of the feature space. To this end, the user marks the subset of “relevant” images out of the k retrieved. Usually, such relevance feedback is used to perform a new k-nn search by modifying some search parameters, i.e., the position of the query point, the similarity metric, and other tuning parameters [1]-[7]. Recent works proposed the use of support vector machine to learn the distribution of relevant images [8]. These techniques require some assumption about the general form of the distribution of relevant images in the feature space. As it is difficult to make any assumption about such a distribution for broad domain databases, we propose to exploit the information about the relevance of the images retrieved so far in a nearest-neighbor fashion. Nearest-neighbor techniques, as used in statistical pattern recognition, case-based reasoning, or instance-based learning, are effective in all applications where it is difficult to produce a high-level generalization of a “class” of objects [9]-[10],[12][13]. Relevance learning in content base image retrieval may well fit into this definition, as it is difficult to provide a general model that can be adapted to represent different concepts of similarity. In addition, the number of available cases may be too small to estimate the optimal set of parameters for such a general model. On the other hand, it can be more effective to use each “relevant” image as well as each “non-relevant” image, as “cases” or “instances” against which the images of the database should be compared. Consequently, we assume that an image is as much as relevant as much as its dissimilarity from the nearest relevant image is small. Analogously, an image is as much as non-relevant as much as its dissimilarity from the nearest non-relevant image is small. 3 Rel evan ce S core Com p u t ati on According to previous section, each image of the database can be thus characterized by a “degree of relevance” and a “degree of non-relevance” according to the dissimilarities from the nearest relevant image, and from the nearest non-relevant image, respectively. However, it should be noted that these degrees should be treated differently because only “relevant” images represent a “concept” in the user’s mind, while “non-relevant” images may represent a number of other concepts different from user’s interest. In other words, while it is meaningful to treat the degree of relevance as a degree of membership to the class of relevant images, the same does not apply to the degree of non-relevance. For this reason, we propose to use the “degree of non-relevance” to weight the “degree of relevance”. Let us denote with R the subset of indexes j ∈ {1,...,k} related to the set of relevant images retrieved so far and the original query (that is relevant by default), and with NR the subset of indexes j ∈ (1,...,k} related to the set of non-relevant images retrieved so far. For each image I of the database, according to the nearest neighbor rule, let us compute the dissimilarity from the nearest image in R and the dissimilarity from the nearest image in NR. Let us denote these dissimilarities as dR(I) and dNR(I), respectively. The value of dR(I) can be clearly used to measure the degree of relevance of image I, assuming that small values of dR(I) are related to very relevant images. On the other hand, the hypothesis that image I is relevant to the user’s query can be supported by a high value of dNR(I). Accordingly, we defined the relevance score ! dR ( I ) $ relevance ( I ) = # 1 + dN ( I ) &

2 0.984029 27 nips-2004-Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation

Author: Yuanqing Lin, Daniel D. Lee

Abstract: Bayesian Regularization and Nonnegative Deconvolution (BRAND) is proposed for estimating time delays of acoustic signals in reverberant environments. Sparsity of the nonnegative filter coefficients is enforced using an L1 -norm regularization. A probabilistic generative model is used to simultaneously estimate the regularization parameters and filter coefficients from the signal data. Iterative update rules are derived under a Bayesian framework using the Expectation-Maximization procedure. The resulting time delay estimation algorithm is demonstrated on noisy acoustic data.

3 0.97714269 176 nips-2004-Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation

Author: Shantanu Chakrabartty, Gert Cauwenberghs

Abstract: An analog system-on-chip for kernel-based pattern classification and sequence estimation is presented. State transition probabilities conditioned on input data are generated by an integrated support vector machine. Dot product based kernels and support vector coefficients are implemented in analog programmable floating gate translinear circuits, and probabilities are propagated and normalized using sub-threshold current-mode circuits. A 14-input, 24-state, and 720-support vector forward decoding kernel machine is integrated on a 3mm×3mm chip in 0.5µm CMOS technology. Experiments with the processor trained for speaker verification and phoneme sequence estimation demonstrate real-time recognition accuracy at par with floating-point software, at sub-microwatt power. 1

4 0.9551152 138 nips-2004-Online Bounds for Bayesian Algorithms

Author: Sham M. Kakade, Andrew Y. Ng

Abstract: We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression and Bayesian logistic regression) perform favorably when compared, in retrospect, to the single best model in the model class. The analysis does not assume that the Bayesian algorithms’ modeling assumptions are “correct,” and our bounds hold even if the data is adversarially chosen. For Gaussian linear regression (using logloss), our error bounds are comparable to the best bounds in the online learning literature, and we also provide a lower bound showing that Gaussian linear regression is optimal in a certain worst case sense. We also give bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression. 1

same-paper 5 0.95385158 114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension

Author: Elizaveta Levina, Peter J. Bickel

Abstract: We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. 1

6 0.91476405 36 nips-2004-Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification

7 0.87092239 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech

8 0.75569242 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

9 0.6954968 28 nips-2004-Bayesian inference in spiking neurons

10 0.69193971 135 nips-2004-On-Chip Compensation of Device-Mismatch Effects in Analog VLSI Neural Networks

11 0.68818533 58 nips-2004-Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid

12 0.68213385 163 nips-2004-Semi-parametric Exponential Family PCA

13 0.68191183 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms

14 0.68161976 131 nips-2004-Non-Local Manifold Tangent Learning

15 0.67682129 102 nips-2004-Learning first-order Markov models for control

16 0.67450094 116 nips-2004-Message Errors in Belief Propagation

17 0.66914248 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

18 0.66643703 70 nips-2004-Following Curved Regularized Optimization Solution Paths

19 0.66334635 75 nips-2004-Heuristics for Ordering Cue Search in Decision Making

20 0.66214383 119 nips-2004-Mistake Bounds for Maximum Entropy Discrimination