nips nips2006 nips2006-175 knowledge-graph by maker-knowledge-mining

175 nips-2006-Simplifying Mixture Models through Function Approximation

Source: pdf

Author: Kai Zhang, James T. Kwok

Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring signiﬁcant computational beneﬁts in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simpliﬁed models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk Abstract Finite mixture model is a powerful tool in many statistical learning problems. [sent-4, score-0.172]

2 The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simpliﬁed models. [sent-6, score-0.52]

3 By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. [sent-7, score-0.399]

4 Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy. [sent-9, score-0.263]

5 1 Introduction In many statistical learning problems, it is useful to obtain an estimate of the underlying probability density given a set of observations. [sent-10, score-0.127]

6 Such a density model can facilitate discovery of the underlying data structure in unsupervised learning, and can also yield, asymptotically, optimal discriminant procedures [7]. [sent-11, score-0.127]

7 In this paper, we focus on the ﬁnite mixture model, which describes the distribution by a mixture of simple parametric functions φ(·)’s as f (x) = n αj φ(x, θj ). [sent-12, score-0.344]

8 The mixture model has been widely used in clustering and density estimation, where the model parameters can be estimated by the standard Expectation-Maximization (EM) algorithm. [sent-15, score-0.36]

9 On the other hand, note that in many learning processes using mixture models (such as particle ﬁltering [6] and non-parametric belief propagation [13]), the computational requirement is also very demanding due to the large number of components involved in the model. [sent-17, score-0.252]

10 Recently, [5] proposes to reduce a large Gaussian mixture into a smaller one by minimizing a KL-based distance between the two mixtures. [sent-20, score-0.258]

11 In this paper, we propose a new algorithm for simplifying a given ﬁnite mixture model while preserving its component structures, with application to nonparametric density estimation and clustering. [sent-22, score-0.435]

12 The idea is to minimize an upper bound on the approximation error between the original and simpliﬁed mixture models. [sent-23, score-0.378]

13 By adopting the L2 norm as the error criterion, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measures. [sent-24, score-0.215]

14 In Section 3, we report experimental results on simplifying the Parzen window estimator, and color image segmentation through the mean shift clustering procedure. [sent-28, score-0.59]

15 2 Approximation Algorithm Given a mixture model n f (x) = αj φj (x), (1) j=1 we assume that the jth component φj (x) is of the form φj (x) = |Hj |−1/2 KHj (x − xj ) , (2) with weight αj , center xj and covariance matrix Hj . [sent-30, score-0.524]

16 Our task is to approximate f with a simpler mixture model m g(x) = wi gi (x), (3) i=1 with m ≪ n, where each component gi also takes the form ˜ gi (x) = |Hi |−1/2 KHi (x − ti ) , ˜ (4) ˜ with weight wi , center ti , and covariance matrix Hi . [sent-34, score-1.763]

17 Given a distance measure D(·, ·) between functions, the approximation error   n E = D(f, g) = D  m αj φj , j=1 i=1 wi gi  (5) is usually difﬁcult to optimize. [sent-36, score-0.552]

18 Consider the L2 distance D(φ, φ′ ) = (φ(x) − φ′ (x)) dx, and suppose that n the mixture components {φj }j=1 are divided into disjoint clusters S1 , . [sent-38, score-0.356]

19 Then, it is easy to see that the approximation error E is bounded by  2  2 n E=  m αj φj (x) − j=1 i=1 m wi gi (x) dx ≤ m Denote this upper bound by E = m Ei = m i=1 i=1 wi gi (x) − j∈Si αj φj (x) dx. [sent-42, score-0.823]

20 E i , where 2  wi gi (x) − j∈Si αj φj (x) dx. [sent-43, score-0.213]

21 Hence, if we can ﬁnd a good representative wi gi for each cluster by minimizing the local approximation error E i , the overall approximation performance can also be guaranteed. [sent-45, score-0.779]

22 This suggests partitioning the original mixture components into compact clusters, wherein approximation can then be done much more easily. [sent-46, score-0.435]

23 1) Partition the set of mixture components (φj ’s) into m clusters where m ≪ n. [sent-50, score-0.308]

24 2) For each cluster, approximate the local mixture model j∈Si αj φj by a single component wi gi , where gi is deﬁned in (4). [sent-55, score-0.918]

25 The simpliﬁed model g is obtained by g(x) = i=1 wi gi (x). [sent-57, score-0.4]

26 1 Partitioning of Components In this section, we consider how to group similar components into the same cluster, so that the subsequent local approximation can be more accurate. [sent-62, score-0.222]

27 A useful algorithm for this task is the classic vector quantization (VQ) [4], where one iterates between partitioning a set of vectors and ﬁnding the best prototype for each partition until the distortion error converges. [sent-63, score-0.167]

28 By deﬁning a distance D(·, ·) between mixture components φj ’s, we can partition the mixture components in a similar way. [sent-64, score-0.588]

29 , ﬁnd the best representative Ri for each cluster, reassign each component φj to the closest representative Rπ(j) , and iterate until the error j αj D(φj , Rπ(j) ) converges. [sent-81, score-0.27]

30 2 Local Approximation In this part, we consider how to obtain a good representative, wi gi in (4), for each local cluster Si . [sent-84, score-0.537]

31 ˜ The task is to determine the unknown variables wi , ti and Hi associated with gi . [sent-85, score-0.626]

32 Using the L2 norm, the upper bound (6) of the local approximation error can be written as  2 Ei wi gi (x) − = = 2 wi j∈Si αj φj (x) dx CK 2CK αj k(rij ) − wi + ci . [sent-86, score-0.871]

33 ˜ i |1/2 ˜ |2H |Hj + Hi |1/2 j∈Si Here, CK = k(x′ x)dx is a kernel-dependent constant, ci = ( j∈Si αj φ2 (x))2 dx is a dataj ˜ dependent constant (irrelevant to the unknown variables), and rij = (ti −xj )′ (Hj + Hi )−1 (ti −xj ). [sent-87, score-0.324]

34 wi , ti and Hi , one can set the corresponding partial derivatives of E i to zero. [sent-93, score-0.413]

35 First, observe that E i is a quadratic function of wi . [sent-96, score-0.187]

36 ˜ Therefore, given Hi and ti , the minimum value of E i can be easily obtained as  2 min Ei = ˜ 1 |Hi | 2  j∈Si ˜ αj k(rij ) Hj + Hi −1/2  . [sent-97, score-0.226]

37 By setting ∂ti E i ti = M−1 i j∈Si where Mi = j∈Si = 0, we have ˜ αj k ′ (rij ) (Hj + Hi )−1 xj , ˜ i |1/2 |Hj + H (8) ˜ αj k ′ (rij ) (Hj + Hi )−1 . [sent-102, score-0.338]

38 If Hi is ﬁxed, we can obtain ti by starting with an initial min (0) ˜ = 0 and obtain ti , and then iterate (8) until convergence. [sent-104, score-0.49]

39 Now, to solve Hi , we set ∂Hi E i ˜ ˜ Hi = P−1 i j∈Si ˜ αj (Hi + Hj )−1 ˜ ˜ k(rij )Hj + 4(−k ′ (rij ))(xj − ti )(xj − ti )′ (Hi + Hj )−1 Hi , ˜ |Hj + Hi |1/2 (9) where Pi = j∈Si ˜ (Hi + Hj )−1 αj k(rij ). [sent-105, score-0.452]

40 ˜ |Hj + Hi |1/2 In summary, we ﬁrst initialize (0) ti ˜ (0) H i = j∈Si αj xj /( j∈Si αj ), (0) (0) Hj + (ti − xj )(ti = j∈Si αj − xj )′ /( j∈Si αj ), ˜ and then iterate (8) and (9) until convergence. [sent-106, score-0.6]

41 The converged values of ti and Hi are substituted into ∂wi E i = 0 to obtain wi as 1 ˜ wi = |2Hi | 2 j∈Si αj k(rij ) . [sent-107, score-0.6]

42 2 Complexity In the partitioning step, sequential sampling has a complexity of O(dmn), where n is original model size, m is the number of clusters, and d the dimension. [sent-109, score-0.146]

43 In the local approximation step, m the complexity is l i=1 ni d3 = lnd3 , where l is the maximum number of iterations needed. [sent-112, score-0.182]

44 Summing up these three terms, the overall complexity is O(dn log(m) + dnm + lnd) = O(dn(m + l)), which is linear in both the data size and dimension (in practice m and l are quite small). [sent-115, score-0.114]

45 To have better intuitions, we examine the special case of a Parzen window density estimator [11], where all φj ’s have the same weights and bandwidths (Hj = H for j = 1, 2, . [sent-120, score-0.406]

46 Equation (9) then reduces to ˜ ˜ ˜ Hi = H + 4Hi (Hi + H)−1 Vi , where Vi = j∈Si αj (−k ′ (rij ))(xj − ti )(xj − ti )′ j∈Si αj k(rij ) (11) . [sent-124, score-0.452]

47 ˜ It shows that the bandwidth Hi of gi can be decomposed into two parts: the bandwidth H of the original kernel density estimator, and the covariance Vi of the local cluster Si with an adjusting ˜ ˜ ˜ matrix Γi = 4Hi (Hi + H)−1 . [sent-125, score-0.811]

48 i i i Moreover, hi is closely related to the spread of the local cluster. [sent-129, score-0.616]

49 Otherwise, the larger the spread of the local cluster, i ˜ the larger is hi . [sent-133, score-0.616]

50 In other words, the bandwidths Hi ’s are adaptive to the local data distribution. [sent-134, score-0.133]

51 ˜ Related works in simplifying the mixture models (such as [5]) simply choose Hi = H + Cov[Si ]. [sent-135, score-0.245]

52 Interestingly, this is somewhat similar to the bandwidth matrix used in the manifold Parzen windows [14], which is designed for handling sparse, high-dimensional data more robustly. [sent-137, score-0.134]

53 Second, in determining the center of gi , (8) can be reduced to ti = j∈Si ′ αj kH+H (xj − ti ) xj ˜ j∈Si i ′ αj kH+H (xj − ti ) ˜ . [sent-142, score-1.043]

54 It is easy to verify that this iterative procedure is indeed locating the peak of the density function 1 ˜ pi (x) = |H + Hi |− 2 j∈Si KH+Hi (x − xj ). [sent-144, score-0.322]

55 Note, on the other hand, that what we want to ˜ 1 approximate originally is the local density fi (x) = |H|− 2 j∈Si KH (x − xj ). [sent-145, score-0.418]

56 In the 1-D case ˜ (with H = h2 , and Hi = h2 ), the bandwidth of pi (i. [sent-146, score-0.165]

57 i i It appears intriguing that on ﬁtting a kernel density fi (x) estimated on the sample set {xj }j∈Si , one needs to locate the maximum of another density function pi (x), instead of the maximum of fi (x) itself or simply, the mean of the sample set {xj }j∈Si as chosen in [5]. [sent-151, score-0.562]

58 Intuitively, when the data is asymmetric, the center ti should be biased towards the heavier side of the data distribution. [sent-153, score-0.368]

59 On the other hand, the mean of Si , though biased towards the heavier side, still lacks an accurate control on the degree of bias. [sent-155, score-0.122]

60 Note that pi (x) has a larger bandwidth than the original fi (x). [sent-157, score-0.27]

61 Therefore, its maximum will move towards the heavier side of the distribution compared with that of fi (x), with the degree of bias automatically controlled by the mean shift iterations in (12). [sent-158, score-0.351]

62 Figure 1(a) shows the histogram of a local cluster Si , whose Parzen window estimator (fi ) is asymmetric. [sent-160, score-0.388]

63 Figure 1(b) plots the corresponding approximation error E i (6) at different bandwidths hi (the remaining parameter, wi , is set to the optimal value by (10) ). [sent-161, score-0.863]

64 10 35 8 approximation error histogram 25 20 15 10 7 6 5 4 3 2 1 5 0 0 local maximu local mean our method 9 30 1. [sent-164, score-0.325]

65 5 3 x (a) The histogram of a local cluster Si and its density fi . [sent-166, score-0.375]

66 Figure 1: Approximation of an asymmetric density using different center selection schemes. [sent-174, score-0.194]

67 3 Experiments In this section, we perform experiments to evaluate the performance of our mixture simpliﬁcation scheme. [sent-175, score-0.172]

68 We focus on the Parzen window estimator which, on given a set of samples S = {xi }n in i=1 1 n 1 ˆ Rd , can be written as f (x) = n |H|− 2 j=1 KH (x − xj ) . [sent-176, score-0.332]

69 Note that the Parzen window estimator is a limiting form of the mixture model, where the number of components equals the data size and can be quite huge. [sent-177, score-0.496]

70 1, we use the proposed approach to reduce the number of components in the kernel density estimator, and compare its performance with the algorithm in [5]. [sent-179, score-0.276]

71 2, we perform color image segmentation by running the mean shift clustering algorithm on the simpliﬁed density model. [sent-181, score-0.556]

72 1 Simplifying Nonparametric Density Models In this section, we reduce the number of kernels in the Parzen window estimator by using the proposed approach and the method in [5]. [sent-183, score-0.288]

73 Experiments are performed on a 1-D set with 1800 samples 6 4 8 drawn from the Gaussian mixture 18 N (−2. [sent-184, score-0.172]

74 Figure 2: Approximate the Parzen window estimator by simplifying mixture models. [sent-220, score-0.465]

75 Green: Parzen window estimator; black: simpliﬁed mixture model; blue-dashed: components of the mixture model. [sent-221, score-0.512]

76 To have a quantitative evaluation, we randomly generate the 3-Gaussian data 100 times, and compare the two algorithms (ours and [5]) using the following error criteria: 1) the L2 error (5); 2) the standard KL distance; 3) the local KL-distance used in [5]. [sent-222, score-0.146]

77 The local KL-distance between two n m mixtures, f = j=1 αj φj and g = i=1 wi gi , is deﬁned as n αj KL(φj ||gπ(j) ), d(f, g) = j=1 where π(j) is the function that maps each component φj to the closest representative component gπ(j) such that π(j) = arg mini=1,2,. [sent-223, score-0.612]

78 2 Image Segmentation The Parzen window estimator can be used to reveal important clustering information, namely that its modes (or local maxima) correspond to dominant clusters in the data. [sent-239, score-0.411]

79 5 local KL error KL distance 2 L −norm error 3400 5 3. [sent-243, score-0.194]

80 mean shift clustering algorithm [1, 3], where every data point is moved along the density gradient until it reaches the nearest local density maximum. [sent-248, score-0.583]

81 The mean shift algorithm is robust, and can identify arbitrarily-shaped clusters in the feature space. [sent-249, score-0.25]

82 Recently, mean shift is applied in color image segmentation and has proven to be quite successful [1]. [sent-250, score-0.392]

83 However, mean shift can be quite expensive due to the large number of kernels involved in the density estimator. [sent-252, score-0.375]

84 To reduce the computational requirement, we ˆ ﬁrst reduce the density estimator f (x) to a simpler model g(x) using our simpliﬁcation scheme, and then apply the iterative mean shift procedure on the simpliﬁed model g(x). [sent-253, score-0.559]

85 We use the Gaussian kernel with bandwidth h = 20. [sent-255, score-0.143]

86 For comparison, we also implement the standard mean shift and its fast version using kd-trees (using the ANN library [10]). [sent-257, score-0.224]

87 As the “true” segmentation of an image is subjective, so only a visual comparison is intended here. [sent-260, score-0.136]

88 Table 1: Total wall time (in seconds) on various segmentation tasks, and the number of components in g(x). [sent-261, score-0.172]

89 image squirrel hand house lake data size 60,192 (209×288) 73,386 (243×302) 48,960 (192×255) 262,144 (512×512) mean shift standard kd-tree 1215. [sent-262, score-0.26]

90 The rows, from top to bottom, are: the original image, segmentation results by standard mean shift and our approach. [sent-275, score-0.311]

91 We can see that our results are closer to those by the standard mean shift (applied on the original density estimator), with the number of components (Table 1) dramatically smaller than the data size n. [sent-276, score-0.426]

92 This demonstrates the success of our approximation scheme in maintaining the structure of the data distribution using highly compact models. [sent-277, score-0.129]

93 Our algorithm is also much faster than the standard mean shift and its fast version using kdtrees. [sent-278, score-0.194]

94 4 Conclusion Finite mixture is a powerful model in many statistical learning problems. [sent-280, score-0.172]

95 In this paper, we reduce the model complexity by ﬁrst grouping the components into compact clusters, and then perform local function approximation that minimizes an upper bound of the approximation error. [sent-282, score-0.46]

96 html Figure 4: Image segmentation by standard mean shift (2nd row), and ours (bottom). [sent-288, score-0.286]

97 The estimation of the gradient of a density function, with applications in pattern recognition. [sent-302, score-0.127]

98 Incremental density approximation and kernel-based Bayesian ﬁltering for object tracking. [sent-321, score-0.195]

99 Very fast EM-based mixture model clustering using multiresolution kd-trees. [sent-344, score-0.233]

100 On estimation of a probability density function and mode. [sent-357, score-0.127]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hi', 0.513), ('hj', 0.283), ('rij', 0.273), ('si', 0.258), ('ti', 0.226), ('gi', 0.213), ('parzen', 0.19), ('wi', 0.187), ('mixture', 0.172), ('kh', 0.157), ('shift', 0.152), ('estimator', 0.132), ('density', 0.127), ('bandwidth', 0.112), ('xj', 0.112), ('segmentation', 0.092), ('window', 0.088), ('fi', 0.08), ('components', 0.08), ('vi', 0.078), ('local', 0.074), ('simplifying', 0.073), ('representative', 0.07), ('approximation', 0.068), ('ri', 0.066), ('cluster', 0.063), ('clustering', 0.061), ('simpli', 0.06), ('bandwidths', 0.059), ('clusters', 0.056), ('heavier', 0.055), ('vq', 0.055), ('partitioning', 0.053), ('pi', 0.053), ('kl', 0.052), ('dx', 0.051), ('dnm', 0.05), ('mount', 0.05), ('distance', 0.048), ('image', 0.044), ('reliable', 0.043), ('comaniciu', 0.043), ('quantization', 0.042), ('mean', 0.042), ('center', 0.04), ('complexity', 0.04), ('dn', 0.04), ('hong', 0.039), ('ann', 0.039), ('iterate', 0.038), ('reduce', 0.038), ('color', 0.038), ('compact', 0.037), ('kong', 0.037), ('error', 0.036), ('partition', 0.036), ('ei', 0.035), ('norm', 0.035), ('representatives', 0.035), ('component', 0.034), ('upper', 0.032), ('finite', 0.031), ('kernel', 0.031), ('histogram', 0.031), ('iterative', 0.03), ('adopting', 0.03), ('library', 0.03), ('kernels', 0.03), ('spread', 0.029), ('nonparametric', 0.029), ('sensitive', 0.029), ('sequential', 0.028), ('asymmetric', 0.027), ('jth', 0.027), ('adjusting', 0.027), ('covariance', 0.027), ('ck', 0.026), ('approximate', 0.025), ('original', 0.025), ('biased', 0.025), ('quite', 0.024), ('scheme', 0.024), ('bound', 0.023), ('hierarchical', 0.023), ('robust', 0.023), ('minimize', 0.022), ('transactions', 0.022), ('side', 0.022), ('illustration', 0.022), ('manifold', 0.022), ('gersho', 0.022), ('lake', 0.022), ('locate', 0.022), ('han', 0.022), ('reassign', 0.022), ('silverman', 0.022), ('bay', 0.022), ('kowloon', 0.022), ('kwok', 0.022), ('goldberger', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 175 nips-2006-Simplifying Mixture Models through Function Approximation

Author: Kai Zhang, James T. Kwok

2 0.22574271 20 nips-2006-Active learning for misspecified generalized linear models

Author: Francis R. Bach

Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspeciﬁcation, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classiﬁcation, multi-class classiﬁcation, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1

3 0.21063465 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

Author: Andrea Vedaldi, Stefano Soatto

Abstract: Image Congealing (IC) is a non-parametric method for the joint alignment of a collection of images affected by systematic and unwanted deformations. The method attempts to undo the deformations by minimizing a measure of complexity of the image ensemble, such as the averaged per-pixel entropy. This enables alignment without an explicit model of the aligned dataset as required by other methods (e.g. transformed component analysis). While IC is simple and general, it may introduce degenerate solutions when the transformations allow minimizing the complexity of the data by collapsing them to a constant. Such solutions need to be explicitly removed by regularization. In this paper we propose an alternative formulation which solves this regularization issue on a more principled ground. We make the simple observation that alignment should simplify the data while preserving the useful information carried by them. Therefore we trade off ﬁdelity and complexity of the aligned ensemble rather than minimizing the complexity alone. This eliminates the need for an explicit regularization of the transformations, and has a number of other useful properties such as noise suppression. We show the modeling and computational beneﬁts of the approach to the some of the problems on which IC has been demonstrated. 1

4 0.18006371 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

Author: Zhenyue Zhang, Jing Wang

Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modiﬁed locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efﬁciency of MLLE. 1

5 0.12512578 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

Author: Gloria Haro, Gregory Randall, Guillermo Sapiro

Abstract: The study of point cloud data sampled from a stratiﬁcation, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artiﬁcial and real examples demonstrating the importance of extending manifold learning to stratiﬁcation learning. 1

6 0.12241675 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

7 0.11444797 131 nips-2006-Mixture Regression for Covariate Shift

8 0.093981415 50 nips-2006-Chained Boosting

9 0.091350794 80 nips-2006-Fundamental Limitations of Spectral Clustering

10 0.090748943 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

11 0.090575784 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

12 0.088366859 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

13 0.08745864 158 nips-2006-PG-means: learning the number of clusters in data

14 0.086419985 7 nips-2006-A Local Learning Approach for Clustering

15 0.082808547 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

16 0.077662647 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

17 0.075846776 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

18 0.073064677 103 nips-2006-Kernels on Structured Objects Through Nested Histograms

19 0.07009656 31 nips-2006-Analysis of Contour Motions

20 0.070093036 164 nips-2006-Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.24), (1, 0.053), (2, 0.038), (3, 0.094), (4, 0.05), (5, -0.016), (6, -0.018), (7, -0.029), (8, 0.027), (9, 0.273), (10, 0.001), (11, 0.033), (12, 0.318), (13, 0.24), (14, -0.015), (15, -0.061), (16, 0.088), (17, 0.089), (18, -0.051), (19, -0.094), (20, -0.06), (21, -0.024), (22, -0.148), (23, 0.051), (24, 0.129), (25, 0.029), (26, 0.027), (27, -0.095), (28, -0.011), (29, -0.025), (30, 0.03), (31, 0.004), (32, -0.035), (33, -0.053), (34, -0.059), (35, -0.108), (36, 0.029), (37, 0.013), (38, 0.016), (39, 0.045), (40, -0.071), (41, -0.008), (42, 0.029), (43, -0.077), (44, -0.045), (45, 0.012), (46, -0.127), (47, 0.013), (48, 0.032), (49, -0.056)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96474874 175 nips-2006-Simplifying Mixture Models through Function Approximation

Author: Kai Zhang, James T. Kwok

2 0.7516343 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

Author: Andrea Vedaldi, Stefano Soatto

3 0.74098265 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

Author: Zhenyue Zhang, Jing Wang

4 0.70677233 20 nips-2006-Active learning for misspecified generalized linear models

Author: Francis R. Bach

5 0.48493144 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

Author: Steffen Bickel, Tobias Scheffer

Abstract: We study a setting that is motivated by the problem of ﬁltering spam messages for many users. Each user receives messages according to an individual, unknown distribution, reﬂected only in the unlabeled inbox. The spam ﬁlter for a user is required to perform well with respect to this distribution. Labeled messages from publicly available sources can be utilized, but they are governed by a distinct distribution, not adequately representing most inboxes. We devise a method that minimizes a loss function with respect to a user’s personal distribution based on the available biased sample. A nonparametric hierarchical Bayesian model furthermore generalizes across users by learning a common prior which is imposed on new email accounts. Empirically, we observe that bias-corrected learning outperforms naive reliance on the assumption of independent and identically distributed data; Dirichlet-enhanced generalization across users outperforms a single (“one size ﬁts all”) ﬁlter as well as independent ﬁlters for all users. 1

6 0.4381375 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects

7 0.43502879 131 nips-2006-Mixture Regression for Covariate Shift

8 0.41739753 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

9 0.3813076 158 nips-2006-PG-means: learning the number of clusters in data

10 0.36205241 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

11 0.33844444 50 nips-2006-Chained Boosting

12 0.33703321 194 nips-2006-Towards a general independent subspace analysis

13 0.33303854 7 nips-2006-A Local Learning Approach for Clustering

14 0.33245793 121 nips-2006-Learning to be Bayesian without Supervision

15 0.32653499 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures

16 0.32268208 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

17 0.31197229 129 nips-2006-Map-Reduce for Machine Learning on Multicore

18 0.30374926 80 nips-2006-Fundamental Limitations of Spectral Clustering

19 0.29259259 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

20 0.29036471 180 nips-2006-Speakers optimize information density through syntactic reduction

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.106), (3, 0.04), (7, 0.069), (9, 0.069), (20, 0.021), (22, 0.096), (30, 0.195), (44, 0.088), (57, 0.097), (65, 0.082), (69, 0.021), (71, 0.015), (90, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90565407 189 nips-2006-Temporal dynamics of information content carried by neurons in the primary visual cortex

Author: Danko Nikolić, Stefan Haeusler, Wolf Singer, Wolfgang Maass

Abstract: We use multi-electrode recordings from cat primary visual cortex and investigate whether a simple linear classifier can extract information about the presented stimuli. We find that information is extractable and that it even lasts for several hundred milliseconds after the stimulus has been removed. In a fast sequence of stimulus presentation, information about both new and old stimuli is present simultaneously and nonlinear relations between these stimuli can be extracted. These results suggest nonlinear properties of cortical representations. The important implications of these properties for the nonlinear brain theory are discussed.

same-paper 2 0.85613179 175 nips-2006-Simplifying Mixture Models through Function Approximation

Author: Kai Zhang, James T. Kwok

3 0.7527312 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the ﬁrst few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these ﬁndings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure ﬁnds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

4 0.74012887 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: We show that the relevant information about a classiﬁcation problem in feature space is contained up to negligible error in a ﬁnite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efﬁcient implicit representations of the data to perform classiﬁcation. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classiﬁcation. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classiﬁcation results. 1

5 0.73902285 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

Author: Andrea Vedaldi, Stefano Soatto

6 0.73564684 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

7 0.73315001 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

8 0.73081857 79 nips-2006-Fast Iterative Kernel PCA

9 0.72802043 61 nips-2006-Convex Repeated Games and Fenchel Duality

10 0.72776788 20 nips-2006-Active learning for misspecified generalized linear models

11 0.7268182 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

12 0.72399598 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

13 0.72029674 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

14 0.71977592 117 nips-2006-Learning on Graph with Laplacian Regularization

15 0.7192663 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

16 0.71889204 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

17 0.7187348 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

18 0.71722901 194 nips-2006-Towards a general independent subspace analysis

19 0.71546513 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

20 0.715087 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation