nips nips2006 nips2006-175 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
Reference: text
sentIndex sentText sentNum sentScore
1 hk Abstract Finite mixture model is a powerful tool in many statistical learning problems. [sent-4, score-0.172]
2 The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. [sent-6, score-0.52]
3 By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. [sent-7, score-0.399]
4 Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy. [sent-9, score-0.263]
5 1 Introduction In many statistical learning problems, it is useful to obtain an estimate of the underlying probability density given a set of observations. [sent-10, score-0.127]
6 Such a density model can facilitate discovery of the underlying data structure in unsupervised learning, and can also yield, asymptotically, optimal discriminant procedures [7]. [sent-11, score-0.127]
7 In this paper, we focus on the finite mixture model, which describes the distribution by a mixture of simple parametric functions φ(·)’s as f (x) = n αj φ(x, θj ). [sent-12, score-0.344]
8 The mixture model has been widely used in clustering and density estimation, where the model parameters can be estimated by the standard Expectation-Maximization (EM) algorithm. [sent-15, score-0.36]
9 On the other hand, note that in many learning processes using mixture models (such as particle filtering [6] and non-parametric belief propagation [13]), the computational requirement is also very demanding due to the large number of components involved in the model. [sent-17, score-0.252]
10 Recently, [5] proposes to reduce a large Gaussian mixture into a smaller one by minimizing a KL-based distance between the two mixtures. [sent-20, score-0.258]
11 In this paper, we propose a new algorithm for simplifying a given finite mixture model while preserving its component structures, with application to nonparametric density estimation and clustering. [sent-22, score-0.435]
12 The idea is to minimize an upper bound on the approximation error between the original and simplified mixture models. [sent-23, score-0.378]
13 By adopting the L2 norm as the error criterion, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measures. [sent-24, score-0.215]
14 In Section 3, we report experimental results on simplifying the Parzen window estimator, and color image segmentation through the mean shift clustering procedure. [sent-28, score-0.59]
15 2 Approximation Algorithm Given a mixture model n f (x) = αj φj (x), (1) j=1 we assume that the jth component φj (x) is of the form φj (x) = |Hj |−1/2 KHj (x − xj ) , (2) with weight αj , center xj and covariance matrix Hj . [sent-30, score-0.524]
16 Our task is to approximate f with a simpler mixture model m g(x) = wi gi (x), (3) i=1 with m ≪ n, where each component gi also takes the form ˜ gi (x) = |Hi |−1/2 KHi (x − ti ) , ˜ (4) ˜ with weight wi , center ti , and covariance matrix Hi . [sent-34, score-1.763]
17 Given a distance measure D(·, ·) between functions, the approximation error n E = D(f, g) = D m αj φj , j=1 i=1 wi gi (5) is usually difficult to optimize. [sent-36, score-0.552]
18 Consider the L2 distance D(φ, φ′ ) = (φ(x) − φ′ (x)) dx, and suppose that n the mixture components {φj }j=1 are divided into disjoint clusters S1 , . [sent-38, score-0.356]
19 Then, it is easy to see that the approximation error E is bounded by 2 2 n E= m αj φj (x) − j=1 i=1 m wi gi (x) dx ≤ m Denote this upper bound by E = m Ei = m i=1 i=1 wi gi (x) − j∈Si αj φj (x) dx. [sent-42, score-0.823]
20 E i , where 2 wi gi (x) − j∈Si αj φj (x) dx. [sent-43, score-0.213]
21 Hence, if we can find a good representative wi gi for each cluster by minimizing the local approximation error E i , the overall approximation performance can also be guaranteed. [sent-45, score-0.779]
22 This suggests partitioning the original mixture components into compact clusters, wherein approximation can then be done much more easily. [sent-46, score-0.435]
23 1) Partition the set of mixture components (φj ’s) into m clusters where m ≪ n. [sent-50, score-0.308]
24 2) For each cluster, approximate the local mixture model j∈Si αj φj by a single component wi gi , where gi is defined in (4). [sent-55, score-0.918]
25 The simplified model g is obtained by g(x) = i=1 wi gi (x). [sent-57, score-0.4]
26 1 Partitioning of Components In this section, we consider how to group similar components into the same cluster, so that the subsequent local approximation can be more accurate. [sent-62, score-0.222]
27 A useful algorithm for this task is the classic vector quantization (VQ) [4], where one iterates between partitioning a set of vectors and finding the best prototype for each partition until the distortion error converges. [sent-63, score-0.167]
28 By defining a distance D(·, ·) between mixture components φj ’s, we can partition the mixture components in a similar way. [sent-64, score-0.588]
29 , find the best representative Ri for each cluster, reassign each component φj to the closest representative Rπ(j) , and iterate until the error j αj D(φj , Rπ(j) ) converges. [sent-81, score-0.27]
30 2 Local Approximation In this part, we consider how to obtain a good representative, wi gi in (4), for each local cluster Si . [sent-84, score-0.537]
31 ˜ The task is to determine the unknown variables wi , ti and Hi associated with gi . [sent-85, score-0.626]
32 Using the L2 norm, the upper bound (6) of the local approximation error can be written as 2 Ei wi gi (x) − = = 2 wi j∈Si αj φj (x) dx CK 2CK αj k(rij ) − wi + ci . [sent-86, score-0.871]
33 ˜ i |1/2 ˜ |2H |Hj + Hi |1/2 j∈Si Here, CK = k(x′ x)dx is a kernel-dependent constant, ci = ( j∈Si αj φ2 (x))2 dx is a dataj ˜ dependent constant (irrelevant to the unknown variables), and rij = (ti −xj )′ (Hj + Hi )−1 (ti −xj ). [sent-87, score-0.324]
34 wi , ti and Hi , one can set the corresponding partial derivatives of E i to zero. [sent-93, score-0.413]
35 First, observe that E i is a quadratic function of wi . [sent-96, score-0.187]
36 ˜ Therefore, given Hi and ti , the minimum value of E i can be easily obtained as 2 min Ei = ˜ 1 |Hi | 2 j∈Si ˜ αj k(rij ) Hj + Hi −1/2 . [sent-97, score-0.226]
37 By setting ∂ti E i ti = M−1 i j∈Si where Mi = j∈Si = 0, we have ˜ αj k ′ (rij ) (Hj + Hi )−1 xj , ˜ i |1/2 |Hj + H (8) ˜ αj k ′ (rij ) (Hj + Hi )−1 . [sent-102, score-0.338]
38 If Hi is fixed, we can obtain ti by starting with an initial min (0) ˜ = 0 and obtain ti , and then iterate (8) until convergence. [sent-104, score-0.49]
39 Now, to solve Hi , we set ∂Hi E i ˜ ˜ Hi = P−1 i j∈Si ˜ αj (Hi + Hj )−1 ˜ ˜ k(rij )Hj + 4(−k ′ (rij ))(xj − ti )(xj − ti )′ (Hi + Hj )−1 Hi , ˜ |Hj + Hi |1/2 (9) where Pi = j∈Si ˜ (Hi + Hj )−1 αj k(rij ). [sent-105, score-0.452]
40 ˜ |Hj + Hi |1/2 In summary, we first initialize (0) ti ˜ (0) H i = j∈Si αj xj /( j∈Si αj ), (0) (0) Hj + (ti − xj )(ti = j∈Si αj − xj )′ /( j∈Si αj ), ˜ and then iterate (8) and (9) until convergence. [sent-106, score-0.6]
41 The converged values of ti and Hi are substituted into ∂wi E i = 0 to obtain wi as 1 ˜ wi = |2Hi | 2 j∈Si αj k(rij ) . [sent-107, score-0.6]
42 2 Complexity In the partitioning step, sequential sampling has a complexity of O(dmn), where n is original model size, m is the number of clusters, and d the dimension. [sent-109, score-0.146]
43 In the local approximation step, m the complexity is l i=1 ni d3 = lnd3 , where l is the maximum number of iterations needed. [sent-112, score-0.182]
44 Summing up these three terms, the overall complexity is O(dn log(m) + dnm + lnd) = O(dn(m + l)), which is linear in both the data size and dimension (in practice m and l are quite small). [sent-115, score-0.114]
45 To have better intuitions, we examine the special case of a Parzen window density estimator [11], where all φj ’s have the same weights and bandwidths (Hj = H for j = 1, 2, . [sent-120, score-0.406]
46 Equation (9) then reduces to ˜ ˜ ˜ Hi = H + 4Hi (Hi + H)−1 Vi , where Vi = j∈Si αj (−k ′ (rij ))(xj − ti )(xj − ti )′ j∈Si αj k(rij ) (11) . [sent-124, score-0.452]
47 ˜ It shows that the bandwidth Hi of gi can be decomposed into two parts: the bandwidth H of the original kernel density estimator, and the covariance Vi of the local cluster Si with an adjusting ˜ ˜ ˜ matrix Γi = 4Hi (Hi + H)−1 . [sent-125, score-0.811]
48 i i i Moreover, hi is closely related to the spread of the local cluster. [sent-129, score-0.616]
49 Otherwise, the larger the spread of the local cluster, i ˜ the larger is hi . [sent-133, score-0.616]
50 In other words, the bandwidths Hi ’s are adaptive to the local data distribution. [sent-134, score-0.133]
51 ˜ Related works in simplifying the mixture models (such as [5]) simply choose Hi = H + Cov[Si ]. [sent-135, score-0.245]
52 Interestingly, this is somewhat similar to the bandwidth matrix used in the manifold Parzen windows [14], which is designed for handling sparse, high-dimensional data more robustly. [sent-137, score-0.134]
53 Second, in determining the center of gi , (8) can be reduced to ti = j∈Si ′ αj kH+H (xj − ti ) xj ˜ j∈Si i ′ αj kH+H (xj − ti ) ˜ . [sent-142, score-1.043]
54 It is easy to verify that this iterative procedure is indeed locating the peak of the density function 1 ˜ pi (x) = |H + Hi |− 2 j∈Si KH+Hi (x − xj ). [sent-144, score-0.322]
55 Note, on the other hand, that what we want to ˜ 1 approximate originally is the local density fi (x) = |H|− 2 j∈Si KH (x − xj ). [sent-145, score-0.418]
56 In the 1-D case ˜ (with H = h2 , and Hi = h2 ), the bandwidth of pi (i. [sent-146, score-0.165]
57 i i It appears intriguing that on fitting a kernel density fi (x) estimated on the sample set {xj }j∈Si , one needs to locate the maximum of another density function pi (x), instead of the maximum of fi (x) itself or simply, the mean of the sample set {xj }j∈Si as chosen in [5]. [sent-151, score-0.562]
58 Intuitively, when the data is asymmetric, the center ti should be biased towards the heavier side of the data distribution. [sent-153, score-0.368]
59 On the other hand, the mean of Si , though biased towards the heavier side, still lacks an accurate control on the degree of bias. [sent-155, score-0.122]
60 Note that pi (x) has a larger bandwidth than the original fi (x). [sent-157, score-0.27]
61 Therefore, its maximum will move towards the heavier side of the distribution compared with that of fi (x), with the degree of bias automatically controlled by the mean shift iterations in (12). [sent-158, score-0.351]
62 Figure 1(a) shows the histogram of a local cluster Si , whose Parzen window estimator (fi ) is asymmetric. [sent-160, score-0.388]
63 Figure 1(b) plots the corresponding approximation error E i (6) at different bandwidths hi (the remaining parameter, wi , is set to the optimal value by (10) ). [sent-161, score-0.863]
64 10 35 8 approximation error histogram 25 20 15 10 7 6 5 4 3 2 1 5 0 0 local maximu local mean our method 9 30 1. [sent-164, score-0.325]
65 5 3 x (a) The histogram of a local cluster Si and its density fi . [sent-166, score-0.375]
66 Figure 1: Approximation of an asymmetric density using different center selection schemes. [sent-174, score-0.194]
67 3 Experiments In this section, we perform experiments to evaluate the performance of our mixture simplification scheme. [sent-175, score-0.172]
68 We focus on the Parzen window estimator which, on given a set of samples S = {xi }n in i=1 1 n 1 ˆ Rd , can be written as f (x) = n |H|− 2 j=1 KH (x − xj ) . [sent-176, score-0.332]
69 Note that the Parzen window estimator is a limiting form of the mixture model, where the number of components equals the data size and can be quite huge. [sent-177, score-0.496]
70 1, we use the proposed approach to reduce the number of components in the kernel density estimator, and compare its performance with the algorithm in [5]. [sent-179, score-0.276]
71 2, we perform color image segmentation by running the mean shift clustering algorithm on the simplified density model. [sent-181, score-0.556]
72 1 Simplifying Nonparametric Density Models In this section, we reduce the number of kernels in the Parzen window estimator by using the proposed approach and the method in [5]. [sent-183, score-0.288]
73 Experiments are performed on a 1-D set with 1800 samples 6 4 8 drawn from the Gaussian mixture 18 N (−2. [sent-184, score-0.172]
74 Figure 2: Approximate the Parzen window estimator by simplifying mixture models. [sent-220, score-0.465]
75 Green: Parzen window estimator; black: simplified mixture model; blue-dashed: components of the mixture model. [sent-221, score-0.512]
76 To have a quantitative evaluation, we randomly generate the 3-Gaussian data 100 times, and compare the two algorithms (ours and [5]) using the following error criteria: 1) the L2 error (5); 2) the standard KL distance; 3) the local KL-distance used in [5]. [sent-222, score-0.146]
77 The local KL-distance between two n m mixtures, f = j=1 αj φj and g = i=1 wi gi , is defined as n αj KL(φj ||gπ(j) ), d(f, g) = j=1 where π(j) is the function that maps each component φj to the closest representative component gπ(j) such that π(j) = arg mini=1,2,. [sent-223, score-0.612]
78 2 Image Segmentation The Parzen window estimator can be used to reveal important clustering information, namely that its modes (or local maxima) correspond to dominant clusters in the data. [sent-239, score-0.411]
79 5 local KL error KL distance 2 L −norm error 3400 5 3. [sent-243, score-0.194]
80 mean shift clustering algorithm [1, 3], where every data point is moved along the density gradient until it reaches the nearest local density maximum. [sent-248, score-0.583]
81 The mean shift algorithm is robust, and can identify arbitrarily-shaped clusters in the feature space. [sent-249, score-0.25]
82 Recently, mean shift is applied in color image segmentation and has proven to be quite successful [1]. [sent-250, score-0.392]
83 However, mean shift can be quite expensive due to the large number of kernels involved in the density estimator. [sent-252, score-0.375]
84 To reduce the computational requirement, we ˆ first reduce the density estimator f (x) to a simpler model g(x) using our simplification scheme, and then apply the iterative mean shift procedure on the simplified model g(x). [sent-253, score-0.559]
85 We use the Gaussian kernel with bandwidth h = 20. [sent-255, score-0.143]
86 For comparison, we also implement the standard mean shift and its fast version using kd-trees (using the ANN library [10]). [sent-257, score-0.224]
87 As the “true” segmentation of an image is subjective, so only a visual comparison is intended here. [sent-260, score-0.136]
88 Table 1: Total wall time (in seconds) on various segmentation tasks, and the number of components in g(x). [sent-261, score-0.172]
89 image squirrel hand house lake data size 60,192 (209×288) 73,386 (243×302) 48,960 (192×255) 262,144 (512×512) mean shift standard kd-tree 1215. [sent-262, score-0.26]
90 The rows, from top to bottom, are: the original image, segmentation results by standard mean shift and our approach. [sent-275, score-0.311]
91 We can see that our results are closer to those by the standard mean shift (applied on the original density estimator), with the number of components (Table 1) dramatically smaller than the data size n. [sent-276, score-0.426]
92 This demonstrates the success of our approximation scheme in maintaining the structure of the data distribution using highly compact models. [sent-277, score-0.129]
93 Our algorithm is also much faster than the standard mean shift and its fast version using kdtrees. [sent-278, score-0.194]
94 4 Conclusion Finite mixture is a powerful model in many statistical learning problems. [sent-280, score-0.172]
95 In this paper, we reduce the model complexity by first grouping the components into compact clusters, and then perform local function approximation that minimizes an upper bound of the approximation error. [sent-282, score-0.46]
96 html Figure 4: Image segmentation by standard mean shift (2nd row), and ours (bottom). [sent-288, score-0.286]
97 The estimation of the gradient of a density function, with applications in pattern recognition. [sent-302, score-0.127]
98 Incremental density approximation and kernel-based Bayesian filtering for object tracking. [sent-321, score-0.195]
99 Very fast EM-based mixture model clustering using multiresolution kd-trees. [sent-344, score-0.233]
100 On estimation of a probability density function and mode. [sent-357, score-0.127]
wordName wordTfidf (topN-words)
[('hi', 0.513), ('hj', 0.283), ('rij', 0.273), ('si', 0.258), ('ti', 0.226), ('gi', 0.213), ('parzen', 0.19), ('wi', 0.187), ('mixture', 0.172), ('kh', 0.157), ('shift', 0.152), ('estimator', 0.132), ('density', 0.127), ('bandwidth', 0.112), ('xj', 0.112), ('segmentation', 0.092), ('window', 0.088), ('fi', 0.08), ('components', 0.08), ('vi', 0.078), ('local', 0.074), ('simplifying', 0.073), ('representative', 0.07), ('approximation', 0.068), ('ri', 0.066), ('cluster', 0.063), ('clustering', 0.061), ('simpli', 0.06), ('bandwidths', 0.059), ('clusters', 0.056), ('heavier', 0.055), ('vq', 0.055), ('partitioning', 0.053), ('pi', 0.053), ('kl', 0.052), ('dx', 0.051), ('dnm', 0.05), ('mount', 0.05), ('distance', 0.048), ('image', 0.044), ('reliable', 0.043), ('comaniciu', 0.043), ('quantization', 0.042), ('mean', 0.042), ('center', 0.04), ('complexity', 0.04), ('dn', 0.04), ('hong', 0.039), ('ann', 0.039), ('iterate', 0.038), ('reduce', 0.038), ('color', 0.038), ('compact', 0.037), ('kong', 0.037), ('error', 0.036), ('partition', 0.036), ('ei', 0.035), ('norm', 0.035), ('representatives', 0.035), ('component', 0.034), ('upper', 0.032), ('finite', 0.031), ('kernel', 0.031), ('histogram', 0.031), ('iterative', 0.03), ('adopting', 0.03), ('library', 0.03), ('kernels', 0.03), ('spread', 0.029), ('nonparametric', 0.029), ('sensitive', 0.029), ('sequential', 0.028), ('asymmetric', 0.027), ('jth', 0.027), ('adjusting', 0.027), ('covariance', 0.027), ('ck', 0.026), ('approximate', 0.025), ('original', 0.025), ('biased', 0.025), ('quite', 0.024), ('scheme', 0.024), ('bound', 0.023), ('hierarchical', 0.023), ('robust', 0.023), ('minimize', 0.022), ('transactions', 0.022), ('side', 0.022), ('illustration', 0.022), ('manifold', 0.022), ('gersho', 0.022), ('lake', 0.022), ('locate', 0.022), ('han', 0.022), ('reassign', 0.022), ('silverman', 0.022), ('bay', 0.022), ('kowloon', 0.022), ('kwok', 0.022), ('goldberger', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
2 0.22574271 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
3 0.21063465 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
Author: Andrea Vedaldi, Stefano Soatto
Abstract: Image Congealing (IC) is a non-parametric method for the joint alignment of a collection of images affected by systematic and unwanted deformations. The method attempts to undo the deformations by minimizing a measure of complexity of the image ensemble, such as the averaged per-pixel entropy. This enables alignment without an explicit model of the aligned dataset as required by other methods (e.g. transformed component analysis). While IC is simple and general, it may introduce degenerate solutions when the transformations allow minimizing the complexity of the data by collapsing them to a constant. Such solutions need to be explicitly removed by regularization. In this paper we propose an alternative formulation which solves this regularization issue on a more principled ground. We make the simple observation that alignment should simplify the data while preserving the useful information carried by them. Therefore we trade off fidelity and complexity of the aligned ensemble rather than minimizing the complexity alone. This eliminates the need for an explicit regularization of the transformations, and has a number of other useful properties such as noise suppression. We show the modeling and computational benefits of the approach to the some of the problems on which IC has been demonstrated. 1
4 0.18006371 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
5 0.12512578 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
Author: Gloria Haro, Gregory Randall, Guillermo Sapiro
Abstract: The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning. 1
6 0.12241675 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
7 0.11444797 131 nips-2006-Mixture Regression for Covariate Shift
8 0.093981415 50 nips-2006-Chained Boosting
9 0.091350794 80 nips-2006-Fundamental Limitations of Spectral Clustering
10 0.090748943 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
11 0.090575784 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
12 0.088366859 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
13 0.08745864 158 nips-2006-PG-means: learning the number of clusters in data
14 0.086419985 7 nips-2006-A Local Learning Approach for Clustering
15 0.082808547 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm
16 0.077662647 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes
17 0.075846776 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
18 0.073064677 103 nips-2006-Kernels on Structured Objects Through Nested Histograms
19 0.07009656 31 nips-2006-Analysis of Contour Motions
20 0.070093036 164 nips-2006-Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
topicId topicWeight
[(0, -0.24), (1, 0.053), (2, 0.038), (3, 0.094), (4, 0.05), (5, -0.016), (6, -0.018), (7, -0.029), (8, 0.027), (9, 0.273), (10, 0.001), (11, 0.033), (12, 0.318), (13, 0.24), (14, -0.015), (15, -0.061), (16, 0.088), (17, 0.089), (18, -0.051), (19, -0.094), (20, -0.06), (21, -0.024), (22, -0.148), (23, 0.051), (24, 0.129), (25, 0.029), (26, 0.027), (27, -0.095), (28, -0.011), (29, -0.025), (30, 0.03), (31, 0.004), (32, -0.035), (33, -0.053), (34, -0.059), (35, -0.108), (36, 0.029), (37, 0.013), (38, 0.016), (39, 0.045), (40, -0.071), (41, -0.008), (42, 0.029), (43, -0.077), (44, -0.045), (45, 0.012), (46, -0.127), (47, 0.013), (48, 0.032), (49, -0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.96474874 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
2 0.7516343 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
Author: Andrea Vedaldi, Stefano Soatto
Abstract: Image Congealing (IC) is a non-parametric method for the joint alignment of a collection of images affected by systematic and unwanted deformations. The method attempts to undo the deformations by minimizing a measure of complexity of the image ensemble, such as the averaged per-pixel entropy. This enables alignment without an explicit model of the aligned dataset as required by other methods (e.g. transformed component analysis). While IC is simple and general, it may introduce degenerate solutions when the transformations allow minimizing the complexity of the data by collapsing them to a constant. Such solutions need to be explicitly removed by regularization. In this paper we propose an alternative formulation which solves this regularization issue on a more principled ground. We make the simple observation that alignment should simplify the data while preserving the useful information carried by them. Therefore we trade off fidelity and complexity of the aligned ensemble rather than minimizing the complexity alone. This eliminates the need for an explicit regularization of the transformations, and has a number of other useful properties such as noise suppression. We show the modeling and computational benefits of the approach to the some of the problems on which IC has been demonstrated. 1
3 0.74098265 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
4 0.70677233 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
5 0.48493144 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
Author: Steffen Bickel, Tobias Scheffer
Abstract: We study a setting that is motivated by the problem of filtering spam messages for many users. Each user receives messages according to an individual, unknown distribution, reflected only in the unlabeled inbox. The spam filter for a user is required to perform well with respect to this distribution. Labeled messages from publicly available sources can be utilized, but they are governed by a distinct distribution, not adequately representing most inboxes. We devise a method that minimizes a loss function with respect to a user’s personal distribution based on the available biased sample. A nonparametric hierarchical Bayesian model furthermore generalizes across users by learning a common prior which is imposed on new email accounts. Empirically, we observe that bias-corrected learning outperforms naive reliance on the assumption of independent and identically distributed data; Dirichlet-enhanced generalization across users outperforms a single (“one size fits all”) filter as well as independent filters for all users. 1
6 0.4381375 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects
7 0.43502879 131 nips-2006-Mixture Regression for Covariate Shift
8 0.41739753 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
9 0.3813076 158 nips-2006-PG-means: learning the number of clusters in data
10 0.36205241 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes
11 0.33844444 50 nips-2006-Chained Boosting
12 0.33703321 194 nips-2006-Towards a general independent subspace analysis
13 0.33303854 7 nips-2006-A Local Learning Approach for Clustering
14 0.33245793 121 nips-2006-Learning to be Bayesian without Supervision
15 0.32653499 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
16 0.32268208 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
17 0.31197229 129 nips-2006-Map-Reduce for Machine Learning on Multicore
18 0.30374926 80 nips-2006-Fundamental Limitations of Spectral Clustering
19 0.29259259 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
20 0.29036471 180 nips-2006-Speakers optimize information density through syntactic reduction
topicId topicWeight
[(1, 0.106), (3, 0.04), (7, 0.069), (9, 0.069), (20, 0.021), (22, 0.096), (30, 0.195), (44, 0.088), (57, 0.097), (65, 0.082), (69, 0.021), (71, 0.015), (90, 0.015)]
simIndex simValue paperId paperTitle
1 0.90565407 189 nips-2006-Temporal dynamics of information content carried by neurons in the primary visual cortex
Author: Danko Nikolić, Stefan Haeusler, Wolf Singer, Wolfgang Maass
Abstract: We use multi-electrode recordings from cat primary visual cortex and investigate whether a simple linear classifier can extract information about the presented stimuli. We find that information is extractable and that it even lasts for several hundred milliseconds after the stimulus has been removed. In a fast sequence of stimulus presentation, information about both new and old stimuli is present simultaneously and nonlinear relations between these stimuli can be extracted. These results suggest nonlinear properties of cortical representations. The important implications of these properties for the nonlinear brain theory are discussed.
same-paper 2 0.85613179 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
3 0.7527312 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
4 0.74012887 65 nips-2006-Denoising and Dimension Reduction in Feature Space
Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1
5 0.73902285 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
Author: Andrea Vedaldi, Stefano Soatto
Abstract: Image Congealing (IC) is a non-parametric method for the joint alignment of a collection of images affected by systematic and unwanted deformations. The method attempts to undo the deformations by minimizing a measure of complexity of the image ensemble, such as the averaged per-pixel entropy. This enables alignment without an explicit model of the aligned dataset as required by other methods (e.g. transformed component analysis). While IC is simple and general, it may introduce degenerate solutions when the transformations allow minimizing the complexity of the data by collapsing them to a constant. Such solutions need to be explicitly removed by regularization. In this paper we propose an alternative formulation which solves this regularization issue on a more principled ground. We make the simple observation that alignment should simplify the data while preserving the useful information carried by them. Therefore we trade off fidelity and complexity of the aligned ensemble rather than minimizing the complexity alone. This eliminates the need for an explicit regularization of the transformations, and has a number of other useful properties such as noise suppression. We show the modeling and computational benefits of the approach to the some of the problems on which IC has been demonstrated. 1
6 0.73564684 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
7 0.73315001 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
8 0.73081857 79 nips-2006-Fast Iterative Kernel PCA
9 0.72802043 61 nips-2006-Convex Repeated Games and Fenchel Duality
10 0.72776788 20 nips-2006-Active learning for misspecified generalized linear models
11 0.7268182 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
12 0.72399598 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
13 0.72029674 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
14 0.71977592 117 nips-2006-Learning on Graph with Laplacian Regularization
15 0.7192663 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
16 0.71889204 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments
17 0.7187348 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
18 0.71722901 194 nips-2006-Towards a general independent subspace analysis
19 0.71546513 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
20 0.715087 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation