nips nips2006 nips2006-127 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn Abstract The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. [sent-10, score-0.396]
2 We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. [sent-11, score-0.223]
3 The modified locally linear embedding (MLLE) proposed in this paper is much stable. [sent-12, score-0.206]
4 It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. [sent-13, score-0.294]
5 MLLE is also compared with the local tangent space alignment (LTSA). [sent-14, score-0.129]
6 1 Introduction The problem of nonlinear dimensionality reduction is to find the meaningful low-dimensional structure hidden in high dimensional data. [sent-16, score-0.108]
7 All these algorithms cover two common steps: learn the local geometry around each data point and nonlinearly map the high dimensional data points into a lower dimensional space using the learned local information [3]. [sent-18, score-0.165]
8 The performances of these algorithms, however, are different both in learning local information and in constructing global embedding, though each of them solves an eigenvalue problem eventually. [sent-19, score-0.057]
9 The effectiveness of the local geometry retrieved determines the efficiency of the methods. [sent-20, score-0.059]
10 This paper will focus on the reconstruction weights that characterize intrinsic geometric properties of each neighborhood in LLE [5]. [sent-21, score-0.126]
11 LLE has many applications such as image classification, image recognition, spectra reconstruction and data visualization because of its simple geometric intuitions, straightforward implementation, and global optimization [6, 11]. [sent-22, score-0.08]
12 It is however also reported that LLE may be not stable and may produce distorted embedding if the manifold dimension is larger than one. [sent-23, score-0.224]
13 One of the curses that make LLE fail is that the local geometry exploited by the reconstruction weights is not well-determined, since the constrained least squares (LS) problem involved for determining the local weights may be ill-conditioned. [sent-24, score-0.189]
14 The purpose of this paper is to improve LLE by making use of multiple local weight vectors. [sent-27, score-0.137]
15 We will show the existence of linearly independent weight vectors that are approximately optimal. [sent-28, score-0.186]
16 The local geometric structure determined by multiple weight vectors is much stable and hence can be used to improve the standard LLE. [sent-29, score-0.234]
17 The modified LLE named as MLLE uses multiple weight vectors for each point in reconstruction of lower dimensional embedding. [sent-30, score-0.227]
18 It can stably retrieve the ideal isometric ||y ||=2. [sent-31, score-0.165]
19 MLLE has properties similar to LTSA both in measuring linear dependence of neighborhood and in constructing the (sparse) matrix whose smallest eigenvectors form the wanted lower dimensional embedding. [sent-36, score-0.246]
20 LLE constructs locally linear structures at each point xi by representing xi using its selected neighbor set Ni = {xj , j ∈ Ji }. [sent-43, score-0.269]
21 The optimal combination weights are determined by solving the constrained least squares problem min xi − wji xj , s. [sent-44, score-0.438]
22 1) j∈Ji Once all the reconstruction weights {wji , j ∈ Ji }, i = 1, · · · , N , are computed, LLE maps the set {x1 , . [sent-48, score-0.067]
23 , tN } in a lower dimensional space Rd (d < m) that preserves the local combination properties totally, ti − min T =[t1 ,. [sent-54, score-0.126]
24 j∈Ji The low dimensional embedding T constructed by LLE tightly depends on the local weights. [sent-60, score-0.219]
25 To formulate the weight vector wi consisting of the local weights wji , j ∈ Ji , let us denote matrix Gi = [. [sent-61, score-0.616]
26 Using the constraint j∈Ji wji = 1, we can write the combination error as xi − j∈Ji wji xj = Gi wi and hence (2. [sent-68, score-0.838]
27 where 1ki denotes the ki -dimensional vector of all 1’s. [sent-71, score-0.192]
28 Theoretically, a null vector of Gi that is not orthogonal to 1ki can be normalized to be a weight vector as required. [sent-72, score-0.103]
29 Otherwise, a weight vector is given by wi = yi /1Ti yi with yi a solution to the linear system GT Gi y = 1ki [6]. [sent-73, score-0.305]
30 Indeed, one can i k formulate the solution using the singular value decomposition (SVD) of Gi . [sent-74, score-0.128]
31 k The problem of solving min1T w=1 Gw is not stable if GT G is singular (has zero eigenvalues) or nearly singular (has relative small eigenvalues). [sent-80, score-0.281]
32 To regularize the problem, it is suggested in [5] to solve the regularized linear system replaced (GT G + γ G 1 2 F I)y = 1k , w = y/1T y k (·)+ denotes the Moore-Penrose generalized inverse of a matrix. [sent-81, score-0.058]
33 5 1 Figure 2: A 2D data set (◦-points) and computed coordinates (dot points) by LLE using different sets of optimal weight vectors (left two panels) or regularization weight vectors (right panel). [sent-101, score-0.349]
34 Let y(γ) be the unique solution to the regularized linear system. [sent-103, score-0.058]
35 Other factor that results in the instability of LLE is that the learned linear structure by using single weight vector at each point is brittle. [sent-112, score-0.119]
36 LLE may give a wrong embedding even if all weight vector is well approximated in a high accuracy. [sent-113, score-0.208]
37 It is imaginable if Gi is rank reducible since multiple optimal weight vectors exist in that case. [sent-114, score-0.176]
38 Figure 2 shows a small example of N = 20 two-dimensional points for which LLE fails even if exact optimal weight vectors are used. [sent-115, score-0.158]
39 We plot three sets of computed 2D embeddings T (j) (within an optimal affine transformation to the ideal X) by LLE with k = 4 using two sets of exact optimal weight vectors and one set of weight vectors that solve the regularized equations, respectively. [sent-116, score-0.381]
40 The errors X − Y (j) = minc,L X − (c1T + LT (j) ) between the ideal set X and the computed sets within optimal affine transformation are large in the example. [sent-117, score-0.068]
41 The uncertainty of w(γ) with small γ occurs because of existence of small singular values of G. [sent-118, score-0.152]
42 Fortunately, it also implies the existence of multiple almost optimal weight vectors simultaneously. [sent-119, score-0.2]
43 Indeed, if G has s ≤ k small singular values, then there are s approximately optimal weight vectors that are linear independent on each others. [sent-120, score-0.334]
44 Denote w( ) = (1 − α)w∗ + V H(:, ), = 1, · · · , s, where V is the eigenvector matrix of G corresponding to the s smallest right singular values, α = 1 √ V T 1k , and H is a Householder matrix that satisfies HV T 1k = α1s . [sent-127, score-0.227]
45 For numerσmin ical stability, we replace w∗ by a regularized weight vector w(γ) like in LLE. [sent-144, score-0.119]
46 MLLE: Modified locally linear embedding It is justifiable to learn the local structure by multiple optimal weight vectors at each point, rather than a single one. [sent-151, score-0.422]
47 Though the exact optimal weight vector may be unique, multiple approximately optimal weight vectors exist by Theorem 2. [sent-152, score-0.311]
48 We will use these weight vectors to determine an improved and more stable embedding. [sent-154, score-0.157]
49 Below we show the details of the modified locally linear embedding using multiple local weight vectors. [sent-155, score-0.343]
50 Consider the neighbor set of xi with ki neighbors. [sent-156, score-0.306]
51 Assume that the first ri singular values of Gi are large compared with the remaining si = ki − ri singular values. [sent-157, score-0.651]
52 , wi i be si ≤ k linearly independent weight vectors, ( ) wi = (1 − αi )wi (γ) + Vi Hi (:, ), = 1, · · · , si . [sent-162, score-0.829]
53 Here wi (γ) is the regularized solution defined in (2. [sent-163, score-0.248]
54 2) with G = Gi , Vi is the matrix of Gi corresponding to the si smallest right singular values, αi = √1 i vi with vi = ViT 1ki , and Hi is a s Householder matrix that satisfies Hi ViT 1ki = αi 1si . [sent-164, score-0.596]
55 , tN }, that minimizes the embedding cost function N si ( ) ti − E(T ) = i=1 =1 wji tj 2 (3. [sent-168, score-0.586]
56 Denote by Wi = (1 − αi )wi (γ)1Ti + Vi Hi the local weight matrix s ˆ and let Wi ∈ RN ×si be the embedded matrix of Wi into the N -dimensional space such that ˆ Wi (Ji , :) = Wi , ˆ ˆ W (i, :) = −1Ti , W (j, :) = 0, s j ∈ Ii = Ji ∪ {i}. [sent-170, score-0.169]
57 , ud+1 ]T of the d eigenvectors of Φ corresponding to the 2nd to d + 1st smallest eigenvalues. [sent-177, score-0.079]
58 1 Determination of number si of approximation optimal weight vectors Obviously, si should be selected such that σki −si +1 (Gi ) is relatively small. [sent-179, score-0.492]
59 In general, if the data points are sampled from a d-dimensional manifold and the neighbor set is well selected, then σd (Gi ) σd+1 (Gi ). [sent-180, score-0.106]
60 So si can be any integer satisfying si ≤ ki − d, and si = ki − d is the best choice. [sent-181, score-0.885]
61 It makes sense to choose si as large as possible if the ratio (i) (i) +···+λk i −si +1 i (i) (i) λ1 +···+λk −s i i λk (i) is small, where λj 2 = σj (Gi ) are the eigenvalues of GT Gi . [sent-183, score-0.197]
62 There is a trade i ∗ between the number of weight vectors and the approximation to Gi wi . [sent-184, score-0.34]
63 We suggest si = max ≤ ki − d, (i) ki j=ki − +1 λj (i) ki − j=1 λj <η , (3. [sent-185, score-0.743]
64 The smaller η is, the smaller si is, and of course, the smaller the combination errors for the weight vectors used are. [sent-189, score-0.339]
65 We use an adaptive (i) (i) ki d strategy to set η as follows. [sent-190, score-0.192]
66 In general, if the manifold near xi is float or has small curvatures and the neighbors are well selected, ρi is smaller than η and si = k − d. [sent-199, score-0.34]
67 For those neighbor sets with large local curvatures, ρi > η and si < ki − d. [sent-200, score-0.435]
68 So less number of weight vectors are used in constructing the local linear structures and the combination errors decrease. [sent-201, score-0.247]
69 1 Determine a neighbor set Ni = {xj , j ∈ Ji } of xi , i ∈ Ji . [sent-206, score-0.114]
70 Set i ki j=d+1 ρi = (i) λj / d j=1 (i) λj . [sent-218, score-0.192]
71 Compute the d + 1 smallest eigenvectors of Φ and pick up the eigenvector matrix corresponding to the 2nd to d + 1st smallest eigenvalues, and set T = [u2 , . [sent-231, score-0.153]
72 The additional flops of MLLE 3 for computing the eigendecomposition of GT Gi is O(ki ) and totally O(k3 N ) with k = maxi ki . [sent-236, score-0.216]
73 i Note that the most computationally expensive steps in both LLE and MLLE are the neighborhood selection and the computation of the d + 1 eigenvectors of the alignment matrix Φ corresponding to small eigenvalues. [sent-237, score-0.152]
74 4 An analysis of MLLE for isometric manifolds Consider the application of MLLE on an isometric manifold M = f (Ω) with open set Ω ⊂ Rd and smooth function f . [sent-240, score-0.292]
75 Assume that {xi } are sampled from M, xi = f (τi ), i = 1, . [sent-241, score-0.078]
76 We have xi − wji τj + O(ε2 ), i wji xj = τi − j∈Ji (4. [sent-245, score-0.607]
77 If ki > d, then the optimal reconstruction error of τi should be zero. [sent-247, score-0.262]
78 For the approximately optimal weight vectors i ( ) wi , we have xi − τi − ( ) j∈Ji ( ) j∈Ji wji xj ≈ σki −si +1 (Gi ) + O(ε2 ). [sent-249, score-0.762]
79 , τN ], we have i E(T ∗ ) = N si ( ) wji τj − τi i=1 =1 j∈Ji 2 N ≤ i=1 2 si σki −si +1 (Gi ) + O(max ε2 ). [sent-255, score-0.575]
80 In this section, we compare MLLE and LTSA in the linear dependence of neighbors and alignment matrices. [sent-263, score-0.109]
81 , ki − d weight vectors are used in MLLE for each neighbor set. [sent-266, score-0.36]
82 The total combination error ki −d M LLE ( ) wji xj − xi (Ni ) = =1 2 = Gi Wi 2 F j∈Ji of xi can be a measure of the linear dependence of the neighborhood Ni . [sent-269, score-0.751]
83 To compare it with the 1 measure of linear dependence defined by LTSA, we denote by xi = |Ii | j∈Ii xj the mean of ¯ ¯ ¯ members in the whole neighbors of xi including xi itself, and Xi = [. [sent-270, score-0.333]
84 The MLLE-measure M LLE and the LTSA-measure LT SA of neighborhood linear dependence are similar, M LLE ¯ ˜ ¯ ˜ ( ) ≈ min, ≤ ki − d, (Ni ) = Xi Wi 2 , Xi w F LT SA 5. [sent-280, score-0.284]
85 Both MLLE and LTSA minimize a trace function of an alignment matrix Φ to obtain an embedding, minT T T =I trace(T ΦT T ). [sent-283, score-0.082]
86 The alignment matrix can be written in the same form N T Si Φi Si , Φ= i=1 where Si is a selection matrix consisting of the columns j ∈ Ii of the large identity matrix of order ˜ ˜ N . [sent-284, score-0.132]
87 In LTSA, the local matrix Φi is given by the orthogonal projection, i. [sent-285, score-0.089]
88 For MLLE, Φi ˜ ˜ range space span(Vi ) of Vi are tightly close each other if the reconstruction error of xi is small. [sent-289, score-0.139]
89 The data points generated from a rectangle with a missing rectangle strip punched out of the center and then the resulting Swiss roll is not convex. [sent-299, score-0.088]
90 In the top middle of Figure 3, we plot the computed coordinates by Isomap, and there is a dilation of the missing region and a warp on the rest of the embedding. [sent-301, score-0.099]
91 As seen in the top right of Figure 3, there is a strong distortion on the computed coordinates by LLE. [sent-302, score-0.059]
92 We now compare MLLE and LTSA for a 2D manifold with 3 peaks embedded in 3D space. [sent-304, score-0.088]
93 We generate N = 1225 3D-points xi = [ti , si , h(ti , si )]T , where ti and si are uniformly distributed in the interval [−1. [sent-305, score-0.609]
94 05 −2 −2 0 2 Figure 3: Left column: Swiss-roll data and generating coordinates with a missing rectangle. [sent-314, score-0.108]
95 It is easy to show that the manifold parameterized by f (t, s) = [t, s, h(t, s)]T is approximately isometric since the Jacobian Jf (t, s) is orthonormal approximately. [sent-332, score-0.202]
96 In the right of Figure 4, we plot the computed coordinates by LTSA and MLLE with k = 12. [sent-333, score-0.059]
97 The deformations of the computed coordinates by LTSA near the peaks are prominent because the curvature of the 3-peak manifold varies very much. [sent-334, score-0.164]
98 Most of the digit classes (digits ’2’-’5’ are marked by ’◦’, ’ ’, ’ ’ and ’ ’ respectively) are well clustered in the resulting embedding of MLLE. [sent-342, score-0.129]
99 The first two coordinates of MLLE are plotted in the middle of Figure 6. [sent-346, score-0.08]
100 Figure 6: Images of faces mapped into the embedding described by the first two coordinates of MLLE, using the parameters k = 14 and d = 3. [sent-374, score-0.188]
wordName wordTfidf (topN-words)
[('mlle', 0.594), ('lle', 0.356), ('ltsa', 0.289), ('wji', 0.241), ('wi', 0.208), ('gi', 0.207), ('ji', 0.2), ('ki', 0.192), ('si', 0.167), ('embedding', 0.129), ('singular', 0.128), ('isometric', 0.102), ('vi', 0.101), ('weight', 0.079), ('xi', 0.078), ('manifold', 0.07), ('gw', 0.067), ('ni', 0.063), ('gt', 0.06), ('locally', 0.059), ('coordinates', 0.059), ('alignment', 0.057), ('isomap', 0.056), ('vectors', 0.053), ('smallest', 0.049), ('cond', 0.048), ('householder', 0.048), ('vki', 0.048), ('xj', 0.047), ('reconstruction', 0.044), ('lt', 0.043), ('vit', 0.042), ('neighborhood', 0.04), ('local', 0.04), ('regularized', 0.04), ('sa', 0.039), ('retrieve', 0.038), ('neighbor', 0.036), ('dependence', 0.034), ('span', 0.033), ('hi', 0.033), ('dimensional', 0.033), ('modi', 0.032), ('dist', 0.032), ('ud', 0.032), ('zhejiang', 0.032), ('tangent', 0.032), ('ti', 0.03), ('approximately', 0.03), ('eigenvalues', 0.03), ('eigenvectors', 0.03), ('generating', 0.03), ('ii', 0.028), ('wit', 0.028), ('reduction', 0.027), ('optimal', 0.026), ('nonlinear', 0.026), ('curvatures', 0.025), ('charting', 0.025), ('ideal', 0.025), ('matrix', 0.025), ('stable', 0.025), ('existence', 0.024), ('orthogonal', 0.024), ('rectangle', 0.024), ('swiss', 0.024), ('totally', 0.024), ('combination', 0.023), ('weights', 0.023), ('instability', 0.022), ('dimensionality', 0.022), ('handwritten', 0.022), ('roll', 0.021), ('column', 0.021), ('af', 0.021), ('middle', 0.021), ('digits', 0.021), ('ls', 0.02), ('china', 0.02), ('lighting', 0.02), ('theorem', 0.019), ('geometry', 0.019), ('tj', 0.019), ('geometric', 0.019), ('missing', 0.019), ('linear', 0.018), ('zhang', 0.018), ('ri', 0.018), ('multiple', 0.018), ('panels', 0.018), ('manifolds', 0.018), ('variations', 0.018), ('rd', 0.018), ('peaks', 0.018), ('constructing', 0.017), ('qi', 0.017), ('visualization', 0.017), ('curvature', 0.017), ('errors', 0.017), ('tightly', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
2 0.18006371 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
3 0.16125923 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
4 0.14615399 120 nips-2006-Learning to Traverse Image Manifolds
Author: Piotr DollĂĄr, Vincent Rabaud, Serge J. Belongie
Abstract: We present a new algorithm, Locally Smooth Manifold Learning (LSML), that learns a warping function from a point on an manifold to its neighbors. Important characteristics of LSML include the ability to recover the structure of the manifold in sparsely populated regions and beyond the support of the provided data. Applications of our proposed technique include embedding with a natural out-of-sample extension and tasks such as tangent distance estimation, frame rate up-conversion, video compression and motion transfer. 1
5 0.14442673 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
Author: Andrea Vedaldi, Stefano Soatto
Abstract: Image Congealing (IC) is a non-parametric method for the joint alignment of a collection of images affected by systematic and unwanted deformations. The method attempts to undo the deformations by minimizing a measure of complexity of the image ensemble, such as the averaged per-pixel entropy. This enables alignment without an explicit model of the aligned dataset as required by other methods (e.g. transformed component analysis). While IC is simple and general, it may introduce degenerate solutions when the transformations allow minimizing the complexity of the data by collapsing them to a constant. Such solutions need to be explicitly removed by regularization. In this paper we propose an alternative formulation which solves this regularization issue on a more principled ground. We make the simple observation that alignment should simplify the data while preserving the useful information carried by them. Therefore we trade off fidelity and complexity of the aligned ensemble rather than minimizing the complexity alone. This eliminates the need for an explicit regularization of the transformations, and has a number of other useful properties such as noise suppression. We show the modeling and computational benefits of the approach to the some of the problems on which IC has been demonstrated. 1
6 0.086423323 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification
7 0.085497141 7 nips-2006-A Local Learning Approach for Clustering
8 0.076709047 50 nips-2006-Chained Boosting
9 0.072690092 60 nips-2006-Convergence of Laplacian Eigenmaps
10 0.067553014 128 nips-2006-Manifold Denoising
11 0.060245372 164 nips-2006-Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
12 0.058075726 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons
13 0.056339815 130 nips-2006-Max-margin classification of incomplete data
14 0.053658213 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
15 0.050946608 61 nips-2006-Convex Repeated Games and Fenchel Duality
16 0.047336061 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
17 0.046836156 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
18 0.046142172 186 nips-2006-Support Vector Machines on a Budget
19 0.04560107 105 nips-2006-Large Margin Component Analysis
20 0.04275351 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
topicId topicWeight
[(0, -0.159), (1, 0.036), (2, -0.0), (3, 0.102), (4, 0.026), (5, -0.019), (6, -0.044), (7, -0.01), (8, -0.003), (9, 0.202), (10, -0.006), (11, -0.079), (12, 0.268), (13, 0.184), (14, 0.019), (15, -0.066), (16, 0.019), (17, 0.059), (18, -0.095), (19, -0.043), (20, 0.05), (21, -0.047), (22, -0.02), (23, -0.051), (24, -0.04), (25, 0.021), (26, -0.109), (27, -0.163), (28, -0.044), (29, -0.191), (30, 0.181), (31, 0.052), (32, 0.036), (33, 0.041), (34, -0.022), (35, -0.11), (36, -0.021), (37, -0.069), (38, 0.041), (39, -0.01), (40, -0.049), (41, -0.027), (42, -0.028), (43, 0.011), (44, -0.104), (45, 0.01), (46, -0.077), (47, 0.067), (48, 0.023), (49, 0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.95862585 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
2 0.64343017 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
3 0.62242144 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
Author: Andrea Vedaldi, Stefano Soatto
Abstract: Image Congealing (IC) is a non-parametric method for the joint alignment of a collection of images affected by systematic and unwanted deformations. The method attempts to undo the deformations by minimizing a measure of complexity of the image ensemble, such as the averaged per-pixel entropy. This enables alignment without an explicit model of the aligned dataset as required by other methods (e.g. transformed component analysis). While IC is simple and general, it may introduce degenerate solutions when the transformations allow minimizing the complexity of the data by collapsing them to a constant. Such solutions need to be explicitly removed by regularization. In this paper we propose an alternative formulation which solves this regularization issue on a more principled ground. We make the simple observation that alignment should simplify the data while preserving the useful information carried by them. Therefore we trade off fidelity and complexity of the aligned ensemble rather than minimizing the complexity alone. This eliminates the need for an explicit regularization of the transformations, and has a number of other useful properties such as noise suppression. We show the modeling and computational benefits of the approach to the some of the problems on which IC has been demonstrated. 1
4 0.6072613 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
5 0.5335995 120 nips-2006-Learning to Traverse Image Manifolds
Author: Piotr DollĂĄr, Vincent Rabaud, Serge J. Belongie
Abstract: We present a new algorithm, Locally Smooth Manifold Learning (LSML), that learns a warping function from a point on an manifold to its neighbors. Important characteristics of LSML include the ability to recover the structure of the manifold in sparsely populated regions and beyond the support of the provided data. Applications of our proposed technique include embedding with a natural out-of-sample extension and tasks such as tangent distance estimation, frame rate up-conversion, video compression and motion transfer. 1
6 0.39281455 128 nips-2006-Manifold Denoising
7 0.34972692 60 nips-2006-Convergence of Laplacian Eigenmaps
8 0.33446911 50 nips-2006-Chained Boosting
9 0.30401972 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification
10 0.30237663 105 nips-2006-Large Margin Component Analysis
11 0.29876381 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
12 0.29176465 130 nips-2006-Max-margin classification of incomplete data
13 0.27602011 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
15 0.26951343 129 nips-2006-Map-Reduce for Machine Learning on Multicore
16 0.25986004 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
17 0.24188049 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes
18 0.23672055 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
19 0.2288952 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis
20 0.22764345 194 nips-2006-Towards a general independent subspace analysis
topicId topicWeight
[(1, 0.094), (3, 0.084), (7, 0.052), (9, 0.081), (20, 0.021), (22, 0.092), (44, 0.054), (57, 0.068), (60, 0.267), (65, 0.04), (69, 0.022)]
simIndex simValue paperId paperTitle
1 0.87665176 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
same-paper 2 0.76382917 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
3 0.56746829 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Author: Hamed Valizadegan, Rong Jin
Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1
4 0.56187004 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
Author: Kilian Q. Weinberger, Fei Sha, Qihui Zhu, Lawrence K. Saul
Abstract: In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semidefinite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further refined by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes. 1
5 0.55743992 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
Author: J.t. Lindgren, Aapo Hyvärinen
Abstract: In previous studies, quadratic modelling of natural images has resulted in cell models that react strongly to edges and bars. Here we apply quadratic Independent Component Analysis to natural image patches, and show that up to a small approximation error, the estimated components are computing conjunctions of two linear features. These conjunctive features appear to represent not only edges and bars, but also inherently two-dimensional stimuli, such as corners. In addition, we show that for many of the components, the underlying linear features have essentially V1 simple cell receptive field characteristics. Our results indicate that the development of the V2 cells preferring angles and corners may be partly explainable by the principle of unsupervised sparse coding of natural images. 1
6 0.55381399 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
7 0.55337095 175 nips-2006-Simplifying Mixture Models through Function Approximation
8 0.55264115 65 nips-2006-Denoising and Dimension Reduction in Feature Space
9 0.55128318 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians
10 0.54891545 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
11 0.54725474 158 nips-2006-PG-means: learning the number of clusters in data
12 0.54702485 167 nips-2006-Recursive ICA
13 0.54614538 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
14 0.54485452 194 nips-2006-Towards a general independent subspace analysis
15 0.54344577 104 nips-2006-Large-Scale Sparsified Manifold Regularization
16 0.54300123 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
17 0.5429371 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
18 0.54293215 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
19 0.54230183 17 nips-2006-A recipe for optimizing a time-histogram
20 0.54205948 20 nips-2006-Active learning for misspecified generalized linear models