nips nips2010 nips2010-221 knowledge-graph by maker-knowledge-mining

221 nips-2010-Random Projections for $k$-means Clustering


Source: pdf

Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas

Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. [sent-3, score-0.119]

2 A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results. [sent-4, score-0.114]

3 1 Introduction The k-means clustering algorithm [16] was recently recognized as one of the top ten data mining tools of the last fifty years [20]. [sent-5, score-0.18]

4 This paper focuses on the application of the random projection method (see Section 2. [sent-7, score-0.06]

5 3) to the k-means clustering problem (see Definition ˜ 1). [sent-8, score-0.146]

6 Formally, assuming as input a set of n points in d dimensions, our goal is to randomly project the points into d ˜ ≪ d, and then apply a k-means clustering algorithm (see Definition 2) on the projected points. [sent-9, score-0.251]

7 Of dimensions, with d course, one should be able to compute the projection fast without distorting significantly the “clusters” of the original point set. [sent-10, score-0.072]

8 Our algorithm (see Algorithm 1) satisfies both conditions by computing the embedding in time linear in the size of the input and by distorting the “clusters” of the dataset by a factor of at most 2 + ε, for some ε ∈ (0, 1/3) (see Theorem 1). [sent-11, score-0.217]

9 We believe that the high dimensionality of modern data will render our algorithm useful and attractive in many practical applications [9]. [sent-12, score-0.086]

10 Let A be an n × d matrix containing n d-dimensional points (A(i) denotes the i-th point of the set), and let k be the number of clusters (see also Section 2. [sent-14, score-0.174]

11 an embedding with image A ∈ Rn×c containing (rescaled) columns from A, such that with constant probability the clustering structure is preserved within a factor of 2 + ε. [sent-22, score-0.358]

12 In the RP methods the construction is done with random sign matrices and the mailman algorithm (see Sections 2. [sent-26, score-0.341]

13 2 Preliminaries We start by formally defining the k-means clustering problem using matrix notation. [sent-32, score-0.245]

14 Later in this section, we precisely describe the approximability framework adopted in the k-means clustering literature and fix the notation. [sent-33, score-0.186]

15 [T HE K - MEANS CLUSTERING PROBLEM ] Given a set of n points in d dimensions (rows in an n × d matrix A) and a positive integer k denoting the number of clusters, find the n × k indicator matrix Xopt such that Xopt = arg min A − XX ⊤ A X∈X 2 F . [sent-35, score-0.394]

16 (1) 2 Here X denotes the set of all n × k indicator matrices X. [sent-36, score-0.077]

17 An n × k indicator matrix has exactly one non-zero element per row, which denotes cluster membership. [sent-38, score-0.172]

18 , k, the i-th point belongs to the j-th cluster if √ and only if Xij = 1/ zj , where zj denotes the number of points in the corresponding cluster. [sent-45, score-0.079]

19 1 Approximation Algorithms for k-means clustering Finding Xopt is an NP-hard problem even for k = 2 [3], thus research has focused on developing approximation algorithms for k-means clustering. [sent-48, score-0.146]

20 [ K - MEANS APPROXIMATION ALGORITHM ] An algorithm is a “γ-approximation” for the k-means clustering problem (γ ≥ 1) if it takes inputs A and k, and returns an indicator matrix Xγ that satisfies with probability at least 1 − δγ , ⊤ A − Xγ Xγ A 2 F ≤ γ min A − XX ⊤ A X∈X 2 F . [sent-51, score-0.389]

21 For our discussion, we fix the γ-approximation algorithm to be the one presented in [14], which guarantees γ = 1 + ε′ ′ O(1) for any ε′ ∈ (0, 1] with running time O(2(k/ε ) dn). [sent-53, score-0.094]

22 2 Notation Given an n × d matrix A and an integer k with k < min{n, d}, let Uk ∈ Rn×k (resp. [sent-55, score-0.125]

23 Vk ∈ Rd×k ) be the matrix of the top k left (resp. [sent-56, score-0.099]

24 right) singular vectors of A, and let Σk ∈ Rk×k be a diagonal matrix containing the top 2 k singular values of A in non-increasing order. [sent-57, score-0.238]

25 A F and A 2 denote the Frobenius and the spectral norm of a matrix A, respectively. [sent-65, score-0.099]

26 the unique d × n matrix satisfying A = AA† A, A† AA† = A† , (AA† )⊤ = AA† , and (A† A)⊤ = A† A. [sent-68, score-0.099]

27 A useful property of matrix norms is that for any two matrices C and T of appropriate dimensions, CT F ≤ C F T 2 ; this is a stronger version of the standard submultiplicavity property. [sent-70, score-0.126]

28 We call P a projector matrix if it is square and P 2 = P . [sent-71, score-0.151]

29 Subsequent research simplified the proof of the above result by showing that such a projection can be generated using a d × t random√ Gaussian matrix R, i. [sent-82, score-0.139]

30 (3) ˜ Notice that such an embedding A = AR preserves the metric structure of the point-set, so it also preserves, within a factor of 1 + ε, the optimal value of the k-means objective function of A. [sent-89, score-0.185]

31 Achlioptas proved that even a (rescaled) random sign matrix suffices in order to get the same guarantees as above [1], an approach that we adopt here (see step two in Algorithm 1). [sent-90, score-0.192]

32 3 A random-projection-type k-means algorithm Algorithm 1 takes as inputs the matrix A ∈ Rn×d , the number of clusters k, an error parameter ε ∈ (0, 1/3), and some γ-approximation k-means algorithm. [sent-92, score-0.212]

33 It returns an indicator matrix Xγ determining a k-partition of the rows of A. [sent-93, score-0.213]

34 ˜ Input: n × d matrix A (n points, d features), number of clusters k, error parameter ε ∈ (0, 1/3), and γ-approximation k-means algorithm. [sent-94, score-0.137]

35 Output: Indicator matrix Xγ determining a k-partition on the rows of A. [sent-95, score-0.163]

36 Run the γ-approximation algorithm on A to obtain Xγ ; Return the indicator matrix Xγ ˜ ˜ Algorithm 1: A random projection algorithm for k-means clustering. [sent-110, score-0.277]

37 1 Running time analysis Algorithm 1 reduces the dimensions of A by post-multiplying it with a random sign matrix R. [sent-112, score-0.245]

38 If R is constructed as in Algorithm 1, one can employ the so-called mailman algorithm for matrix multiplication [15] and 3 compute the product AR in O(nd⌈ε−2 k/ log(d)⌉) time. [sent-114, score-0.436]

39 Indeed, the mailman algorithm computes (after preprocessing ) a matrix-vector product of any d-dimensional vector (row of A) with an d × log(d) sign matrix in O(d) time. [sent-115, score-0.419]

40 By partitioning the columns of our d × t matrix R into ⌈t/ log(d)⌉ blocks, the claim follows. [sent-116, score-0.099]

41 The latter assumption is reasonable in our setting since the need for dimension reduction in k-means clustering arises usually in high-dimensional data (large d). [sent-118, score-0.233]

42 Other choices of R would give the same approximation results; the time complexity to compute the embedding would √ be different though. [sent-119, score-0.096]

43 A matrix where each entry is a random Gaussian variable with zero mean and variance 1/ t would imply an O(knd/ε2 ) time complexity (naive multiplication). [sent-120, score-0.119]

44 In our experiments in Section 5 we experiment with the matrix R described in Algorithm 1 and employ MatLab’s matrix-matrix BLAS implementation to proceed in the third step of the algorithm. [sent-121, score-0.144]

45 We also experimented with a novel MatLab/C implementation of the mailman algorithm but, in the general case, we were not able to outperform MatLab’s built-in routines (see section 5. [sent-122, score-0.272]

46 Using, for example, the algorithm of [14] with γ = 1 + ε would result in an algorithm that preserves the clustering within a factor of O(1) 2 + ε, for any ε ∈ (0, 1/3), running in time O(nd⌈ε−2 k/ log(d)⌉ + 2(k/ε) kn/ε2 ). [sent-125, score-0.363]

47 We thus employ the Lloyd algorithm for our experimental evaluation of our algorithm in Section 5. [sent-127, score-0.092]

48 Note that, after using the proposed dimensionality reduction method, the cost of the Lloyd heuristic is only O(nk 2 /ε2 ) per iteration. [sent-128, score-0.13]

49 Let the n × d matrix A and the positive integer k < min{n, d} be the inputs of the k-means clustering problem. [sent-135, score-0.312]

50 Run Algorithm 1 with inputs A, k, ε, and the γ-approximation algorithm in order to construct an indicator matrix Xγ . [sent-137, score-0.224]

51 Before employing Corollary 11, Lemma 6, and Lemma 8 from [19] we need to make sure that the matrix R constructed in Algorithm 1 is consistent with Definition 1 and Lemma 5 in [19]. [sent-142, score-0.099]

52 1 of [1] immediately shows that the random sign matrix R of Algorithm 1 satisfies Definition 1 and Lemma 5 in [19]. [sent-144, score-0.162]

53 Assume that the matrix R is constructed by using Algorithm 1 with inputs A, k and ε. [sent-146, score-0.14]

54 1 Reading the input d × log d sign matrix requires O(d log d) time. [sent-158, score-0.198]

55 However, in our case we only consider multiplication with a random sign matrix, therefore we can avoid the preprocessing step by directly computing a random correspondence matrix as discussed in [15, Preprocessing Section]. [sent-159, score-0.27]

56 Let Φ = Vk⊤ R; note that Φ is a k × t matrix and the SV D of Φ is Φ = UΦ ΣΦ VΦ , where UΦ and ΣΦ are k × k † matrices, and VΦ is a t × k matrix. [sent-166, score-0.099]

57 By taking the SVD of (Vk⊤ R) and (Vk⊤ R)⊤ we get † (Vk⊤ R) − (Vk⊤ R)⊤ ⊤ ⊤ VΦ Σ−1 UΦ − VΦ ΣΦ UΦ Φ = 2 ⊤ VΦ (Σ−1 − ΣΦ )UΦ Φ = 2 = 2 Σ−1 − ΣΦ Φ 2 , ⊤ since VΦ and UΦ can be dropped without changing any unitarily invariant norm. [sent-167, score-0.155]

58 Assuming that, for all i ∈ [k], σi (Φ) and τi (Ψ) denote the i-th largest singular value of Φ and the i-th diagonal element of Ψ, respectively, it is τi (Ψ) = 1 − σi (Φ)σk+1−i (Φ) . [sent-169, score-0.069]

59 Under the same assumptions as in Lemma 2 and for any n × d matrix C w. [sent-173, score-0.099]

60 Then, setting Z = CR 2 , using F the third statement of Lemma 2, the fact that k ≥ 1, and Chebyshev’s inequality we get P |Z − E [Z] | ≥ ε C 2 F ≤ Var [Z] ε2 C ≤ 4 F 4 F 4 C F 2 C tε2 2 ≤ 0. [sent-179, score-0.115]

61 97, † Ak = (AR)(Vk⊤ R) Vk⊤ + E, where E is an n × d matrix with E F ≤ 4ε A − Ak (7) F. [sent-187, score-0.099]

62 Then, setting A = Ak + Aρ−k , and using the triangle inequality we get E F ≤ † ⊤ Ak − Ak R(Vk⊤ R) Vk F † ⊤ Aρ−k R(Vk R) Vk⊤ + F . [sent-190, score-0.1]

63 A well-known property connects the SVD of a matrix and k-means clustering. [sent-213, score-0.099]

64 Recall Definition 1, and ⊤ notice that Xopt Xopt A is a matrix of rank at most k. [sent-214, score-0.147]

65 Since I − Xγ Xγ is a projector matrix, it can be dropped without ˜ ˜ increasing a unitarily invariant norm. [sent-223, score-0.177]

66 (12) we used Lemma 5, the triangle inequality, and the fact that I − Xγ Xγ is a projector matrix and can be dropped without increasing a unitarily invariant norm. [sent-229, score-0.306]

67 To better understand this step, notice that Xγ gives a γ-approximation to the ˜ optimal k-means clustering of the matrix AR, and any other n × k indicator matrix (for example, the matrix Xopt ) satisfies 2 2 2 ⊤ ⊤ I − Xγ Xγ AR F ≤ γ min (I − XX ⊤ )AR F ≤ γ I − Xopt Xopt AR F . [sent-235, score-0.56]

68 1 0 50 100 150 200 number of dimensions t 250 300 0. [sent-264, score-0.083]

69 35 0 50 100 150 200 number of dimensions t 250 0 300 0 50 100 150 200 number of dimensions t 250 300 Figure 1: The results of our experiments after running Algorithm 1 with k = 40 on the face images collection. [sent-265, score-0.319]

70 5 Experiments This section describes an empirical evaluation of Algorithm 1 on a face images collection. [sent-274, score-0.093]

71 We implemented our algorithm in MatLab and compared it against other prominent dimensionality reduction techniques such as the Local Linear Embedding (LLE) algorithm and the Laplacian scores for feature selection. [sent-275, score-0.232]

72 Our empirical findings are very promising indicating that our algorithm and implementation could be very useful in real applications involving clustering of large-scale data. [sent-278, score-0.201]

73 1 An application of Algorithm 1 on a face images collection We experiment with a face images collection. [sent-280, score-0.186]

74 This collection contains 400 face images of dimensions 64 × 64 corresponding to 40 different people. [sent-282, score-0.176]

75 These images form 40 groups each one containing exactly 10 different images of the same person. [sent-283, score-0.109]

76 After vectorizing each 2-D image and putting it as a row vector in an appropriate matrix, one can construct a 400 × 4096 image-by-pixel matrix A. [sent-284, score-0.099]

77 In this matrix, objects are the face images of the ORL collection while features are the pixel values of the images. [sent-285, score-0.093]

78 To apply the Lloyd’s heuristic on A, we employ MatLab’s function kmeans with the parameter determining the maximum number of repetitions setting to 30. [sent-286, score-0.1]

79 whenever we call kmeans with inputs a matrix A ∈ R400×d , with d ≥ 1, and the integer k = 40, ˜ respectively. [sent-289, score-0.198]

80 , 391-th rows of A, corresponds to picking images from the forty different groups of the available collection, since the images of every group are stored sequentially in A. [sent-293, score-0.128]

81 We evaluate the clustering outcome from two different perspectives. [sent-294, score-0.146]

82 First, we measure and report the objective function F of the k-means clustering problem. [sent-295, score-0.146]

83 Second, we report the mis-classification accuracy of the clustering result. [sent-299, score-0.146]

84 9, for example, implies that 90% of the objects were assigned to the correct cluster after the application of the clustering algorithm. [sent-301, score-0.169]

85 In the sequel, we first perform experiments by running Algorithm 1 with everything fixed but t, which denotes the dimensionality of the projected data. [sent-302, score-0.147]

86 Then, for four representative values of t, we compare Algorithm 1 with three other dimensionality reduction methods as well with the approach of running the Lloyd’s heuristic on the original high dimensional data. [sent-303, score-0.19]

87 First, the normalized objective function F is a ˜ is large in the first few choices piece-wise non-increasing function of the number of dimensions t. [sent-310, score-0.083]

88 ˜ of t; then, increasing the number of dimensions t of the projected data decreases F by a smaller value. [sent-352, score-0.118]

89 Finally, we report the running time T of the algorithm which includes only the clustering step. [sent-358, score-0.24]

90 Compare, for example, the one second running time that is needed to solve the k-means problem when t = 275 against the 10 seconds that are necessary to solve the problem on the high dimensional data. [sent-362, score-0.095]

91 To our benefit, in this case, the multiplication AR takes only 0. [sent-363, score-0.062]

92 We now compare our algorithm against other dimensionality reduction techniques. [sent-366, score-0.146]

93 In terms of computational complexity, for example t = 50, the time (in seconds) needed for all five methods (only the dimension reduction step) are TSV D = 5. [sent-369, score-0.087]

94 2 A note on the mailman algorithm for matrix-matrix and matrix-vector multiplication In this section, we compare three different implementations of the third step of Algorithm 1. [sent-376, score-0.313]

95 1, the mailman algorithm is asymptotically faster than naively multiplying the two matrices A and R. [sent-378, score-0.278]

96 In this section we want to understand whether this asymptotic behavior of the mailman algorithm is indeed achieved in a practical implementation. [sent-379, score-0.251]

97 We observed that when A is a vector (n = 1), then the mailman algorithm is indeed faster than (MM1) and (MM2) as it is also observed in the numerical experiments of [15]. [sent-383, score-0.251]

98 On the other hand, our best implementation of the mailman algorithm for matrix-matrix operations is inferior to both (MM1) and (MM2) for any 10 ≤ n ≤ 10, 000. [sent-385, score-0.272]

99 Random projection in dimensionality reduction: applications to image and text data. [sent-407, score-0.092]

100 A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. [sent-468, score-0.18]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xopt', 0.612), ('vk', 0.531), ('mailman', 0.217), ('ak', 0.162), ('clustering', 0.146), ('ar', 0.131), ('lloyd', 0.111), ('lemma', 0.104), ('matrix', 0.099), ('svd', 0.097), ('embedding', 0.096), ('dimensions', 0.083), ('matlab', 0.067), ('cr', 0.064), ('multiplication', 0.062), ('running', 0.06), ('reduction', 0.06), ('boutsidis', 0.059), ('lle', 0.059), ('unitarily', 0.059), ('factor', 0.055), ('projector', 0.052), ('dimensionality', 0.052), ('singular', 0.051), ('rr', 0.05), ('indicator', 0.05), ('rp', 0.05), ('uk', 0.049), ('ik', 0.048), ('notice', 0.048), ('face', 0.048), ('drineas', 0.048), ('dropped', 0.047), ('aa', 0.047), ('images', 0.045), ('statement', 0.045), ('sign', 0.043), ('preserved', 0.042), ('inputs', 0.041), ('xx', 0.041), ('inequality', 0.04), ('symposium', 0.04), ('projection', 0.04), ('approximability', 0.04), ('ccf', 0.04), ('rpi', 0.04), ('submultiplicativity', 0.04), ('rows', 0.038), ('clusters', 0.038), ('projections', 0.038), ('rn', 0.036), ('projected', 0.035), ('petros', 0.035), ('christos', 0.035), ('orl', 0.035), ('seconds', 0.035), ('algorithm', 0.034), ('preserves', 0.034), ('focs', 0.032), ('kmeans', 0.032), ('distorting', 0.032), ('extraction', 0.032), ('get', 0.03), ('laplacian', 0.03), ('var', 0.03), ('triangle', 0.03), ('savings', 0.028), ('guyon', 0.028), ('log', 0.028), ('dimension', 0.027), ('matrices', 0.027), ('hd', 0.027), ('preprocessing', 0.026), ('determining', 0.026), ('integer', 0.026), ('scores', 0.026), ('losing', 0.026), ('feature', 0.026), ('stoc', 0.024), ('employ', 0.024), ('cluster', 0.023), ('ls', 0.023), ('implementation', 0.021), ('rescaled', 0.021), ('foundations', 0.021), ('concludes', 0.021), ('neighbors', 0.02), ('random', 0.02), ('nition', 0.02), ('johnson', 0.019), ('zj', 0.019), ('iii', 0.019), ('theorem', 0.019), ('invariant', 0.019), ('containing', 0.019), ('min', 0.019), ('heuristic', 0.018), ('diagonal', 0.018), ('points', 0.018), ('union', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 221 nips-2010-Random Projections for $k$-means Clustering

Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas

Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

2 0.19877 98 nips-2010-Functional form of motion priors in human motion perception

Author: Hongjing Lu, Tungyou Lin, Alan Lee, Luminita Vese, Alan L. Yuille

Abstract: It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More specifically, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, first-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively. In our first experimental session we estimate the weights of the three terms for each functional form to maximize the fit to human performance. We then measured human performance for motion tasks and found that we obtained better fit for the L1-norm (Laplace) than for the L2-norm (Gaussian). We note that the L1-norm is also a better fit to the statistics of motion in natural environments. In addition, we found large weights for the second-order smoothness term, indicating the importance of high-order smoothness compared to slowness and lower-order smoothness. To validate our results further, we used the best fit models using the L1-norm to predict human performance in a second session with different experimental setups. Our results showed excellent agreement between human performance and model prediction – ranging from 3% to 8% for five human subjects over ten experimental conditions – and give further support that the human visual system uses an L1-norm (Laplace) prior.

3 0.1614279 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations

Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan

Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.

4 0.15645291 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

Author: Jason Lee, Ben Recht, Nathan Srebro, Joel Tropp, Ruslan Salakhutdinov

Abstract: The max-norm was proposed as a convex matrix regularizer in [1] and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro [2] to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas. 1

5 0.13658966 257 nips-2010-Structured Determinantal Point Processes

Author: Alex Kulesza, Ben Taskar

Abstract: We present a novel probabilistic model for distributions over sets of structures— for example, sets of sequences, trees, or graphs. The critical characteristic of our model is a preference for diversity: sets containing dissimilar structures are more likely. Our model is a marriage of structured probabilistic models, like Markov random fields and context free grammars, with determinantal point processes, which arise in quantum physics as models of particles with repulsive interactions. We extend the determinantal point process model to handle an exponentially-sized set of particles (structures) via a natural factorization of the model into parts. We show how this factorization leads to tractable algorithms for exact inference, including computing marginals, computing conditional probabilities, and sampling. Our algorithms exploit a novel polynomially-sized dual representation of determinantal point processes, and use message passing over a special semiring to compute relevant quantities. We illustrate the advantages of the model on tracking and articulated pose estimation problems. 1

6 0.13277674 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms

7 0.12860404 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration

8 0.09728577 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

9 0.088561803 134 nips-2010-LSTD with Random Projections

10 0.082639486 148 nips-2010-Learning Networks of Stochastic Differential Equations

11 0.08035551 261 nips-2010-Supervised Clustering

12 0.077278726 45 nips-2010-CUR from a Sparse Optimization Viewpoint

13 0.076560922 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

14 0.075771518 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

15 0.070549905 124 nips-2010-Inductive Regularized Learning of Kernel Functions

16 0.06277746 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

17 0.061186008 280 nips-2010-Unsupervised Kernel Dimension Reduction

18 0.058888018 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

19 0.057049733 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

20 0.056997743 270 nips-2010-Tight Sample Complexity of Large-Margin Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.168), (1, 0.014), (2, 0.063), (3, 0.028), (4, 0.013), (5, -0.125), (6, 0.046), (7, 0.006), (8, 0.055), (9, -0.017), (10, 0.126), (11, -0.161), (12, 0.081), (13, -0.031), (14, 0.273), (15, -0.026), (16, 0.092), (17, 0.034), (18, -0.014), (19, 0.047), (20, 0.096), (21, 0.017), (22, -0.008), (23, -0.025), (24, -0.09), (25, 0.051), (26, 0.036), (27, -0.139), (28, 0.035), (29, -0.125), (30, 0.08), (31, -0.088), (32, -0.053), (33, 0.033), (34, 0.116), (35, -0.039), (36, 0.129), (37, -0.081), (38, -0.031), (39, -0.072), (40, 0.035), (41, -0.064), (42, 0.117), (43, -0.022), (44, 0.015), (45, -0.093), (46, -0.005), (47, -0.065), (48, -0.08), (49, -0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93414909 221 nips-2010-Random Projections for $k$-means Clustering

Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas

Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

2 0.65337336 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations

Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan

Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.

3 0.60082918 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms

Author: Margareta Ackerman, Shai Ben-David, David Loker

Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinberg’s famous impossibility result, while providing a simpler proof. 1

4 0.56099856 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

Author: Jason Lee, Ben Recht, Nathan Srebro, Joel Tropp, Ruslan Salakhutdinov

Abstract: The max-norm was proposed as a convex matrix regularizer in [1] and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro [2] to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas. 1

5 0.53438658 45 nips-2010-CUR from a Sparse Optimization Viewpoint

Author: Jacob Bien, Ya Xu, Michael W. Mahoney

Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.

6 0.51305664 261 nips-2010-Supervised Clustering

7 0.49591568 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

8 0.46784669 257 nips-2010-Structured Determinantal Point Processes

9 0.42780483 98 nips-2010-Functional form of motion priors in human motion perception

10 0.41356823 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

11 0.39355385 166 nips-2010-Minimum Average Cost Clustering

12 0.38585362 134 nips-2010-LSTD with Random Projections

13 0.38561812 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

14 0.3793011 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

15 0.37596267 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

16 0.37285525 223 nips-2010-Rates of convergence for the cluster tree

17 0.36931032 220 nips-2010-Random Projection Trees Revisited

18 0.36799878 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration

19 0.3342545 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

20 0.31738177 287 nips-2010-Worst-Case Linear Discriminant Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.321), (17, 0.013), (27, 0.033), (30, 0.138), (35, 0.037), (45, 0.135), (50, 0.049), (52, 0.043), (60, 0.049), (77, 0.027), (78, 0.018), (90, 0.021), (95, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91758746 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms

Author: Benjamin Miller, Nadya Bliss, Patrick J. Wolfe

Abstract: When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a “detection theory” for graph-valued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph’s so-called modularity matrix. This metric is observed to have relatively low variance for certain categories of randomly-generated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not well-correlated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach. 1

same-paper 2 0.89225888 221 nips-2010-Random Projections for $k$-means Clustering

Author: Christos Boutsidis, Anastasios Zouzias, Petros Drineas

Abstract: This paper discusses the topic of dimensionality reduction for k-means clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ Rn×d ) can be projected into t = Ω(k/ε2 ) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε−2 k/ log(d)⌉) time, such that with constant probability the optimal k-partition of the point set is preserved within a factor of 2 + √ The projection is done ε. √ by post-multiplying A with a d × t random matrix R having entries +1/ t or −1/ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

3 0.88511908 192 nips-2010-Online Classification with Specificity Constraints

Author: Andrey Bernstein, Shie Mannor, Nahum Shimkin

Abstract: We consider the online binary classification problem, where we are given m classifiers. At each stage, the classifiers map the input to the probability that the input belongs to the positive class. An online classification meta-algorithm is an algorithm that combines the outputs of the classifiers in order to attain a certain goal, without having prior knowledge on the form and statistics of the input, and without prior knowledge on the performance of the given classifiers. In this paper, we use sensitivity and specificity as the performance metrics of the meta-algorithm. In particular, our goal is to design an algorithm that satisfies the following two properties (asymptotically): (i) its average false positive rate (fp-rate) is under some given threshold; and (ii) its average true positive rate (tp-rate) is not worse than the tp-rate of the best convex combination of the m given classifiers that satisfies fprate constraint, in hindsight. We show that this problem is in fact a special case of the regret minimization problem with constraints, and therefore the above goal is not attainable. Hence, we pose a relaxed goal and propose a corresponding practical online learning meta-algorithm that attains it. In the case of two classifiers, we show that this algorithm takes a very simple form. To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting. 1

4 0.88002306 45 nips-2010-CUR from a Sparse Optimization Viewpoint

Author: Jacob Bien, Ya Xu, Michael W. Mahoney

Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.

5 0.84445935 261 nips-2010-Supervised Clustering

Author: Pranjal Awasthi, Reza B. Zadeh

Abstract: Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.

6 0.83714914 284 nips-2010-Variational bounds for mixed-data factor analysis

7 0.83588582 146 nips-2010-Learning Multiple Tasks using Manifold Regularization

8 0.75230616 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

9 0.73611653 262 nips-2010-Switched Latent Force Models for Movement Segmentation

10 0.70477813 222 nips-2010-Random Walk Approach to Regret Minimization

11 0.69873047 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

12 0.69798219 226 nips-2010-Repeated Games against Budgeted Adversaries

13 0.68893629 117 nips-2010-Identifying graph-structured activation patterns in networks

14 0.68736058 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

15 0.68706387 89 nips-2010-Factorized Latent Spaces with Structured Sparsity

16 0.6772067 265 nips-2010-The LASSO risk: asymptotic results and real world examples

17 0.67162645 166 nips-2010-Minimum Average Cost Clustering

18 0.66637629 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

19 0.6660009 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

20 0.65971732 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices