nips nips2012 nips2012-133 knowledge-graph by maker-knowledge-mining

133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery


Source: pdf

Author: Ehsan Elhamifar, Guillermo Sapiro, René Vidal

Abstract: Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. [sent-3, score-1.04]

2 We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. [sent-4, score-0.707]

3 When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. [sent-5, score-1.237]

4 Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i. [sent-6, score-0.45]

5 , it does not require that the pairwise dissimilarities come from a metric. [sent-8, score-0.398]

6 1 Introduction Finding a subset of data points, called representatives or exemplars, which can efficiently describe the data collection, is an important problem in scientific data analysis with applications in machine learning, computer vision, information retrieval, etc. [sent-10, score-0.945]

7 For example, the efficiency of the NN method improves [1] by comparing tests samples to K representatives as opposed to all N training samples, where typically we have K N . [sent-13, score-0.849]

8 The first group of algorithms finds representatives from data that lie in one or multiple low-dimensional subspaces and typically operate on the measurement data vectors directly [5, 6, 7, 8, 9, 10, 11]. [sent-17, score-0.898]

9 1 The second group of algorithms finds representatives by assuming that there is a natural grouping of the data collection based on an appropriate measure of similarity between pairs of data points [2, 4, 12, 13, 14]. [sent-21, score-1.011]

10 The Kmedoids algorithm [2] tries to find K representatives from pairwise dissimilarities between data points. [sent-23, score-1.283]

11 The Affinity Propagation (AP) algorithm [4, 13, 14] tries to find representatives from pairwise similarities between data points by using a message passing algorithm. [sent-26, score-1.027]

12 In this paper, we propose an algorithm for selecting representatives of a data collection given dissimilarities between pairs of data points. [sent-28, score-1.315]

13 We propose a row-sparsity regularized [18, 19] trace minimization program whose objective is to find a few representatives that encode well the collection of data points according to the provided dissimilarities. [sent-29, score-1.113]

14 The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. [sent-30, score-1.04]

15 Instead of choosing the number of representatives, the regularization parameter puts a trade-off between the number of representatives and the encoding cost of the data points via the representatives based on the dissimilarities. [sent-31, score-1.89]

16 We obtain the range of the regularization parameter where the solution of the proposed optimization program changes from selecting one representative for all data points to selecting each data point as a representative. [sent-32, score-0.636]

17 When there is a clustering of data points, defined based on their dissimilarities, we show that, for a suitable range of the regularization parameter, the algorithm finds representatives from each cluster. [sent-33, score-0.954]

18 Moreover, data points in each cluster select representatives only from the same cluster. [sent-34, score-1.059]

19 Unlike metric-based methods, we do not require that the dissimilarities come from a metric. [sent-35, score-0.365]

20 Specifically, the dissimilarities can be asymmetric or can violate the triangle inequality. [sent-36, score-0.45]

21 2 Problem Statement We consider the problem of finding representatives from a collection of N data points. [sent-38, score-0.884]

22 Assume we are given a set of nonnegative dissimilarities {dij }i,j=1,. [sent-39, score-0.365]

23 The dissimilarity dij indicates how well the data point i is suited to be a representative of the data point j. [sent-43, score-0.435]

24 More specifically, the smaller the value of dij is, the better the data point i is a representative of the data point j. [sent-44, score-0.391]

25 1 Such dissimilarities can be built from measured data points, e. [sent-45, score-0.397]

26 We can arrange the dissimilarities into a matrix of the form     d1 d11 d12 · · · d1N  . [sent-51, score-0.386]

27 Remark 1 We do not require the dissimilarities to satisfy the triangle inequality. [sent-64, score-0.389]

28 In the experiments, we will show an example of asymmetric dissimilarities for finding representative sentences in text documents. [sent-68, score-0.64]

29 Given D, our goal is to select a subset of data points, called representatives or exemplars, that efficiently represent the collection of data points. [sent-69, score-0.959]

30 We consider an optimization program that promotes selecting a few data points that can well encode all data points via the dissimilarities. [sent-70, score-0.437]

31 To do so, we consider variables zij associated with dissimilarities dij and denote by the matrix of all variables as 1 dii can be set to have a nonzero value, as we will show in the experiments on the text data. [sent-71, score-0.703]

32 005 λ max,2 data points representatives 1 Representatives for λ =0. [sent-74, score-0.961]

33 01 λ max,2 max,2 data points representatives 1 data points representatives 1 Representatives for λ =1 λ max,2 data points representatives 1 data points representatives 1 0. [sent-76, score-3.844]

34 05 λmax,∞ data points representatives 1 −1 −1 −1 0 1 2 3 4 5 0 1 2 3 4 5 −1 Representatives for λ =0. [sent-87, score-0.961]

35 9 λmax,∞ data points representatives 1 −1 −1 Representatives for λ =0. [sent-88, score-0.961]

36 5 data points representatives 1 0 1 2 3 4 5 Representatives for λ =1 λmax,∞ data points representatives 1 0. [sent-90, score-1.922]

37 5 −1 0 1 2 3 4 5 −1 −1 0 1 2 3 4 5 −1 0 1 2 3 4 5 Figure 1: Data points (blue dots) in two clusters and the representatives (red circles) found by the proposed optimization program in (4) for several values of λ with λmax,q defined in (6). [sent-100, score-1.109]

38 We interpret zij as the probability that data point i be a representative for data point j, hence zij ∈ [0, 1]. [sent-115, score-0.473]

39 A data point j can have multiple representatives in N which case zij > 0 for all the indices i of the representatives. [sent-116, score-0.992]

40 As a result, we must have i=1 zij = 1, which ensures that the total probability of data point j choosing all its representatives is equal to one. [sent-117, score-0.992]

41 Our goal is to select a few representatives that well encode the data collection according to the dissimilarities. [sent-118, score-0.929]

42 First, we want the representatives to encode well all data points via dissimilarities. [sent-120, score-0.978]

43 If the data point i is chosen to be a representative of a data point j with probability zij , the cost of encoding j with i is dij zij ∈ [0, dij ]. [sent-121, score-0.773]

44 Hence, the total cost of encoding j using all N its representatives is i=1 dij zij . [sent-122, score-1.114]

45 Second, we would like to have as few representatives as possible for all the data points. [sent-123, score-0.866]

46 Having a few representatives then corresponds to having a few nonzero rows in the matrix Z. [sent-127, score-0.903]

47 Putting these two goals together, we consider the following minimization program N N min N dij zij + λ j=1 i=1 N I( z i q ) s. [sent-128, score-0.343]

48 The first term in the objective function corresponds to the total cost of encoding all data points using the representatives and the second term corresponds to the cost associated with the number of the representatives. [sent-131, score-1.034]

49 Since the minimization in (3) that involves counting the number of nonzero rows of Z is, in general, NP-hard, we consider the following standard convex relaxation N N min N dij zij + λ j=1 i=1 N zi q s. [sent-133, score-0.302]

50 As we change the regularization parameter λ in (4), the number of representatives found by the algorithm changes. [sent-193, score-0.873]

51 Figures 1 and 2 illustrate the representatives and the matrix Z, respectively, for several values of λ. [sent-200, score-0.855]

52 In Section 3, we compute the range of λ for which the solution of (4) changes from a single representative to all points being representatives. [sent-201, score-0.292]

53 Once we have solved the optimization program (4), we can find the representative indices from the nonzero rows of Z. [sent-204, score-0.321]

54 We can also obtain the clustering of data points into K clusters associated with K representatives by assigning each data point to its closets representative. [sent-205, score-1.118]

55 In Section 3 we show that when there is a clustering of data points based on their dissimilarities (see Definition 1), each point selects representatives from its own cluster. [sent-208, score-1.38]

56 We obtain a threshold value on λ after which the solution of (4) remains the same, selecting only one representative data point. [sent-211, score-0.282]

57 However, the two cases obtain the same representative given by the data point for which 1 di is minimum, i. [sent-218, score-0.282]

58 , the data point with the smallest sum of 4 δ1 δ2 ∆ Figure 3: Data points in two clusters with dissimilarities given by pairwise Euclidean distances. [sent-220, score-0.6]

59 For λ < ∆ − max{δ1 , δ2 }, in the solution of the optimization program (4), points in each cluster are represented by representatives from the same cluster. [sent-221, score-1.136]

60 Notice also that when the dissimilarities are the Euclidean distances between the data points, the single representative corresponds to the data point closest to the geometric median of all data points, as shown in the right plot of Figure 1. [sent-223, score-0.663]

61 When the regularization parameter λ is smaller than the threshold in (6), the optimization program in (4) can find multiple representatives for each data point. [sent-224, score-1.039]

62 However, when there is a clustering of data points based on their dissimilarities (see Definition 1), we expect to select representatives from each cluster. [sent-225, score-1.384]

63 In addition, we expect that the data points in each cluster be associated with the representatives in that cluster only. [sent-226, score-1.101]

64 Next, we show that for a suitable range of the regularization parameter that depends on the intraclass and interclass dissimilarities, the probability that a point chooses representatives from other clusters is zero. [sent-237, score-1.043]

65 j i ∈Cj j ∈Cj i=j i ∈Ci (8) Then for λ ≤ λc , the optimization program (4) finds representatives in each cluster, where the data points in every Ci select representatives only from Ci . [sent-244, score-1.939]

66 A less tight clustering threshold λc ≤ λc on the regularization parameter is given by λc min min i=j i ∈Ci ,j ∈Cj di j − max max di j . [sent-245, score-0.277]

67 , when the intraclass dissimilarities increase or the interclass dissimilarities decrease, the maximum possible λ for which we obtain clustering increases. [sent-250, score-0.856]

68 As an illustrative example, consider Figure 3, where data points are distributed in two clusters according to the dissimilarities given by the pairwise Euclidean distances of the data points. [sent-251, score-0.629]

69 1 Left cluster Right cluster −2 10 25 20 15 10 5 0 0 −4 10 α q = ∞ , ∆ / δ =4 Left cluster Right cluster −2 10 α 30 Number of Representatives q = 2 , ∆ / δ =1. [sent-256, score-0.28]

70 1 30 20 15 10 5 0 0 10 Left cluster Right cluster 25 10 −4 −2 10 0 10 10 α Figure 4: Number of representatives obtained by the proposed optimization program in (4) for data points in the two clusters shown in Fig. [sent-257, score-1.281]

71 5 λmax,∞ 1 data points representatives 3 1 data points representatives 3 20 20 0. [sent-263, score-1.922]

72 λmax,q (Ci ) denotes the threshold on λ after which we obtain only one representative from Ci , then for maxi λmax,q (Ci ) ≤ λ < λc , the data points in each Ci select only one representative that is in Ci . [sent-281, score-0.507]

73 As scaling of D and λ by the same value does not change the solution of (4), we always scale dissimilarities to lie in [0, 1] by dividing the elements of D by its largest element. [sent-292, score-0.386]

74 Figures 1 and 2 show the representatives and the matrix of variables Z, respectively, for several values of the regularization parameter. [sent-299, score-0.894]

75 Notice that, as discussed before, for small values of λ, we obtain more representatives and as we increase λ, the number of representatives decreases. [sent-300, score-1.688]

76 It is important to note that, as we showed in the theoretical analysis, when the regularization parameter is sufficiently small, data points in each cluster only select representatives from that cluster (see Figure 2), i. [sent-302, score-1.168]

77 The two left-hand side plots in Figure 4 show the number of the representatives for q = 2 and q = ∞, respectively, from each of the two clusters. [sent-307, score-0.834]

78 As shown, when λ gets larger than λmax,q , we obtain only one representative from the right cluster and no representative from the left cluster, i. [sent-308, score-0.404]

79 Horizontal axis shows the percentage of the selected representatives from each class (averaged over all classes). [sent-312, score-0.849]

80 The two right-hand side plots in Figure 4 show the number of the representatives when we increase the distance between the two clusters. [sent-316, score-0.834]

81 Notice that we obtain similar results as before except that the range of λ for which we select one representative from each cluster has increased. [sent-317, score-0.294]

82 Finding representatives that correspond to the modes of the data distribution helps to significantly reduce the computational cost and memory requirements of classification algorithms, while maintaining their performance. [sent-331, score-0.883]

83 We find the representatives of the training data in each class of a dataset and use the representatives as a reduced training set to perform NN classification on the test data. [sent-333, score-1.73]

84 We obtain the representatives by taking dissimilarities to be pairwise Euclidean distances between data points. [sent-334, score-1.305]

85 As the results show, the classification performance using the representatives found by our proposed algorithm is close to that of using all the training samples. [sent-340, score-0.862]

86 Specifically, in the USPS dataset, using representatives found by our proposed method, which consist of only 16% of the training samples, we obtain 6. [sent-341, score-0.882]

87 In the ISOLET dataset, with representatives corresponding to less than half of the training samples, we obtain very close classification performance to using all the training samples (12. [sent-344, score-0.884]

88 Notice that when the number of representatives decreases, as expected, the classification performance also decreases. [sent-347, score-0.834]

89 7 Figure 7: Some frames of a political debate video, which consists of multiple shots, and the automatically computed representatives (inside red rectangles) of the whole video sequence using our proposed algorithm. [sent-349, score-0.932]

90 We set the dissimilarities to be the Euclidean distances between pairs of data points. [sent-355, score-0.418]

91 Figure 7 shows some frames of the video and the representatives computed by our method. [sent-356, score-0.885]

92 It is worth mentioning that the computed representatives do not change for λ ∈ [2. [sent-358, score-0.834]

93 3 Finding Representative Sentences in Text Documents As we discussed earlier, our proposed algorithm can deal with dissimilarities that are not necessarily metric, i. [sent-363, score-0.378]

94 We consider now an example of asymmetric dissimilarities where we find representative sentences in the text document of this paper. [sent-366, score-0.64]

95 For each word in sentence j, if the word matches3 a word in sentence i, we set the encoding cost for the word to the logarithm of the number of words in sentence i, which is the cost of encoding the index of the matched word. [sent-369, score-0.484]

96 We found that 96% of the dissimilarities are asymmetric. [sent-374, score-0.365]

97 The four representative sentences obtained by our algorithm summarize the paper as follows: –Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. [sent-375, score-1.543]

98 –We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. [sent-376, score-0.707]

99 –When there is a clustering of data points, defined based on their dissimilarities, we show that, for a suitable range of the regularization parameter, the algorithm finds representatives from each cluster. [sent-377, score-0.954]

100 –As the results show, the classification performance using the representatives found by our proposed algorithm is close to that of using all the training samples. [sent-378, score-0.862]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('representatives', 0.834), ('dissimilarities', 0.365), ('representative', 0.157), ('dij', 0.122), ('zij', 0.102), ('kmedoids', 0.1), ('points', 0.095), ('program', 0.089), ('cluster', 0.07), ('ci', 0.067), ('sentence', 0.054), ('clusters', 0.051), ('di', 0.049), ('exemplars', 0.046), ('cj', 0.044), ('dissimilarity', 0.044), ('word', 0.043), ('sentences', 0.043), ('frey', 0.041), ('encoding', 0.039), ('regularization', 0.039), ('asymmetric', 0.038), ('dueck', 0.038), ('interclass', 0.038), ('intraclass', 0.038), ('text', 0.037), ('ap', 0.036), ('nds', 0.036), ('selecting', 0.034), ('pairwise', 0.033), ('nn', 0.033), ('sapiro', 0.032), ('data', 0.032), ('usps', 0.031), ('nonzero', 0.03), ('clustering', 0.03), ('isolet', 0.03), ('max', 0.029), ('elhamifar', 0.029), ('vidal', 0.029), ('select', 0.028), ('optimization', 0.027), ('video', 0.027), ('dii', 0.026), ('qr', 0.026), ('dn', 0.025), ('prototypes', 0.025), ('frames', 0.024), ('triangle', 0.024), ('point', 0.024), ('words', 0.024), ('rand', 0.024), ('violate', 0.023), ('emphasis', 0.023), ('zn', 0.022), ('matrix', 0.021), ('classi', 0.021), ('distances', 0.021), ('solution', 0.021), ('facility', 0.02), ('nding', 0.02), ('obtain', 0.02), ('range', 0.019), ('mahoney', 0.019), ('debate', 0.019), ('nity', 0.019), ('tries', 0.019), ('rows', 0.018), ('threshold', 0.018), ('collection', 0.018), ('euclidean', 0.017), ('cost', 0.017), ('notice', 0.017), ('propagation', 0.017), ('af', 0.017), ('min', 0.017), ('kmeans', 0.017), ('classification', 0.017), ('encode', 0.017), ('finding', 0.017), ('promotes', 0.016), ('subset', 0.015), ('revealing', 0.015), ('grayscale', 0.015), ('political', 0.015), ('trace', 0.015), ('percentage', 0.015), ('synthetic', 0.015), ('training', 0.015), ('image', 0.014), ('soda', 0.014), ('nsf', 0.014), ('passing', 0.014), ('partitions', 0.014), ('put', 0.014), ('logarithm', 0.014), ('cation', 0.013), ('minimization', 0.013), ('proposed', 0.013), ('factorization', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Author: Ehsan Elhamifar, Guillermo Sapiro, René Vidal

Abstract: Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1

2 0.057933424 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

3 0.052139554 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

4 0.050312445 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

5 0.04439858 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

6 0.041682269 126 nips-2012-FastEx: Hash Clustering with Exponential Families

7 0.040444687 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

8 0.039657474 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

9 0.039107416 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

10 0.038580839 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

11 0.037133131 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

12 0.034644552 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

13 0.033585619 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

14 0.033432443 279 nips-2012-Projection Retrieval for Classification

15 0.032897137 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

16 0.030919407 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

17 0.030243848 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

18 0.029722709 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

19 0.029634776 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

20 0.029455427 208 nips-2012-Matrix reconstruction with the local max norm


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.089), (1, 0.032), (2, -0.007), (3, -0.047), (4, 0.003), (5, -0.01), (6, -0.01), (7, -0.008), (8, 0.017), (9, 0.029), (10, 0.041), (11, -0.055), (12, -0.008), (13, 0.004), (14, 0.041), (15, 0.001), (16, -0.042), (17, 0.023), (18, -0.02), (19, -0.02), (20, -0.037), (21, -0.035), (22, -0.03), (23, 0.005), (24, -0.012), (25, 0.063), (26, 0.018), (27, 0.01), (28, 0.015), (29, 0.001), (30, 0.022), (31, -0.009), (32, 0.032), (33, 0.018), (34, -0.091), (35, -0.001), (36, -0.017), (37, 0.028), (38, 0.019), (39, -0.003), (40, -0.066), (41, 0.022), (42, 0.015), (43, 0.02), (44, 0.036), (45, -0.006), (46, -0.012), (47, -0.007), (48, -0.017), (49, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88701797 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Author: Ehsan Elhamifar, Guillermo Sapiro, René Vidal

Abstract: Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1

2 0.69619572 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

3 0.69188464 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

4 0.66930497 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

5 0.65949905 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Author: Dijun Luo, Heng Huang, Feiping Nie, Chris H. Ding

Abstract: In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

6 0.65604085 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

7 0.56878805 196 nips-2012-Learning with Partially Absorbing Random Walks

8 0.55765122 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

9 0.54194939 125 nips-2012-Factoring nonnegative matrices with linear programs

10 0.53330964 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

11 0.50878227 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

12 0.50173789 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

13 0.49327502 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

14 0.48936158 291 nips-2012-Reducing statistical time-series problems to binary classification

15 0.4871023 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

16 0.47261658 126 nips-2012-FastEx: Hash Clustering with Exponential Families

17 0.4714404 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

18 0.47055662 26 nips-2012-A nonparametric variable clustering model

19 0.46086299 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

20 0.44826579 251 nips-2012-On Lifting the Gibbs Sampling Algorithm


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.066), (17, 0.011), (21, 0.017), (38, 0.124), (42, 0.038), (49, 0.258), (54, 0.017), (55, 0.012), (74, 0.041), (76, 0.127), (80, 0.082), (92, 0.044), (98, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77509564 231 nips-2012-Multiple Operator-valued Kernel Learning

Author: Hachem Kadri, Alain Rakotomamonjy, Philippe Preux, Francis R. Bach

Abstract: Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the case of kernel ridge regression for functional responses with an r -norm constraint on the combination coefficients (r ≥ 1). The resulting optimization problem is more involved than those of multiple scalar-valued kernel learning since operator-valued kernels pose more technical and theoretical issues. We propose a multiple operator-valued kernel learning algorithm based on solving a system of linear operator equations by using a block coordinate-descent procedure. We experimentally validate our approach on a functional regression task in the context of finger movement prediction in brain-computer interfaces. 1

same-paper 2 0.74639416 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Author: Ehsan Elhamifar, Guillermo Sapiro, René Vidal

Abstract: Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1

3 0.74532789 145 nips-2012-Gradient Weights help Nonparametric Regressors

Author: Samory Kpotufe, Abdeslam Boularias

Abstract: In regression problems over Rd , the unknown function f often varies more in some coordinates than in others. We show that weighting each coordinate i with the estimated norm of the ith derivative of f is an efficient way to significantly improve the performance of distance-based regressors, e.g. kernel and k-NN regressors. We propose a simple estimator of these derivative norms and prove its consistency. Moreover, the proposed estimator is efficiently learned online. 1

4 0.68460548 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. 1

5 0.6335097 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

6 0.6307472 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

7 0.6305179 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

8 0.63026154 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

9 0.6296069 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

10 0.62915963 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

11 0.62663364 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

12 0.62663007 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

13 0.62654471 292 nips-2012-Regularized Off-Policy TD-Learning

14 0.62618655 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

15 0.62596285 69 nips-2012-Clustering Sparse Graphs

16 0.62547004 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

17 0.62535453 180 nips-2012-Learning Mixtures of Tree Graphical Models

18 0.62493622 294 nips-2012-Repulsive Mixtures

19 0.62363267 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

20 0.6232273 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity