nips nips2012 nips2012-70 knowledge-graph by maker-knowledge-mining

70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk


Source: pdf

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fi Abstract Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. [sent-7, score-0.329]

2 However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. [sent-8, score-0.52]

3 Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. [sent-9, score-0.479]

4 The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. [sent-12, score-0.154]

5 Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. [sent-13, score-0.147]

6 Nonnegative Matrix Factorization (NMF) as a relaxation technique for clustering has shown remarkable progress in the past decade (see e. [sent-15, score-0.354]

7 In general, NMF finds a low-rank approximating matrix to the input nonnegative data matrix, where the most popular approximation criterion or divergence in NMF is the Least Square Error (LSE). [sent-18, score-0.323]

8 It has been shown that certain NMF variants with this divergence measure are equivalent to k-means, kernel k-means, or spectral graph cuts [7]. [sent-19, score-0.221]

9 Although popularly used, previous NMF methods based on LSE often yield mediocre performance for clustering, especially for data that lie in a curved manifold. [sent-23, score-0.152]

10 In clustering analysis, the cluster assignment is often inferred from pairwise similarities between data samples. [sent-24, score-0.588]

11 Commonly the similarities are calculated based on Euclidean distances. [sent-25, score-0.15]

12 For data in a curved manifold, only local Euclidean distances are reliable and similarities between non-neighboring samples are usually set to zero, which yields a sparse input matrix to NMF. [sent-26, score-0.367]

13 The same problem occurs for clustering nodes of a sparse network. [sent-28, score-0.317]

14 In this paper we propose a new NMF method for clustering such manifold data or sparse network data. [sent-29, score-0.385]

15 Previous NMF clustering methods based on LSE used an approximated matrix that takes only similarities within immediate neighborhood into account. [sent-30, score-0.588]

16 Here we consider multi-step similarities between data samples using graph random walk, which has shown to be an effective smoothing approach for finding global data structures such as clusters. [sent-31, score-0.31]

17 In NMF the smoothing can reduce the sparsity gap in the approximation and thus ease cluster analysis. [sent-32, score-0.196]

18 We name the new method NMF using graph Random walk (NMFR). [sent-33, score-0.232]

19 To overcome the above obstacles, we employ (1) a regularization technique that supplements the orthogonality constraint for better clustering, and (2) a more scalable fixed-point algorithm to calculate the product of the inverted matrix and the factorizing matrix. [sent-36, score-0.409]

20 The proposed algorithm is compared with nine other state-of-the-art clustering approaches on a large variety of real-world datasets. [sent-38, score-0.292]

21 Experimental results show that with only simple initialization NMFR performs pretty robust across 46 clustering tasks. [sent-39, score-0.347]

22 The new method achieves the best clustering purity for 36 of the selected datasets, and nearly the best for the rest. [sent-40, score-0.367]

23 In the remaining, we briefly review some related work of clustering by NMF in Section 2. [sent-42, score-0.292]

24 2 Pairwise Clustering by NMF Cluster analysis or clustering is the task of assigning a set of data samples into groups (called clusters) so that the objects in the same cluster are more similar to each other than to those in other clusters. [sent-46, score-0.4]

25 The pairwise similarities between n data samples can be encoded n×n in an undirected graph with adjacency matrix S ∈ R+ . [sent-48, score-0.354]

26 Because clustered data tend to have higher similarities within clusters and lower similarities between clusters, the similarity matrix in visualization has nearly diagonal blockwise looking if we sort the rows and columns by clusters. [sent-49, score-0.608]

27 Such structure motivated approximative low-rank factorization of S by the cluster indicator matrix U ∈ {0, 1}n×r for r clusters: S ≈ U U T , where Uik = 1 if the i-th sample is assigned to the k-th cluster and 0 otherwise. [sent-50, score-0.437]

28 Moreover, clusters of balanced sizes are desired in most clustering tasks. [sent-51, score-0.356]

29 One of the popular choices is nonnegativity and orthogonality constraint combination [11, 23]. [sent-58, score-0.184]

30 In this way, each row of W has only one non-zero entry because the non-zero parts of two nonnegative and orthogonal vectors do not overlap. [sent-60, score-0.161]

31 Some other Nonnegative Matrix Factorization (NMF) relaxations exist, for example, the kernel Convex NMF [9] and its special case Projective NMF [23], as well as the relaxation by using a left-stochastic matrix [2]. [sent-61, score-0.131]

32 Furthermore, SNMF with orthogonality has tight connection to classical objectives such as kernel k-means and normalized cuts [7, 23]. [sent-67, score-0.172]

33 2 Figure 1: Illustration of clustering the SEMEION handwritten digit dataset by NMF based on LSE: (a) the symmetrized 5-NN graph, (b) the correct clusters to be found, (c) the ideally assumed data that suits the least square error, (d) the smoothed input by using graph random walk. [sent-71, score-0.612]

34 3 NMF Using Graph Random Walk There is a serious drawback in previous NMF clustering methods using least square errors. [sent-76, score-0.356]

35 When minimizing S − S 2 for given S, the approximating matrix S should be diagonal blockwise for F clustering analysis, as shown in Figure 1 (b). [sent-77, score-0.516]

36 However, the similarity matrix of real-world data often occurs differently from the ideal case. [sent-79, score-0.182]

37 In many clustering tasks, the raw features of data are usually weak. [sent-80, score-0.322]

38 The similarities calculated from such distances are thus sparse, where the similarities between nonneighboring samples are usually set to zero. [sent-82, score-0.33]

39 For example, symmetrized K-nearest-neighbor (K-NN) graph is a popularly used similarity input. [sent-83, score-0.219]

40 Therefore, similarity matrices in real-world clustering tasks often look like Figure 1 (a), where the non-zero entries are much sparser than the ideal case. [sent-84, score-0.38]

41 It is a mismatch to approximate a sparse similarity matrix by a dense diagonal blockwise matrix using LSE. [sent-85, score-0.363]

42 Because squared Euclidean distance is a symmetric metric, the learning objective can be dominated by the approximation to the majority of zero entries, which is undesired for finding correct cluster assignments. [sent-86, score-0.135]

43 Although various matrix factorization schemes and factorizing matrix 3 constraints have been proposed for NMF, little research effort has been made to overcome the above mismatch. [sent-87, score-0.355]

44 In this work we present a different way to formalize NMF for clustering to reduce the sparsity gap between input and output matrices. [sent-88, score-0.292]

45 Instead of approximation to the sparse input S, which only encodes the immediate similarities between data samples, we propose to approximate a smoothed version of S which takes farther relationships between data samples into account. [sent-89, score-0.298]

46 Graph random walk is a common way to implement multi-step similarities. [sent-90, score-0.16]

47 Denote Q = D−1/2 SD−1/2 the normalized similarity matrix, where D is a diagonal matrix with Dii = j Sij . [sent-91, score-0.226]

48 The similarities between data j nodes using j steps are given by (αQ) , where α ∈ (0, 1) is a decay parameter controlling the ranj ∞ dom walk extent. [sent-92, score-0.31]

49 A smoothed approximated matrix A is shown in Figure 1 (d), from which we can see the sparsity gap to the approximating matrix is reduced. [sent-97, score-0.32]

50 Just smoothing the input matrix by random walk is not enough, as we are presented with two difficulties. [sent-98, score-0.342]

51 First, random walk only alters the spectrum of Q, while the eigensubspaces of A and Q are the same. [sent-99, score-0.16]

52 Smoothing therefore does not change the result of clustering algorithms that operate on the eigenvectors (e. [sent-100, score-0.292]

53 That is, smoothing by random walk itself can bring little improvement unless we impose extra constraints or regularization. [sent-105, score-0.273]

54 To improve over spectral clustering, we propose to regularize the trace maximization by an extra penalty term on W . [sent-113, score-0.133]

55 The new optimization problem for pairwise clustering is: 2 T 2 Wik minimize J (W ) = −Tr W AW + λ W ≥0 i (3) k subject to W T W = I, (4) where λ > 0 is the tradeoff parameter. [sent-114, score-0.356]

56 The extra penalty term collaborates with the orthogonality constraint for pairwise clustering, which is justified by two interpretations. [sent-116, score-0.213]

57 Given the constraints W ≥ 0 and W T W = I, it is beneficial to push the magnitudes in W W T off-diagonal for maximizing the correlation to similarities between different data samples. [sent-120, score-0.15]

58 2 Because i ai = r is constant, minimizing i ai actually maximizing ij:i=j ai aj . [sent-123, score-0.105]

59 The equalization by the proposed penalty term thus well supplements the nonnegativity and orthogonality constraints and, as a whole, provides closer relaxation to the normalized cluster indicator matrix M . [sent-126, score-0.537]

60 4 Algorithm 1 Large-Scale Relaxed Majorization and Minimization Algorithm for W Input: similarity matrix S, random walk extent α ∈ (0, 1), number of clusters r, nonnegative initial guess of W . [sent-127, score-0.538]

61 until W converges Discretized W to cluster indicator matrix U Output: U . [sent-132, score-0.24]

62 Introducing Lagrangian multipliers {Λkl } for the orthogonality constraint, we have the augmented objective L(W, Λ) = J (W ) + Tr Λ W T W − I . [sent-135, score-0.099]

63 We then use the orthogonality constraint to solve the multipliers. [sent-137, score-0.124]

64 Substituting the multipliers in the preliminary update rule, we obtain an optimization algorithm which iterates the following multiplicative update rule: new Wik = Wik where V is a diagonal matrix with Vii = AW + 2λW W T V W ik (2λV W + W W T AW )ik l 1/4 (5) 2 Wil . [sent-138, score-0.381]

65 Instead, the monotonicity stated in the theorem justifies that the above algorithm jointly minimizes the J (W ) and drives W towards the manifold defined by the orthogonality constraint. [sent-143, score-0.167]

66 After W converges, we discretize it and obtain the cluster indicator matrix U . [sent-144, score-0.24]

67 We can thus avoid expensive computation and storage of large smoothed similarity matrix. [sent-147, score-0.126]

68 3 Initialization Most state-of-the-art clustering methods involve non-convex optimization objectives and thus only return local optima in general. [sent-162, score-0.318]

69 To achieve a better local 5 optimum, a clustering algorithm should start from one or more relatively considerate initial guesses. [sent-164, score-0.292]

70 Different strategies for choosing the starting point can be classified into the following levels, sorted by their computational cost: Level-0: (random-init) The starting relaxed indicator matrix is filled by randomly generated numbers. [sent-165, score-0.132]

71 Level-1: (simple-init) The starting matrix is the result of a cheap clustering method, e. [sent-166, score-0.417]

72 Level-4: (meta-co-init) Same as Level-3 except that clustering methods provide initialization for each other. [sent-175, score-0.347]

73 A preferable clustering method should achieve high accuracy with cheap initialization. [sent-182, score-0.323]

74 As we shall see, the proposed NMFR algorithm can attain satisfactory clustering accuracy with only simple initialization (Level-1). [sent-183, score-0.347]

75 We also selected two recent clustering methods beyond NMF: 1-Spectral (1Spec) [14] which uses balanced graph cut, and Interaction Component Model (ICM) [22] which is the symmetric version of topic model [3]. [sent-185, score-0.391]

76 The best α and the corresponding clustering result were then obtained by minimizing A − bW W T 2 with a suitable positive scalar F b. [sent-201, score-0.292]

77 Here we set b = 2λ using the heuristic that the penalty term in gradient can be interpreted as 1 removal of diagonal effect of approximating matrix. [sent-202, score-0.103]

78 The new clustering method is not very sensitive to the choice of α for large-scale datasets. [sent-204, score-0.292]

79 That is, their starting point was the Ncut cluster indicator matrix plus a small constant 0. [sent-208, score-0.24]

80 We have compared the above methods on clustering various datasets. [sent-210, score-0.292]

81 The clustering r 1 performance is evaluated by cluster purity = n k=1 max1≤l≤r nl , where nl is the number of k k data samples in the cluster k that belong to ground-truth class l. [sent-218, score-0.653]

82 A larger purity in general corresponds to a better clustering result. [sent-219, score-0.367]

83 Note that cluster purity can be regarded as classification accuracy if we have a few labeled data samples to remove ambiguity between clusters and classes. [sent-709, score-0.247]

84 In this sense, the resulting purities for such manifold data are even comparable to the state-of-theart supervised classification results. [sent-710, score-0.166]

85 In addition, NMFR also brings remarkable improvement for datasets beyond digit or letter recognition, for example, the text data RCV1, 20NEWS, protein data PROTEIN and sensor data SEISMIC. [sent-712, score-0.197]

86 Even for some small datasets where NMFR is not the winner, its cluster purities are still close to the best. [sent-714, score-0.245]

87 7 5 Conclusions We have presented a new NMF method using random walk for clustering. [sent-715, score-0.16]

88 Extensive empirical study has shown that our method can often improve clustering accuracy remarkably given simple initialization. [sent-717, score-0.316]

89 Moreover, the smoothing brings improved clustering accuracy but at the cost of increased running time. [sent-722, score-0.409]

90 In addition, the approximated matrix could also be learnable. [sent-724, score-0.12]

91 Given a real-valued matrix B, we can always decompose it into two nonnegative parts such that + − B = B + − B − , where Bij = (|Bij | + Bij )/2 and Bij = (|Bij | − Bij )/2. [sent-731, score-0.255]

92 a b (Minimization) Setting ∂G(W , Λ)/∂ Wik = 0 gives − new Wik = Wik + 1/4 + 2WΛ+ ik + 2WΛ− ik . [sent-734, score-0.224]

93 Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. [sent-770, score-0.255]

94 On the equivalence of nonnegative matrix factorization and spectral clustering. [sent-783, score-0.426]

95 Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. [sent-790, score-0.255]

96 On the equivalence between non-negative matrix factorization and probabilistic laten semantic indexing. [sent-803, score-0.183]

97 Symmetric nonnegative matrix factorization: Algorithms and applications to probabilistic clustering. [sent-823, score-0.255]

98 An inverse power method for nonlinear eigenproblems with applications in 1u Spectral clustering and sparse PCA. [sent-828, score-0.317]

99 How the result of graph clustering methods depends on the construction of the graph. [sent-846, score-0.364]

100 Unified development of multiplicative algorithms for linear and quadratic nonnegative matrix factorization. [sent-881, score-0.31]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nmf', 0.523), ('wik', 0.344), ('clustering', 0.292), ('nmfr', 0.244), ('lse', 0.179), ('nonnegative', 0.161), ('walk', 0.16), ('similarities', 0.15), ('aw', 0.124), ('ik', 0.112), ('cluster', 0.108), ('orthogonality', 0.099), ('purities', 0.098), ('matrix', 0.094), ('bij', 0.092), ('factorization', 0.089), ('smoothing', 0.088), ('icm', 0.086), ('spectral', 0.082), ('ncut', 0.079), ('purity', 0.075), ('ding', 0.074), ('dcd', 0.073), ('iterativesolver', 0.073), ('wil', 0.073), ('graph', 0.072), ('curved', 0.068), ('manifold', 0.068), ('smoothed', 0.067), ('uik', 0.065), ('clusters', 0.064), ('nonnegativity', 0.06), ('similarity', 0.059), ('projective', 0.056), ('multiplicative', 0.055), ('initialization', 0.055), ('blockwise', 0.053), ('factorizing', 0.053), ('symmetrized', 0.053), ('protein', 0.053), ('lsd', 0.049), ('mediocre', 0.049), ('onmf', 0.049), ('seismic', 0.049), ('snmf', 0.049), ('terative', 0.049), ('zhirong', 0.049), ('scalable', 0.046), ('yang', 0.043), ('nsc', 0.043), ('semeion', 0.043), ('cut', 0.042), ('majorization', 0.04), ('optdigits', 0.04), ('supplements', 0.04), ('square', 0.039), ('approximating', 0.039), ('datasets', 0.039), ('ij', 0.039), ('diagonal', 0.038), ('indicator', 0.038), ('pairwise', 0.038), ('cuts', 0.038), ('guesses', 0.037), ('finland', 0.037), ('obstacles', 0.037), ('relaxation', 0.037), ('nl', 0.035), ('popularly', 0.035), ('normalized', 0.035), ('ai', 0.035), ('euclidean', 0.034), ('icdm', 0.033), ('cheap', 0.031), ('sij', 0.031), ('usually', 0.03), ('farther', 0.03), ('divergence', 0.029), ('tr', 0.029), ('ideal', 0.029), ('brings', 0.029), ('calculating', 0.029), ('update', 0.028), ('calculate', 0.027), ('symmetric', 0.027), ('optimization', 0.026), ('immediate', 0.026), ('text', 0.026), ('approximated', 0.026), ('penalty', 0.026), ('overcome', 0.025), ('sparse', 0.025), ('extensive', 0.025), ('extra', 0.025), ('digit', 0.025), ('drawback', 0.025), ('remarkable', 0.025), ('comprehensive', 0.025), ('constraint', 0.025), ('remarkably', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

2 0.21385989 125 nips-2012-Factoring nonnegative matrices with linear programs

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X ≈ CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes. 1

3 0.19045928 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

4 0.17888363 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

5 0.16128856 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Author: Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, Anil K. Jain

Abstract: One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called semi-crowdsourced clustering that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects and from the manual annotations of only a small portion of the data to be clustered. One difficulty in learning the pairwise similarity measure is that there is a significant amount of noise and inter-worker variations in the manual annotations obtained via crowdsourcing. We address this difficulty by developing a metric learning algorithm based on the matrix completion method. Our empirical study with two real-world image data sets shows that the proposed algorithm outperforms state-of-the-art distance metric learning algorithms in both clustering accuracy and computational efficiency. 1

6 0.12768671 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

7 0.12581603 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

8 0.1160363 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

9 0.10199727 126 nips-2012-FastEx: Hash Clustering with Exponential Families

10 0.09416686 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

11 0.088653944 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

12 0.078831278 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

13 0.075290933 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

14 0.075191185 165 nips-2012-Iterative ranking from pair-wise comparisons

15 0.074677527 192 nips-2012-Learning the Dependency Structure of Latent Factors

16 0.071529783 26 nips-2012-A nonparametric variable clustering model

17 0.069098145 291 nips-2012-Reducing statistical time-series problems to binary classification

18 0.066039748 196 nips-2012-Learning with Partially Absorbing Random Walks

19 0.062210035 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

20 0.0614217 208 nips-2012-Matrix reconstruction with the local max norm


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.181), (1, 0.08), (2, 0.013), (3, -0.114), (4, -0.022), (5, -0.028), (6, -0.03), (7, -0.068), (8, 0.053), (9, 0.076), (10, 0.098), (11, -0.225), (12, -0.082), (13, -0.016), (14, 0.148), (15, 0.009), (16, -0.071), (17, 0.098), (18, -0.03), (19, -0.11), (20, -0.179), (21, -0.116), (22, -0.031), (23, 0.014), (24, 0.004), (25, -0.002), (26, 0.02), (27, 0.023), (28, 0.018), (29, -0.026), (30, -0.008), (31, 0.005), (32, -0.019), (33, 0.027), (34, -0.02), (35, -0.037), (36, 0.001), (37, 0.078), (38, 0.015), (39, 0.066), (40, 0.015), (41, 0.01), (42, 0.149), (43, -0.0), (44, 0.087), (45, 0.011), (46, -0.035), (47, 0.012), (48, 0.087), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96300411 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

2 0.84319299 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

3 0.82267433 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

4 0.75720042 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

5 0.73721051 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Author: Dijun Luo, Heng Huang, Feiping Nie, Chris H. Ding

Abstract: In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

6 0.70707452 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

7 0.68980467 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

8 0.67340487 125 nips-2012-Factoring nonnegative matrices with linear programs

9 0.61647475 26 nips-2012-A nonparametric variable clustering model

10 0.60046643 196 nips-2012-Learning with Partially Absorbing Random Walks

11 0.59437042 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

12 0.5760166 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

13 0.53169292 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

14 0.52590168 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

15 0.51151705 126 nips-2012-FastEx: Hash Clustering with Exponential Families

16 0.48185498 291 nips-2012-Reducing statistical time-series problems to binary classification

17 0.45388302 155 nips-2012-Human memory search as a random walk in a semantic network

18 0.44368592 128 nips-2012-Fast Resampling Weighted v-Statistics

19 0.44360027 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

20 0.43661785 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.086), (17, 0.016), (21, 0.01), (38, 0.133), (42, 0.025), (54, 0.394), (55, 0.013), (74, 0.051), (76, 0.101), (80, 0.043), (92, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92037332 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

2 0.79535705 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

same-paper 3 0.78483927 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

4 0.76991165 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

Author: Paul Vernaza, Drew Bagnell

Abstract: Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality— a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data. 1

5 0.74212205 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

6 0.7240535 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

7 0.62413472 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

8 0.6235677 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

9 0.58871996 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

10 0.58748662 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

11 0.57761931 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

12 0.57243836 38 nips-2012-Algorithms for Learning Markov Field Policies

13 0.56896126 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

14 0.56132913 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

15 0.56066269 160 nips-2012-Imitation Learning by Coaching

16 0.55514449 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

17 0.55304658 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

18 0.55282056 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

19 0.54940331 348 nips-2012-Tractable Objectives for Robust Policy Optimization

20 0.54858392 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization