nips nips2005 nips2005-79 knowledge-graph by maker-knowledge-mining

79 nips-2005-Fusion of Similarity Data in Clustering


Source: pdf

Author: Tilman Lange, Joachim M. Buhmann

Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 of Computer Sience, ETH Zurich, Switzerland Abstract Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. [sent-5, score-0.213]

2 Many studies have focussed on fusing information in supervised learning contexts. [sent-6, score-0.1]

3 We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. [sent-7, score-0.423]

4 Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. [sent-8, score-0.894]

5 The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. [sent-9, score-0.422]

6 For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. [sent-10, score-0.136]

7 The experiments demonstrate the performance of the method on toy as well as real world data sets. [sent-11, score-0.102]

8 The ability of an algorithm to determine an interesting partition of the set of objects under consideration, however, heavily depends on the available information. [sent-13, score-0.129]

9 How to reasonably identify a weighting of the different information sources such that an interesting group structure can be successfully uncovered, remains, however, a largely unresolved issue. [sent-15, score-0.237]

10 Different sources of information about the same objects naturally arise in many application scenarios. [sent-16, score-0.251]

11 In computer vision, for example, information sources can consist of plain intensity measurements, edge maps, the similarity to other images or even human similarity assessments. [sent-17, score-0.632]

12 In this work, we use a non-negative matrix factorization approach (nmf) to pairwise clustering of similarity data that is extended in a second step in order to incorporate a suitable weighting of multiple information sources, leading to a mixture of similarities. [sent-21, score-0.915]

13 Algorithms for nmf have recently found a lot of attention. [sent-23, score-0.188]

14 Only recently, [18] have also employed a nmf to perform clustering. [sent-25, score-0.249]

15 For the purpose of model selection, we employ a stability-based approach that has already been successfully applied to model se- lection problems in clustering (e. [sent-26, score-0.278]

16 Some work has been devoted to feature selection and weighting in clustering problems. [sent-30, score-0.457]

17 In [14, 10], Gaussian mixture model-based approaches to feature selection are introduced. [sent-32, score-0.166]

18 Similarity measurements represent a particularly generic form of providing input to a clustering algorithm. [sent-36, score-0.27]

19 In [1], an approach to learning the bandwidth parameter of an rbf-kernel for spectral clustering is studied. [sent-40, score-0.232]

20 The paper is organized as follows: section 2 introduces the nmf-based clustering method combined with a data-source weighting (section 3). [sent-41, score-0.379]

21 Section 4 discusses an out-of-sample extension allowing us to predict assignments and to employ the stability principle for model selection. [sent-42, score-0.261]

22 2 Clustering by Non-Negative Matrix Factorization Suppose we want to group a finite set of objects On := {o1 , . [sent-44, score-0.129]

23 Usually, there are multiple ways of measuring the similarity between different objects. [sent-48, score-0.288]

24 Such relations give rise to similarities sij := s(oi , oj ) 1 where we assume non-negativity sij ≥ 0, symmetry sji = sij , and boundedness sij < ∞. [sent-49, score-0.483]

25 For n objects, we summarize the similarity data in a n×n matrix S = (sij ) which is re-normalized to P = S/1t S1n , where 1n := (1, . [sent-50, score-0.238]

26 n The re-normalized similarities can be interpreted as the probability of the joint occurrence of objects i, j. [sent-54, score-0.196]

27 We aim now at finding a non-negative matrix factorization of P ∈ [0, 1]n×n into a product WHt of the n × k matrices W and H with non-negative entries for which additionally holds 1t W1k = 1 and Ht 1n = 1k , where k denotes the number of clusters. [sent-55, score-0.257]

28 Given a factorization of P in W and H, we can use the maximum a posteriori estimate, arg maxν hiν j wjν , to arrive at a hard assignment of objects to classes. [sent-59, score-0.289]

29 In order to obtain a factorization, we minimize the cross-entropy C(P WHt ) := − pij log i,j wiν hjν (1) ν which becomes minimal iff P = WHt 2 and is not convex in W and H together. [sent-60, score-0.268]

30 Then, by the convexity of − log x, we obtain − log ν wiν hjν ≤ − ν τνij log τνijjν , 1 In the following, we represent objects by their indices. [sent-63, score-0.16]

31 2 which yields the relaxed objective function: ˜ C(P WHt ) := − pij τνij log wiν hjν + τνij log τνij ≥ C(P WHt ). [sent-65, score-0.282]

32 (2) i,j,ν With this relaxation, we can employ an alternating minimization scheme for minimizing the bound on C. [sent-66, score-0.117]

33 until convergence, which produces a sequence of estimates (t) (t) (t) (t) τνij = wiν hjν (t) µ (t+1) , (t) wiν wiµ hjµ (t) = (t+1) pij τνij , hjν = j i pij τνij (t) a,b (3) pab τνab ˜ that converges to a local minimum of C. [sent-73, score-0.497]

34 We use the convention hjν = 0 whenever i,j pij τνij = 0. [sent-75, score-0.232]

35 3 Fusing Multiple Data Sources Measuring the similarity of objects in, say, L different ways results in L normalized similarity matrices P1 , . [sent-77, score-0.608]

36 For fixed α = (αl ) ∈ [0, 1]L , the aggregated and normalized similarity becomes the convex (l) ¯ combination P = l αl Pl . [sent-82, score-0.208]

37 Hence, pij is a mixture of individual similarities pij , i. [sent-83, score-0.587]

38 Again, we seek a good factorization of P by minimizing the cross-entropy, which then becomes min Eα C(Pl WHt ) (4) α ,W,H where Eα [fl ] = l αl fl denotes the expectation w. [sent-86, score-0.244]

39 Hence, we can employ a slightly modified, nested alternating minimization approach: Given fixed α , obtain estimates W and H using the relaxation of the last section. [sent-93, score-0.229]

40 The update equations change to (t+1) wiν = (l) (t) αl l pij τνij , j (t+1) hjν l = l αl αl (l) (t) i pij τνij (l) (t) i,j . [sent-94, score-0.464]

41 (5) pij τνij Given the current estimates of W and H, we could minimize the objective in equation (4) w. [sent-95, score-0.315]

42 The LP solution is very sparse since the optimal solutions for the linear program lie on the corners of the simplex in the positive orthant spanned by the constraints. [sent-101, score-0.142]

43 In particular, it lacks a means to control the sparseness of the coefficients α . [sent-102, score-0.129]

44 We, therefore, use a maximum entropy approach ([6]) for sparseness control: the entropy is upper bounded by log L and measures the sparseness of the vector α , since the lower the entropy the more peaked the distribution α can be. [sent-103, score-0.522]

45 This approach is reasonable as we actually want to combine multiple (not only identify one) information sources but the best fit in an unsupervised problem will be usually obtained by choosing only a single source. [sent-105, score-0.244]

46 (6) has an analytical solution, namely the Gibbs distribution αl ∝ exp(−cl /η) (7) For η → ∞ one obtains αl = 1/L, while for η → 0, the LP solution is recovered and the estimates become the sparser the more the individual cl differ. [sent-111, score-0.169]

47 Put differently, the parameter η enables us to explore the space of different similarity combinations. [sent-112, score-0.208]

48 The extension mechanism can be seen as in spirit of the Nystr¨ m extension (c. [sent-116, score-0.167]

49 (ii) The free parameters of the approach, the number of clusters k as well as the sparseness control parameter η, can be estimated using a re-sampling-based stability assessment that relies on the ability of an algorithm to generalize to previously unseen objects. [sent-120, score-0.423]

50 Out-of-Sample Extension: Suppose we have to predict class memberships for r (= n − ˜ m in the hold-out case) additional objects in the r × m matrix Sl . [sent-121, score-0.159]

51 We can express the weighted, normalized similarity (l) between a new object o and object i as pio := l αl soi / ˆ ˜ zoν for a new object o by zoν = ˆ ziν pio , ˆ (l) l,j αl soj . [sent-125, score-0.594]

52 These values can be obtained using the originally computed ziν which are weighted according to their similarity between object i and o. [sent-127, score-0.309]

53 Model Selection: The approach presented so far has two free parameters, the number of classes k and the sparseness penalty η. [sent-130, score-0.129]

54 In [9], a method for determining the number of classes has been introduced, that assesses the variability of clustering solutions. [sent-131, score-0.232]

55 The assessment can be regarded as a generalization of cross-validation, as it relies on the dissimilarity of solutions generated from multiple sub-samples. [sent-133, score-0.248]

56 25 −1 10 0 2 10 10 sparsity parameter η 3 10 0 (b) 1 2 3 4 5 data source index 6 (c) Figure 1: Results on the toy data set (1(a)): The stability assessment (1(b)) suggests the range η ∈ {101 , 102 , 5 · 102 }, which yield solutions matching the ground-truth. [sent-154, score-0.521]

57 , k}n , we define their disagreement as d(Y, Y ) = min π∈Sk 1 n n I{yi =π(yi )} (9) i=1 where Sk denotes the set of all permutation on sets of size k and IA is the indicator function on the expression A. [sent-160, score-0.273]

58 The measure quantifies the 0-1 loss after the labels have been permuted, so that the two clustering solutions are in the best possible agreement. [sent-161, score-0.3]

59 Following the approach in [9], we select the η, given a pre-specified range of admissible values, such that the average disagreement observed on B sub-samples is minimal. [sent-164, score-0.262]

60 In this sense, the entropy regularization mechanism guides the search for similarity combinations leading to stable grouping solutions. [sent-165, score-0.464]

61 Note that, multiple minima can occur and may yield solutions emphasizing different aspects of the data. [sent-166, score-0.159]

62 5 Experimental Results and Discussion The performance of our proposal is explored by analyzing toy and real world data. [sent-167, score-0.154]

63 For the stability assessment, different η have been chosen by η ∈ {10−3 , 10−2 , 10−1 , . [sent-170, score-0.163]

64 We compared our results with NCut [15] and Lee and Seung’s two NMF algorithms [11] (which measure the approximation error of the factorization with (i) the KL divergence and (ii) the squared Frobenius norm) applied to the uniform combination of similarities. [sent-172, score-0.214]

65 Toy Experiment: Figure 1(a) depicts a data set consisting of two nested rings, where the clustering task consists of identifying each ring as a class. [sent-173, score-0.35]

66 Figure 1(b) depicts the stability assessment, where we see very small disagreements for η ∈ {101 , 102 , 5 · 102 }. [sent-178, score-0.215]

67 Image segmentation example:3 The next task consists of finding a reasonable segmentation of the images depicted in figures 2(b) and 2(a). [sent-184, score-0.249]

68 For both images, we measured localized intensity histograms and additionally computed Gabor filter responses (e. [sent-185, score-0.12]

69 The resulting similarity matrices have been used as input for the nmf-based data fusion. [sent-192, score-0.24]

70 For the sub-sampling, m = 500 objects have been employed. [sent-193, score-0.129]

71 Figures 3(a) (for the shell image) and 3(b) (for the bird image) show the stability curves for these examples which exhibit minima for non-trivial η resulting in non-uniform α . [sent-194, score-0.163]

72 Figure 3(c) depicts the resulting segmentation generated using α indicated by the stability assessment, while 3(d) shows a segmentation result, where α is closer to the uniform distribution but the stability score for the corresponding η is low. [sent-195, score-0.558]

73 Again, we can see that weighting the different similarity measurements has a beneficial effect, since it leads to improved results. [sent-196, score-0.361]

74 3(e)) confirms that a non-trivial weighting is desirable here. [sent-198, score-0.115]

75 2(b), we observe similar behavior: the stability selected solution (fig. [sent-201, score-0.206]

76 In this example, the intensity information dominates the solution obtained on the uniformly combined similarities. [sent-204, score-0.162]

77 However, the texture information alone does not yield a sensible segmentation. [sent-205, score-0.095]

78 Only the non-trivial combination, where the influence of intensity information is decreased and that of the texture information is increased, gives rise to the desired result. [sent-206, score-0.107]

79 It is additionally noteworthy, that the prediction mechanism employed works rather well: In both examples, it has been able to generalize the segmentation from m = 500 to more than 3500 objects. [sent-207, score-0.249]

80 Since several of the 3588 proteins belong to more than one category, we extracted a subset of 1579 proteins exclusively belonging to one of the three categories cell cycle + DNA processing,transcription and protein fate. [sent-213, score-0.149]

81 Of the matrices used in [7], we employed a Gauss Kernel derived from gene expression profiles, one derived from Swiss-Waterman alignments, one obtained from comparisons of protein domains as well as two diffusion kernels derived from protein-protein interaction data. [sent-215, score-0.272]

82 Although the data is not very discriminative for the 3-class problem, the solutions generated on the data combined using the α for the most stable η lead to more than 10% improvement w. [sent-216, score-0.142]

83 The nmf results are slightly worse than those of NCut. [sent-220, score-0.188]

84 05 −3 10 −1 10 0 2 10 10 sparsity parameter η 3 −3 10 10 (a) −1 10 0 2 10 10 sparsity parameter η 3 10 (b) (c) (d) (e) (f) (g) (h) Figure 3: Stability plots and segmentation results for the images in 2(a) and 2(b) (see text). [sent-241, score-0.22]

85 ground-truth (the disagreement measure of section 4 is used) in comparison with the solution obtained using the least stable η-parameter. [sent-242, score-0.303]

86 The latter, however, was hardly better than random guessing by having an overall disagreement of more than 0. [sent-243, score-0.218]

87 For the most stable η, we observed a disagreement around 0. [sent-247, score-0.26]

88 NCut and the two nmf methods proposed in [11] lead to rates 0. [sent-252, score-0.188]

89 Note, that the clustering results are comparable with some of those obtained in [7], where the protein-protein interaction data has been used to construct a (supervised) classifier. [sent-256, score-0.265]

90 6 Conclusion This work introduced an approach to combining similarity data originating from multiple sources for grouping a set of objects. [sent-257, score-0.442]

91 Adopting a pairwise clustering perspective enables a smooth integration of multiple similarity measurements. [sent-258, score-0.524]

92 To be able to distinguish between desired and distractive information, a weighting mechanism is introduced leading to a potentially sparse convex combination of the measurements. [sent-259, score-0.273]

93 Here, an entropy constraint is employed to control the amount of sparseness actually allowed. [sent-260, score-0.278]

94 A stability-based model selection mechanism is used to select this free parameter. [sent-261, score-0.138]

95 We emphasize, that this procedure represents a completely unsupervised model selection strategy. [sent-262, score-0.149]

96 The experimental evaluation on toy and real world data demonstrates that our proposal yields meaningful partitions and is able to distinguish between desired and spurious structure in data. [sent-263, score-0.22]

97 Future work will focus on (i) improving the optimization of the proposed model, (ii) the integration of additional constraints and (iii) the introduction of a cluster-specific weighting mechanism. [sent-264, score-0.115]

98 On the convexity of some divergence measures based on entropy functions. [sent-278, score-0.173]

99 Kernelbased data fusion and its application to protein function prediction in yeast. [sent-317, score-0.113]

100 Distance metric learning with application to clustering with side-information. [sent-398, score-0.232]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('wht', 0.408), ('clustering', 0.232), ('pij', 0.232), ('disagreement', 0.218), ('similarity', 0.208), ('ncut', 0.188), ('nmf', 0.188), ('stability', 0.163), ('factorization', 0.16), ('pl', 0.153), ('ij', 0.144), ('hj', 0.134), ('assessment', 0.131), ('objects', 0.129), ('sparseness', 0.129), ('sources', 0.122), ('weighting', 0.115), ('sij', 0.104), ('fusing', 0.1), ('wi', 0.099), ('pio', 0.094), ('zo', 0.094), ('cl', 0.093), ('segmentation', 0.09), ('entropy', 0.088), ('selection', 0.075), ('toy', 0.072), ('nystr', 0.07), ('protein', 0.069), ('solutions', 0.068), ('similarities', 0.067), ('object', 0.066), ('nested', 0.066), ('volume', 0.063), ('lange', 0.063), ('mechanism', 0.063), ('employed', 0.061), ('lp', 0.057), ('mixture', 0.056), ('permutation', 0.055), ('intensity', 0.054), ('divergence', 0.054), ('texture', 0.053), ('extension', 0.052), ('depicts', 0.052), ('proposal', 0.052), ('roth', 0.052), ('nips', 0.051), ('zi', 0.051), ('objective', 0.05), ('fl', 0.05), ('multiple', 0.049), ('relaxation', 0.047), ('employ', 0.046), ('image', 0.046), ('sparsity', 0.045), ('comparisons', 0.044), ('unsupervised', 0.044), ('admissible', 0.044), ('fusion', 0.044), ('solution', 0.043), ('stable', 0.042), ('yield', 0.042), ('images', 0.04), ('proteins', 0.04), ('measurements', 0.038), ('alternating', 0.037), ('iff', 0.036), ('feature', 0.035), ('additionally', 0.035), ('distinguish', 0.035), ('pairwise', 0.035), ('weighted', 0.035), ('les', 0.034), ('minimizing', 0.034), ('grouping', 0.033), ('gene', 0.033), ('ii', 0.033), ('uniformly', 0.033), ('estimates', 0.033), ('interaction', 0.033), ('ct', 0.033), ('matrices', 0.032), ('combined', 0.032), ('wj', 0.032), ('mit', 0.032), ('sk', 0.031), ('meaningful', 0.031), ('convexity', 0.031), ('consideration', 0.031), ('histograms', 0.031), ('ways', 0.031), ('program', 0.031), ('selecting', 0.03), ('matrix', 0.03), ('introduced', 0.03), ('world', 0.03), ('leading', 0.03), ('procedure', 0.03), ('reasonable', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 79 nips-2005-Fusion of Similarity Data in Clustering

Author: Tilman Lange, Joachim M. Buhmann

Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1

2 0.16882835 178 nips-2005-Soft Clustering on Graphs

Author: Kai Yu, Shipeng Yu, Volker Tresp

Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1

3 0.1449441 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering

Author: Rong Jin, Feng Kang, Chris H. Ding

Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1

4 0.13802166 102 nips-2005-Kernelized Infomax Clustering

Author: David Barber, Felix V. Agakov

Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1

5 0.10505723 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models

Author: Purnamrita Sarkar, Andrew W. Moore

Abstract: This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in p-dimensional Euclidian latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional kd-trees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for non-linear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS co-publication data. We present a detailed version of this work in [1]. 1

6 0.097310282 98 nips-2005-Infinite latent feature models and the Indian buffet process

7 0.093233109 50 nips-2005-Convex Neural Networks

8 0.090707771 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

9 0.090600692 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

10 0.089828223 151 nips-2005-Pattern Recognition from One Example by Chopping

11 0.087032937 192 nips-2005-The Information-Form Data Association Filter

12 0.086248629 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection

13 0.080384001 159 nips-2005-Q-Clustering

14 0.080189951 131 nips-2005-Multiple Instance Boosting for Object Detection

15 0.080097482 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

16 0.079496756 126 nips-2005-Metric Learning by Collapsing Classes

17 0.079203047 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories

18 0.079037257 127 nips-2005-Mixture Modeling by Affinity Propagation

19 0.076324776 177 nips-2005-Size Regularized Cut for Data Clustering

20 0.074655816 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.26), (1, 0.093), (2, -0.072), (3, 0.052), (4, -0.184), (5, -0.053), (6, 0.024), (7, 0.05), (8, -0.116), (9, -0.08), (10, -0.102), (11, -0.007), (12, -0.036), (13, 0.034), (14, 0.109), (15, 0.111), (16, -0.003), (17, 0.003), (18, 0.017), (19, -0.085), (20, -0.043), (21, 0.137), (22, 0.044), (23, -0.049), (24, -0.035), (25, -0.109), (26, 0.087), (27, -0.017), (28, 0.006), (29, -0.042), (30, 0.003), (31, -0.043), (32, 0.051), (33, -0.033), (34, -0.009), (35, -0.059), (36, -0.03), (37, -0.041), (38, -0.1), (39, 0.086), (40, 0.103), (41, -0.053), (42, 0.019), (43, -0.018), (44, -0.067), (45, 0.013), (46, -0.047), (47, 0.024), (48, 0.033), (49, 0.124)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95272022 79 nips-2005-Fusion of Similarity Data in Clustering

Author: Tilman Lange, Joachim M. Buhmann

Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1

2 0.68211812 178 nips-2005-Soft Clustering on Graphs

Author: Kai Yu, Shipeng Yu, Volker Tresp

Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1

3 0.68142468 102 nips-2005-Kernelized Infomax Clustering

Author: David Barber, Felix V. Agakov

Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1

4 0.64591473 84 nips-2005-Generalization in Clustering with Unobserved Features

Author: Eyal Krupka, Naftali Tishby

Abstract: We argue that when objects are characterized by many attributes, clustering them on the basis of a relatively small random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove a finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. The scheme is demonstrated for collaborative filtering of users with movies rating as attributes. 1

5 0.62074393 159 nips-2005-Q-Clustering

Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes

Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.

6 0.53087264 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering

7 0.51001537 98 nips-2005-Infinite latent feature models and the Indian buffet process

8 0.5081014 127 nips-2005-Mixture Modeling by Affinity Propagation

9 0.49949041 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes

10 0.49423483 71 nips-2005-Fast Krylov Methods for N-Body Learning

11 0.48297757 192 nips-2005-The Information-Form Data Association Filter

12 0.47103232 151 nips-2005-Pattern Recognition from One Example by Chopping

13 0.47003418 86 nips-2005-Generalized Nonnegative Matrix Approximations with Bregman Divergences

14 0.45725203 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

15 0.43855813 131 nips-2005-Multiple Instance Boosting for Object Detection

16 0.42216468 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

17 0.42113319 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories

18 0.41498491 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

19 0.40987596 126 nips-2005-Metric Learning by Collapsing Classes

20 0.40869498 177 nips-2005-Size Regularized Cut for Data Clustering


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.047), (10, 0.029), (27, 0.027), (31, 0.053), (34, 0.098), (39, 0.027), (41, 0.341), (50, 0.013), (55, 0.036), (69, 0.039), (73, 0.073), (88, 0.085), (91, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92585355 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

Author: J. I. Alvarez-hamelin, Luca Dall'asta, Alain Barrat, Alessandro Vespignani

Abstract: We use the k-core decomposition to develop algorithms for the analysis of large scale complex networks. This decomposition, based on a recursive pruning of the least connected vertices, allows to disentangle the hierarchical structure of networks by progressively focusing on their central cores. By using this strategy we develop a general visualization algorithm that can be used to compare the structural properties of various networks and highlight their hierarchical structure. The low computational complexity of the algorithm, O(n + e), where n is the size of the network, and e is the number of edges, makes it suitable for the visualization of very large sparse networks. We show how the proposed visualization tool allows to find specific structural fingerprints of networks. 1

2 0.90987229 103 nips-2005-Kernels for gene regulatory regions

Author: Jean-philippe Vert, Robert Thurman, William S. Noble

Abstract: We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classifier built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identifies a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from five yeast species to recognize classes of cell-cycle regulated genes. Supplementary data is available at http://noble.gs.washington.edu/proj/pkernel. 1

same-paper 3 0.83687943 79 nips-2005-Fusion of Similarity Data in Clustering

Author: Tilman Lange, Joachim M. Buhmann

Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1

4 0.54284418 178 nips-2005-Soft Clustering on Graphs

Author: Kai Yu, Shipeng Yu, Volker Tresp

Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1

5 0.53918254 198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails

Author: Nebojsa Jojic, Vladimir Jojic, Christopher Meek, David Heckerman, Brendan J. Frey

Abstract: We introduce a new model of genetic diversity which summarizes a large input dataset into an epitome, a short sequence or a small set of short sequences of probability distributions capturing many overlapping subsequences from the dataset. The epitome as a representation has already been used in modeling real-valued signals, such as images and audio. The discrete sequence model we introduce in this paper targets applications in genetics, from multiple alignment to recombination and mutation inference. In our experiments, we concentrate on modeling the diversity of HIV where the epitome emerges as a natural model for producing relatively small vaccines covering a large number of immune system targets known as epitopes. Our experiments show that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains. We also discuss epitome designs that take into account uncertainty about Tcell cross reactivity and epitope presentation. In our experiments, we find that vaccine optimization is fairly robust to these uncertainties. 1

6 0.50032103 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

7 0.48741451 184 nips-2005-Structured Prediction via the Extragradient Method

8 0.48305857 23 nips-2005-An Application of Markov Random Fields to Range Sensing

9 0.48198822 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

10 0.48171771 114 nips-2005-Learning Rankings via Convex Hull Separation

11 0.4811303 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

12 0.47947633 169 nips-2005-Saliency Based on Information Maximization

13 0.47824878 105 nips-2005-Large-Scale Multiclass Transduction

14 0.47683936 127 nips-2005-Mixture Modeling by Affinity Propagation

15 0.4762066 46 nips-2005-Consensus Propagation

16 0.47265968 133 nips-2005-Nested sampling for Potts models

17 0.47222409 51 nips-2005-Correcting sample selection bias in maximum entropy density estimation

18 0.47111705 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

19 0.46972397 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

20 0.46946651 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations