nips nips2008 nips2008-55 knowledge-graph by maker-knowledge-mining

55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph


Source: pdf

Author: Deli Zhao, Xiaoou Tang

Abstract: Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk Abstract Detecting underlying clusters from large-scale data plays a central role in machine learning research. [sent-4, score-0.204]

2 In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. [sent-5, score-0.125]

3 To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. [sent-6, score-0.52]

4 More specifically, we propose to structurize a cluster using cycles in the associated subgraph. [sent-7, score-0.319]

5 A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. [sent-8, score-0.363]

6 The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. [sent-9, score-0.673]

7 Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. [sent-10, score-0.561]

8 Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. [sent-11, score-0.182]

9 In general, algorithms for clustering fall into two categories: partitional clustering and hierarchical clustering. [sent-15, score-0.306]

10 Hierarchical clustering proceeds by merging small clusters (agglomerative) or dividing large clusters into small ones (divisive). [sent-16, score-0.729]

11 The key point of agglomerative merging is the measurement of structural affinity between clusters. [sent-17, score-0.405]

12 This paper is devoted to handle the problem of data clustering via hierarchical agglomerative merging. [sent-18, score-0.249]

13 1 Related work The representative algorithms for partitional clustering are the traditional K-means and the latest Affinity Propagation (AP) [1]. [sent-20, score-0.156]

14 The AP algorithm addresses this issue by that each sample is initially viewed as an examplar and then examplar-to-member and member-to-examplar messages competitively transmit among all samples until a group of good examplars and their corresponding clusters emerge. [sent-22, score-0.254]

15 However, AP is computationally expensive to acquire clusters when the number of clusters is set in advance. [sent-24, score-0.408]

16 The classic algorithms for agglomerative clustering include three kinds of linkage algorithms: the single, complete, and average Linkages. [sent-26, score-0.264]

17 A novel agglomerative clustering algorithm was recently developed by Ma et al. [sent-28, score-0.224]

18 The core of their algorithm is to characterize the structures of clusters by means of the variational coding length of coding arbitrary two merged clusters against only coding them individually. [sent-30, score-0.555]

19 The complexity of the graph can be characterized by the collective dynamics of these basic cycles. [sent-32, score-0.128]

20 Spectral clustering algorithms are another group of popular algorithms developed in recent years. [sent-35, score-0.151]

21 The core of the algorithm is based on a new cluster descriptor that is essentially the integration of all cycles in the cluster by means of the Zeta function of the corresponding graph. [sent-45, score-0.51]

22 The Zeta function leads to a rational form of cyclic interactions of members in the cluster, where cycles are employed as primitive structures of clusters. [sent-46, score-0.21]

23 With the cluster descriptor, the popularity of a cluster is quantified as the global fusion of variations of the structural descriptor by the leaveone-out strategy in the cluster. [sent-47, score-0.716]

24 This definition of the popularity is expressible by diagonals of matrix inverse. [sent-48, score-0.278]

25 The structural inference between clusters may be performed with this popularity character. [sent-49, score-0.555]

26 Based on the novel popularity character, we propose a clustering method, named Zeta merging in the hierarchical agglomerative fashion. [sent-50, score-0.709]

27 As a subsidiary procedure for Zeta merging, we present a simple method called llinks, to find the initial set of clusters as the input of Zeta merging. [sent-52, score-0.231]

28 The Zell algorithm is the combination of Zeta merging and l-links. [sent-53, score-0.196]

29 2 Cyclizing a cluster with Zeta function Our ideas are mainly inspired by recent progress on study of collective dynamics of complex networks. [sent-55, score-0.153]

30 Experiments have validated that the stochastic states of a neuronal network is partially modulated by the information that cyclically transmits [6], and that proportions of cycles in a network is strongly relevant to the level of its complexity [7]. [sent-56, score-0.193]

31 Recent studies [8], [9] unveil that short cycles and Hamilton cycles in graphs play a critical role in the structural connectivity and community of a network. [sent-57, score-0.442]

32 These progress inspires us to formalize the structural complexity of a cluster by means of cyclic interactions of its members. [sent-58, score-0.303]

33 As illustrated in Figure 1, the relationship between samples can be characterized by the combination of all cycles in the graph. [sent-59, score-0.166]

34 Thus the structural complexity of the graph can be conveyed by the collective dynamics of these basic cycles. [sent-60, score-0.265]

35 Therefore, we may characterize a cluster with the global combination of structural cycles in the associated graph. [sent-61, score-0.426]

36 To do so, we need to model cycles of different lengths and combine them together as a structural descriptor. [sent-62, score-0.276]

37 1 Modeling cycles of equal length We here model cycles using the sum-product codes to structurize a cluster. [sent-64, score-0.363]

38 We apply the factorial codes to retrieve the structural information of cycle γ , thus defining −1 νγ = Wp →p1 k=1 Wpk →pk+1 , where Wpk →pk+1 is the (pk , pk+1 ) entry of W. [sent-73, score-0.132]

39 For the set K of all cycles of length , the sum-product code ν is written as:−1 ν = νγ = γ ∈K Wp γ ∈K Wpk →pk+1 . [sent-75, score-0.166]

40 The structural complexity of the graph is measured by these quantities of cycles of all different lengths, i. [sent-77, score-0.373]

41 2 Integrating cycles using Zeta function Zeta functions are widely applied in pure mathematics as tools of performing statistics in number theory, computing algebraic invariants in algebraic geometry, measuring complexities in dynamic systems. [sent-89, score-0.224]

42 Here ζz may be viewed as a kind of functional organization of all cycles in {K1 , . [sent-92, score-0.166]

43 3 Modeling popularity The popularity of a group of samples means how much these samples in the group is perceived to be a whole cluster. [sent-104, score-0.534]

44 To model the popularity, we need to formalize the complexity descriptor of the cluster C. [sent-105, score-0.217]

45 With the cyclic integration ζz in the preceding section, the complexity of the cluster can be measured by the polynomial entropy εC of logarithm form: ∞ z εC = ln ζz = ν = − ln det(I − zW). [sent-106, score-0.275]

46 (3) =1 The entropy εC will be employed to model the popularity of C. [sent-107, score-0.241]

47 As we analyze at the beginning of Section 2, cycles are strongly associated with structural communities of a network. [sent-108, score-0.276]

48 To model the popularity, therefore, we may investigate the variational information of cycles by successively leaving one member in C out. [sent-109, score-0.166]

49 More clearly, let χC denote the popularity character of C. [sent-110, score-0.32]

50 The popularity measure χC is a structural character of C , which can be exploited to handle problems in learning such as clustering, ranking, and classification. [sent-117, score-0.43]

51 4 Structural affinity measurement Given a set of initial clusters Cc = {C1 , . [sent-124, score-0.231]

52 , Cm } and the adjacency matrix P of the corresponding samples, the affinities between clusters or data groups can be measured via the corresponding popularity character χC . [sent-127, score-0.569]

53 Under our framework, an intuitive inference is that the two clusters that share the largest reciprocal popularity have the most consistent structures, meaning the two clusters are most relevant from the structural point of view. [sent-128, score-0.784]

54 The incremental popularity δχCi embodies the information gain of Ci after being merged with Cj . [sent-130, score-0.28]

55 3 Zeta merging We will develop the clustering algorithm using the structural character χC . [sent-133, score-0.51]

56 The automatic detection of the number of clusters are also taken into consideration. [sent-134, score-0.204]

57 1 Algorithm of Zeta merging With the criterion of structural affinity in Section 2. [sent-136, score-0.306]

58 4, it is straightforward to write the procedures of clustering in the hierarchical agglomerative way. [sent-137, score-0.249]

59 The algorithm may proceed from the pair {Ci , Cj } that has the largest incremental popularity δχCi ∪Cj , i. [sent-138, score-0.241]

60 In general, Zeta merging will proceed smoothly if the damping factor z is bounded as 0 < z < 2 1 1 . [sent-142, score-0.196]

61 P Algorithm 1 Zeta merging inputs: the weighted adjacency matrix P, the m initial clusters Cc = {C1 , . [sent-143, score-0.472]

62 while 1 do if t = mc then break; end if Search two clusters Ci and Cj such that {Ci , Cj } = arg max δχCi ∪Cj . [sent-148, score-0.231]

63 end while {Ci ,Cj }∈Cc The merits of Zeta merging are that it is free from the restriction of data distributions and is less affected by the factor of multiple scales in data. [sent-150, score-0.223]

64 Affinity propagation in Zeta merging proceeds on graph according to cyclic associations, requiring no specification on data distributions. [sent-151, score-0.344]

65 Moreover, the popularity character χC of each cluster is obtained from the averaged amount of variational information conveyed by εC . [sent-152, score-0.469]

66 What’s most important is that cycles rooted at each point in C globally interact with all other points. [sent-154, score-0.166]

67 Thus, the global descriptor εC and the popularity character χC are not sensitive to the local data scale at each point, leading to the robustness of Zeta merging against the variation of data scales. [sent-155, score-0.612]

68 2 Number of clusters in Zeta merging In some circumstances, it is needed to automatically detect the number of underlying clusters from given data. [sent-157, score-0.604]

69 This functionality can be reasonably realized in Zeta merging if each cluster corresponds to a diagonal block structure in P, up to some permutations. [sent-158, score-0.318]

70 The principle is that the minimum δχCi ∪Cj will be zero when a set of separable clusters emerges, behind which is the mathematical principle that inverting a block-diagonal matrix is equivalent to inverting the matrices on the diagonal blocks. [sent-159, score-0.231]

71 Then the number of clusters corresponds to the step at the jumping point. [sent-161, score-0.25]

72 4 The Zell algorithm An issue arising in Zeta merging is the determination of the initial set of clusters. [sent-162, score-0.223]

73 The same cluster is denoted by the markers with the same color of edges. [sent-167, score-0.147]

74 Then from yi , messages are passed among Si in the sense of minimum distances (or general dissimilarities), thus locally forming an acyclic directed subgraph at each point. [sent-173, score-0.139]

75 for i from 1 to mo do 2K Search 2K nearest neighbors of yi and form Si . [sent-192, score-0.153]

76 2 Graph construction The directional connectivity of l-links leads us to build a directed graph whose vertex yi directionally points to its K nearest neighbors. [sent-199, score-0.166]

77 Algorithm 3 Directed graph construction inputs: the sample set Cy , the number K of nearest neighbors, and a free parameter a ∈ (0, 1]. [sent-203, score-0.132]

78 2 1 Estimate the parameter σ by σ 2 = − mo lna yi ∈Cy yj ∈S 3 [distance (yi , yj )] . [sent-204, score-0.164]

79 3 Our algorithm for data clustering is in effect to perform Zeta merging on the initial set of small clusters derived from l-links. [sent-210, score-0.552]

80 The differences between the sizes of different clusters are large. [sent-219, score-0.204]

81 Zeta merging may also be combined with K-means and Affinity Propagation for clustering. [sent-230, score-0.196]

82 So, they can be employed to generate initial clusters as the input of Zeta merging. [sent-232, score-0.231]

83 5 Experiment Experiments are conducted on clustering toy data, hand-written digits and cropped faces from captured images, and segmenting images to test the performance of Zell. [sent-233, score-0.226]

84 For fair comparison, we run Ncuts on the graph whose parameters are set the same with the graph used by Zell. [sent-241, score-0.14]

85 As shown in Figures 3 (b) and (c), the Zell algorithm accurately detects the underlying clusters out. [sent-247, score-0.204]

86 Particularly, Zell is capable of simultaneously differentiating the cluster with five members and the cluster with 1500 members. [sent-248, score-0.244]

87 Figures 3 (d) and (e) show the curves of minimum variational δχ (for the data in Figure 3 (c)) where the number of clusters is determined at the largest gap of the curve in the stable part. [sent-250, score-0.231]

88 2 fails to automatically detect the number of clusters for the data in Figure 3 (a), because the corresponding P matrix has no clear diagonal block structures. [sent-252, score-0.204]

89 The last row shows the numbers of clusters automatically detected by Zell on the five data sets. [sent-256, score-0.234]

90 2 On imagery data The imagery patterns we adopt are the hand-written digits in the MNIST and USPS databases and the facial images in the ORL and FRGC (Face Recognition Grand Challenge, http://www. [sent-287, score-0.205]

91 Such persons are selected as another group of clusters if the number of faces for each person is no less than forty. [sent-297, score-0.23]

92 In particular, the performance of Zell is encouraging on the FRGC data set which has the largest numbers of clusters and samples. [sent-304, score-0.204]

93 3 Image segmentation We show several examples of the application of Zell on image segmentation from the Berkeley (I −I )2 segmentation database. [sent-313, score-0.225]

94 The key point of the algorithm is the integration of structural cycles but Zeta function of a graph. [sent-321, score-0.308]

95 A popularity character of measuring the compactness of the cluster is defined via Zeta function, on which the core of Zell for agglomerative clustering is based. [sent-322, score-0.666]

96 0002 Figure 4: Image segmentation by Zell from the Berkeley segmentation database. [sent-337, score-0.134]

97 approach for finding initial small clusters is presented, which is based on the merging of local links among samples. [sent-338, score-0.427]

98 Experimental results on toy data, hand-written digits, facial images, and image segmentation show the competitive performance of Zell. [sent-340, score-0.169]

99 (2007) On short cycles and their role in network structure. [sent-391, score-0.166]

100 (2005) Loops of any size and Hamilton cycles in random scale-free networks. [sent-396, score-0.166]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zeta', 0.557), ('zell', 0.356), ('ci', 0.252), ('popularity', 0.241), ('cj', 0.234), ('clusters', 0.204), ('merging', 0.196), ('cycles', 0.166), ('clustering', 0.125), ('cluster', 0.122), ('structural', 0.11), ('nmi', 0.108), ('frgc', 0.108), ('agglomerative', 0.099), ('ncuts', 0.093), ('cc', 0.09), ('zw', 0.081), ('ap', 0.08), ('character', 0.079), ('orl', 0.077), ('graph', 0.07), ('nity', 0.069), ('descriptor', 0.068), ('segmentation', 0.067), ('af', 0.066), ('usps', 0.063), ('mo', 0.062), ('cy', 0.054), ('imagery', 0.052), ('mnist', 0.051), ('jumping', 0.046), ('sfrgc', 0.046), ('wpk', 0.046), ('adjacency', 0.045), ('cyclic', 0.044), ('linkage', 0.04), ('yj', 0.04), ('toy', 0.039), ('directed', 0.039), ('merged', 0.039), ('facial', 0.039), ('pk', 0.039), ('digits', 0.039), ('seed', 0.037), ('diagonals', 0.037), ('coding', 0.036), ('attentional', 0.035), ('nearest', 0.035), ('neighbors', 0.034), ('propagation', 0.034), ('si', 0.034), ('integration', 0.032), ('pi', 0.032), ('collective', 0.031), ('cyclizing', 0.031), ('determinantal', 0.031), ('partitional', 0.031), ('pref', 0.031), ('savchenko', 0.031), ('structurize', 0.031), ('ymo', 0.031), ('digit', 0.031), ('detected', 0.03), ('hong', 0.03), ('algebraic', 0.029), ('manifolds', 0.028), ('global', 0.028), ('free', 0.027), ('initial', 0.027), ('distances', 0.027), ('conveyed', 0.027), ('linkages', 0.027), ('mc', 0.027), ('minimum', 0.027), ('complexity', 0.027), ('face', 0.026), ('group', 0.026), ('quanti', 0.025), ('ln', 0.025), ('wright', 0.025), ('reciprocal', 0.025), ('hamilton', 0.025), ('lossy', 0.025), ('markers', 0.025), ('hierarchical', 0.025), ('variations', 0.025), ('image', 0.024), ('ep', 0.024), ('messages', 0.024), ('cuts', 0.024), ('newman', 0.023), ('xp', 0.023), ('spectral', 0.023), ('images', 0.023), ('named', 0.023), ('normalized', 0.023), ('yi', 0.022), ('cycle', 0.022), ('detecting', 0.022), ('passing', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

Author: Deli Zhao, Xiaoou Tang

Abstract: Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1

2 0.11565187 117 nips-2008-Learning Taxonomies by Dependence Maximization

Author: Matthew Blaschko, Arthur Gretton

Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1

3 0.11333024 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

Author: Lukas Kroc, Ashish Sabharwal, Bart Selman

Abstract: We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature. This methodology scales up to several hundred variables. 1

4 0.098651305 48 nips-2008-Clustering via LP-based Stabilities

Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas

Abstract: A novel center-based clustering algorithm is proposed in this paper. We first formulate clustering as an NP-hard linear integer program and we then use linear programming and the duality theory to derive the solution of this optimization problem. This leads to an efficient and very general algorithm, which works in the dual domain, and can cluster data based on an arbitrary set of distances. Despite its generality, it is independent of initialization (unlike EM-like methods such as K-means), has guaranteed convergence, can automatically determine the number of clusters, and can also provide online optimality bounds about the quality of the estimated clustering solutions. To deal with the most critical issue in a centerbased clustering algorithm (selection of cluster centers), we also introduce the notion of stability of a cluster center, which is a well defined LP-based quantity that plays a key role to our algorithm’s success. Furthermore, we also introduce, what we call, the margins (another key ingredient in our algorithm), which can be roughly thought of as dual counterparts to stabilities and allow us to obtain computationally efficient approximations to the latter. Promising experimental results demonstrate the potentials of our method.

5 0.096752428 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

Author: Tae-kyun Kim, Roberto Cipolla

Abstract: We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks. 1

6 0.086062238 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

7 0.081506968 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering

8 0.07571099 218 nips-2008-Spectral Clustering with Perturbed Data

9 0.067787431 107 nips-2008-Influence of graph construction on graph-based clustering measures

10 0.066715322 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

11 0.065066181 143 nips-2008-Multi-label Multiple Kernel Learning

12 0.061175689 228 nips-2008-Support Vector Machines with a Reject Option

13 0.058094088 62 nips-2008-Differentiable Sparse Coding

14 0.057024106 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

15 0.056666423 29 nips-2008-Automatic online tuning for fast Gaussian summation

16 0.056549072 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

17 0.052695729 44 nips-2008-Characteristic Kernels on Groups and Semigroups

18 0.050081395 194 nips-2008-Regularized Learning with Networks of Features

19 0.04889353 202 nips-2008-Robust Regression and Lasso

20 0.048726141 165 nips-2008-On the Reliability of Clustering Stability in the Large Sample Regime


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.152), (1, -0.06), (2, 0.018), (3, 0.035), (4, 0.051), (5, -0.073), (6, 0.032), (7, -0.096), (8, 0.028), (9, -0.147), (10, 0.05), (11, -0.018), (12, -0.007), (13, 0.017), (14, -0.1), (15, 0.081), (16, -0.154), (17, -0.024), (18, 0.082), (19, -0.044), (20, 0.032), (21, 0.066), (22, -0.024), (23, 0.13), (24, 0.026), (25, -0.044), (26, 0.012), (27, -0.027), (28, -0.012), (29, 0.009), (30, 0.017), (31, -0.017), (32, 0.1), (33, 0.008), (34, 0.057), (35, 0.124), (36, -0.02), (37, 0.013), (38, 0.052), (39, 0.039), (40, 0.04), (41, -0.11), (42, -0.028), (43, 0.079), (44, -0.055), (45, 0.082), (46, 0.051), (47, 0.036), (48, 0.091), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96366757 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

Author: Deli Zhao, Xiaoou Tang

Abstract: Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1

2 0.80450702 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering

Author: Shai Ben-David, Margareta Ackerman

Abstract: Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, ([1]) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in [1]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg’s axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg’s axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time. 1

3 0.78392059 117 nips-2008-Learning Taxonomies by Dependence Maximization

Author: Matthew Blaschko, Arthur Gretton

Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1

4 0.66977084 48 nips-2008-Clustering via LP-based Stabilities

Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas

Abstract: A novel center-based clustering algorithm is proposed in this paper. We first formulate clustering as an NP-hard linear integer program and we then use linear programming and the duality theory to derive the solution of this optimization problem. This leads to an efficient and very general algorithm, which works in the dual domain, and can cluster data based on an arbitrary set of distances. Despite its generality, it is independent of initialization (unlike EM-like methods such as K-means), has guaranteed convergence, can automatically determine the number of clusters, and can also provide online optimality bounds about the quality of the estimated clustering solutions. To deal with the most critical issue in a centerbased clustering algorithm (selection of cluster centers), we also introduce the notion of stability of a cluster center, which is a well defined LP-based quantity that plays a key role to our algorithm’s success. Furthermore, we also introduce, what we call, the margins (another key ingredient in our algorithm), which can be roughly thought of as dual counterparts to stabilities and allow us to obtain computationally efficient approximations to the latter. Promising experimental results demonstrate the potentials of our method.

5 0.62362885 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

Author: Lukas Kroc, Ashish Sabharwal, Bart Selman

Abstract: We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature. This methodology scales up to several hundred variables. 1

6 0.56261837 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

7 0.55643904 29 nips-2008-Automatic online tuning for fast Gaussian summation

8 0.53710318 107 nips-2008-Influence of graph construction on graph-based clustering measures

9 0.50737453 218 nips-2008-Spectral Clustering with Perturbed Data

10 0.4601135 165 nips-2008-On the Reliability of Clustering Stability in the Large Sample Regime

11 0.45128775 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

12 0.38871691 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

13 0.36702842 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring

14 0.34335157 44 nips-2008-Characteristic Kernels on Groups and Semigroups

15 0.33443612 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees

16 0.32784706 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

17 0.31936905 69 nips-2008-Efficient Exact Inference in Planar Ising Models

18 0.31767797 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

19 0.30914715 249 nips-2008-Variational Mixture of Gaussian Process Experts

20 0.30507553 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.038), (7, 0.084), (12, 0.04), (15, 0.018), (26, 0.288), (28, 0.195), (57, 0.084), (59, 0.017), (63, 0.03), (71, 0.017), (77, 0.041), (78, 0.016), (81, 0.011), (83, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79291123 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

Author: Deli Zhao, Xiaoou Tang

Abstract: Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1

2 0.76769054 102 nips-2008-ICA based on a Smooth Estimation of the Differential Entropy

Author: Lev Faivishevsky, Jacob Goldberger

Abstract: In this paper we introduce the MeanNN approach for estimation of main information theoretic measures such as differential entropy, mutual information and divergence. As opposed to other nonparametric approaches the MeanNN results in smooth differentiable functions of the data samples with clear geometrical interpretation. Then we apply the proposed estimators to the ICA problem and obtain a smooth expression for the mutual information that can be analytically optimized by gradient descent methods. The improved performance of the proposed ICA algorithm is demonstrated on several test examples in comparison with state-ofthe-art techniques. 1

3 0.63240558 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

4 0.63233763 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1

5 0.63136297 4 nips-2008-A Scalable Hierarchical Distributed Language Model

Author: Andriy Mnih, Geoffrey E. Hinton

Abstract: Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPLMs is their extremely long training and testing times. Morin and Bengio have proposed a hierarchical language model built around a binary tree of words, which was two orders of magnitude faster than the nonhierarchical model it was based on. However, it performed considerably worse than its non-hierarchical counterpart in spite of using a word tree created using expert knowledge. We introduce a fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data. We then show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models. 1

6 0.62673658 231 nips-2008-Temporal Dynamics of Cognitive Control

7 0.62673366 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

8 0.62557125 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

9 0.62545598 66 nips-2008-Dynamic visual attention: searching for coding length increments

10 0.62533325 31 nips-2008-Bayesian Exponential Family PCA

11 0.6231032 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

12 0.6227029 200 nips-2008-Robust Kernel Principal Component Analysis

13 0.62151736 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

14 0.62072724 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

15 0.6204766 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

16 0.62039346 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

17 0.62025565 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms

18 0.62005633 211 nips-2008-Simple Local Models for Complex Dynamical Systems

19 0.61990023 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

20 0.61937118 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds