nips nips2006 nips2006-123 knowledge-graph by maker-knowledge-mining

123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding


Source: pdf

Author: Dengyong Zhou, Jiayuan Huang, Bernhard Schölkopf

Abstract: We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classification on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. [sent-11, score-0.564]

2 Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs. [sent-13, score-0.234]

3 1 Introduction In machine learning problem settings, we generally assume pairwise relationships among the objects of our interest. [sent-14, score-0.125]

4 An object set endowed with pairwise relationships can be naturally illustrated as a graph, in which the vertices represent the objects, and any two vertices that have some kind of relationship are joined together by an edge. [sent-15, score-0.376]

5 It depends on whether the pairwise relationships among objects are symmetric or not. [sent-17, score-0.125]

6 A hyperlink can be thought of as a directed edge because given an arbitrary hyperlink we cannot expect that there certainly exists an inverse one, that is, the hyperlink based relationships are asymmetric [20]. [sent-20, score-0.283]

7 However, in many real-world problems, representing a set of complex relational objects as undirected or directed graphs is not complete. [sent-21, score-0.218]

8 For illustrating this point of view, let us consider a problem of grouping a collection of articles into different topics. [sent-22, score-0.112]

9 One may construct an undirected graph in which two vertices are joined together by an edge if there is at least one common author of their corresponding articles (Figure 1), and then an undirected graph based clustering approach is applied, e. [sent-24, score-0.737]

10 The undirected graph may be further embellished by assigning to each edge a weight equal to the Figure 1: Hypergraph vs. [sent-27, score-0.192]

11 The entry (vi , ej ) is set to 1 if ej is an author of article vi , and 0 otherwise. [sent-30, score-0.11]

12 Middle: an undirected graph in which two articles are joined together by an edge if there is at least one author in common. [sent-31, score-0.398]

13 This graph cannot tell us whether the same person is the author of three or more articles or not. [sent-32, score-0.27]

14 Right: a hypergraph which completely illustrates the complex relationships among authors and articles. [sent-33, score-0.758]

15 The above method may sound natural, but within its graph representation we obviously miss the information on whether the same person joined writing three or more articles or not. [sent-35, score-0.294]

16 Such information loss is unexpected because the articles by the same person likely belong to the same topic and hence the information is useful for our grouping task. [sent-36, score-0.143]

17 A natural way of remedying the information loss issue occurring in the above methodology is to represent the data as a hypergraph instead. [sent-37, score-0.727]

18 A hypergraph is a graph in which an edge can connect more than two vertices [2]. [sent-38, score-0.943]

19 In what follows, we shall unifiedly refer to the usual undirected or directed graphs as simple graphs. [sent-40, score-0.18]

20 It is obvious that a simple graph is a special kind of hypergraph with each edge containing two vertices only. [sent-42, score-0.943]

21 In the problem of clustering articles stated before, it is quite straightforward to construct a hypergraph with the vertices representing the articles, and the edges the authors (Figure 1). [sent-43, score-0.995]

22 Each edge contains all articles by its corresponding author. [sent-44, score-0.148]

23 A powerful technique for partitioning simple graphs is spectral clustering. [sent-51, score-0.217]

24 Therefore, we generalize spectral clustering techniques to hypergraphs, more specifically, the normalized cut approach of [16]. [sent-52, score-0.366]

25 Moreover, as in the case of simple graphs, a real-valued relaxation of the hypergraph normalized cut criterion leads to the eigendecomposition of a positive semidefinite matrix, which can be regarded as an analogue of the so-called Laplacian for simple graphs (cf. [sent-53, score-0.99]

26 [5]), and hence we suggestively call it the hypergraph Laplacian. [sent-54, score-0.729]

27 Consequently, we develop algorithms for hypergraph embedding and transductive inference based on the hypergraph Laplacian. [sent-55, score-1.524]

28 There have actually existed a large amount of literature on hypergraph partitioning, which arises from a variety of practical problems, such as partitioning circuit netlists [11], clustering categorial data [9], and image segmentation [1]. [sent-56, score-0.812]

29 Unlike the present work however, they generally transformed hypergraphs to simple ones by using the heuristics we discussed in the beginning or other domain-specific heuristics, and then applied simple graph based spectral clustering techniques. [sent-57, score-0.508]

30 We first introduce some basic notions on hypergraphs in Section 2. [sent-62, score-0.234]

31 In Section 3, we generalize the simple graph normalized cut to hypergraphs. [sent-63, score-0.276]

32 As shown in Section 4, the hypergraph normalized cut has an elegant probabilistic interpretation based on a random walk naturally associated with a hypergraph. [sent-64, score-0.952]

33 In Section 5, we introduce the real-valued relaxation to approximately obtain hypergraph normalized cuts, and also the hypergraph Laplacian derived from this relaxation. [sent-65, score-1.495]

34 In section 6, we develop a spectral hypergraph embedding technique based on the hypergraph Laplacian. [sent-66, score-1.555]

35 In Section 7, we address transductive inference on hypergraphs, this is, classifying the vertices of a hypergraph provided that some of its vertices have been labeled. [sent-67, score-1.013]

36 Then we call G = (V, E) a hypergraph with the vertex set V and the hyperedge set E. [sent-70, score-0.946]

37 A hyperedge containing just two vertices is a simple graph edge. [sent-71, score-0.385]

38 A weighted hypergraph is a hypergraph that has a positive number w(e) associated with each hyperedge e, called the weight of hyperedge e. [sent-72, score-1.756]

39 A hyperedge e is said to be incident with a vertex v when v ∈ e. [sent-74, score-0.284]

40 For a hyperedge e ∈ E, its degree is defined to be δ(e) = |e|. [sent-77, score-0.178]

41 We say that there is a hyperpath between vertices v1 and vk when there is an alternative sequence of distinct vertices and hyperedges v1 , e1 , v2 , e2 , . [sent-78, score-0.369]

42 A hypergraph is connected if there is a path for every pair of vertices. [sent-82, score-0.7]

43 In what follows, the hypergraphs we mention are always assumed to be connected. [sent-83, score-0.234]

44 A hypergraph G can be represented by a |V | × |E| matrix H with entries h(v, e) = 1 if v ∈ e and 0 otherwise, called the incidence matrix of G. [sent-84, score-0.7]

45 Let Dv and De denote the diagonal matrices containing the vertex and hyperedge degrees respectively, and let W denote the diagonal matrix containing the weights of hyperedges. [sent-86, score-0.246]

46 Then the adjacency matrix A of hypergraph G is defined as A = HW H T − Dv , where H T is the transpose of H. [sent-87, score-0.7]

47 3 Normalized hypergraph cut For a vertex subset S ⊂ V, let S c denote the compliment of S. [sent-88, score-0.904]

48 A cut of a hypergraph G = (V, E, w) is a partition of V into two parts S and S c . [sent-89, score-0.866]

49 We say that a hyperedge e is cut if it is incident with the vertices in S and S c simultaneously. [sent-90, score-0.467]

50 Given a vertex subset S ⊂ V, define the hyperedge boundary ∂S of S to be a hyperedge set which consists of hyperedges which are cut, i. [sent-91, score-0.483]

51 ∂S := {e ∈ E|e ∩ S = ∅, e ∩ S c = ∅}, and define the volume vol S of S to be the sum of the degrees of the vertices in S, that is,vol S := v∈S d(v). [sent-93, score-0.51]

52 Moreover, define the volume of ∂S by vol ∂S := w(e) e∈∂S |e ∩ S| |e ∩ S c | . [sent-94, score-0.395]

53 Then, when a hyperedge e is cut, there are |e ∩ S| |e ∩ S c | subedges are cut, and hence a single sum term in Equation (1) is the sum of the weights over the subedges which are cut. [sent-102, score-0.236]

54 Naturally, we try to obtain a partition in which the connection among the vertices in the same cluster is dense while the connection between two clusters is sparse. [sent-103, score-0.145]

55 (2) argmin c(S) := vol ∂S vol S vol S c ∅=S⊂V For a simple graph, |e ∩ S| = |e ∩ S c | = 1, and δ(e) = 2. [sent-105, score-1.185]

56 Thus the right-hand side of Equation (2) reduces to the simple graph normalized cut [16] up to a factor 1/2. [sent-106, score-0.276]

57 In what follows, we explain the hypergraph normalized cut in terms of random walks. [sent-107, score-0.884]

58 4 Random walk explanation We associate each hypergraph with a natural random walk which has the transition rule as follows. [sent-108, score-0.836]

59 Given the current position u ∈ V, first choose a hyperedge e over all hyperedges incident with u with the probability proportional to w(e), and then choose a vertex v ∈ e uniformly at random. [sent-109, score-0.343]

60 Let P denote the transition probability matrix of this hypergraph random walk. [sent-111, score-0.7]

61 From Equation (4), we have π(v), (5) v∈V that is, the ratio vol S/ vol V is the probability with which the random walk occupies some vertex in S. [sent-116, score-0.926]

62 It is worth pointing out that the random walk view is consistent with that for the simple graph normalized cut [13]. [sent-119, score-0.344]

63 The consistency means that our generalization of the normalized cut approach from simple graphs to hypergraphs is reasonable. [sent-120, score-0.477]

64 As in [20] where the spectral clustering methodology is generalized from undirected to directed simple graphs, we may consider generalizing the present approach to directed hypergraphs [8]. [sent-131, score-0.621]

65 A directed hypergraph is a hypergraph in which each hyperedge e is an ordered pair (X, Y ) where X ⊆ V is the tail of e and Y ⊆ V \ X is the head. [sent-132, score-1.635]

66 Directed hypergraphs have been used to model various practical problems from biochemical networks [15] to natural language parsing [12]. [sent-133, score-0.234]

67 6 Spectral hypergraph embedding As in the simple graph case [4, 10], it is straightforward to extend the spectral hypergraph clustering approach to k-way partitioning. [sent-134, score-1.715]

68 We may obtain a k-way vol ∂Vi k partition by minimizing c(V1 , · · · , Vk ) = i=1 over all k-way partitions. [sent-136, score-0.425]

69 Similarly, vol Vi the combinatorial optimization problem can be relaxed into a real-valued one, of which the solution can be any orthogonal basis of the linear space spanned by the eigenvectors of ∆ associated with the k smallest eigenvalues. [sent-137, score-0.462]

70 Then k T −1 ri (Dv − HW De H T )ri c(V1 , · · · , Vk ) = TD r ri v i i=1 −1/2 Define si = Dv ri , and fi = si / si , where · denotes the usual Euclidean norm. [sent-145, score-0.132]

71 And then the row vectors of X are regarded as the representations of the graph vertices in k-dimensional Euclidian space. [sent-156, score-0.207]

72 Those vectors corresponding to the vertices are generally expected to be well separated, and consequently we can obtain a good partition simply by running k-means on them once. [sent-157, score-0.145]

73 [18] has resorted to a semidefinite relaxation model for the k-way normalized cut instead of the relatively loose spectral relaxation, and then obtained a more accurate solution. [sent-158, score-0.345]

74 7 Transductive inference We have established algorithms for spectral hypergraph clustering and embedding. [sent-161, score-0.882]

75 Specifically, given a hypergraph G = (V, E, w), the vertices in a subset S ⊂ V have labels in L = {1, −1}, our task is to predict the labels of the remaining unlabeled vertices. [sent-163, score-0.815]

76 Basically, we should try to assign the same label to all vertices contained in the same hyperedge. [sent-164, score-0.115]

77 It is actually straightforward to derive a transductive inference approach from a clustering scheme. [sent-165, score-0.151]

78 Since in general normalized cuts are thought to be superior to mincuts, the transductive inference approach that we used in the later experiments is built on the above spectral hypergraph clustering method. [sent-168, score-1.013]

79 In our experiments, we constructed a hypergraph for each dataset, where attribute values were regarded as hyperedges. [sent-176, score-0.728]

80 We also constructed a simple graph for each dataset, and the simple graph spectral clustering based approach [19] was then used as the baseline. [sent-179, score-0.366]

81 Those simple graphs were constructed in the way discussed in the beginning of Section 1, which is essentially to define pairwise relationships among the objects by the adjacency matrices of hypergraphs. [sent-180, score-0.184]

82 We embedded those animals into Euclidean space by using the eigenvectors of the hypergraph Laplacian associated with the smallest eigenvalues (Figure 2). [sent-185, score-0.845]

83 15 seasnake carp dolphin stingray dogfish seahorse −0. [sent-197, score-0.196]

84 1 bear gorilla girl toad frog seal pussycat pony lion pitviper deer seasnake tortoise platypus mink stingray tuatara newt vampire squirrel slowworm dolphin seahorse sealion carp dogfish ostrich kiwi flamingo hawk penguin dove gull swan −0. [sent-204, score-0.947]

85 15 flea gnat octopus pitviper crab clam newt slowworm starfish mink frog tuatara lion scorpion lobster toad bear pussycat platypus worm deer girl penguin pony cavy gorilla tortoise squirrel kiwi flea ladybird gull vampire ostrich hawk swan gnat housefly honeybee wasp flamingo dove −0. [sent-210, score-1.115]

86 Left panel: the eigenvectors with the 2nd and 3rd smallest eigenvalues; right panel: the eigenvectors with the 3rd and 4th smallest eigenvalues. [sent-222, score-0.134]

87 Note that dolphin is between class 1 (denoted by ◦) containing the animals having milk and living on land, and class 4 (denoted by ) containing the animals living in sea. [sent-223, score-0.325]

88 (a)-(c) Results from both the hypergraph based approach and the simple graph based approach. [sent-259, score-0.792]

89 mapped to the positions between class 1 consisting of the animals having milk and living on land, and class 4 consisting of the animals living in sea. [sent-261, score-0.274]

90 The third task is text categorization on a modified 20-newsgroup dataset with binary occurrence values for 100 words across 16242 articles (see http://www. [sent-267, score-0.112]

91 The articles belong to 4 different topics corresponding to the highest level of the original 20 newsgroups, with the sizes being 4605, 3519, 2657 and 5461 respectively. [sent-271, score-0.112]

92 The final task is to guess the letter categories with the letter dataset, in which each instance is described by 16 primitive numerical attributes (statistical moments and edge counts). [sent-272, score-0.178]

93 The results show that the hypergraph based method is consistently better than the baseline. [sent-278, score-0.7]

94 It is interesting that the α influences the baseline much more than the hypergraph based approach. [sent-280, score-0.7]

95 9 Conclusion We generalized spectral clustering techniques to hypergraphs, and developed algorithms for hypergraph embedding and transductive inference. [sent-281, score-1.006]

96 It might be more sensible to model them as hypergraphs instead such that complex interactions will be completely taken into account. [sent-286, score-0.234]

97 Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. [sent-355, score-0.207]

98 New spectral methods for ratio cut partitioning and clustering. [sent-361, score-0.294]

99 Propagating distributions on a hypergraph by dual information regularization. [sent-408, score-0.7]

100 On semidefinite relaxation for normalized k-cut and connections to spectral clustering. [sent-418, score-0.209]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hypergraph', 0.7), ('vol', 0.395), ('hypergraphs', 0.234), ('hyperedge', 0.178), ('cut', 0.136), ('dv', 0.126), ('vertices', 0.115), ('spectral', 0.114), ('articles', 0.112), ('graph', 0.092), ('transductive', 0.083), ('vk', 0.08), ('animals', 0.078), ('clustering', 0.068), ('vertex', 0.068), ('walk', 0.068), ('undirected', 0.064), ('graphs', 0.059), ('hyperedges', 0.059), ('joined', 0.059), ('letter', 0.058), ('relationships', 0.058), ('directed', 0.057), ('hw', 0.054), ('dolphin', 0.051), ('normalized', 0.048), ('relaxation', 0.047), ('vi', 0.046), ('ri', 0.044), ('hyperlink', 0.044), ('milk', 0.044), ('partitioning', 0.044), ('embedding', 0.041), ('incident', 0.038), ('seal', 0.038), ('objects', 0.038), ('living', 0.037), ('edge', 0.036), ('author', 0.035), ('smallest', 0.034), ('eigenvectors', 0.033), ('erent', 0.032), ('person', 0.031), ('laplacian', 0.031), ('partition', 0.03), ('pairwise', 0.029), ('carp', 0.029), ('cavy', 0.029), ('clam', 0.029), ('dogfish', 0.029), ('dove', 0.029), ('flamingo', 0.029), ('flea', 0.029), ('girl', 0.029), ('gnat', 0.029), ('gorilla', 0.029), ('gull', 0.029), ('hawk', 0.029), ('honeybee', 0.029), ('housefly', 0.029), ('kiwi', 0.029), ('ladybird', 0.029), ('lobster', 0.029), ('mink', 0.029), ('newt', 0.029), ('ostrich', 0.029), ('penguin', 0.029), ('pitviper', 0.029), ('platypus', 0.029), ('pony', 0.029), ('pussycat', 0.029), ('scorpion', 0.029), ('seahorse', 0.029), ('sealion', 0.029), ('seasnake', 0.029), ('seawasp', 0.029), ('slowworm', 0.029), ('starfish', 0.029), ('stingray', 0.029), ('subedges', 0.029), ('suggestively', 0.029), ('swan', 0.029), ('toad', 0.029), ('tortoise', 0.029), ('tuatara', 0.029), ('vampire', 0.029), ('wasp', 0.029), ('worm', 0.029), ('zoo', 0.029), ('article', 0.029), ('attribute', 0.028), ('methodology', 0.027), ('euclidean', 0.027), ('categorical', 0.027), ('attributes', 0.026), ('frog', 0.025), ('remp', 0.025), ('crab', 0.025), ('deer', 0.025), ('lion', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

Author: Dengyong Zhou, Jiayuan Huang, Bernhard Schölkopf

Abstract: We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classification on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs. 1

2 0.15517956 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

3 0.12131169 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi

Abstract: One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary. keywords: Clustering, Semi-Supervised Learning 1

4 0.097780123 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

5 0.096414149 163 nips-2006-Prediction on a Graph with a Perceptron

Author: Mark Herbster, Massimiliano Pontil

Abstract: We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1

6 0.093781255 77 nips-2006-Fast Computation of Graph Kernels

7 0.084687829 173 nips-2006-Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

8 0.078257069 7 nips-2006-A Local Learning Approach for Clustering

9 0.06933409 150 nips-2006-On Transductive Regression

10 0.069147885 117 nips-2006-Learning on Graph with Laplacian Regularization

11 0.064849377 39 nips-2006-Balanced Graph Matching

12 0.064198107 35 nips-2006-Approximate inference using planar graph decomposition

13 0.059023857 60 nips-2006-Convergence of Laplacian Eigenmaps

14 0.057624262 128 nips-2006-Manifold Denoising

15 0.051240742 14 nips-2006-A Small World Threshold for Economic Network Formation

16 0.045247987 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

17 0.043187138 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

18 0.040864706 115 nips-2006-Learning annotated hierarchies from relational data

19 0.040256336 169 nips-2006-Relational Learning with Gaussian Processes

20 0.037070412 130 nips-2006-Max-margin classification of incomplete data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.076), (2, -0.008), (3, 0.163), (4, 0.081), (5, -0.055), (6, 0.011), (7, 0.121), (8, 0.053), (9, -0.111), (10, 0.094), (11, -0.074), (12, -0.055), (13, 0.034), (14, -0.051), (15, -0.011), (16, -0.029), (17, 0.048), (18, 0.001), (19, -0.124), (20, -0.076), (21, -0.046), (22, 0.009), (23, -0.025), (24, 0.091), (25, 0.013), (26, -0.076), (27, 0.076), (28, 0.106), (29, -0.034), (30, -0.01), (31, 0.024), (32, 0.015), (33, 0.005), (34, -0.007), (35, -0.038), (36, -0.055), (37, -0.01), (38, -0.007), (39, -0.091), (40, -0.012), (41, 0.013), (42, -0.07), (43, 0.066), (44, 0.008), (45, -0.009), (46, -0.119), (47, -0.041), (48, 0.055), (49, -0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94937754 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

Author: Dengyong Zhou, Jiayuan Huang, Bernhard Schölkopf

Abstract: We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classification on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs. 1

2 0.69052142 163 nips-2006-Prediction on a Graph with a Perceptron

Author: Mark Herbster, Massimiliano Pontil

Abstract: We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1

3 0.6227833 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi

Abstract: One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary. keywords: Clustering, Semi-Supervised Learning 1

4 0.60436082 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

5 0.57969099 77 nips-2006-Fast Computation of Graph Kernels

Author: Karsten M. Borgwardt, Nicol N. Schraudolph, S.v.n. Vishwanathan

Abstract: Using extensions of linear algebra concepts to Reproducing Kernel Hilbert Spaces (RKHS), we define a unifying framework for random walk kernels on graphs. Reduction to a Sylvester equation allows us to compute many of these kernels in O(n3 ) worst-case time. This includes kernels whose previous worst-case time complexity was O(n6 ), such as the geometric kernels of G¨ rtner et al. [1] and a the marginal graph kernels of Kashima et al. [2]. Our algebra in RKHS allow us to exploit sparsity in directed and undirected graphs more effectively than previous methods, yielding sub-cubic computational complexity when combined with conjugate gradient solvers or fixed-point iterations. Experiments on graphs from bioinformatics and other application domains show that our algorithms are often more than 1000 times faster than existing approaches. 1

6 0.47346318 39 nips-2006-Balanced Graph Matching

7 0.47090876 173 nips-2006-Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

8 0.45376033 35 nips-2006-Approximate inference using planar graph decomposition

9 0.44877437 117 nips-2006-Learning on Graph with Laplacian Regularization

10 0.43617383 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

11 0.43436453 7 nips-2006-A Local Learning Approach for Clustering

12 0.41924793 128 nips-2006-Manifold Denoising

13 0.41635221 60 nips-2006-Convergence of Laplacian Eigenmaps

14 0.41479981 92 nips-2006-High-Dimensional Graphical Model Selection Using $\ell 1$-Regularized Logistic Regression

15 0.41368064 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

16 0.40303084 14 nips-2006-A Small World Threshold for Economic Network Formation

17 0.39567515 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

18 0.29113212 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

19 0.27720121 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

20 0.2722097 169 nips-2006-Relational Learning with Gaussian Processes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.067), (3, 0.015), (7, 0.093), (9, 0.042), (12, 0.024), (20, 0.012), (22, 0.055), (44, 0.048), (57, 0.046), (64, 0.023), (65, 0.051), (69, 0.027), (83, 0.029), (87, 0.37)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76204765 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

Author: Dengyong Zhou, Jiayuan Huang, Bernhard Schölkopf

Abstract: We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classification on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs. 1

2 0.65639722 144 nips-2006-Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints

Author: Carla P. Gomes, Ashish Sabharwal, Bart Selman

Abstract: We propose a new technique for sampling the solutions of combinatorial problems in a near-uniform manner. We focus on problems specified as a Boolean formula, i.e., on SAT instances. Sampling for SAT problems has been shown to have interesting connections with probabilistic reasoning, making practical sampling algorithms for SAT highly desirable. The best current approaches are based on Markov Chain Monte Carlo methods, which have some practical limitations. Our approach exploits combinatorial properties of random parity (XOR) constraints to prune away solutions near-uniformly. The final sample is identified amongst the remaining ones using a state-of-the-art SAT solver. The resulting sampling distribution is provably arbitrarily close to uniform. Our experiments show that our technique achieves a significantly better sampling quality than the best alternative. 1

3 0.52397579 34 nips-2006-Approximate Correspondences in High Dimensions

Author: Kristen Grauman, Trevor Darrell

Abstract: Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel. 1

4 0.39005142 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

5 0.38146332 163 nips-2006-Prediction on a Graph with a Perceptron

Author: Mark Herbster, Massimiliano Pontil

Abstract: We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1

6 0.3812685 198 nips-2006-Unified Inference for Variational Bayesian Linear Gaussian State-Space Models

7 0.38000131 35 nips-2006-Approximate inference using planar graph decomposition

8 0.37876552 33 nips-2006-Analysis of Representations for Domain Adaptation

9 0.37754634 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

10 0.37654582 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

11 0.37500352 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

12 0.37444526 77 nips-2006-Fast Computation of Graph Kernels

13 0.37376365 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions

14 0.37336743 117 nips-2006-Learning on Graph with Laplacian Regularization

15 0.3730346 65 nips-2006-Denoising and Dimension Reduction in Feature Space

16 0.3702496 150 nips-2006-On Transductive Regression

17 0.36975658 128 nips-2006-Manifold Denoising

18 0.36974236 20 nips-2006-Active learning for misspecified generalized linear models

19 0.36851302 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

20 0.36850747 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation