nips nips2008 nips2008-48 knowledge-graph by maker-knowledge-mining

48 nips-2008-Clustering via LP-based Stabilities

Source: pdf

Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas

Abstract: A novel center-based clustering algorithm is proposed in this paper. We ﬁrst formulate clustering as an NP-hard linear integer program and we then use linear programming and the duality theory to derive the solution of this optimization problem. This leads to an efﬁcient and very general algorithm, which works in the dual domain, and can cluster data based on an arbitrary set of distances. Despite its generality, it is independent of initialization (unlike EM-like methods such as K-means), has guaranteed convergence, can automatically determine the number of clusters, and can also provide online optimality bounds about the quality of the estimated clustering solutions. To deal with the most critical issue in a centerbased clustering algorithm (selection of cluster centers), we also introduce the notion of stability of a cluster center, which is a well deﬁned LP-based quantity that plays a key role to our algorithm’s success. Furthermore, we also introduce, what we call, the margins (another key ingredient in our algorithm), which can be roughly thought of as dual counterparts to stabilities and allow us to obtain computationally efﬁcient approximations to the latter. Promising experimental results demonstrate the potentials of our method.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 gr Abstract A novel center-based clustering algorithm is proposed in this paper. [sent-7, score-0.266]

2 We ﬁrst formulate clustering as an NP-hard linear integer program and we then use linear programming and the duality theory to derive the solution of this optimization problem. [sent-8, score-0.322]

3 This leads to an efﬁcient and very general algorithm, which works in the dual domain, and can cluster data based on an arbitrary set of distances. [sent-9, score-0.431]

4 Despite its generality, it is independent of initialization (unlike EM-like methods such as K-means), has guaranteed convergence, can automatically determine the number of clusters, and can also provide online optimality bounds about the quality of the estimated clustering solutions. [sent-10, score-0.381]

5 To deal with the most critical issue in a centerbased clustering algorithm (selection of cluster centers), we also introduce the notion of stability of a cluster center, which is a well deﬁned LP-based quantity that plays a key role to our algorithm’s success. [sent-11, score-0.685]

6 Furthermore, we also introduce, what we call, the margins (another key ingredient in our algorithm), which can be roughly thought of as dual counterparts to stabilities and allow us to obtain computationally efﬁcient approximations to the latter. [sent-12, score-0.677]

7 Most of the clustering methods are center-based, thus trying to extract a set of cluster centers that best ‘describe’ the input data. [sent-16, score-0.545]

8 Typically, this translates into an optimization problem where one seeks to assign each input data point to a unique cluster center such that the total sum of the corresponding distances is minimized. [sent-17, score-0.263]

9 These techniques are extremely popular and they are thus essential even to other types of clustering algorithms such as Spectral Clustering methods [1],[2]. [sent-18, score-0.239]

10 Currently, most center-based clustering methods rely on EM-like schemes for optimizing their clustering objective function [3]. [sent-19, score-0.612]

11 It keeps greedily reﬁning a current set of cluster centers based on a simple gradient descent scheme. [sent-21, score-0.306]

12 A second very important drawback of many center-based clustering methods, which severely limits their applicability, is that they either require the input data to be of vectorial form and/or impose strong restrictions on the type of distance functions they can handle. [sent-26, score-0.278]

13 1 To circumvent all the issues mentioned above, a novel center-based clustering algorithm is proposed in this paper. [sent-33, score-0.266]

14 Similarly to other methods, it reduces clustering to a well-deﬁned (but NP-hard) minimization problem, where, of course, the challenge now is how to obtain solutions of minimum objective value. [sent-34, score-0.364]

15 By making heavy use of a dual LP relaxation to that program, we then manage to derive a dual based algorithm for clustering. [sent-36, score-0.637]

16 As in all center-based clustering techniques, the most critical component in the resulting algorithm is deciding what cluster centers to choose. [sent-37, score-0.609]

17 To this end, we introduce, what we call, the stability of a data point as a cluster center (this is an LP-based quantity), which we consider as another contribution of this work. [sent-38, score-0.356]

18 Intuitively, the stability of a data point as a cluster center tries to measure how much we need to penalize that point (by appropriately modifying the objective function) such that it can no longer be chosen as a center in an optimal solution of the modiﬁed problem. [sent-39, score-0.663]

19 As we prove in this paper, margins can be considered as dual to stabilities. [sent-43, score-0.465]

20 The outcome is an efﬁcient and very easily implementable optimization algorithm, which works in the dual domain by iteratively updating a dual solution via two very simple operations: DISTRIBUTE and PROJECT. [sent-47, score-0.654]

21 These guarantees come in the form of lower bounds on the cost of the optimal clustering and are computed (for free) by simply using the cost of the dual solutions generated during the course of the algorithm. [sent-54, score-0.735]

22 2 Clustering via stabilities based on Linear Programming Given a set of objects V with distances d = {dpq }, clustering amounts to choosing a set of cluster centers from V (say {qi }k ) such that the sum of distances between each object and its closest i=1 center is minimized. [sent-55, score-1.162]

23 To this end, we are going to use the following objective function E(·) (which will be referred to as the primal cost hereafter): min k,{qi }k i=1 E({qi }k ) = i=1 p∈V min dpqi + i i dqi qi (1) Note that, in this case, we require that each cluster is chosen from the set V. [sent-56, score-0.5]

24 Also note that, besides {qi }, here we optimize over the number of cluster centers k as well. [sent-57, score-0.306]

25 Of course, to avoid the trivial solution of choosing all objects as centers, we regularize the problem by assigning a penalty dqq to each chosen center q. [sent-58, score-0.444]

26 dpq xpq (2) xpq = 1 (3) p,q∈V q∈V xpq ≤ xqq xpq ≥ 0 (4) (5) To get an equivalent problem to (1), we simply have to replace xpq ≥ 0 with xpq ∈ {0, 1}. [sent-61, score-1.41]

27 In this case, each binary variable xpq with p = q indicates whether object p has been assigned to cluster center q or not, while binary variable xqq indicates whether object q has been chosen as a cluster center or not. [sent-62, score-1.061]

28 Constraints (3) simply express the fact that each object must be assigned to exactly one center, while constraints (4) require that if p has been assigned to q then object q must obviously be chosen as a center. [sent-63, score-0.31]

29 2 Obviously at the core of any clustering problem of this type lies the issue of deciding which objects will be chosen as centers. [sent-64, score-0.433]

30 This will be a well deﬁned measure which, intuitively, tries to quantitatively answer the following question: “How much do we need to penalize an object in order to ensure that it is never selected as an optimal cluster center? [sent-66, score-0.321]

31 We will thus deﬁne the stability S(q) of an object q as follows: S(q) = inf{perturbation s that has to be applied to penalty dqq (i. [sent-68, score-0.4]

32 , dqq ← dqq + s) such that P RIMAL has no optimal solution x with xqq > 0} (6) An object q can be stable or unstable depending on whether it holds S(q) ≥ 0 or S(q) < 0. [sent-70, score-0.655]

33 To select a set of centers Q, we will then rely on the following observation: a stable object with high stability is also expected to be, with high probability, an optimal center in (1). [sent-71, score-0.597]

34 Hence, our strategy for generating Q will be to sequentially select a set of stable objects, trying, at each step, to select an object of approximately maximum stability (as already explained, there is high chance that this object will be an optimal center in (1)). [sent-73, score-0.518]

35 Furthermore, each time we insert a stable object q to Q, we reestimate stabilities for the remaining objects in order to take this fact into account (e. [sent-74, score-0.526]

36 , an object may become unstable if we know that it holds xqq = 1 for another object q). [sent-76, score-0.432]

37 To achieve that, we will need to impose extra constraints to P RIMAL (as we shall see, this will help us to obtain an accurate estimation for the stabilities of the remaining objects given that objects in Q are already chosen as centers). [sent-77, score-0.556]

38 1 Margins and dual-based clustering For having a practical algorithm, the most critical issue is how to obtain a rough approximation to the stability of an object q in a computationally efﬁcient manner. [sent-80, score-0.49]

39 As we shall see, to achieve this we will need to to move to the dual domain and introduce a novel concept that lies at the core of our approach: the margin of dual solutions. [sent-81, score-0.627]

40 But, ﬁrst, we need to introduce the dual to problem P RIMAL, which is the linear program called D UAL in (7)2 : D UAL ≡ max D(h) = p∈V hp (7) ∀p ∈ V p∈V hpq = p∈V hpq ≥ dpq dpq , (8) ∀q ∈ V (9) ∀p = q s. [sent-82, score-1.578]

41 Given a feasible dual solution h, we can now deﬁne its margin ∆q (h) (with respect to object q) as follows: ∆q (h) = p:hpq =hp ˆ (hp − hp ) − p=q (hpq − max(hp , dpq )) − hqq − hq , (11) where (for any h) ˆ p hereafter denotes the next-to-minimum pseudo-distance from p. [sent-85, score-1.084]

42 h There is a very tight connection between margins of dual solutions and stabilities of objects. [sent-86, score-0.711]

43 The following lemma provides a ﬁrst indication for this fact and shows that we can actually use margins to decide whether an object is stable or not and also to lower bound or upper bound its stability accordingly (see [7] for proofs): Lemma 1 ([7]). [sent-87, score-0.491]

44 Problem D UAL results from the standard dual to P RIMAL after applying a transformation to the dual variables. [sent-90, score-0.57]

45 In fact, the following fundamental theorem goes even further by proving that stabilities can be fully characterized solely in terms of margins. [sent-95, score-0.25]

46 Hence, margins and stabilities are two concepts that can be roughly considered as dual to each other: Theorem 2 ([7]). [sent-96, score-0.677]

47 (14) What the above theorem essentially tells us is that one can compute S(q) exactly, simply by considering the margins of optimal dual solutions. [sent-99, score-0.465]

48 Based on this fact, it is therefore safe to assume that solutions h with high (but not necessarily maximum) dual objective D(h) will have margins that are good approximations to S(q), i. [sent-100, score-0.59]

49 (15) This is exactly the idea that our clustering algorithm will rely on in order to efﬁciently discover objects that are stable. [sent-103, score-0.439]

50 It thus maintains a dual solution h and a set Q containing all stable objects chosen as centers up to the current point (Q is empty initially). [sent-104, score-0.742]

51 At each iteration, it increases the dual objective D(h) by updating solution h via an operation called DISTRIBUTE. [sent-105, score-0.527]

52 This operation is repeatedly applied until a high enough objective value D(h) is obtained such that at least one stable object is revealed based on the estimated margins of h. [sent-106, score-0.522]

53 , for increasing the dual objective) also relies critically on the use of margins. [sent-111, score-0.311]

54 Another technical point that we need to solve comes from the fact that Q gets populated with objects as the algorithm proceeds, which is something that we certainly need to take into account when estimating object stabilities. [sent-112, score-0.335]

55 Fortunately, there is a very elegant solution to this problem: since all objects in Q are assumed to be cluster centers (i. [sent-113, score-0.491]

56 In addition to that, the deﬁnition of margin ∆q (h) needs to be modiﬁed as follows : ∆q (h) = p∈Q:hpq =hp / ˆ (hp − hp ) − p∈Q∪{q} / (hpq − max(hp , dpq )) − hqq − hq . [sent-123, score-0.583]

57 ¯ q ¯ q∈Q / (17) Based on the fact that margins are used as approximations to the stabilities of objects, the above update simply says that the object q with maximum stability should be chosen as the new center at ¯ the current iteration, provided of course that this object q is stable. [sent-125, score-0.951]

58 Furthermore, in this case, we also ¯ 3 Actually, to represent the dual of P RIMAL Q exactly, we need to add a constant in the objective function of D UAL Q . [sent-126, score-0.403]

59 need to update the current dual solution h in order to take account of the fact that extra constraints have been added to D UALQ (these are a result of the extra constraint xq q = 1 that has been added to ¯¯ P RIMALQ ). [sent-130, score-0.396]

60 / ¯ ¯ ¯ ¯ q q (18) Note that update hpp += hq p − dq p is needed for maintaining dual feasibility constraint (9). [sent-132, score-0.444]

61 Essen¯ ¯ tially, PROJECT is a warm-start operation, that allows us to reuse existing information for computing a solution h that has a high dual objective value D(h) and is also feasible to the updated D UALQ . [sent-133, score-0.431]

62 The DISTRIBUTE operation: In case it holds ∆q (h) < 0 for all q ∈ Q, this means that we are / unable to ﬁnd an object with good stability at the current iteration. [sent-134, score-0.281]

63 To counter that, we will thus need to update solution h in order to increase its dual objective value (recall that, by lemma 1, stable objects will necessarily be revealed at an optimal dual solution, i. [sent-135, score-0.962]

64 Intuitively, what happens is that as we increase the dual objective D(h), objects not in Q actually try to compete with each other for achieving a large margin. [sent-138, score-0.506]

65 Interestingly enough, in order to increase D(h), we will again have to rely on the margins of the current dual solution. [sent-139, score-0.508]

66 , LQ = {p ∈ Q | hp = minq∈Q hpq }, while |Vq | denotes the / cardinality of the set Vq = {p ∈ Q ∪ LQ | hp ≥ dpq } ∪ {q}. [sent-142, score-1.03]

67 If maxq∈Q ∆q (h) < 0, then the DISTRIBUTE operation maintains feasibility and, / unless V = Q ∪ LQ , it also strictly increases the dual objective. [sent-144, score-0.377]

68 As already explained, it is an iterative algorithm, which keeps updating a dual solution h by using the DISTRIBUTE and PROJECT operations (the latter applied only when needed) until the dual objective can no longer increase. [sent-147, score-0.745]

69 Note also that, besides maintaining a dual solution h, the algorithm also maintains Q which provides a current clustering and also has a primal cost E(Q). [sent-148, score-0.82]

70 If maxq∈Q ∆q (h) > 0, then the EXPAND operation strictly decreases the primal cost / E(Q). [sent-150, score-0.256]

71 We also note that the algorithm’s convergence is always guaranteed: the algorithm terminates when neither the primal cost E(Q) decreases nor the dual objective D(h) increases during the current iteration. [sent-154, score-0.592]

72 Finally, we note that exactly the same algorithm applies to the general case where the objects in V form a graph with edges E (distance dpq is then deﬁned only for pq ∈ E). [sent-155, score-0.369]

73 These can be considered as counterparts to our pseudo-distances hpq . [sent-163, score-0.27]

74 Afﬁnity propagation also estimates the so-called self-availabilities for measuring the likelihood of an object being a cluster center. [sent-164, score-0.319]

75 On the contrary, we use for the same purpose the margins that approximate the stability of an object. [sent-165, score-0.307]

76 Very recently, another exemplar-based algorithm has been proposed as well, which relies on solving a convex formulation of clustering [8]. [sent-168, score-0.292]

77 Furthermore, it relies on a convex relaxation which is known to be much less tight than the LP relaxation P RIMAL we use here (essentially [8] replaces all constraints xpq ≤ xqq , ∀p ∈ V with the much looser constraint p xpq ≤ |V| · xqq ). [sent-170, score-0.762]

78 We also note that, unlike EM-like clustering algorithms such as K-means, our method is totally insensitive to initialization conditions and does not get stuck at bad local minima (thus yielding solutions of much better quality). [sent-172, score-0.372]

79 4 Experimental results To illustrate the robustness of our algorithm to noise and its insensitivity to initialization, we start by showing clustering results on synthetic data. [sent-174, score-0.266]

80 The clustering result produced by our algorithm is shown in Fig. [sent-179, score-0.266]

81 As can be seen from that ﬁgure, despite the heavy percentage of noise, our method has been able to accurately detect all gaussian centers and successfully cluster this 2D dataset. [sent-181, score-0.333]

82 Instead, it was inferred based on a common penalty term dqq for all objects q, which was set roughly equal to the median distance between points. [sent-183, score-0.318]

83 2(c) the primal and dual costs that were generated by our algorithm when it was applied to the example of Fig. [sent-189, score-0.512]

84 Note that the dual costs represent lower bounds to the optimum value of the objective function E(·), while the primal costs represent obviously upper bounds. [sent-192, score-0.733]

85 This fact allows us to obtain online optimality bounds with respect to how far our current primal solution Q is with respect to the unknown optimum of E(·). [sent-193, score-0.276]

86 For instance, in this particular example, we can be sure that the primal cost of our ﬁnal solution is within 1% of the unknown optimum of function E(·), i. [sent-195, score-0.276]

87 As we shall see, a non-Euclidean distance for clustering will have to be used in this case. [sent-199, score-0.308]

88 4 0 0 20 40 60 (c) Primal and dual costs Fig. [sent-225, score-0.349]

89 The centers of the big circles represent the points chosen as cluster centers by the 2 algorithms. [sent-227, score-0.493]

90 The primal and dual costs in (c) verify that the cost of our algorithm’s solution is within 1% of the optimum cost. [sent-228, score-0.625]

91 The 3D-motion segmentation problem is the task of clustering these N pixel pairs according to the K moving objects. [sent-230, score-0.393]

92 In this case, each pixel pair, say, pi = (yi , zi ) that belongs to a moving object k should satisfy an epipolar constraint: T yi Fk zi = 0 , (19) where Fk represents the fundamental matrix associated with the k-th 3D motion. [sent-232, score-0.323]

93 Hence, to solve the 3D segmentation problem, we need to estimate both the matrices Fk as well as the association of each pixel pair pi = (yi , zi ) to the correct fundamental matric Fk . [sent-234, score-0.304]

94 The resulting matrices, say, {Fk } will then correspond to cluster centers, whereas all the input pixel pairs {pi } will correspond to objects that need to be assigned to an active cluster center. [sent-238, score-0.495]

95 A clustering objective function of the form (1) thus results and by minimizing it we can also obtain a solution to the 3D segmentation problem. [sent-239, score-0.493]

96 Of course, in this case, the distance function d(pi , Fk ) between an object pi = (yi , zi ) and a cluster center will not be Euclidean. [sent-240, score-0.477]

97 The ground-truth motion segmentation is also shown for each example and, as can be seen, it is almost identical with the segmentation estimated by our algorithm. [sent-245, score-0.258]

98 Some really impressive results on 4 very challenging datasets have been reported for that algorithm in [5], indicating that it outperforms any other center-based clustering method. [sent-247, score-0.266]

99 In particular, AP has been used for: clustering images of faces (using the squared error distance), detecting genes in microarray data (using a distance based on exons’ transcriptions levels), identifying representative sentences in manuscripts (using (a) (b) Fig. [sent-248, score-0.378]

100 The primal costs generated by AP for this dataset (shown in (c)) demonstrate that AP fails to converge in this case (to prevent that, a properly chosen damping factor has to be used). [sent-256, score-0.256]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rimal', 0.309), ('dual', 0.285), ('hp', 0.274), ('hpq', 0.27), ('clustering', 0.239), ('dpq', 0.212), ('stabilities', 0.212), ('ual', 0.193), ('margins', 0.18), ('xpq', 0.174), ('centers', 0.16), ('xqq', 0.154), ('cluster', 0.146), ('primal', 0.136), ('ualq', 0.135), ('objects', 0.13), ('stability', 0.127), ('ap', 0.125), ('object', 0.124), ('dqq', 0.116), ('segmentation', 0.108), ('distribute', 0.101), ('fk', 0.091), ('objective', 0.091), ('center', 0.083), ('lq', 0.068), ('maxq', 0.068), ('vq', 0.068), ('operation', 0.067), ('costs', 0.064), ('stable', 0.06), ('rimalq', 0.058), ('hq', 0.058), ('solution', 0.055), ('pi', 0.055), ('cost', 0.053), ('nity', 0.052), ('tziritas', 0.051), ('propagation', 0.049), ('qi', 0.047), ('guaranteed', 0.047), ('cities', 0.046), ('pixel', 0.046), ('course', 0.045), ('af', 0.045), ('rely', 0.043), ('motion', 0.042), ('initialization', 0.042), ('lp', 0.042), ('dq', 0.041), ('relaxation', 0.04), ('furthermore', 0.04), ('genes', 0.039), ('distance', 0.039), ('crete', 0.039), ('dprev', 0.039), ('hqp', 0.039), ('hqq', 0.039), ('minq', 0.039), ('fundamental', 0.038), ('deciding', 0.037), ('hereafter', 0.037), ('project', 0.037), ('contrary', 0.036), ('motions', 0.036), ('obviously', 0.035), ('solutions', 0.034), ('distances', 0.034), ('komodakis', 0.034), ('nikos', 0.034), ('paragios', 0.034), ('sentences', 0.034), ('penalty', 0.033), ('insensitive', 0.033), ('optimum', 0.032), ('assessing', 0.032), ('hpp', 0.031), ('trapped', 0.031), ('holds', 0.03), ('shall', 0.03), ('clusters', 0.03), ('zi', 0.03), ('update', 0.029), ('gaussians', 0.029), ('damping', 0.029), ('updating', 0.029), ('program', 0.028), ('optimality', 0.027), ('chosen', 0.027), ('algorithm', 0.027), ('despite', 0.027), ('faces', 0.027), ('need', 0.027), ('correspondences', 0.026), ('relies', 0.026), ('bounds', 0.026), ('maintains', 0.025), ('clusterings', 0.025), ('bad', 0.024), ('penalize', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 48 nips-2008-Clustering via LP-based Stabilities

Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas

2 0.12518138 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

Author: David Sontag, Amir Globerson, Tommi S. Jaakkola

Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for ﬁnding the most probable (MAP) conﬁguration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and signiﬁcantly faster than a method that does not use partitioning. 1

3 0.12500516 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering

Author: Shai Ben-David, Margareta Ackerman

Abstract: Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, ([1]) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the speciﬁc formalism used in [1]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg’s axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg’s axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time. 1

4 0.12098478 117 nips-2008-Learning Taxonomies by Dependence Maximization

Author: Matthew Blaschko, Arthur Gretton

Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1

5 0.11981785 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

Author: Laurent Jacob, Jean-philippe Vert, Francis R. Bach

Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may beneﬁt from the others. In the context of learning linear functions for supervised classiﬁcation or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non-convex methods dedicated to the same problem. 1

6 0.116397 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization

7 0.11032368 218 nips-2008-Spectral Clustering with Perturbed Data

8 0.098651305 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

9 0.095571078 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

10 0.087490231 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

11 0.083340719 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

12 0.080714978 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

13 0.071655095 179 nips-2008-Phase transitions for high-dimensional joint support recovery

14 0.070852607 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context

15 0.070136458 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

16 0.063902296 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

17 0.063472018 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

18 0.063130885 193 nips-2008-Regularized Co-Clustering with Dual Supervision

19 0.062506169 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

20 0.060181435 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.193), (1, -0.058), (2, -0.026), (3, 0.007), (4, 0.006), (5, -0.036), (6, -0.022), (7, -0.147), (8, -0.005), (9, -0.114), (10, -0.015), (11, 0.015), (12, 0.045), (13, 0.082), (14, -0.028), (15, 0.206), (16, -0.192), (17, 0.022), (18, 0.124), (19, 0.02), (20, -0.081), (21, 0.099), (22, -0.028), (23, 0.145), (24, 0.008), (25, -0.125), (26, 0.019), (27, 0.046), (28, -0.107), (29, 0.04), (30, -0.127), (31, 0.029), (32, -0.031), (33, -0.049), (34, 0.065), (35, -0.002), (36, -0.03), (37, 0.032), (38, 0.004), (39, -0.03), (40, 0.022), (41, -0.089), (42, 0.046), (43, 0.07), (44, -0.084), (45, -0.001), (46, 0.028), (47, 0.034), (48, 0.017), (49, -0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96836519 48 nips-2008-Clustering via LP-based Stabilities

Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas

2 0.80127841 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering

Author: Shai Ben-David, Margareta Ackerman

3 0.74642515 117 nips-2008-Learning Taxonomies by Dependence Maximization

Author: Matthew Blaschko, Arthur Gretton

4 0.70141989 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

Author: Laurent Jacob, Jean-philippe Vert, Francis R. Bach

5 0.69729573 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

Author: Deli Zhao, Xiaoou Tang

Abstract: Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More speciﬁcally, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1

6 0.56135261 218 nips-2008-Spectral Clustering with Perturbed Data

7 0.48541611 29 nips-2008-Automatic online tuning for fast Gaussian summation

8 0.47150365 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

9 0.47048965 165 nips-2008-On the Reliability of Clustering Stability in the Large Sample Regime

10 0.43556499 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

11 0.40762809 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization

12 0.39952374 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

13 0.38577631 23 nips-2008-An ideal observer model of infant object perception

14 0.38451394 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

15 0.38201916 193 nips-2008-Regularized Co-Clustering with Dual Supervision

16 0.38005605 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

17 0.36379179 107 nips-2008-Influence of graph construction on graph-based clustering measures

18 0.36083844 85 nips-2008-Fast Rates for Regularized Objectives

19 0.3577354 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions

20 0.34444419 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.014), (6, 0.069), (7, 0.057), (12, 0.062), (15, 0.013), (21, 0.089), (28, 0.182), (53, 0.163), (57, 0.054), (59, 0.025), (63, 0.041), (71, 0.022), (77, 0.039), (83, 0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84279633 48 nips-2008-Clustering via LP-based Stabilities

Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas

2 0.83312583 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

Author: Pierre Garrigues, Laurent E. Ghaoui

Abstract: It has been shown that the problem of 1 -penalized least-square regression commonly referred to as the Lasso or Basis Pursuit DeNoising leads to solutions that are sparse and therefore achieves model selection. We propose in this paper RecLasso, an algorithm to solve the Lasso with online (sequential) observations. We introduce an optimization problem that allows us to compute an homotopy from the current solution to the solution after observing a new data point. We compare our method to Lars and Coordinate Descent, and present an application to compressive sensing with sequential observations. Our approach can easily be extended to compute an homotopy from the current solution to the solution that corresponds to removing a data point, which leads to an efﬁcient algorithm for leave-one-out cross-validation. We also propose an algorithm to automatically update the regularization parameter after observing a new data point. 1

3 0.80623323 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues

Author: Rama Natarajan, Iain Murray, Ladan Shams, Richard S. Zemel

Abstract: We explore a recently proposed mixture model approach to understanding interactions between conﬂicting sensory cues. Alternative model formulations, differing in their sensory noise models and inference methods, are compared based on their ﬁt to experimental data. Heavy-tailed sensory likelihoods yield a better description of the subjects’ response behavior than standard Gaussian noise models. We study the underlying cause for this result, and then present several testable predictions of these models. 1

4 0.80564868 153 nips-2008-Nonlinear causal discovery with additive noise models

Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf

Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to ﬁt them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identiﬁed. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identiﬁcation power provided by nonlinearities. 1

5 0.8014971 134 nips-2008-Mixed Membership Stochastic Blockmodels

Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing

Abstract: In many settings, such as protein interactions and gene regulatory networks, collections of author-recipient email, and social networks, the data consist of pairwise measurements, e.g., presence or absence of links between pairs of objects. Analyzing such data with probabilistic models requires non-standard assumptions, since the usual independence or exchangeability assumptions no longer hold. In this paper, we introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) with a local model to instantiate node-speciﬁc variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodel with applications to social networks and protein interaction networks. 1

6 0.80106032 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making

7 0.77051979 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

8 0.76789618 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

9 0.76628876 194 nips-2008-Regularized Learning with Networks of Features

10 0.76536 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

11 0.76425648 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

12 0.76314509 196 nips-2008-Relative Margin Machines

13 0.76177692 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

14 0.76167631 195 nips-2008-Regularized Policy Iteration

15 0.76029617 202 nips-2008-Robust Regression and Lasso

16 0.75952047 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models

17 0.75904655 113 nips-2008-Kernelized Sorting

18 0.75885016 179 nips-2008-Phase transitions for high-dimensional joint support recovery

19 0.75859749 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

20 0.75848877 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data