nips nips2004 nips2004-61 knowledge-graph by maker-knowledge-mining

61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters


Source: pdf

Author: Massimiliano Pavan, Marcello Pelillo

Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 it Abstract Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. [sent-3, score-0.268]

2 They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. [sent-4, score-0.159]

3 We address the problem of grouping out-of-sample examples after the clustering process has taken place. [sent-5, score-0.232]

4 This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. [sent-6, score-0.22]

5 We show that the very notion of a dominant set offers a simple and efficient way of doing this. [sent-7, score-0.489]

6 Numerical experiments on various grouping problems show the effectiveness of the approach. [sent-8, score-0.165]

7 1 Introduction Proximity-based, or pairwise, data clustering techniques are gaining increasing popularity over traditional central grouping techniques, which are centered around the notion of “feature” (see, e. [sent-9, score-0.281]

8 Hence, it is natural to map (possibly implicitly) the data to be clustered to the nodes of a weighted graph, with edge weights representing similarity or dissimilarity relations. [sent-14, score-0.177]

9 Further, it allows one to use non-metric similarities and it is applicable to problems that do not have a natural embedding to a uniform feature space, such as the grouping of structural or graph-based representations. [sent-16, score-0.205]

10 We have recently developed a new framework for pairwise data clustering based on a novel graph-theoretic concept, that of a dominant set, which generalizes the notion of a maximal clique to edge-weighted graphs [7, 9]. [sent-17, score-0.644]

11 An intriguing connection between dominant sets and the solutions of a (continuous) quadratic optimization problem makes them related in a non-trivial way to spectral-based cluster notions, and allows one to use straightforward dynamics from evolutionary game theory to determine them [14]. [sent-18, score-0.869]

12 A nice feature of this framework is that it naturally provides a principled measure of a cluster’s cohesiveness as well as a measure of a vertex participation to its assigned group. [sent-19, score-0.441]

13 The approach has proven to be a powerful one when applied to problems such as intensity, color, and texture segmentation, or visual database organization, and is competitive with spectral approaches such as normalized cut [7, 8, 9]. [sent-21, score-0.213]

14 However, a typical problem associated to pairwise grouping algorithms in general, and hence to the dominant set framework in particular, is the scaling behavior with the number of data. [sent-22, score-0.669]

15 In such situations, the trivial approach of recomputing the complete cluster structure upon the arrival of any new item is clearly unfeasible. [sent-25, score-0.229]

16 There is no straightforward way of accomplishing this within the pairwise grouping paradigm, short of recomputing the complete cluster structure. [sent-28, score-0.456]

17 By contrast, we shall see that the very notion of a dominant set, o thanks to its clear combinatorial properties, offers a simple and efficient solution to this problem. [sent-30, score-0.527]

18 The basic idea consists of computing, for any new example, a quantity which measures the degree of cluster membership, and we provide simple approximations which allow us to do this in linear time and space, with respect to the cluster size. [sent-31, score-0.44]

19 Our classification schema inherits the main features of the dominant set formulation, i. [sent-32, score-0.411]

20 , the ability of yielding a soft classification of the input data and of providing principled measures for cluster membership and cohesiveness. [sent-34, score-0.372]

21 Numerical experiments show that the strategy of first grouping a small number of data items and then classifying the out-of-sample instances using our prediction rule is clearly successful as we are able to obtain essentially the same results as the dense problem in much less time. [sent-35, score-0.321]

22 We also present results on high-resolution image segmentation problems, a task where the dominant set framework would otherwise be computationally impractical. [sent-36, score-0.582]

23 2 Dominant Sets and Their Continuous Characterization We represent the data to be clustered as an undirected edge-weighted (similarity) graph with no self-loops G = (V, E, w), where V = {1, . [sent-37, score-0.109]

24 , n} is the vertex set, E ⊆ V × V is the edge set, and w : E → IR∗ is the (positive) weight function. [sent-40, score-0.099]

25 Vertices in G + correspond to data points, edges represent neighborhood relationships, and edge-weights reflect similarity between pairs of linked vertices. [sent-41, score-0.098]

26 As customary, we represent the graph G with the corresponding weighted adjacency (or similarity) matrix, which is the n × n nonnegative, symmetric matrix A = (aij ) defined as: aij = w(i, j) , 0, if (i, j) ∈ E otherwise . [sent-42, score-0.251]

27 Let S ⊆ V be a non-empty subset of vertices and i ∈ V . [sent-43, score-0.156]

28 S is defined as: 1 awdegS (i) = aij (1) |S| j∈S where |S| denotes the cardinality of S. [sent-47, score-0.112]

29 Moreover, if j ∈ S we define φS (i, j) = aij − / awdegS (i) which is a measure of the similarity between nodes j and i, with respect to the average similarity between node i and its neighbors in S. [sent-48, score-0.277]

30 Note that w{1,2,3,4} (1) < 0 and this reflects the fact that vertex 1 is loosely coupled to vertices 2, 3 and 4. [sent-50, score-0.327]

31 Conversely, w{5,6,7,8} (5) > 0 and this reflects the fact that vertex 5 is tightly coupled with vertices 6, 7, and 8. [sent-51, score-0.331]

32 Let S ⊆ V be a non-empty subset of vertices and i ∈ S. [sent-52, score-0.156]

33 W(S) = (3) i∈S Intuitively, wS (i) gives us a measure of the overall similarity between vertex i and the vertices of S \ {i} with respect to the overall similarity among the vertices in S \ {i}, with positive values indicating high internal coherency (see Fig. [sent-57, score-0.607]

34 A non-empty subset of vertices S ⊆ V such that W (T ) > 0 for any non-empty T ⊆ S, is said to be dominant if: 1. [sent-59, score-0.567]

35 The two previous conditions correspond to the two main properties of a cluster: the first regards internal homogeneity, whereas the second regards external inhomogeneity. [sent-62, score-0.121]

36 The above definition represents our formalization of the concept of a cluster in an edgeweighted graph. [sent-63, score-0.27]

37 The following theorem, proved in [7], establishes an intriguing connection between dominant sets and local solutions of program (4). [sent-66, score-0.663]

38 Theorem 1 If S is a dominant subset of vertices, then its (weighted) characteristics vector xS , which is the vector of ∆n defined as xS = i wS (i) W(S) , 0, if i ∈ S otherwise (5) is a strict local solution of program (4). [sent-67, score-0.538]

39 Conversely, if x is a strict local solution of program (4) then its support S = σ(x) is a dominant set, provided that wS∪{i} (i) = 0 for all i ∈ S. [sent-68, score-0.538]

40 By virtue of this result, we can find a dominant set by localizing a local solution of program (4) with an appropriate continuous optimization technique, such as replicator dynamics from evolutionary game theory [14], and then picking up its support. [sent-70, score-0.659]

41 Note that the components of the weighted characteristic vectors give us a natural measure of the participation of the corresponding vertices in the cluster, whereas the value of the objective function measures the cohesiveness of the class. [sent-71, score-0.561]

42 In order to get a partition of the input data into coherent groups, a simple approach is to iteratively finding a dominant set and then removing it from the graph, until all vertices have been grouped (see [9] for a hierarchical extension of this framework). [sent-72, score-0.688]

43 On the other hand, by finding all dominant sets, i. [sent-73, score-0.411]

44 , local solutions of (4), of the original graph, one can obtain a “soft” partition of the dataset, whereby clusters are allowed to overlap. [sent-75, score-0.234]

45 Finally, note that spectral clustering approaches such as, e. [sent-76, score-0.126]

46 3 Predicting Cluster Membership for Out-of-Sample Data Suppose we are given a set V of n unlabeled items and let G = (V, E, w) denote the corresponding similarity graph. [sent-79, score-0.14]

47 We shall denote by ˆ ˆ ˆ ˆ ˆ G = (V , E, w), with V = V ∪ V , the similarity graph built upon all the n + k data. [sent-83, score-0.207]

48 ˆ of G, namely a graph having V ⊆ V ˆ Let S ⊆ V be a subset of vertices which is dominant in the original graph G and let ˆ i ∈ V \ V a new data point. [sent-86, score-0.709]

49 As pointed out in the previous section, the sign of wS∪{i} (i) provides an indication as to whether i is tightly or loosely coupled with the vertices in S (the condition wS∪{i} (i) = 0 corresponds to a non-generic boundary situation that does not arise in practice and will therefore be ignored). [sent-87, score-0.325]

50 1 Accordingly, it is natural to propose the following rule for predicting cluster membership of unseen data: if wS∪{i} (i) > 0, then assign vertex i to cluster S . [sent-88, score-0.628]

51 (6) Note that, according to this rule, the same point can be assigned to more than one class, thereby yielding a soft partition of the input data. [sent-89, score-0.15]

52 To get a hard partition one can use the cluster membership approximation measures we shall discuss below. [sent-90, score-0.494]

53 Note that it may also happen for some instance i that no cluster S satisfies rule (6), in which case the point gets unclassified (or assigned to an “outlier” group). [sent-91, score-0.288]

54 This should be interpreted as an indication that either the point is too noisy or that the cluster formation process was inaccurate. [sent-92, score-0.218]

55 Proposition 1 Let G = (V, E, w) be an edge-weighted (similarity) graph, A = (aij ) its weighted adjacency matrix, and S ⊆ V a dominant set of G with characteristic vector Observe that wS (i) depends only on the the weights on the edges of the subgraph induced by S. [sent-98, score-0.578]

56 Let G = (V , E, w) be a supergraph of G with weighted adjacency matrix A = (ˆij ). [sent-101, score-0.147]

57 From Theorem 1, xS is a strict local solution of program (4) and hence it satisfies the Karush-Kuhn-Tucker (KKT) equality conditions, i. [sent-103, score-0.213]

58 Now, let n = |V | be the cardinality of V and let xS be ˆ ˆ which is obtained by padding xS with the (ˆ -dimensional) characteristic vector of S in G, n ˆ zeros. [sent-106, score-0.133]

59 Now, recall that the KKT equality conditions for program (4) imply S S S S h∈S ahj xh = x · Ax = f (x ) for any j ∈ S [7]. [sent-109, score-0.257]

60 Given an out-of-sample vertex i and a class S such that rule (6) holds, we now provide an approximation of the degree of participation of i in S ∪ {i} which, as pointed out in the previous section, is given by the ratio between wS∪{i} (i) and W(S ∪ {i}). [sent-111, score-0.284]

61 This can be used, for example, to get a hard partition of the input data when an instance happens to be assigned to more than one class. [sent-112, score-0.142]

62 Hence, we approximate S with a clique having constant weight a, and impose that it has the same cohesiveness value f (xS ) = xS · AxS as the original dominant set. [sent-116, score-0.66]

63 (9) Using the above formula one can easily get, by normalization, an approximation of the ˆ ˆ characteristic vector xS ∈ ∆n+k of S, the extension of cluster S obtained applying rule (6): ˆ ˆ S = S ∪ {i ∈ V \ V : wS∪{i} (i) > 0} . [sent-119, score-0.406]

64 ˆ With an approximation of xS at hand, it is also easy to compute an approximation of the ˆ ˆ ˆ ˆ ˆ ˆ cohesiveness of the new cluster S, i. [sent-120, score-0.453]

65 Indeed, assuming that S is dominant in G, ˆ ˆ ˆ S ˆ ˆ and recalling the KKT equality conditions for program (4) [7], we get (AxS )i = xS · Ax ˆ ˆ for all i ∈ S. [sent-123, score-0.621]

66 It is therefore natural to approximate the cohesiveness of S as a weighted ˆ ˆ S )i ’s. [sent-124, score-0.235]

67 Average distance between approximated and actual cluster membership (left) and cohesiveness (middle) as a function of sampling rate. [sent-166, score-0.663]

68 Right: average CPU time as a function of sampling rate. [sent-167, score-0.11]

69 4 Experimental Results In an attempt to evaluate how the approximations given at the end of the previous section actually compare to the solutions obtained on the dense problem, we conducted the following preliminary experiment. [sent-168, score-0.197]

70 We generated 150 points on the plane so as to form a dominant set (we used a standard Gaussian kernel to obtain similarities), and extracted random samples with increasing sampling rate, ranging from 1/15 to 1. [sent-169, score-0.521]

71 For each sampling rate 100 trials were made, for each of which we computed the Euclidean distance between the approximated and the actual characteristic vector (i. [sent-170, score-0.33]

72 , cluster membership), as well as the distance between the approximated and the actual cluster cohesiveness (that is, the value of the objective function f ). [sent-172, score-0.631]

73 As can be seen, our approximations work remarkably well: with a sampling rate less than 10 % the distance between the characteristic vectors is around 0. [sent-175, score-0.332]

74 Also, note how the CPU time increases linearly as the sampling rate approaches 100%. [sent-182, score-0.195]

75 Our goal was to test how the solutions obtained on the sampled graph compare with those of the original, dense problem and to study how the performance of the algorithm scales w. [sent-185, score-0.265]

76 As before, we used sampling rates from 1/15 to 1, and for each such value 100 random samples were extracted. [sent-189, score-0.11]

77 After the grouping process, the out-of-sample instances were assigned to one of the two classes found using rule (6). [sent-190, score-0.268]

78 4%, which is even slightly higher that the one obtained on the dense (100% rate) problem, which is 72. [sent-195, score-0.096]

79 For the sake of comparison we also ran normalized cut on the whole dataset, and it yielded a classification rate of 72. [sent-198, score-0.241]

80 Finally, we applied our algorithm to the segmentation of brightness images. [sent-200, score-0.14]

81 The image to be segmented is represented as a graph where vertices correspond to pixels and edgeweights reflect the “similarity” between vertex pairs. [sent-201, score-0.445]

82 As customary, we defined a similarity measure between pixels based on brightness proximity. [sent-202, score-0.186]

83 Specifically, following [7], similarity between pixels i and j was measured by w(i, j) = exp (I(i) − I(j))2 /σ 2 where σ is a positive real number which affects the decreasing rate of w, and I(i) is defined as the (normalized) intensity value at node i. [sent-203, score-0.237]

84 After drawing a set of pixels at random with sampling rate p = 0. [sent-204, score-0.249]

85 005, we iteratively found a dominant set in the sampled graph using replicator dynamics [7, 14], we removed it from the graph. [sent-205, score-0.568]

86 Average classification rate (left) and CPU time (right) as a function of sampling rate. [sent-240, score-0.195]

87 Figure 4: Segmentation results on a 115 × 97 weather radar image. [sent-241, score-0.092]

88 From left to right: original image, the two regions found on the sampled image (sampling rate = 0. [sent-242, score-0.181]

89 5%), and the two regions obtained on the whole image (sampling rate = 100%). [sent-243, score-0.215]

90 Figure 4 shows the results obtained on a 115 × 97 weather radar image, used in [13, 7] as an instance whereby edge-detection-based segmentation would perform poorly. [sent-245, score-0.309]

91 The leftmost cluster is the one obtained after the first iteration of the algorithm, and successive clusters are shown left to right. [sent-247, score-0.254]

92 Note how the segmentation obtained over the sparse image, sampled at 0. [sent-248, score-0.171]

93 3 On these images the sampling process produced a sample with no more than 1000 pixels, and our current MATLAB implementation took only a few seconds to return a solution. [sent-257, score-0.157]

94 Running the grouping algorithm on the whole images (which contain more than 150, 000 pixels) would simply be unfeasible. [sent-258, score-0.196]

95 We also ran normalized cut on these images (using the same sample rate of 0. [sent-260, score-0.21]

96 5 Conclusions We have provided a simple and efficient extension to the dominant-set clustering framework to deal with the grouping of out-of-sample data. [sent-263, score-0.293]

97 This makes the approach applicable to very large grouping problems, such as high-resolution image segmentation, where it would otherwise be impractical. [sent-264, score-0.23]

98 Experiments show that the solutions extrapolated from the sparse data are comparable with those of the dense problem, which in turn compare favorably with spectral solutions such as normalized cut’s, and are obtained in much less time. [sent-265, score-0.333]

99 For each image, the first line shows the major regions obtained with our approximation algorithm, while the second line shows the results obtained with normalized cut. [sent-271, score-0.149]

100 Unsupervised texture segmentation by dominant sets and game dynamics. [sent-344, score-0.644]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ws', 0.425), ('dominant', 0.411), ('xs', 0.369), ('cohesiveness', 0.194), ('cluster', 0.185), ('grouping', 0.165), ('vertices', 0.156), ('pavan', 0.111), ('sampling', 0.11), ('membership', 0.107), ('segmentation', 0.106), ('vertex', 0.099), ('similarity', 0.098), ('program', 0.086), ('rate', 0.085), ('ahj', 0.083), ('axs', 0.083), ('aij', 0.081), ('cut', 0.081), ('whereby', 0.077), ('graph', 0.071), ('characteristic', 0.068), ('clustering', 0.067), ('solutions', 0.067), ('cpu', 0.066), ('participation', 0.066), ('ax', 0.065), ('kkt', 0.065), ('image', 0.065), ('dense', 0.062), ('pairwise', 0.062), ('spectral', 0.059), ('adjacency', 0.058), ('ahi', 0.055), ('awdegs', 0.055), ('edgeweighted', 0.055), ('replicator', 0.055), ('venezia', 0.055), ('intriguing', 0.055), ('clique', 0.055), ('equality', 0.055), ('partition', 0.055), ('game', 0.054), ('pixels', 0.054), ('evolutionary', 0.053), ('rule', 0.052), ('assigned', 0.051), ('notion', 0.049), ('supergraph', 0.048), ('radar', 0.048), ('seconds', 0.047), ('normalized', 0.044), ('customary', 0.044), ('recomputing', 0.044), ('regards', 0.044), ('weather', 0.044), ('sets', 0.044), ('soft', 0.044), ('coupled', 0.042), ('items', 0.042), ('weighted', 0.041), ('irn', 0.041), ('strict', 0.041), ('similarities', 0.04), ('organization', 0.04), ('burden', 0.039), ('shall', 0.038), ('clustered', 0.038), ('approximation', 0.037), ('nities', 0.037), ('get', 0.036), ('measures', 0.036), ('clusters', 0.035), ('nystr', 0.035), ('distance', 0.035), ('obtained', 0.034), ('approximations', 0.034), ('ionosphere', 0.034), ('tightly', 0.034), ('brightness', 0.034), ('conditions', 0.033), ('vision', 0.033), ('indication', 0.033), ('actual', 0.032), ('cardinality', 0.031), ('nice', 0.031), ('notions', 0.031), ('sampled', 0.031), ('deal', 0.031), ('whole', 0.031), ('hence', 0.031), ('pointed', 0.03), ('hi', 0.03), ('extension', 0.03), ('concept', 0.03), ('conversely', 0.03), ('loosely', 0.03), ('serve', 0.029), ('texture', 0.029), ('offers', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

Author: Massimiliano Pavan, Marcello Pelillo

Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1

2 0.12728186 161 nips-2004-Self-Tuning Spectral Clustering

Author: Lihi Zelnik-manor, Pietro Perona

Abstract: We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local’ scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated. 1

3 0.10850438 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Author: Le Lu, Gregory D. Hager, Laurent Younes

Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.

4 0.10830852 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

5 0.10198853 115 nips-2004-Maximum Margin Clustering

Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans

Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1

6 0.089839622 116 nips-2004-Message Errors in Belief Propagation

7 0.083248921 103 nips-2004-Limits of Spectral Clustering

8 0.08315406 125 nips-2004-Multiple Relational Embedding

9 0.080427684 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

10 0.079293437 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

11 0.077790499 165 nips-2004-Semi-supervised Learning on Directed Graphs

12 0.075953849 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

13 0.075839981 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

14 0.074471287 36 nips-2004-Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification

15 0.073004752 177 nips-2004-Supervised Graph Inference

16 0.070693038 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

17 0.069559664 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

18 0.068500683 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

19 0.068380207 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

20 0.06738364 77 nips-2004-Hierarchical Clustering of a Mixture Model


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.221), (1, 0.068), (2, -0.044), (3, -0.069), (4, -0.019), (5, -0.023), (6, -0.109), (7, 0.058), (8, -0.121), (9, -0.086), (10, -0.12), (11, 0.012), (12, -0.047), (13, -0.033), (14, -0.02), (15, -0.015), (16, 0.007), (17, -0.007), (18, -0.071), (19, 0.018), (20, 0.071), (21, -0.002), (22, 0.001), (23, -0.023), (24, -0.029), (25, -0.045), (26, 0.016), (27, 0.027), (28, -0.022), (29, -0.073), (30, 0.075), (31, 0.077), (32, -0.053), (33, 0.054), (34, 0.014), (35, -0.043), (36, -0.008), (37, -0.005), (38, 0.102), (39, -0.043), (40, 0.029), (41, 0.053), (42, -0.092), (43, -0.006), (44, 0.016), (45, 0.18), (46, 0.023), (47, -0.061), (48, 0.03), (49, 0.057)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94182122 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

Author: Massimiliano Pavan, Marcello Pelillo

Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1

2 0.64469475 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

Author: Richard S. Zemel, Miguel Á. Carreira-Perpiñán

Abstract: Many machine learning algorithms for clustering or dimensionality reduction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering) or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fullyconnected graph, a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually produces a better representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimensionality reduction algorithm based on the graph. 1

3 0.6039862 161 nips-2004-Self-Tuning Spectral Clustering

Author: Lihi Zelnik-manor, Pietro Perona

Abstract: We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local’ scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated. 1

4 0.59940666 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

5 0.58463222 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)

Author: Kumar Chellapilla, Patrice Y. Simard

Abstract: Machine learning is often used to automatically solve human tasks. In this paper, we look for tasks where machine learning algorithms are not as good as humans with the hope of gaining insight into their current limitations. We studied various Human Interactive Proofs (HIPs) on the market, because they are systems designed to tell computers and humans apart by posing challenges presumably too hard for computers. We found that most HIPs are pure recognition tasks which can easily be broken using machine learning. The harder HIPs use a combination of segmentation and recognition tasks. From this observation, we found that building segmentation tasks is the most effective way to confuse machine learning algorithms. This has enabled us to build effective HIPs (which we deployed in MSN Passport), as well as design challenging segmentation tasks for machine learning algorithms. 1 In t rod u ct i on The OCR problem for high resolution printed text has virtually been solved 10 years ago [1]. On the other hand, cursive handwriting recognition today is still too poor for most people to rely on. Is there a fundamental difference between these two seemingly similar problems? To shed more light on this question, we study problems that have been designed to be difficult for computers. The hope is that we will get some insight on what the stumbling blocks are for machine learning and devise appropriate tests to further understand their similarities and differences. Work on distinguishing computers from humans traces back to the original Turing Test [2] which asks that a human distinguish between another human and a machine by asking questions of both. Recent interest has turned to developing systems that allow a computer to distinguish between another computer and a human. These systems enable the construction of automatic filters that can be used to prevent automated scripts from utilizing services intended for humans [4]. Such systems have been termed Human Interactive Proofs (HIPs) [3] or Completely Automated Public Turing Tests to Tell Computers and Humans Apart (CAPTCHAs) [4]. An overview of the work in this area can be found in [5]. Construction of HIPs that are of practical value is difficult because it is not sufficient to develop challenges at which humans are somewhat more successful than machines. This is because the cost of failure for an automatic attacker is minimal compared to the cost of failure for humans. Ideally a HIP should be solved by humans more than 80% of the time, while an automatic script with reasonable resource use should succeed less than 0.01% of the time. This latter ratio (1 in 10,000) is a function of the cost of an automatic trial divided by the cost of having a human perform the attack. This constraint of generating tasks that are failed 99.99% of the time by all automated algorithms has generated various solutions which can easily be sampled on the internet. Seven different HIPs, namely, Mailblocks, MSN (before April 28th, 2004), Ticketmaster, Yahoo, Yahoo v2 (after Sept’04), Register, and Google, will be given as examples in the next section. We will show in Section 3 that machinelearning-based attacks are far more successful than 1 in 10,000. Yet, some of these HIPs are harder than others and could be made even harder by identifying the recognition and segmentation parts, and emphasizing the latter. Section 4 presents examples of more difficult HIPs which are much more respectable challenges for machine learning, and yet surprisingly easy for humans. The final section discusses a (known) weakness of machine learning algorithms and suggests designing simple artificial datasets for studying this weakness. 2 Exa mp les o f H I Ps The HIPs explored in this paper are made of characters (or symbols) rendered to an image and presented to the user. Solving the HIP requires identifying all characters in the correct order. The following HIPs can be sampled from the web: Mailblocks: While signing up for free email service with (www.mailblocks.com), you will find HIP challenges of the type: mailblocks MSN: While signing up for free e-mail with MSN Hotmail (www.hotmail.com), you will find HIP challenges of the type: Register.com: While requesting a whois lookup for a domain at www.register.com, you will HIP challenges of the type: Yahoo!/EZ-Gimpy (CMU): While signing up for free e-mail service with Yahoo! (www.yahoo.com), you will receive HIP challenges of the type: Yahoo! (version 2): Starting in August 2004, Yahoo! introduced their second generation HIP. Three examples are presented below: Ticketmaster: While looking for concert tickets at www.ticketmaster.com, you will receive HIP challenges of the type: Google/Gmail: While signing up for free e-mail with Gmail at www.google.com, one will receive HIP challenges of the type: While solutions to Yahoo HIPs are common English words, those for ticketmaster and Google do not necessarily belong to the English dictionary. They appear to have been created using a phonetic generator [8]. 3 Usi n g ma ch i n e lea rn i n g t o b rea k H IP s Breaking HIPs is not new. Mori and Malik [7] have successfully broken the EZGimpy (92% success) and Gimpy (33% success) HIPs from CMU. Our approach aims at an automatic process for solving multiple HIPs with minimum human intervention, using machine learning. In this paper, our main goal is to learn more about the common strengths and weaknesses of these HIPs rather than to prove that we can break any one HIP in particular with the highest possible success rate. We have results for six different HIPs: EZ-Gimpy/Yahoo, Yahoo v2, mailblocks, register, ticketmaster, and Google. To simplify our study, we will not be using language models in our attempt to break HIPs. For example, there are only about 600 words in the EZ-Gimpy dictionary [7], which means that a random guess attack would get a success rate of 1 in 600 (more than enough to break the HIP, i.e., greater than 0.01% success). HIPs become harder when no language model is used. Similarly, when a HIP uses a language model to generate challenges, success rate of attacks can be significantly improved by incorporating the language model. Further, since the language model is not common to all HIPs studied, it was not used in this paper. Our generic method for breaking all of these HIPs is to write a custom algorithm to locate the characters, and then use machine learning for recognition. Surprisingly, segmentation, or finding the characters, is simple for many HIPs which makes the process of breaking the HIP particularly easy. Gimpy uses a single constant predictable color (black) for letters even though the background color changes. We quickly realized that once the segmentation problem is solved, solving the HIP becomes a pure recognition problem, and it can trivially be solved using machine learning. Our recognition engine is based on neural networks [6][9]. It yielded a 0.4% error rate on the MNIST database, uses little memory, and is very fast for recognition (important for breaking HIPs). For each HIP, we have a segmentation step, followed by a recognition step. It should be stressed that we are not trying to solve every HIP of a given type i.e., our goal is not 100% success rate, but something efficient that can achieve much better than 0.01%. In each of the following experiments, 2500 HIPs were hand labeled and used as follows (a) recognition (1600 for training, 200 for validation, and 200 for testing), and (b) segmentation (500 for testing segmentation). For each of the five HIPs, a convolution neural network, identical to the one described in [6], was trained and tested on gray level character images centered on the guessed character positions (see below). The trained neural network became the recognizer. 3.1 M a i l b l oc k s To solve the HIP, we select the red channel, binarize and erode it, extract the largest connected components (CCs), and breakup CCs that are too large into two or three adjacent CCs. Further, vertically overlapping half character size CCs are merged. The resulting rough segmentation works most of the time. Here is an example: For instance, in the example above, the NN would be trained, and tested on the following images: … The end-to-end success rate is 88.8% for segmentation, 95.9% for recognition (given correct segmentation), and (0.888)*(0.959)7 = 66.2% total. Note that most of the errors come from segmentation, even though this is where all the custom programming was invested. 3.2 Register The procedure to solve HIPs is very similar. The image was smoothed, binarized, and the largest 5 connected components were identified. Two examples are presented below: The end-to-end success rate is 95.4% for segmentation, 87.1% for recognition (given correct segmentation), and (0.954)*(0.871)5 = 47.8% total. 3.3 Y a h oo/ E Z - G i mp y Unlike the mailblocks and register HIPs, the Yahoo/EZ-Gimpy HIPs are richer in that a variety of backgrounds and clutter are possible. Though some amount of text warping is present, the text color, size, and font have low variability. Three simple segmentation algorithms were designed with associated rules to identify which algorithm to use. The goal was to keep these simple yet effective: a) No mesh: Convert to grayscale image, threshold to black and white, select large CCs with sizes close to HIP char sizes. One example: b) Black mesh: Convert to grayscale image, threshold to black and white, remove vertical and horizontal line pixels that don’t have neighboring pixels, select large CCs with sizes close to HIP char sizes. One example: c) White mesh: Convert to grayscale image, threshold to black and white, add black pixels (in white line locations) if there exist neighboring pixels, select large CCs with sizes close to HIP char sizes. One example: Tests for black and white meshes were performed to determine which segmentation algorithm to use. The end-to-end success rate was 56.2% for segmentation (38.2% came from a), 11.8% from b), and 6.2% from c), 90.3% for recognition (given correct segmentation), and (0.562)*(0.903)4.8 = 34.4% total. The average length of a Yahoo HIP solution is 4.8 characters. 3.4 T i c k e t ma s t e r The procedure that solved the Yahoo HIP is fairly successful at solving some of the ticket master HIPs. These HIPs are characterized by cris-crossing lines at random angles clustered around 0, 45, 90, and 135 degrees. A multipronged attack as in the Yahoo case (section 3.3) has potential. In the interests of simplicity, a single attack was developed: Convert to grayscale, threshold to black and white, up-sample image, dilate first then erode, select large CCs with sizes close to HIP char sizes. One example: The dilate-erode combination causes the lines to be removed (along with any thin objects) but retains solid thick characters. This single attack is successful in achieving an end-to-end success rate of 16.6% for segmentation, the recognition rate was 82.3% (in spite of interfering lines), and (0.166)*(0.823)6.23 = 4.9% total. The average HIP solution length is 6.23 characters. 3.5 Y a h oo ve r s i on 2 The second generation HIP from Yahoo had several changes: a) it did not use words from a dictionary or even use a phonetic generator, b) it uses only black and white colors, c) uses both letters and digits, and d) uses connected lines and arcs as clutter. The HIP is somewhat similar to the MSN/Passport HIP which does not use a dictionary, uses two colors, uses letters and digits, and background and foreground arcs as clutter. Unlike the MSN/Passport HIP, several different fonts are used. A single segmentation attack was developed: Remove 6 pixel border, up-sample, dilate first then erode, select large CCs with sizes close to HIP char sizes. The attack is practically identical to that used for the ticketmaster HIP with different preprocessing stages and slightly modified parameters. Two examples: This single attack is successful in achieving an end-to-end success rate of 58.4% for segmentation, the recognition rate was 95.2%, and (0.584)*(0.952)5 = 45.7% total. The average HIP solution length is 5 characters. 3.6 G oog l e / G M a i l The Google HIP is unique in that it uses only image warp as a means of distorting the characters. Similar to the MSN/Passport and Yahoo version 2 HIPs, it is also two color. The HIP characters are arranged closed to one another (they often touch) and follow a curved baseline. The following very simple attack was used to segment Google HIPs: Convert to grayscale, up-sample, threshold and separate connected components. a) b) This very simple attack gives an end-to-end success rate of 10.2% for segmentation, the recognition rate was 89.3%, giving (0.102)*(0.893)6.5 = 4.89% total probability of breaking a HIP. Average Google HIP solution length is 6.5 characters. This can be significantly improved upon by judicious use of dilate-erode attack. A direct application doesn’t do as well as it did on the ticketmaster and yahoo HIPs (because of the shear and warp of the baseline of the word). More successful and complicated attacks might estimate and counter the shear and warp of the baseline to achieve better success rates. 4 Lesso n s lea rn ed f ro m b rea ki n g H IPs From the previous section, it is clear that most of the errors come from incorrect segmentations, even though most of the development time is spent devising custom segmentation schemes. This observation raises the following questions: Why is segmentation a hard problem? Can we devise harder HIPs and datasets? Can we build an automatic segmentor? Can we compare classification algorithms based on how useful they are for segmentation? 4.1 T h e s e g me n t a t i on p r ob l e m As a review, segmentation is difficult for the following reasons: 1. Segmentation is computationally expensive. In order to find valid patterns, a recognizer must attempt recognition at many different candidate locations. 2. The segmentation function is complex. To segment successfully, the system must learn to identify which patterns are valid among the set of all possible valid and non-valid patterns. This task is intrinsically more difficult than classification because the space of input is considerably larger. Unlike the space of valid patterns, the space of non-valid patterns is typically too vast to sample. This is a problem for many learning algorithms which yield too many false positives when presented non-valid patterns. 3. Identifying valid characters among a set of valid and invalid candidates is a combinatorial problem. For example, correctly identifying which 8 characters among 20 candidates (assuming 12 false positives), has a 1 in 125,970 (20 choose 8) chances of success by random guessing. 4.2 B ui l d i n g b e t te r / h a r de r H I P s We can use what we have learned to build better HIPs. For instance the HIP below was designed to make segmentation difficult and a similar version has been deployed by MSN Passport for hotmail registrations (www.hotmail.com): The idea is that the additional arcs are themselves good candidates for false characters. The previous segmentation attacks would fail on this HIP. Furthermore, simple change of fonts, distortions, or arc types would require extensive work for the attacker to adjust to. We believe HIPs that emphasize the segmentation problem, such as the above example, are much stronger than the HIPs we examined in this paper, which rely on recognition being difficult. Pushing this to the extreme, we can easily generate the following HIPs: Despite the apparent difficulty of these HIPs, humans are surprisingly good at solving these, indicating that humans are far better than computers at segmentation. This approach of adding several competing false positives can in principle be used to automatically create difficult segmentation problems or benchmarks to test classification algorithms. 4.3 B ui l d i n g a n a ut o ma t i c s e g me n t or To build an automatic segmentor, we could use the following procedure. Label characters based on their correct position and train a recognizer. Apply the trained recognizer at all locations in the HIP image. Collect all candidate characters identified with high confidence by the recognizer. Compute the probability of each combination of candidates (going from left to right), and output the solution string with the highest probability. This is better illustrated with an example. Consider the following HIP (to the right). The trained neural network has these maps (warm colors indicate recognition) that show that K, Y, and so on are correctly identified. However, the maps for 7 and 9 show several false positives. In general, we would get the following color coded map for all the different candidates: HIP K Y B 7 9 With a threshold of 0.5 on the network’s outputs, the map obtained is: We note that there are several false positives for each true positive. The number of false positives per true positive character was found to be between 1 and 4, giving a 1 in C(16,8) = 12,870 to 1 in C(32,8) = 10,518,300 random chance of guessing the correct segmentation for the HIP characters. These numbers can be improved upon by constraining solution strings to flow sequentially from left to right and by restricting overlap. For each combination, we compute a probability by multiplying the 8 probabilities of the classifier for each position. The combination with the highest probability is the one proposed by the classifier. We do not have results for such an automatic segmentor at this time. It is interesting to note that with such a method a classifier that is robust to false positives would do far better than one that is not. This suggests another axis for comparing classifiers. 5 Con clu si on In this paper, we have successfully applied machine learning to the problem of solving HIPs. We have learned that decomposing the HIP problem into segmentation and recognition greatly simplifies analysis. Recognition on even unprocessed images (given segmentation is a solved) can be done automatically using neural networks. Segmentation, on the other hand, is the difficulty differentiator between weaker and stronger HIPs and requires custom intervention for each HIP. We have used this observation to design new HIPs and new tests for machine learning algorithms with the hope of improving them. A c k n ow l e d ge me n t s We would like to acknowledge Chau Luu and Eric Meltzer for their help with labeling and segmenting various HIPs. We would also like to acknowledge Josh Benaloh and Cem Paya for stimulating discussions on HIP security. References [1] Baird HS (1992), “Anatomy of a versatile page reader,” IEEE Pro., v.80, pp. 1059-1065. [2] Turing AM (1950), “Computing Machinery and Intelligence,” Mind, 59:236, pp. 433-460. [3] First Workshop on Human Interactive Proofs, Palo Alto, CA, January 2002. [4] Von Ahn L, Blum M, and Langford J, The Captcha Project. http://www.captcha.net [5] Baird HS and Popat K (2002) “Human Interactive Proofs and Document Image Analysis,” Proc. IAPR 2002 Workshop on Document Analysis Systerms, Princeton, NJ. [6] Simard PY, Steinkraus D, and Platt J, (2003) “Best Practice for Convolutional Neural Networks Applied to Visual Document Analysis,” in International Conference on Document Analysis and Recognition (ICDAR), pp. 958-962, IEEE Computer Society, Los Alamitos. [7] Mori G, Malik J (2003), “Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA,” Proc. of the Computer Vision and Pattern Recognition (CVPR) Conference, IEEE Computer Society, vol.1, pages:I-134 - I-141, June 18-20, 2003 [8] Chew, M. and Baird, H. S. (2003), “BaffleText: a Human Interactive Proof,” Proc., 10th IS&T;/SPIE Document Recognition & Retrieval Conf., Santa Clara, CA, Jan. 22. [9] LeCun Y, Bottou L, Bengio Y, and Haffner P, “Gradient-based learning applied to document recognition,’ Proceedings of the IEEE, Nov. 1998.

6 0.57632518 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

7 0.56482512 115 nips-2004-Maximum Margin Clustering

8 0.55511105 103 nips-2004-Limits of Spectral Clustering

9 0.52947545 165 nips-2004-Semi-supervised Learning on Directed Graphs

10 0.49564475 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

11 0.4460023 158 nips-2004-Sampling Methods for Unsupervised Learning

12 0.44453952 141 nips-2004-Optimal sub-graphical models

13 0.43905035 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem

14 0.43202731 177 nips-2004-Supervised Graph Inference

15 0.41568649 77 nips-2004-Hierarchical Clustering of a Mixture Model

16 0.41474783 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

17 0.41099373 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

18 0.40967804 130 nips-2004-Newscast EM

19 0.40966195 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images

20 0.40776768 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.109), (15, 0.174), (26, 0.043), (31, 0.037), (32, 0.014), (33, 0.185), (35, 0.023), (39, 0.019), (50, 0.041), (56, 0.012), (94, 0.016), (95, 0.219)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95925546 171 nips-2004-Solitaire: Man Versus Machine

Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy

Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1

2 0.8980276 58 nips-2004-Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid

Author: Felix Schürmann, Karlheinz Meier, Johannes Schemmel

Abstract: Computation without stable states is a computing paradigm different from Turing’s and has been demonstrated for various types of simulated neural networks. This publication transfers this to a hardware implemented neural network. Results of a software implementation are reproduced showing that the performance peaks when the network exhibits dynamics at the edge of chaos. The liquid computing approach seems well suited for operating analog computing devices such as the used VLSI neural network. 1

same-paper 3 0.86675197 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

Author: Massimiliano Pavan, Marcello Pelillo

Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1

4 0.78513545 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

5 0.78374451 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight. 1

6 0.78190517 131 nips-2004-Non-Local Manifold Tangent Learning

7 0.77605754 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

8 0.77518636 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

9 0.77460957 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

10 0.77226645 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

11 0.77204251 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

12 0.77136916 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

13 0.77136481 127 nips-2004-Neighbourhood Components Analysis

14 0.77093858 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

15 0.77012515 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

16 0.76979756 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

17 0.76958889 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

18 0.76933056 102 nips-2004-Learning first-order Markov models for control

19 0.76922601 168 nips-2004-Semigroup Kernels on Finite Sets

20 0.76817977 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data