nips nips2006 nips2006-117 knowledge-graph by maker-knowledge-mining

117 nips-2006-Learning on Graph with Laplacian Regularization

Source: pdf

Author: Rie K. Ando, Tong Zhang

Abstract: We consider a general form of transductive learning on graphs with Laplacian regularization, and derive margin-based generalization bounds using appropriate geometric properties of the graph. We use this analysis to obtain a better understanding of the role of normalization of the graph Laplacian matrix as well as the effect of dimension reduction. The results suggest a limitation of the standard degree-based normalization. We propose a remedy from our analysis and demonstrate empirically that the remedy leads to improved classiﬁcation performance.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We consider a general form of transductive learning on graphs with Laplacian regularization, and derive margin-based generalization bounds using appropriate geometric properties of the graph. [sent-14, score-0.183]

2 We use this analysis to obtain a better understanding of the role of normalization of the graph Laplacian matrix as well as the effect of dimension reduction. [sent-15, score-0.363]

3 It was proposed in [3] that one may ﬁrst project these data points into the eigenspace corresponding to the largest eigenvalues of a normalized adjacency matrix of the graph and then use the standard k-means method for clustering. [sent-19, score-0.194]

4 Although such normalization may signiﬁcantly affect the performance, the issue has not been studied from the learning theory perspective. [sent-27, score-0.152]

5 The relationship of kernel design and graph learning was investigated in [6], which argued that quadratic regularization-based graph learning can be regarded as kernel design. [sent-28, score-0.286]

6 The goal of this paper is to provide some learning theoretical insight into the role of normalization of the graph Laplacian matrix (D − W). [sent-30, score-0.285]

7 We ﬁrst present a model for transductive learning on graphs and develop a margin analysis for multi-class graph learning. [sent-31, score-0.222]

8 Based on this, we analyze the performance of Laplacian regularization-based graph learning in relation to graph properties. [sent-32, score-0.201]

9 We use this analysis to obtain a better understanding of the role of normalization of the graph Laplacian matrix as well as dimension reduction in graph learning. [sent-33, score-0.569]

10 The results indicate a limitation of the commonly practiced degree-based normalization mentioned above. [sent-34, score-0.245]

11 We propose a learning theoretical remedy based on our analysis and use experiments to demonstrate that the remedy leads to improved classiﬁcation performance. [sent-35, score-0.17]

12 Assume that each node vj is associated with an output value yj ∈ Y, which we are interested in predicting. [sent-41, score-0.355]

13 We encode the label yj into a vector in RK , so that the problem becomes that of generating an estimation vector fj,· = [fj,1 , . [sent-48, score-0.329]

14 , fj,K ] ∈ RK , which can then be used to recover the label yj . [sent-51, score-0.329]

15 , K}, we encode each yj = k ∈ Y as ek ∈ RK , where ek is a vector of zero entries except for the k-th entry being one. [sent-55, score-0.346]

16 , fj,K ] ∈ RK (which is intended to approximate eyj ), we decode the corresponding label estimation yj as: yj = arg maxk {fj,k : k = 1, . [sent-59, score-0.633]

17 If the true label is ˆ ˆ yj , then the classiﬁcation error is err(fj,· , yj ) = I(ˆj = yj ), where we use I(·) to denote the set y indicator function. [sent-63, score-0.937]

18 In order to estimate f = [fj,k ] ∈ RmK from only a subset of labeled nodes, we consider for a K T given kernel matrix K ∈ Rm , the quadratic regularization f T QK f = k=1 f·,k K−1 f·,k , where f·,k = [f1,k , . [sent-64, score-0.176]

19 We will consider the kernel matrix induced by the graph Laplacian, to be introduced later in the paper. [sent-69, score-0.174]

20 , fj,K ] ∈ RK is measured by a loss function φ(fj,· , yj ). [sent-74, score-0.329]

21 Our learning method attempts to minimize the empirical risk on the set Zn of n labeled training nodes, subject to f T QK f being small:   1 ˆ f (Zn ) = arg min  φ(fj,· , yj ) + λf T QK f  . [sent-75, score-0.358]

22 In this paper, we focus on a special class of loss function that is of the form φ(fj,· , yj ) = K k=1 φ0 (fj,k , δk,yj ), where δa,b is the delta function deﬁned as: δa,b = 1 when a = b and δa,b = 0 otherwise. [sent-77, score-0.329]

23 Theorem 1 Let φ(fj,· , yj ) = K φ0 (fj,k , δk,yj ) in (1). [sent-79, score-0.304]

24 Then ∀p > 0, the expected generalization error of the learning method (1) over the random training samples Zn can be bounded by: EZn 1 m−n ˆ err(fj,· (Zn ), yj ) ≤ ¯ j∈Zn 1 a inf f ∈RmK ¯ where Zn = {1, . [sent-81, score-0.386]

25 , m} − Zn , trp (K) = 1 m 1 m m φ0 (fj,· , yj ) + λf T QK f + j=1 m j=1 Kp j,j 1/p btrp (K) λnc p , , and Kj,j is the (j, j)-entry of K. [sent-84, score-0.406]

26 We have err(fj,· (Zn ), yj ) ≤ 1 ˆ (Zn+1 ), yj ) + b Kj,j p . [sent-109, score-0.608]

27 3 Laplacian regularization Consider an undirected graph G = (V, E) deﬁned on the nodes V = {vj : j = 1, . [sent-119, score-0.222]

28 Let degj (G) = j ′ =1 wj,j ′ / / be the degree of node j of graph G. [sent-130, score-0.241]

29 Deﬁnition 1 Consider a graph G = (V, E) of m nodes with weights wj,j ′ (j, j ′ = 1, . [sent-132, score-0.167]

30 The unnormalized Laplacian matrix L(G) ∈ Rm×m is deﬁned as: Lj,j ′ (G) = −wj,j ′ if j = j ′ ; degj (G) otherwise. [sent-136, score-0.313]

31 The corresponding T regularization is based on: f·,k LS (G)f·,k = 1 2 m j,j ′ =1 wj,j ′ fj fj,k √ − √ ′ ,k S Sj 2 . [sent-142, score-0.321]

32 j′ A common choice of S is S = I, corresponding to regularizing with the unnormalized Laplacian L. [sent-143, score-0.17]

33 The idea is natural: we assume that the predictive values fj,k and fj ′ ,k should be close when (j, j ′ ) ∈ E with a strong link. [sent-144, score-0.266]

34 ,m on V , we deﬁne the cut for LS in Deﬁnition 1 as: cut(LS , y) = j,j ′ :y j =yj ′ wj,j′ 2 1 Sj + 1 Sj ′ + j,j ′ :y j =yj ′ wj,j′ 2 2 √1 − √1 S Sj . [sent-151, score-0.175]

35 j′ Unlike typical graph-theoretical deﬁnitions of graph-cut, this learning theoretical deﬁnition of graphcut penalizes not only between-class edge weights but also within-class edge weights when such an edge connects two nodes with different scaling factors. [sent-152, score-0.242]

36 This penalization is intuitive if we look at the regularizer in Deﬁnition (1), which encourages fj,k / Sj to be similar to fj ′ ,k / Sj ′ when wj,j ′ is large. [sent-153, score-0.318]

37 If j and j ′ belongs to the same class, we want fj,k to be similar to fj ′ ,k . [sent-154, score-0.266]

38 It can be easily veriﬁed that j=1 φ(fj,· , yj )/m + λf T QK f = λ(αs + cut(LS , y)). [sent-169, score-0.304]

39 Deﬁnition 3 A subgraph G0 = (V0 , E0 ) of G = (V, E) is a pure component if G0 is connected, E0 is induced by restricting E on V0 , and if labels y have identical values on V0 . [sent-181, score-0.249]

40 A pure subgraph G′ = ∪q Gℓ of G divides V into q disjoint sets V = ∪q Vℓ such that each subgraph Gℓ = (Vℓ , Eℓ ) is ℓ=1 ℓ=1 a pure component. [sent-182, score-0.382]

41 If we remove all edges of G that connect nodes with different labels, then the resulting subgraph is a pure subgraph (but not the only one). [sent-184, score-0.366]

42 For each pure component Gℓ , its ﬁrst eigenvalue λ1 (Gℓ ) is always zero. [sent-185, score-0.162]

43 Theorem 3 Let the assumptions of Theorem 2 hold, and G′ = ∪q Gℓ (Gℓ = (Vℓ , Eℓ )) be a pure ℓ=1 subgraph of G. [sent-187, score-0.191]

44 2 To put this into perspective, suppose that we use unnormalized Laplacian regularizer on a zero-cut graph. [sent-192, score-0.192]

45 Then S = I and cut(LS , y) = 0, and by letting p = 1 and p → ∞ in Theorem 3, we have: EZn ¯ j∈Zn ˆ err(fj,· , yj ) ≤2 m−n b q · ac n and EZn ¯ j∈Zn ˆ err(fj,· , yj ) b m . [sent-193, score-0.662]

46 1 Near zero-cut optimum scaling factors The above observation motivates a scaling matrix S so that it compensates for the unbalanced pure component sizes. [sent-198, score-0.355]

47 Here we focus on the case that scaling factors are constant within each pure component (Sj = sℓ when j ∈ Vℓ ) in order to derive optimum scaling factors. [sent-200, score-0.312]

48 Hence, if cut(G′ , y) is small, then we should choose sℓ ∝ mℓ ¯ for each pure component ℓ so that the generalization performance is approximately (ac)−1 b · q/n. [sent-203, score-0.212]

49 The commonly practiced degree-based normalization method Sj = degj (G) provides such good normalization factors under a simpliﬁed “box model” used in early studies e. [sent-205, score-0.528]

50 In this model, each node connects to itself and all other nodes of the same pure component with edge weight wj,j ′ = 1. [sent-208, score-0.296]

51 The degree is thus degj (Gℓ ) = |Vℓ | = mℓ , which gives the optimal scaling in our analysis. [sent-209, score-0.167]

52 For such a model, the degree-based normalization can fail because the degj (Gℓ ) within each pure component Gℓ is not approximately constant (thus raising cut(LS , y)), and it may not be proportional to mℓ . [sent-212, score-0.41]

53 We call this method of normalization (Sj = 1/Kj,j , K = (αS−1 + LS )−1 ) K-scaling in this paper as it scales the kernel matrix K so that each Kj,j = 1. [sent-218, score-0.219]

54 By contrast, we call the standard degree-based normalization (Sj = degj (G), K = (αI + LS )−1 ) L-scaling as it scales diagonals of LS to 1. [sent-219, score-0.309]

55 In fact, no one has proposed this normalization method in the graph learning setting before this work. [sent-221, score-0.241]

56 4 Dimension Reduction Normalization and dimension reduction have been commonly used in spectral clustering such as [3, 4]. [sent-223, score-0.266]

57 For semi-supervised learning, dimension reduction (without normalization) is known to improve performance [1, 6] while normalization (without dimension reduction) has also been explored [7]. [sent-224, score-0.488]

58 An appropriate combination of normalization and dimension reduction can further improve performance. [sent-225, score-0.367]

59 We shall ﬁrst introduce dimension reduction with normalized Laplacian LS (G). [sent-226, score-0.262]

60 The beneﬁt of dimension reduction in graph learning has been investigated in [6], under the spectral kernel design framework. [sent-232, score-0.397]

61 Note that the normalization issue, which will change the eigenvectors and their ordering, wasn’t investigated there. [sent-233, score-0.174]

62 S Theorem 4 Let G′ = ∪q Gℓ (Gℓ = (Vℓ , Eℓ )) be a pure subgraph of G. [sent-236, score-0.191]

63 For simplicity, we only consider least squares loss φ(fj,· , yj ) = K (fj,k − δk,yj )2 in (1) using regularization (4) and K0 = I. [sent-244, score-0.412]

64 With k=1 m 1 ¯ p = 1, we have m j=1 φ(fj,· , yj ) ≤ δr (S)2 + λm. [sent-245, score-0.304]

65 5, we have EZn m−n j∈Zn err(fj,· , yj ) ≤ 16(δr (S)2 +λm)+ λnm . [sent-249, score-0.304]

66 By optimizing ¯ over λ, we obtain ˆ err(fj,· , yj ) EZn ≤ 16δr (S)2 + 32 r/n. [sent-250, score-0.304]

67 Compared to Theorem 3, the advantage of dimension reduction in (5) is that the quantity cut(LS , y) is replaced by LS (G) − LS (G′ ) 2 , which is typically much smaller. [sent-253, score-0.215]

68 The regularization parameter λ is chosen by cross validation on the n training labeled examples. [sent-263, score-0.235]

69 We will show performance either when the rest of the parameters (α and dimensionality r) are also chosen by cross validation or when they are set to the optimum (oracle performance). [sent-264, score-0.234]

70 Controlled data experiments The purpose of the controlled data experiments is to observe the correlation of the effectiveness of the normalization methods with graph properties. [sent-269, score-0.241]

71 We show the results when dimension reduction is applied Accuracy (%) Accuracy (%) 100 80 60 40 graph1 Unnormalized graph2 L-scaling graph3 K-scaling 100 90 80 70 60 graph6 graph7 graph8 graph9 graph10 Unnormalized (a) Nearly-constant degrees. [sent-271, score-0.215]

72 With dimension reduction (dim ≤ 20; chosen by cross validation). [sent-276, score-0.289]

73 Figure 1 (a) shows classiﬁcation accuracy on three graphs that were generated so that the node degrees (of either correct edges or erroneous edges) are close to constant within each class but vary across classes. [sent-279, score-0.203]

74 On these graphs, both K-scaling and L-scaling signiﬁcantly improve classiﬁcation accuracy over the unnormalized baseline. [sent-280, score-0.196]

75 Each class consists of core nodes and satellite nodes. [sent-284, score-0.17]

76 Satellite nodes are relatively weakly connected to core nodes of the same class. [sent-286, score-0.231]

77 They are also connected to some other classes’ satellite nodes (i. [sent-287, score-0.168]

78 For graphs generated in this manner, degrees vary within the same class since the satellite nodes have smaller degrees than the core nodes. [sent-291, score-0.285]

79 In particular, K-scaling does well even when L-scaling rather underperforms the unnormalized baseline. [sent-294, score-0.204]

80 As our baseline, we test the supervised conﬁguration by letting W + βI be the kernel matrix and using the same least squares loss function, where we use the oracle β which is optimal. [sent-306, score-0.167]

81 ) 45 accuracy (%) 75 65 45 35 35 L-scaling (w /o dim redu. [sent-311, score-0.188]

82 ) 10 30 50 # of labeled ex amples 10 30 50 # of labeled ex amples K-scaling (w / dim redu. [sent-313, score-0.444]

83 ) 10 50 90 10 50 90 # of labeled ex amples # of labeled ex . [sent-314, score-0.231]

84 (a-1) MNIST, dim and α determined by cross validation. [sent-316, score-0.236]

85 (b-1) RCV1, dim and α determined by cross validation. [sent-318, score-0.236]

86 K-scaling outperforms L-scaling, and L-scaling outperforms the unnormalized Laplacian. [sent-320, score-0.17]

87 The performance of the unnormalized Laplacian (with dimension reduction) is roughly consistent with the performance with similar (m, n) with heuristic dimension selection in [1]. [sent-323, score-0.412]

88 Without dimension reduction, L-scaling and K-scaling still improve performance over the unnormalized Laplacian. [sent-324, score-0.291]

89 In Figure 2 (a-1), the unnormalized Laplacian with dimension reduction underperforms the unnormalized Laplacian without dimension reduction, indicating that dimension reduction rather degrades performance. [sent-326, score-0.902]

90 By comparing Figure 2 (a-1) and (a-2), we observe that this seemingly counterintuitive performance trend is caused by the difﬁculty of choosing the right dimensionality by cross validation. [sent-327, score-0.157]

91 As observed, if the optimal dimensionality is known (as in (a-2)), dimension reduction improves performance either with or without normalization by K-scaling and L-scaling, and all transductive conﬁgurations outperform the supervised baseline. [sent-329, score-0.483]

92 We also note that the comparison of Figure 2 (a-1) and (a-2) shows that choosing good dimensionality by cross validation is much harder than choosing α by cross validation, especially when the number of labeled examples is small. [sent-330, score-0.293]

93 In the setting of Figure 2 (b-1) where the dimensionality and α were determined by cross validation, K-scaling with dimension reduction generally performs the best. [sent-334, score-0.328]

94 By setting the dimensionality and α to the optimum, the beneﬁt of K-scaling with dimension reduction is even clearer (Figure 2 (b-2)). [sent-335, score-0.254]

95 Its performance differences from the second and third best ‘L-scaling (w/ dim redu. [sent-336, score-0.185]

96 In our experiments, K-scaling with dimension reduction consistently outperformed others. [sent-340, score-0.215]

97 On real data, cut is not near-zero, and the effect of normalization is unclear (Section 3. [sent-343, score-0.327]

98 In particular, we explained the importance of Laplacian normalization and dimension reduction for graph learning. [sent-346, score-0.456]

99 We argued that the standard L-scaling normalization method has the undesirable property that the normalization factors can vary signiﬁcantly within a pure component. [sent-347, score-0.451]

100 An alternate normalization method, which we call K-scaling, is proposed to remedy the problem. [sent-348, score-0.227]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zn', 0.471), ('ls', 0.399), ('yj', 0.304), ('fj', 0.266), ('cut', 0.175), ('unnormalized', 0.17), ('dim', 0.162), ('sj', 0.155), ('fin', 0.153), ('normalization', 0.152), ('err', 0.145), ('ezn', 0.136), ('laplacian', 0.135), ('degj', 0.119), ('reduction', 0.117), ('pure', 0.116), ('rmk', 0.102), ('trp', 0.102), ('dimension', 0.098), ('graph', 0.089), ('qk', 0.083), ('graphs', 0.079), ('nodes', 0.078), ('subgraph', 0.075), ('remedy', 0.075), ('cross', 0.074), ('mnist', 0.063), ('satellite', 0.063), ('theorem', 0.059), ('regularization', 0.055), ('fi', 0.055), ('ac', 0.054), ('transductive', 0.054), ('labeled', 0.054), ('validation', 0.052), ('amples', 0.051), ('practiced', 0.051), ('yin', 0.051), ('generalization', 0.05), ('scaling', 0.048), ('oracle', 0.047), ('nition', 0.047), ('optimum', 0.046), ('cp', 0.044), ('ez', 0.043), ('kernel', 0.043), ('rk', 0.041), ('eigenspace', 0.04), ('dimensionality', 0.039), ('diagonals', 0.038), ('ex', 0.036), ('bkin', 0.034), ('underperforms', 0.034), ('node', 0.033), ('inf', 0.032), ('factors', 0.031), ('penalization', 0.03), ('core', 0.029), ('pr', 0.028), ('squares', 0.028), ('spectral', 0.028), ('connected', 0.027), ('accuracy', 0.026), ('edge', 0.025), ('shall', 0.025), ('label', 0.025), ('erroneous', 0.025), ('loss', 0.025), ('matrix', 0.024), ('articles', 0.024), ('component', 0.023), ('commonly', 0.023), ('rm', 0.023), ('sup', 0.023), ('performance', 0.023), ('eigenvalue', 0.023), ('regularizer', 0.022), ('normalized', 0.022), ('edges', 0.022), ('investigated', 0.022), ('bound', 0.021), ('trend', 0.021), ('connects', 0.021), ('ek', 0.021), ('classes', 0.02), ('suggests', 0.02), ('rbf', 0.02), ('classi', 0.02), ('theoretical', 0.02), ('weakly', 0.019), ('motivates', 0.019), ('bold', 0.019), ('behaves', 0.019), ('limitation', 0.019), ('adjacency', 0.019), ('induced', 0.018), ('degrees', 0.018), ('vj', 0.018), ('labels', 0.017), ('reduced', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 117 nips-2006-Learning on Graph with Laplacian Regularization

Author: Rie K. Ando, Tong Zhang

2 0.17103271 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

Author: Samuel S. Gross, Olga Russakovsky, Chuong B. Do, Serafim Batzoglou

Abstract: We consider the problem of training a conditional random ﬁeld (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk minimization. We give a gradient-based procedure for minimizing an arbitrarily accurate approximation of the empirical risk under a Hamming loss function. In experiments with both simulated and real data, our optimization procedure gives signiﬁcantly better testing performance than several current approaches for CRF training, especially in situations of high label noise. 1

3 0.14642324 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures

Author: Kenichi Kurihara, Max Welling, Nikos A. Vlassis

Abstract: Dirichlet Process (DP) mixture models are promising candidates for clustering applications where the number of clusters is unknown a priori. Due to computational considerations these models are unfortunately unsuitable for large scale data-mining applications. We propose a class of deterministic accelerated DP mixture models that can routinely handle millions of data-cases. The speedup is achieved by incorporating kd-trees into a variational Bayesian algorithm for DP mixtures in the stick-breaking representation, similar to that of Blei and Jordan (2005). Our algorithm differs in the use of kd-trees and in the way we handle truncation: we only assume that the variational distributions are ﬁxed at their priors after a certain level. Experiments show that speedups relative to the standard variational algorithm can be signiﬁcant. 1

4 0.12645674 159 nips-2006-Parameter Expanded Variational Bayesian Methods

Author: Tommi S. Jaakkola, Yuan Qi

Abstract: Bayesian inference has become increasingly important in statistical machine learning. Exact Bayesian calculations are often not feasible in practice, however. A number of approximate Bayesian methods have been proposed to make such calculations practical, among them the variational Bayesian (VB) approach. The VB approach, while useful, can nevertheless suffer from slow convergence to the approximate solution. To address this problem, we propose Parameter-eXpanded Variational Bayesian (PX-VB) methods to speed up VB. The new algorithm is inspired by parameter-expanded expectation maximization (PX-EM) and parameterexpanded data augmentation (PX-DA). Similar to PX-EM and -DA, PX-VB expands a model with auxiliary variables to reduce the coupling between variables in the original model. We analyze the convergence rates of VB and PX-VB and demonstrate the superior convergence rates of PX-VB in variational probit regression and automatic relevance determination. 1

5 0.1239104 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi

Abstract: One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary. keywords: Clustering, Semi-Supervised Learning 1

6 0.1190565 116 nips-2006-Learning from Multiple Sources

7 0.11097538 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

8 0.10137653 80 nips-2006-Fundamental Limitations of Spectral Clustering

9 0.095564157 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

10 0.084932573 65 nips-2006-Denoising and Dimension Reduction in Feature Space

11 0.084193729 163 nips-2006-Prediction on a Graph with a Perceptron

12 0.083292291 39 nips-2006-Balanced Graph Matching

13 0.077867106 7 nips-2006-A Local Learning Approach for Clustering

14 0.074998423 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

15 0.071328931 150 nips-2006-On Transductive Regression

16 0.071047366 128 nips-2006-Manifold Denoising

17 0.070937589 181 nips-2006-Stability of $K$-Means Clustering

18 0.069147885 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

19 0.06870465 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

20 0.066834241 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.202), (1, 0.109), (2, -0.05), (3, 0.123), (4, 0.035), (5, 0.064), (6, 0.033), (7, 0.085), (8, -0.023), (9, -0.056), (10, 0.049), (11, -0.09), (12, -0.128), (13, -0.015), (14, -0.077), (15, 0.123), (16, 0.047), (17, -0.021), (18, 0.068), (19, -0.198), (20, 0.022), (21, 0.015), (22, -0.19), (23, -0.147), (24, -0.102), (25, -0.039), (26, 0.037), (27, -0.043), (28, 0.077), (29, -0.091), (30, -0.174), (31, 0.002), (32, -0.044), (33, -0.088), (34, 0.047), (35, -0.09), (36, -0.147), (37, 0.031), (38, -0.074), (39, 0.129), (40, 0.086), (41, -0.004), (42, 0.013), (43, 0.07), (44, -0.124), (45, 0.025), (46, -0.002), (47, 0.09), (48, -0.014), (49, -0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95122463 117 nips-2006-Learning on Graph with Laplacian Regularization

Author: Rie K. Ando, Tong Zhang

2 0.60326391 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

Author: Samuel S. Gross, Olga Russakovsky, Chuong B. Do, Serafim Batzoglou

3 0.53553057 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures

Author: Kenichi Kurihara, Max Welling, Nikos A. Vlassis

4 0.48645365 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi

5 0.41769764 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

Author: Dengyong Zhou, Jiayuan Huang, Bernhard Schölkopf

Abstract: We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classiﬁcation on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs. 1

6 0.41517365 159 nips-2006-Parameter Expanded Variational Bayesian Methods

7 0.39945969 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow

8 0.37858325 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

9 0.37350509 181 nips-2006-Stability of $K$-Means Clustering

10 0.37092388 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

11 0.36088648 116 nips-2006-Learning from Multiple Sources

12 0.35861897 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

13 0.32088396 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields

14 0.31727555 39 nips-2006-Balanced Graph Matching

15 0.31455937 60 nips-2006-Convergence of Laplacian Eigenmaps

16 0.29471508 80 nips-2006-Fundamental Limitations of Spectral Clustering

17 0.29298043 163 nips-2006-Prediction on a Graph with a Perceptron

18 0.2927697 194 nips-2006-Towards a general independent subspace analysis

19 0.26953027 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields

20 0.26905894 150 nips-2006-On Transductive Regression

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.105), (3, 0.025), (7, 0.112), (9, 0.069), (20, 0.013), (22, 0.055), (44, 0.105), (57, 0.073), (65, 0.077), (69, 0.022), (84, 0.24)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89988786 18 nips-2006-A selective attention multi--chip system with dynamic synapses and spiking neurons

Author: Chiara Bartolozzi, Giacomo Indiveri

Abstract: Selective attention is the strategy used by biological sensory systems to solve the problem of limited parallel processing capacity: salient subregions of the input stimuli are serially processed, while non–salient regions are suppressed. We present an mixed mode analog/digital Very Large Scale Integration implementation of a building block for a multi–chip neuromorphic hardware model of selective attention. We describe the chip’s architecture and its behavior, when its is part of a multi–chip system with a spiking retina as input, and show how it can be used to implement in real-time ﬂexible models of bottom-up attention. 1

2 0.8487159 2 nips-2006-A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

Author: Yee W. Teh, David Newman, Max Welling

Abstract: Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision. Due to the large scale nature of these applications, current inference procedures like variational Bayes and Gibbs sampling have been found lacking. In this paper we propose the collapsed variational Bayesian inference algorithm for LDA, and show that it is computationally efﬁcient, easy to implement and signiﬁcantly more accurate than standard variational Bayesian inference for LDA.

same-paper 3 0.81499761 117 nips-2006-Learning on Graph with Laplacian Regularization

Author: Rie K. Ando, Tong Zhang

4 0.66996109 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the ﬁrst few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these ﬁndings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure ﬁnds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

5 0.66969538 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: We show that the relevant information about a classiﬁcation problem in feature space is contained up to negligible error in a ﬁnite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efﬁcient implicit representations of the data to perform classiﬁcation. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classiﬁcation. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classiﬁcation results. 1

6 0.66414094 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

7 0.66024113 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

8 0.65858066 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

9 0.65852755 20 nips-2006-Active learning for misspecified generalized linear models

10 0.65743291 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

11 0.65721834 175 nips-2006-Simplifying Mixture Models through Function Approximation

12 0.65659881 109 nips-2006-Learnability and the doubling dimension

13 0.65452534 35 nips-2006-Approximate inference using planar graph decomposition

14 0.65428424 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions

15 0.65225679 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics

16 0.65223092 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

17 0.65175951 21 nips-2006-AdaBoost is Consistent

18 0.65152061 163 nips-2006-Prediction on a Graph with a Perceptron

19 0.65055454 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

20 0.64875698 37 nips-2006-Attribute-efficient learning of decision lists and linear threshold functions under unconcentrated distributions