nips nips2013 nips2013-35 knowledge-graph by maker-knowledge-mining

35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

Source: pdf

Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

Abstract: We ﬁnd that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss deﬁned over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze ﬁve popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justiﬁcations and guidelines for their practical use. Simulations on synthetic and real datasets conﬁrm the potential of the proposed theory and tool.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We ﬁnd that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. [sent-5, score-0.643]

2 Understanding of such structure and analysis of the loss deﬁned over such structure help reveal important properties of the target function over a graph. [sent-6, score-0.165]

3 In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. [sent-7, score-0.902]

4 We use this to develop an analytical tool and analyze ﬁve popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. [sent-8, score-0.863]

5 1 Introduction Various graph-based models, regardless of application, aim to learn a target function on graphs that well respects the graph topology. [sent-11, score-0.255]

6 Whether these models can capture the graph structure faithfully, or whether their target functions possess desirable properties over the graph, remain unclear. [sent-13, score-0.22]

7 [16] showed that the target functions of Laplacian regularized methods become ﬂat as the number of unlabeled points increases, but they also observed that a good classiﬁcation can still be obtained if an appropriate threshold is used. [sent-17, score-0.19]

8 [22] proved that commute and hitting times are dominated by the local structures in large graphs, ignoring the global patterns. [sent-20, score-0.402]

9 Interestingly, despite this ﬁnding, the pseudo-inverse of graph Laplacian, known as the kernel matrix of commute times, consistently performs superior in collaborative ﬁltering [10]. [sent-22, score-0.25]

10 In spectral clustering, the eigenvectors of the normalized graph Laplacian are more desired than those of the un-normalized one [20, 21]. [sent-23, score-0.27]

11 Also for the recently proposed partially absorbing random walks [23], certain setting of absorption rates seems better than others. [sent-24, score-0.684]

12 1 Our starting point is the discovery of a common structure hidden in the target functions of various graph models. [sent-26, score-0.247]

13 We call this structure the harmonic structure for its resemblance to the harmonic function [9, 26]. [sent-28, score-0.956]

14 It naturally arises from the ﬁrst step analysis of random walk models, and, as will be shown in this paper, implicitly exists in other methods such as pseudo-inverse of the graph Laplacian and eigenvectors of the Laplacian matrices. [sent-29, score-0.25]

15 The target functions of these models are characterized by their harmonic loss, a quantitative notion introduced in this paper to measure the discrepancy of a target function f on cuts of graphs. [sent-30, score-0.652]

16 The variations of f across cuts can then be upper and lower bounded by the ratio of its harmonic loss and the cut cost. [sent-31, score-0.694]

17 As long as the harmonic loss varies slowly, the graph conductance dominates the variations of f – it will remain smooth in a dense area but vary sharply otherwise. [sent-32, score-0.715]

18 We use this tool to study ﬁve popular models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. [sent-36, score-0.863]

19 The key message conveyed in our results is that various existing models enjoying the harmonic structure are actually capable of capturing the global graph topology, and understanding of this structure can guide us in applying them properly. [sent-38, score-0.65]

20 Denote by G = (V, W ) a graph with n vertices V and a symmetric non-negative afﬁnity matrix W = [wij ] ∈ Rn×n (wii = 0). [sent-41, score-0.205]

21 Denote by di = j wij the degree of vertex i, by D = diag(d1 , d2 , . [sent-42, score-0.29]

22 The conductance of a subset S ⊂ V of vertices is deﬁned as Φ(S) = min(d(S),d(S)) , where ¯ ¯ and d(S) = ¯ = w(S, S) ¯ i∈S di is i∈S,j∈S wij is the cut cost between S and its complement S, the volume of S. [sent-46, score-0.457]

23 The harmonic loss of f : V → R on any S ⊆ V is deﬁned as:     wij di f (i) − Lf (S) := di f (i) − f (j) = wij f (j) . [sent-50, score-0.706]

24 However, as we shall see below, it is always non-negative on superlevel sets. [sent-53, score-0.22]

25 The following lemma shows that the harmonic loss couples the cut cost and the discrepancy of the function across the cut. [sent-54, score-0.676]

26 In practice, to examine the variation of f on a graph, one does not necessarily examine on every subset of vertices, which will be exponential in the number of vertices. [sent-60, score-0.172]

27 Instead, it sufﬁces to consider its variation on the superlevel sets deﬁned as follows. [sent-61, score-0.306]

28 For any function f : V → R on a graph and a scalar c ∈ R, the set {i | f (i) ≥ c} is called a superlevel set of f with level c. [sent-64, score-0.347]

29 , i} is the superlevel set with level f (i) if f (i) > f (i + 1). [sent-73, score-0.22]

30 For convenience, we still call Si a superlevel set of f even if f (i) = f (i + 1). [sent-74, score-0.22]

31 In this paper, we will mainly examine the variation of f on its n superlevel sets S1 , . [sent-75, score-0.349]

32 Our ﬁrst observation is that the harmonic loss on each superlevel set is non-negative, stated as follows. [sent-79, score-0.699]

33 2 Based on the notion of superlevel sets, it becomes legitimate to talk about the continuity of a function on graphs, which we formally deﬁne as follows. [sent-86, score-0.251]

34 First, for any function f on a graph, as long as its harmonic loss Lf (Si ) varies slowly on the superlevel sets, i. [sent-123, score-0.734]

35 , f is harmonic almost everywhere, the graph conductance Φ(Si ) will dominate the variation of f . [sent-125, score-0.715]

36 8, a big gap exists across the cut if Φ(Si ) is small (see Sec. [sent-128, score-0.173]

37 Second, the continuity (either left, right, or both) of f ensures that its variations conform with the graph connectivity, i. [sent-131, score-0.178]

38 For each model, we show its target function in harmonic forms, quantify its harmonic loss, analyze its dropping bounds, and provide corrections or justiﬁcations for its use. [sent-138, score-0.964]

39 Here we show that Laplacian regularization can actually capture the global graph structure and a simple normalization scheme would resolve the raised issue. [sent-143, score-0.234]

40 Denote by f : V → R the absorption probability vector from every point to the positive labeled point. [sent-145, score-0.384]

41 Assume the vertices are sorted such that 1 = f (1) > f (2) ≥ · · · ≥ f (n − 1) > f (n) = 0 (vertex 1 is labeled positive and vertex n is labeled negative). [sent-146, score-0.413]

42 (4) di k∼i Our ﬁrst observation is that the harmonic loss of f is constant w. [sent-151, score-0.606]

43 Since the harmonic loss of f is a constant on the superlevel sets Si (Corollary 3. [sent-172, score-0.699]

44 8, the variation of f depends solely on the cut value w(Si , Si ), which indicates that it will drop slowly when the cut is dense but drastically when the cut is sparse. [sent-175, score-0.604]

45 1, where the graph consists of 6 points in 2 classes denoted by different colors, with 3 points in each. [sent-180, score-0.203]

46 The absorption probabilities from all the vertices to vertex 1 are computed and shown. [sent-184, score-0.515]

47 We can see that since the ¯ cut w(S2 , S2 ) = 2 is quite dense, the drop between f (2) and f (3) is upper bounded by a small number (Theorem 2. [sent-185, score-0.185]

48 Now let f1 and f2 denote the absorption probability vectors to the two labeled points respectively. [sent-192, score-0.422]

49 The magenta triangle and the green circle denote labeled data. [sent-217, score-0.173]

50 The absorption probabilities to both labeled points are shown in Fig. [sent-225, score-0.453]

51 The green vector is well above the the magenta vector, indicating that every unlabeled point has larger absorption probability to the green labeled point. [sent-227, score-0.591]

52 Comparing them classiﬁes all the unlabeled points to the green Gaussian (Fig. [sent-228, score-0.163]

53 Since the green labeled point has larger degree than the magenta one1 , this result is expected from the analysis in [1]. [sent-230, score-0.173]

54 2 Partially Absorbing Random Walks Here we revisit the recently proposed partially absorbing random walks (PARW) [23], which generalizes absorbing random walks by allowing partial absorption at each state. [sent-249, score-1.042]

55 The absorption rate pii αλi at state i is deﬁned as pii = αλi +di , where α > 0, λi > 0 are regularization parameters. [sent-250, score-0.446]

56 Given current state i, a PARW in the next step will get absorbed at i with probability pii and with probability w (1 − pii ) × dij moves to state j. [sent-251, score-0.191]

57 Let aij be the probability that a PARW starting from state i gets i absorbed at state j within ﬁnite steps, and denote by A = [aij ] ∈ Rn×n the absorption probability matrix. [sent-252, score-0.432]

58 In what follows, we extend this result to show that the columns of A obtained by PARW with almost arbitrary Λ (not just Λ = I) actually exhibit strong harmonic structures and should be expected to work equally well. [sent-259, score-0.486]

59 Our second observation is that the harmonic structure exists in the probabilities of PARW from every vertex getting absorbed at a particular vertex, i. [sent-270, score-0.672]

60 , n, (5) p(1) = d1 + αλ1 d1 + αλ1 di + αλi k∼1 k∼i which is equivalent to the following harmonic form: αλ1 αλi w1k p(1) = (1 − p(1)) + p(k), p(i) = − p(i) + d1 d1 di k∼1 The harmonic loss of p can be computed from Eq. [sent-283, score-1.17]

61 8, for 1 ≤ i ≤ k, the drop of p(i) ¯ 2 ¯ ¯ across the cut {Si , Si } is lower bounded by 1 αλ1 /w(Si , Si ), if α is sufﬁciently small. [sent-304, score-0.185]

62 The comparison between the corresponding that p(i) will drop a lot when the cut w(Si , S row and column of A is shown in Figs. [sent-306, score-0.185]

63 3845 Figure 3: (a) Absorption probabilities that a PAWR gets absorbed at other points when starting from i (see Fig. [sent-338, score-0.18]

64 3 Pseudo-inverse of the Graph Laplacian The pseudo-inverse L† of the graph Laplacian is a valid kernel corresponding to commute times [10, 12]. [sent-352, score-0.26]

65 While commute times may fail to capture the global topology in large graphs [22], L† , if used directly as a similarity measure, gives superior performance in practice [10]. [sent-353, score-0.25]

66 Here we provide a formal analysis and justiﬁcation for L† by revealing the strong harmonic structure hidden in it. [sent-354, score-0.467]

67 The following lemma shows the harmonic form of ℓ. [sent-364, score-0.466]

68 ℓ has the following harmonic form: ℓ(1) = 1 1− n + d1 k∼1 1 w1k ℓ(k), ℓ(i) = − n + d1 di k∼i wik ℓ(k), i = 2, . [sent-367, score-0.653]

69 Then the harmonic loss of ℓ on the set Si admits a very simple form, as shown below. [sent-376, score-0.479]

70 2, we can immediately conclude that the variation of ℓ(i) is dominated by the cut cost on the superlevel set Si . [sent-389, score-0.467]

71 4 Hitting Times The hitting time hij from vertex i to j is the expected number of steps it takes a random walk starting from i to reach j for the ﬁrst time. [sent-393, score-0.39]

72 While it was proven in [22] that hitting times are dominated by the local structure of the target, we show below that the hitting times from other points to the same target admit a harmonic structure, and thus are still able to capture the global structure of graphs. [sent-394, score-1.1]

73 Our result is complementary to the analysis in [22], and provides a justiﬁcation of using hitting times in information retrieval where the query is taken as the target to be hit by others [15]. [sent-395, score-0.351]

74 6 Let h : V → R be the hitting times from every vertex to a particular vertex. [sent-397, score-0.339]

75 , assume the vertices have been sorted such that h(1) ≥ h(2) ≥ · · · ≥ h(n − 1) > h(n) = 0, where vertex n is the target vertex. [sent-402, score-0.294]

76 Applying the ﬁrst step analysis, we obtain the harmonic form of h: wik h(i) = 1 + h(k), for i = 1, . [sent-403, score-0.526]

77 (8) di k∼i The harmonic loss on the set Si turns out to be the volume of the set, as stated below. [sent-407, score-0.606]

78 ¯ Now let us examine the variation of h across any cut {Si , Si }. [sent-417, score-0.266]

79 As i decreases from d(Si ) > 2 d(V), the variation of αi dn 1 becomes slower and slower (αi = 1 when d(Si ) ≤ 2 d(V)), so the variation of h will depend on the variation of the conductance of Si , i. [sent-421, score-0.323]

80 In contrast, there are no gaps exhibited in the hitting times from the target to other vertices (Fig. [sent-428, score-0.367]

81 5 Eigenvectors of the Laplacian Matrices The eigenvectors of the Laplacian matrices play a key role in graph partitioning [20]. [sent-431, score-0.243]

82 Here we address these issues by examining the harmonic structures in these eigenvectors. [sent-436, score-0.466]

83 Then we have wik wik u(k), v(i) = v(k), for i = 1, . [sent-441, score-0.178]

84 (10) u(i) = di − λu di (1 − λv ) k∼i k∼i We can see that the smaller λu and λv , the stronger the harmonic structures of u and v. [sent-445, score-0.72]

85 As long as λu ≪ mini {di }, we are safe to say that u will have a signiﬁcant harmonic structure, and thus will be informative for clustering. [sent-447, score-0.484]

86 However, if λu is close to mini {di }, no matter how small λu is, the harmonic structure of u will be weaker, and thus u is less useful. [sent-448, score-0.514]

87 (10), v will always enjoy a signiﬁcant harmonic structure as long as λv is much smaller than 1. [sent-450, score-0.467]

88 4 Experiments In the ﬁrst experiment5, we test absorbing random walks (ARW) for SSL, with the class mass normalization suggested in [26] (ARW-CMN), our proposed normalization (ARW-N-1NN, Sec. [sent-454, score-0.414]

89 1), and without any normalization (ARW-1NN) – where each unlabeled instance is assigned the class of the labeled instance at which it most likely gets absorbed. [sent-456, score-0.231]

90 The results of LGC are not comparable to ARW-N-1NN and PARW (Λ = I), which is probably due to the lack of a harmonic structure. [sent-511, score-0.437]

91 We observe that the columns in Λ = R give signiﬁcantly better results compared with rows, implying that the harmonic structure is vital to the performance. [sent-550, score-0.487]

92 This suggests that it is not the special setting of absorbing rates but the harmonic structure that determines the overall performance. [sent-552, score-0.677]

93 6426 In the third experiment, we test hitting times and pseudo-inverse of the graph Laplacian for SSL on USPS. [sent-572, score-0.353]

94 We compare two different uses of hitting times, the case of starting from the labeled data L to hit the unlabeled data U (HT(L → U)), and the case of the opposite direction (HT(U → L)). [sent-573, score-0.438]

95 Each unlabeled instance j is assigned the class of labeled instance j ∗ , where j ∗ = arg mini∈L {hij } in HT(L → U), j ∗ = arg mini∈L {hji } in (HT(U → L)), and j ∗ = arg maxi∈L {ℓji } in L† = (ℓij ). [sent-574, score-0.252]

96 The results averaged over 100 trials are shown in Table 3, where we see that HT(L → U) performs much better than HT(U → L), which is expected as the former admits a desired harmonic structure. [sent-575, score-0.437]

97 5 Conclusion In this paper, we explore the harmonic structure that widely exists in graph models. [sent-582, score-0.594]

98 The proposed harmonic loss quantiﬁes the discrepancy of a function across cuts, allows a uniﬁed treatment of various models from different contexts, and makes them easy to analyze. [sent-584, score-0.51]

99 New spectral methods for ratio cut partitioning and clustering. [sent-650, score-0.184]

100 Hitting and commute times in large graphs are often misleading. [sent-714, score-0.198]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('harmonic', 0.437), ('si', 0.41), ('absorption', 0.293), ('parw', 0.22), ('superlevel', 0.22), ('absorbing', 0.21), ('laplacian', 0.191), ('hitting', 0.19), ('lf', 0.184), ('walks', 0.148), ('cut', 0.137), ('di', 0.127), ('graph', 0.127), ('vertex', 0.113), ('lrw', 0.11), ('commute', 0.097), ('eigenvectors', 0.093), ('labeled', 0.091), ('unlabeled', 0.089), ('wik', 0.089), ('variation', 0.086), ('ht', 0.078), ('vertices', 0.078), ('conductance', 0.065), ('pii', 0.065), ('graphs', 0.065), ('target', 0.063), ('absorbed', 0.061), ('ssl', 0.06), ('cuts', 0.058), ('lgc', 0.055), ('lsym', 0.055), ('corollary', 0.052), ('wij', 0.05), ('pagerank', 0.048), ('drop', 0.048), ('mini', 0.047), ('magenta', 0.046), ('examine', 0.043), ('von', 0.042), ('loss', 0.042), ('hit', 0.041), ('sorted', 0.04), ('supplement', 0.038), ('points', 0.038), ('pawr', 0.037), ('green', 0.036), ('gap', 0.036), ('times', 0.036), ('justi', 0.035), ('slowly', 0.035), ('luxburg', 0.034), ('partially', 0.033), ('belkin', 0.032), ('continuity', 0.031), ('discrepancy', 0.031), ('probabilities', 0.031), ('classi', 0.031), ('walk', 0.03), ('structure', 0.03), ('hij', 0.03), ('laplacians', 0.03), ('nadler', 0.03), ('structures', 0.029), ('diffusion', 0.029), ('lemma', 0.029), ('fi', 0.029), ('aij', 0.028), ('lh', 0.028), ('normalization', 0.028), ('dropping', 0.027), ('starting', 0.027), ('normalized', 0.026), ('eigenvalues', 0.026), ('global', 0.026), ('contexts', 0.026), ('superior', 0.026), ('eigenvector', 0.025), ('explains', 0.025), ('resistance', 0.025), ('dominated', 0.024), ('zhou', 0.024), ('dense', 0.024), ('cluster', 0.024), ('spectral', 0.024), ('arg', 0.024), ('gets', 0.023), ('partitioning', 0.023), ('usps', 0.023), ('regularization', 0.023), ('clusters', 0.022), ('comparing', 0.022), ('resemblance', 0.022), ('retrieval', 0.021), ('seminal', 0.021), ('manifolds', 0.021), ('columns', 0.02), ('columbia', 0.02), ('colt', 0.02), ('variations', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

2 0.1109376 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

Author: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour

Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [14]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efﬁcient manner. 1

3 0.097404532 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram

Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very ﬂexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1

4 0.094170243 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh

Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difﬁculty of balancing computational complexity with statistical power. In this work, we develop from ﬁrst principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random ﬁelds, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider speciﬁc graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1

5 0.089024402 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions

Author: Ari Pakman, Liam Paninski

Abstract: We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest. An extension of this idea to distributions over mixtures of binary and possibly-truncated Gaussian or exponential variables allows us to sample from posteriors of linear and probit regression models with spike-and-slab priors and truncated parameters. We illustrate the advantages of these algorithms in several examples in which they outperform the Metropolis or Gibbs samplers. 1

6 0.088270806 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

7 0.074370131 202 nips-2013-Multiclass Total Variation Clustering

8 0.073403426 66 nips-2013-Computing the Stationary Distribution Locally

9 0.071537562 149 nips-2013-Latent Structured Active Learning

10 0.068344831 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

11 0.068059929 350 nips-2013-Wavelets on Graphs via Deep Learning

12 0.064187728 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

13 0.062065747 61 nips-2013-Capacity of strong attractor patterns to model behavioural and cognitive prototypes

14 0.057520017 339 nips-2013-Understanding Dropout

15 0.057390224 344 nips-2013-Using multiple samples to learn mixture models

16 0.055575591 211 nips-2013-Non-Linear Domain Adaptation with Boosting

17 0.055513281 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms

18 0.055505902 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

19 0.055246837 7 nips-2013-A Gang of Bandits

20 0.053689774 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.141), (1, 0.031), (2, 0.02), (3, -0.004), (4, 0.052), (5, 0.039), (6, 0.002), (7, -0.081), (8, -0.039), (9, 0.015), (10, 0.039), (11, -0.077), (12, 0.124), (13, 0.006), (14, 0.043), (15, -0.019), (16, -0.058), (17, 0.018), (18, -0.021), (19, 0.004), (20, -0.034), (21, 0.0), (22, 0.016), (23, -0.03), (24, -0.035), (25, -0.034), (26, 0.054), (27, 0.0), (28, -0.12), (29, -0.027), (30, -0.099), (31, 0.023), (32, 0.007), (33, 0.096), (34, -0.024), (35, 0.005), (36, 0.057), (37, -0.066), (38, 0.001), (39, 0.113), (40, 0.057), (41, 0.083), (42, -0.02), (43, 0.071), (44, -0.034), (45, -0.084), (46, -0.094), (47, -0.024), (48, 0.028), (49, -0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95778883 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

2 0.72079635 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram

3 0.66495699 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

Author: Masayuki Karasuyama, Hiroshi Mamitsuka

Abstract: Label propagation is one of the state-of-the-art methods for semi-supervised learning, which estimates labels by propagating label information through a graph. Label propagation assumes that data points (nodes) connected in a graph should have similar labels. Consequently, the label estimation heavily depends on edge weights in a graph which represent similarity of each node pair. We propose a method for a graph to capture the manifold structure of input features using edge weights parameterized by a similarity function. In this approach, edge weights represent both similarity and local reconstruction weight simultaneously, both being reasonable for label propagation. For further justiﬁcation, we provide analytical considerations including an interpretation as a cross-validation of a propagation model in the feature space, and an error analysis based on a low dimensional manifold model. Experimental results demonstrated the effectiveness of our approach both in synthetic and real datasets. 1

4 0.6529516 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh

5 0.64782828 357 nips-2013-k-Prototype Learning for 3D Rigid Structures

Author: Hu Ding, Ronald Berezney, Jinhui Xu

Abstract: In this paper, we study the following new variant of prototype learning, called k-prototype learning problem for 3D rigid structures: Given a set of 3D rigid structures, ﬁnd a set of k rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized. Prototype learning is a core problem in machine learning and has a wide range of applications in many areas. Existing results on this problem have mainly focused on the graph domain. In this paper, we present the ﬁrst algorithm for learning multiple prototypes from 3D rigid structures. Our result is based on a number of new insights to rigid structures alignment, clustering, and prototype reconstruction, and is practically efﬁcient with quality guarantee. We validate our approach using two type of data sets, random data and biological data of chromosome territories. Experiments suggest that our approach can effectively learn prototypes in both types of data. 1

6 0.64424276 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

7 0.63698012 350 nips-2013-Wavelets on Graphs via Deep Learning

8 0.58455282 202 nips-2013-Multiclass Total Variation Clustering

9 0.5804528 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

10 0.54570186 73 nips-2013-Convex Relaxations for Permutation Problems

11 0.53647935 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

12 0.52353466 25 nips-2013-Adaptive Anonymity via $b$-Matching

13 0.51474559 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

14 0.50588834 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms

15 0.49497542 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

16 0.47849461 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport

17 0.47386605 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

18 0.4690401 244 nips-2013-Parametric Task Learning

19 0.46833512 186 nips-2013-Matrix factorization with binary components

20 0.46314567 7 nips-2013-A Gang of Bandits

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.013), (7, 0.253), (16, 0.034), (33, 0.12), (34, 0.102), (41, 0.023), (49, 0.039), (56, 0.095), (70, 0.026), (85, 0.079), (89, 0.025), (93, 0.082), (95, 0.028), (98, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78059369 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

2 0.71615517 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning

Author: Leonidas Lefakis, François Fleuret

Abstract: We propose to train an ensemble with the help of a reservoir in which the learning algorithm can store a limited number of samples. This novel approach lies in the area between ofﬂine and online ensemble approaches and can be seen either as a restriction of the former or an enhancement of the latter. We identify some basic strategies that can be used to populate this reservoir and present our main contribution, dubbed Greedy Edge Expectation Maximization (GEEM), that maintains the reservoir content in the case of Boosting by viewing the samples through their projections into the weak classiﬁer response space. We propose an efﬁcient algorithmic implementation which makes it tractable in practice, and demonstrate its efﬁciency experimentally on several compute-vision data-sets, on which it outperforms both online and ofﬂine methods in a memory constrained setting. 1

3 0.63783926 99 nips-2013-Dropout Training as Adaptive Regularization

Author: Stefan Wager, Sida Wang, Percy Liang

Abstract: Dropout and other feature noising schemes control overﬁtting by artiﬁcially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is ﬁrst-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and ﬁnd that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classiﬁcation tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. 1

4 0.63213629 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

Author: Yuhong Guo

Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Speciﬁcally, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components speciﬁc to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efﬁcient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and signiﬁcantly outperform both standard PCA and robust PCA with rank constraints. 1

5 0.62926888 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efﬁcient structured mean ﬁeld variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show signiﬁcantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

6 0.62843591 5 nips-2013-A Deep Architecture for Matching Short Texts

7 0.62585509 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

8 0.62397659 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

9 0.62381071 215 nips-2013-On Decomposing the Proximal Map

10 0.62275511 69 nips-2013-Context-sensitive active sensing in humans

11 0.6215139 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies

12 0.62058425 251 nips-2013-Predicting Parameters in Deep Learning

13 0.6190725 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

14 0.61903805 201 nips-2013-Multi-Task Bayesian Optimization

15 0.61891031 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

16 0.61872119 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

17 0.61717004 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

18 0.61614186 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

19 0.61586791 232 nips-2013-Online PCA for Contaminated Data

20 0.61566615 196 nips-2013-Modeling Overlapping Communities with Node Popularities