nips nips2013 nips2013-35 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. [sent-5, score-0.643]
2 Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. [sent-6, score-0.165]
3 In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. [sent-7, score-0.902]
4 We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. [sent-8, score-0.863]
5 1 Introduction Various graph-based models, regardless of application, aim to learn a target function on graphs that well respects the graph topology. [sent-11, score-0.255]
6 Whether these models can capture the graph structure faithfully, or whether their target functions possess desirable properties over the graph, remain unclear. [sent-13, score-0.22]
7 [16] showed that the target functions of Laplacian regularized methods become flat as the number of unlabeled points increases, but they also observed that a good classification can still be obtained if an appropriate threshold is used. [sent-17, score-0.19]
8 [22] proved that commute and hitting times are dominated by the local structures in large graphs, ignoring the global patterns. [sent-20, score-0.402]
9 Interestingly, despite this finding, the pseudo-inverse of graph Laplacian, known as the kernel matrix of commute times, consistently performs superior in collaborative filtering [10]. [sent-22, score-0.25]
10 In spectral clustering, the eigenvectors of the normalized graph Laplacian are more desired than those of the un-normalized one [20, 21]. [sent-23, score-0.27]
11 Also for the recently proposed partially absorbing random walks [23], certain setting of absorption rates seems better than others. [sent-24, score-0.684]
12 1 Our starting point is the discovery of a common structure hidden in the target functions of various graph models. [sent-26, score-0.247]
13 We call this structure the harmonic structure for its resemblance to the harmonic function [9, 26]. [sent-28, score-0.956]
14 It naturally arises from the first step analysis of random walk models, and, as will be shown in this paper, implicitly exists in other methods such as pseudo-inverse of the graph Laplacian and eigenvectors of the Laplacian matrices. [sent-29, score-0.25]
15 The target functions of these models are characterized by their harmonic loss, a quantitative notion introduced in this paper to measure the discrepancy of a target function f on cuts of graphs. [sent-30, score-0.652]
16 The variations of f across cuts can then be upper and lower bounded by the ratio of its harmonic loss and the cut cost. [sent-31, score-0.694]
17 As long as the harmonic loss varies slowly, the graph conductance dominates the variations of f – it will remain smooth in a dense area but vary sharply otherwise. [sent-32, score-0.715]
18 We use this tool to study five popular models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. [sent-36, score-0.863]
19 The key message conveyed in our results is that various existing models enjoying the harmonic structure are actually capable of capturing the global graph topology, and understanding of this structure can guide us in applying them properly. [sent-38, score-0.65]
20 Denote by G = (V, W ) a graph with n vertices V and a symmetric non-negative affinity matrix W = [wij ] ∈ Rn×n (wii = 0). [sent-41, score-0.205]
21 Denote by di = j wij the degree of vertex i, by D = diag(d1 , d2 , . [sent-42, score-0.29]
22 The conductance of a subset S ⊂ V of vertices is defined as Φ(S) = min(d(S),d(S)) , where ¯ ¯ and d(S) = ¯ = w(S, S) ¯ i∈S di is i∈S,j∈S wij is the cut cost between S and its complement S, the volume of S. [sent-46, score-0.457]
23 The harmonic loss of f : V → R on any S ⊆ V is defined as: wij di f (i) − Lf (S) := di f (i) − f (j) = wij f (j) . [sent-50, score-0.706]
24 However, as we shall see below, it is always non-negative on superlevel sets. [sent-53, score-0.22]
25 The following lemma shows that the harmonic loss couples the cut cost and the discrepancy of the function across the cut. [sent-54, score-0.676]
26 In practice, to examine the variation of f on a graph, one does not necessarily examine on every subset of vertices, which will be exponential in the number of vertices. [sent-60, score-0.172]
27 Instead, it suffices to consider its variation on the superlevel sets defined as follows. [sent-61, score-0.306]
28 For any function f : V → R on a graph and a scalar c ∈ R, the set {i | f (i) ≥ c} is called a superlevel set of f with level c. [sent-64, score-0.347]
29 , i} is the superlevel set with level f (i) if f (i) > f (i + 1). [sent-73, score-0.22]
30 For convenience, we still call Si a superlevel set of f even if f (i) = f (i + 1). [sent-74, score-0.22]
31 In this paper, we will mainly examine the variation of f on its n superlevel sets S1 , . [sent-75, score-0.349]
32 Our first observation is that the harmonic loss on each superlevel set is non-negative, stated as follows. [sent-79, score-0.699]
33 2 Based on the notion of superlevel sets, it becomes legitimate to talk about the continuity of a function on graphs, which we formally define as follows. [sent-86, score-0.251]
34 First, for any function f on a graph, as long as its harmonic loss Lf (Si ) varies slowly on the superlevel sets, i. [sent-123, score-0.734]
35 , f is harmonic almost everywhere, the graph conductance Φ(Si ) will dominate the variation of f . [sent-125, score-0.715]
36 8, a big gap exists across the cut if Φ(Si ) is small (see Sec. [sent-128, score-0.173]
37 Second, the continuity (either left, right, or both) of f ensures that its variations conform with the graph connectivity, i. [sent-131, score-0.178]
38 For each model, we show its target function in harmonic forms, quantify its harmonic loss, analyze its dropping bounds, and provide corrections or justifications for its use. [sent-138, score-0.964]
39 Here we show that Laplacian regularization can actually capture the global graph structure and a simple normalization scheme would resolve the raised issue. [sent-143, score-0.234]
40 Denote by f : V → R the absorption probability vector from every point to the positive labeled point. [sent-145, score-0.384]
41 Assume the vertices are sorted such that 1 = f (1) > f (2) ≥ · · · ≥ f (n − 1) > f (n) = 0 (vertex 1 is labeled positive and vertex n is labeled negative). [sent-146, score-0.413]
42 (4) di k∼i Our first observation is that the harmonic loss of f is constant w. [sent-151, score-0.606]
43 Since the harmonic loss of f is a constant on the superlevel sets Si (Corollary 3. [sent-172, score-0.699]
44 8, the variation of f depends solely on the cut value w(Si , Si ), which indicates that it will drop slowly when the cut is dense but drastically when the cut is sparse. [sent-175, score-0.604]
45 1, where the graph consists of 6 points in 2 classes denoted by different colors, with 3 points in each. [sent-180, score-0.203]
46 The absorption probabilities from all the vertices to vertex 1 are computed and shown. [sent-184, score-0.515]
47 We can see that since the ¯ cut w(S2 , S2 ) = 2 is quite dense, the drop between f (2) and f (3) is upper bounded by a small number (Theorem 2. [sent-185, score-0.185]
48 Now let f1 and f2 denote the absorption probability vectors to the two labeled points respectively. [sent-192, score-0.422]
49 The magenta triangle and the green circle denote labeled data. [sent-217, score-0.173]
50 The absorption probabilities to both labeled points are shown in Fig. [sent-225, score-0.453]
51 The green vector is well above the the magenta vector, indicating that every unlabeled point has larger absorption probability to the green labeled point. [sent-227, score-0.591]
52 Comparing them classifies all the unlabeled points to the green Gaussian (Fig. [sent-228, score-0.163]
53 Since the green labeled point has larger degree than the magenta one1 , this result is expected from the analysis in [1]. [sent-230, score-0.173]
54 2 Partially Absorbing Random Walks Here we revisit the recently proposed partially absorbing random walks (PARW) [23], which generalizes absorbing random walks by allowing partial absorption at each state. [sent-249, score-1.042]
55 The absorption rate pii αλi at state i is defined as pii = αλi +di , where α > 0, λi > 0 are regularization parameters. [sent-250, score-0.446]
56 Given current state i, a PARW in the next step will get absorbed at i with probability pii and with probability w (1 − pii ) × dij moves to state j. [sent-251, score-0.191]
57 Let aij be the probability that a PARW starting from state i gets i absorbed at state j within finite steps, and denote by A = [aij ] ∈ Rn×n the absorption probability matrix. [sent-252, score-0.432]
58 In what follows, we extend this result to show that the columns of A obtained by PARW with almost arbitrary Λ (not just Λ = I) actually exhibit strong harmonic structures and should be expected to work equally well. [sent-259, score-0.486]
59 Our second observation is that the harmonic structure exists in the probabilities of PARW from every vertex getting absorbed at a particular vertex, i. [sent-270, score-0.672]
60 , n, (5) p(1) = d1 + αλ1 d1 + αλ1 di + αλi k∼1 k∼i which is equivalent to the following harmonic form: αλ1 αλi w1k p(1) = (1 − p(1)) + p(k), p(i) = − p(i) + d1 d1 di k∼1 The harmonic loss of p can be computed from Eq. [sent-283, score-1.17]
61 8, for 1 ≤ i ≤ k, the drop of p(i) ¯ 2 ¯ ¯ across the cut {Si , Si } is lower bounded by 1 αλ1 /w(Si , Si ), if α is sufficiently small. [sent-304, score-0.185]
62 The comparison between the corresponding that p(i) will drop a lot when the cut w(Si , S row and column of A is shown in Figs. [sent-306, score-0.185]
63 3845 Figure 3: (a) Absorption probabilities that a PAWR gets absorbed at other points when starting from i (see Fig. [sent-338, score-0.18]
64 3 Pseudo-inverse of the Graph Laplacian The pseudo-inverse L† of the graph Laplacian is a valid kernel corresponding to commute times [10, 12]. [sent-352, score-0.26]
65 While commute times may fail to capture the global topology in large graphs [22], L† , if used directly as a similarity measure, gives superior performance in practice [10]. [sent-353, score-0.25]
66 Here we provide a formal analysis and justification for L† by revealing the strong harmonic structure hidden in it. [sent-354, score-0.467]
67 The following lemma shows the harmonic form of ℓ. [sent-364, score-0.466]
68 ℓ has the following harmonic form: ℓ(1) = 1 1− n + d1 k∼1 1 w1k ℓ(k), ℓ(i) = − n + d1 di k∼i wik ℓ(k), i = 2, . [sent-367, score-0.653]
69 Then the harmonic loss of ℓ on the set Si admits a very simple form, as shown below. [sent-376, score-0.479]
70 2, we can immediately conclude that the variation of ℓ(i) is dominated by the cut cost on the superlevel set Si . [sent-389, score-0.467]
71 4 Hitting Times The hitting time hij from vertex i to j is the expected number of steps it takes a random walk starting from i to reach j for the first time. [sent-393, score-0.39]
72 While it was proven in [22] that hitting times are dominated by the local structure of the target, we show below that the hitting times from other points to the same target admit a harmonic structure, and thus are still able to capture the global structure of graphs. [sent-394, score-1.1]
73 Our result is complementary to the analysis in [22], and provides a justification of using hitting times in information retrieval where the query is taken as the target to be hit by others [15]. [sent-395, score-0.351]
74 6 Let h : V → R be the hitting times from every vertex to a particular vertex. [sent-397, score-0.339]
75 , assume the vertices have been sorted such that h(1) ≥ h(2) ≥ · · · ≥ h(n − 1) > h(n) = 0, where vertex n is the target vertex. [sent-402, score-0.294]
76 Applying the first step analysis, we obtain the harmonic form of h: wik h(i) = 1 + h(k), for i = 1, . [sent-403, score-0.526]
77 (8) di k∼i The harmonic loss on the set Si turns out to be the volume of the set, as stated below. [sent-407, score-0.606]
78 ¯ Now let us examine the variation of h across any cut {Si , Si }. [sent-417, score-0.266]
79 As i decreases from d(Si ) > 2 d(V), the variation of αi dn 1 becomes slower and slower (αi = 1 when d(Si ) ≤ 2 d(V)), so the variation of h will depend on the variation of the conductance of Si , i. [sent-421, score-0.323]
80 In contrast, there are no gaps exhibited in the hitting times from the target to other vertices (Fig. [sent-428, score-0.367]
81 5 Eigenvectors of the Laplacian Matrices The eigenvectors of the Laplacian matrices play a key role in graph partitioning [20]. [sent-431, score-0.243]
82 Here we address these issues by examining the harmonic structures in these eigenvectors. [sent-436, score-0.466]
83 Then we have wik wik u(k), v(i) = v(k), for i = 1, . [sent-441, score-0.178]
84 (10) u(i) = di − λu di (1 − λv ) k∼i k∼i We can see that the smaller λu and λv , the stronger the harmonic structures of u and v. [sent-445, score-0.72]
85 As long as λu ≪ mini {di }, we are safe to say that u will have a significant harmonic structure, and thus will be informative for clustering. [sent-447, score-0.484]
86 However, if λu is close to mini {di }, no matter how small λu is, the harmonic structure of u will be weaker, and thus u is less useful. [sent-448, score-0.514]
87 (10), v will always enjoy a significant harmonic structure as long as λv is much smaller than 1. [sent-450, score-0.467]
88 4 Experiments In the first experiment5, we test absorbing random walks (ARW) for SSL, with the class mass normalization suggested in [26] (ARW-CMN), our proposed normalization (ARW-N-1NN, Sec. [sent-454, score-0.414]
89 1), and without any normalization (ARW-1NN) – where each unlabeled instance is assigned the class of the labeled instance at which it most likely gets absorbed. [sent-456, score-0.231]
90 The results of LGC are not comparable to ARW-N-1NN and PARW (Λ = I), which is probably due to the lack of a harmonic structure. [sent-511, score-0.437]
91 We observe that the columns in Λ = R give significantly better results compared with rows, implying that the harmonic structure is vital to the performance. [sent-550, score-0.487]
92 This suggests that it is not the special setting of absorbing rates but the harmonic structure that determines the overall performance. [sent-552, score-0.677]
93 6426 In the third experiment, we test hitting times and pseudo-inverse of the graph Laplacian for SSL on USPS. [sent-572, score-0.353]
94 We compare two different uses of hitting times, the case of starting from the labeled data L to hit the unlabeled data U (HT(L → U)), and the case of the opposite direction (HT(U → L)). [sent-573, score-0.438]
95 Each unlabeled instance j is assigned the class of labeled instance j ∗ , where j ∗ = arg mini∈L {hij } in HT(L → U), j ∗ = arg mini∈L {hji } in (HT(U → L)), and j ∗ = arg maxi∈L {ℓji } in L† = (ℓij ). [sent-574, score-0.252]
96 The results averaged over 100 trials are shown in Table 3, where we see that HT(L → U) performs much better than HT(U → L), which is expected as the former admits a desired harmonic structure. [sent-575, score-0.437]
97 5 Conclusion In this paper, we explore the harmonic structure that widely exists in graph models. [sent-582, score-0.594]
98 The proposed harmonic loss quantifies the discrepancy of a function across cuts, allows a unified treatment of various models from different contexts, and makes them easy to analyze. [sent-584, score-0.51]
99 New spectral methods for ratio cut partitioning and clustering. [sent-650, score-0.184]
100 Hitting and commute times in large graphs are often misleading. [sent-714, score-0.198]
wordName wordTfidf (topN-words)
[('harmonic', 0.437), ('si', 0.41), ('absorption', 0.293), ('parw', 0.22), ('superlevel', 0.22), ('absorbing', 0.21), ('laplacian', 0.191), ('hitting', 0.19), ('lf', 0.184), ('walks', 0.148), ('cut', 0.137), ('di', 0.127), ('graph', 0.127), ('vertex', 0.113), ('lrw', 0.11), ('commute', 0.097), ('eigenvectors', 0.093), ('labeled', 0.091), ('unlabeled', 0.089), ('wik', 0.089), ('variation', 0.086), ('ht', 0.078), ('vertices', 0.078), ('conductance', 0.065), ('pii', 0.065), ('graphs', 0.065), ('target', 0.063), ('absorbed', 0.061), ('ssl', 0.06), ('cuts', 0.058), ('lgc', 0.055), ('lsym', 0.055), ('corollary', 0.052), ('wij', 0.05), ('pagerank', 0.048), ('drop', 0.048), ('mini', 0.047), ('magenta', 0.046), ('examine', 0.043), ('von', 0.042), ('loss', 0.042), ('hit', 0.041), ('sorted', 0.04), ('supplement', 0.038), ('points', 0.038), ('pawr', 0.037), ('green', 0.036), ('gap', 0.036), ('times', 0.036), ('justi', 0.035), ('slowly', 0.035), ('luxburg', 0.034), ('partially', 0.033), ('belkin', 0.032), ('continuity', 0.031), ('discrepancy', 0.031), ('probabilities', 0.031), ('classi', 0.031), ('walk', 0.03), ('structure', 0.03), ('hij', 0.03), ('laplacians', 0.03), ('nadler', 0.03), ('structures', 0.029), ('diffusion', 0.029), ('lemma', 0.029), ('fi', 0.029), ('aij', 0.028), ('lh', 0.028), ('normalization', 0.028), ('dropping', 0.027), ('starting', 0.027), ('normalized', 0.026), ('eigenvalues', 0.026), ('global', 0.026), ('contexts', 0.026), ('superior', 0.026), ('eigenvector', 0.025), ('explains', 0.025), ('resistance', 0.025), ('dominated', 0.024), ('zhou', 0.024), ('dense', 0.024), ('cluster', 0.024), ('spectral', 0.024), ('arg', 0.024), ('gets', 0.023), ('partitioning', 0.023), ('usps', 0.023), ('regularization', 0.023), ('clusters', 0.022), ('comparing', 0.022), ('resemblance', 0.022), ('retrieval', 0.021), ('seminal', 0.021), ('manifolds', 0.021), ('columns', 0.02), ('columbia', 0.02), ('colt', 0.02), ('variations', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
2 0.1109376 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
Author: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour
Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [14]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner. 1
3 0.097404532 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
4 0.094170243 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
5 0.089024402 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
Author: Ari Pakman, Liam Paninski
Abstract: We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest. An extension of this idea to distributions over mixtures of binary and possibly-truncated Gaussian or exponential variables allows us to sample from posteriors of linear and probit regression models with spike-and-slab priors and truncated parameters. We illustrate the advantages of these algorithms in several examples in which they outperform the Metropolis or Gibbs samplers. 1
6 0.088270806 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
7 0.074370131 202 nips-2013-Multiclass Total Variation Clustering
8 0.073403426 66 nips-2013-Computing the Stationary Distribution Locally
9 0.071537562 149 nips-2013-Latent Structured Active Learning
10 0.068344831 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
11 0.068059929 350 nips-2013-Wavelets on Graphs via Deep Learning
12 0.064187728 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
13 0.062065747 61 nips-2013-Capacity of strong attractor patterns to model behavioural and cognitive prototypes
14 0.057520017 339 nips-2013-Understanding Dropout
15 0.057390224 344 nips-2013-Using multiple samples to learn mixture models
16 0.055575591 211 nips-2013-Non-Linear Domain Adaptation with Boosting
17 0.055513281 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms
18 0.055505902 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
19 0.055246837 7 nips-2013-A Gang of Bandits
20 0.053689774 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
topicId topicWeight
[(0, 0.141), (1, 0.031), (2, 0.02), (3, -0.004), (4, 0.052), (5, 0.039), (6, 0.002), (7, -0.081), (8, -0.039), (9, 0.015), (10, 0.039), (11, -0.077), (12, 0.124), (13, 0.006), (14, 0.043), (15, -0.019), (16, -0.058), (17, 0.018), (18, -0.021), (19, 0.004), (20, -0.034), (21, 0.0), (22, 0.016), (23, -0.03), (24, -0.035), (25, -0.034), (26, 0.054), (27, 0.0), (28, -0.12), (29, -0.027), (30, -0.099), (31, 0.023), (32, 0.007), (33, 0.096), (34, -0.024), (35, 0.005), (36, 0.057), (37, -0.066), (38, 0.001), (39, 0.113), (40, 0.057), (41, 0.083), (42, -0.02), (43, 0.071), (44, -0.034), (45, -0.084), (46, -0.094), (47, -0.024), (48, 0.028), (49, -0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.95778883 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
2 0.72079635 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
3 0.66495699 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
Author: Masayuki Karasuyama, Hiroshi Mamitsuka
Abstract: Label propagation is one of the state-of-the-art methods for semi-supervised learning, which estimates labels by propagating label information through a graph. Label propagation assumes that data points (nodes) connected in a graph should have similar labels. Consequently, the label estimation heavily depends on edge weights in a graph which represent similarity of each node pair. We propose a method for a graph to capture the manifold structure of input features using edge weights parameterized by a similarity function. In this approach, edge weights represent both similarity and local reconstruction weight simultaneously, both being reasonable for label propagation. For further justification, we provide analytical considerations including an interpretation as a cross-validation of a propagation model in the feature space, and an error analysis based on a low dimensional manifold model. Experimental results demonstrated the effectiveness of our approach both in synthetic and real datasets. 1
4 0.6529516 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
5 0.64782828 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
Author: Hu Ding, Ronald Berezney, Jinhui Xu
Abstract: In this paper, we study the following new variant of prototype learning, called k-prototype learning problem for 3D rigid structures: Given a set of 3D rigid structures, find a set of k rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized. Prototype learning is a core problem in machine learning and has a wide range of applications in many areas. Existing results on this problem have mainly focused on the graph domain. In this paper, we present the first algorithm for learning multiple prototypes from 3D rigid structures. Our result is based on a number of new insights to rigid structures alignment, clustering, and prototype reconstruction, and is practically efficient with quality guarantee. We validate our approach using two type of data sets, random data and biological data of chromosome territories. Experiments suggest that our approach can effectively learn prototypes in both types of data. 1
6 0.64424276 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
7 0.63698012 350 nips-2013-Wavelets on Graphs via Deep Learning
8 0.58455282 202 nips-2013-Multiclass Total Variation Clustering
9 0.5804528 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
10 0.54570186 73 nips-2013-Convex Relaxations for Permutation Problems
11 0.53647935 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
12 0.52353466 25 nips-2013-Adaptive Anonymity via $b$-Matching
13 0.51474559 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
14 0.50588834 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms
15 0.49497542 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
16 0.47849461 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
17 0.47386605 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
18 0.4690401 244 nips-2013-Parametric Task Learning
19 0.46833512 186 nips-2013-Matrix factorization with binary components
20 0.46314567 7 nips-2013-A Gang of Bandits
topicId topicWeight
[(2, 0.013), (7, 0.253), (16, 0.034), (33, 0.12), (34, 0.102), (41, 0.023), (49, 0.039), (56, 0.095), (70, 0.026), (85, 0.079), (89, 0.025), (93, 0.082), (95, 0.028), (98, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.78059369 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
2 0.71615517 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
Author: Leonidas Lefakis, François Fleuret
Abstract: We propose to train an ensemble with the help of a reservoir in which the learning algorithm can store a limited number of samples. This novel approach lies in the area between offline and online ensemble approaches and can be seen either as a restriction of the former or an enhancement of the latter. We identify some basic strategies that can be used to populate this reservoir and present our main contribution, dubbed Greedy Edge Expectation Maximization (GEEM), that maintains the reservoir content in the case of Boosting by viewing the samples through their projections into the weak classifier response space. We propose an efficient algorithmic implementation which makes it tractable in practice, and demonstrate its efficiency experimentally on several compute-vision data-sets, on which it outperforms both online and offline methods in a memory constrained setting. 1
3 0.63783926 99 nips-2013-Dropout Training as Adaptive Regularization
Author: Stefan Wager, Sida Wang, Percy Liang
Abstract: Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. 1
4 0.63213629 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
5 0.62926888 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth
Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1
6 0.62843591 5 nips-2013-A Deep Architecture for Matching Short Texts
7 0.62585509 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
8 0.62397659 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
9 0.62381071 215 nips-2013-On Decomposing the Proximal Map
10 0.62275511 69 nips-2013-Context-sensitive active sensing in humans
11 0.6215139 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
12 0.62058425 251 nips-2013-Predicting Parameters in Deep Learning
13 0.6190725 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
14 0.61903805 201 nips-2013-Multi-Task Bayesian Optimization
15 0.61891031 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
16 0.61872119 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
17 0.61717004 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
18 0.61614186 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models
19 0.61586791 232 nips-2013-Online PCA for Contaminated Data
20 0.61566615 196 nips-2013-Modeling Overlapping Communities with Node Popularities