nips nips2011 nips2011-150 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Blake Shaw, Bert Huang, Tony Jebara
Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Many real-world networks are described by both connectivity information and features for every node. [sent-10, score-0.287]
2 To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. [sent-11, score-1.068]
3 Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. [sent-12, score-1.112]
4 We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. [sent-13, score-0.216]
5 We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. [sent-14, score-0.191]
6 1 Introduction The proliferation of social networks on the web has spurred many significant advances in modeling networks [1, 2, 4, 12, 13, 15, 16, 26]. [sent-15, score-0.147]
7 Many social networks are of this form; on services such as Facebook, Twitter, or LinkedIn, there are profiles which describe each person, as well as the connections they make. [sent-17, score-0.127]
8 The relationship between a node’s features and connections is often not explicit. [sent-18, score-0.083]
9 We want to learn the relationship between profiles and links from massive social networks such that we can better predict who is likely to connect. [sent-20, score-0.16]
10 To model this relationship, one could simply model each link independently, where one simply learns what characteristics of two profiles imply a possible link. [sent-21, score-0.107]
11 However, this approach completely ignores the structural characteristics of the links in the network. [sent-22, score-0.092]
12 We posit that modeling independent links is insufficient, and in order to better model these networks one must account for the inherent topology of the network as well as the interactions between the features of nodes. [sent-23, score-0.257]
13 We thus propose structure preserving metric learning (SPML), a method for learning a distance metric between nodes that preserves the structural network behavior seen in data. [sent-24, score-0.816]
14 These methods first build a k-nearest neighbors (kNN) graph from ∗ Blake Shaw is currently at Foursquare, and Bert Huang is currently at the University of Maryland. [sent-27, score-0.108]
15 1 training data with a fixed k, and then learn a Mahalanobis distance metric which tries to keep connected points with similar labels close while pushing away class impostors, pairs of points which are connected but of different classes. [sent-28, score-0.337]
16 Fundamentally, these supervised methods aim to learn a distance metric such that applying a connectivity algorithm (for instance, k-nearest neighbors) under the metric will produce a graph where no point is connected to others with different class labels. [sent-29, score-0.718]
17 Once the metric is learned, the class label for an unseen datapoint can be predicted by the majority vote of nearby points under the learned metric. [sent-31, score-0.212]
18 Unfortunately, these metric learning algorithms are not easily applied when we are given a network as input instead of class labels for each point. [sent-32, score-0.271]
19 Under this new regime, we want to learn a metric such that points connected in the network are close and points which are unconnected are more distant. [sent-33, score-0.34]
20 Intuitively, certain features or groups of features should influence how nodes connect, and thus it should be possible to learn a mapping from features to connectivity such that the mapping respects the underlying topological structure of the network. [sent-34, score-0.489]
21 Like previous metric learning methods, SPML learns a metric which reconciles the input features with some auxiliary information such as class labels. [sent-35, score-0.502]
22 In this case, instead of pushing away class impostors, SPML pushes away graph impostors, points which are close in terms of distance but which should remain unconnected in order to preserve the topology of the network. [sent-36, score-0.249]
23 Thus SPML learns a metric where the learned distances are inherently tied to the original input connectivity. [sent-37, score-0.382]
24 Preserving graph topology is possible by enforcing simple linear constraints on distances between nodes [21]. [sent-38, score-0.298]
25 By adapting the constraints from the graph embedding technique structure preserving embedding, we formulate simple linear structure preserving constraints for metric learning that enforce that neighbors of each node are closer than all others. [sent-39, score-0.986]
26 Furthermore, we adapt these constraints for an online setting similar to PEGASOS [20] and OASIS [3], such that we can apply SPML to large networks by optimizing with stochastic gradient descent (SGD). [sent-40, score-0.133]
27 2 Structure preserving metric learning Given as input an adjacency matrix A ∈ Bn×n , and node features X ∈ Rd×n , structure preserving metric learning (SPML) learns a Mahalanobis distance metric parameterized by a positive semidefinite (PSD) matrix M ∈ Rd×d , where M 0. [sent-41, score-1.317]
28 The distance between two points under the metric is defined as DM (xi , xj ) = (xi −xj ) M(xi −xj ). [sent-42, score-0.31]
29 When the metric is the identity M = Id , DM (xi , xj ) represents the squared Euclidean distance between the i’th and j’th points. [sent-43, score-0.31]
30 Learning M is equivalent to learning a linear scaling on the input features LX where M = L L and L ∈ Rd×d . [sent-44, score-0.084]
31 SPML learns an M which is structure preserving, as defined in Definition 1. [sent-45, score-0.081]
32 Given a connectivity algorithm G, SPML learns a metric such that applying G to the input data using the learned metric produces the input adjacency matrix exactly. [sent-46, score-0.75]
33 1 Possible choices for G include maximum weight b-matching, k-nearest neighbors, -neighborhoods, or maximum weight spanning tree. [sent-47, score-0.076]
34 Definition 1 Given a graph with adjacency matrix A, a distance metric parametrized by M ∈ Rd×d is structure preserving with respect to a connectivity algorithm G, if G(X, M) = A. [sent-48, score-0.799]
35 1 Preserving graph topology with linear constraints To preserve graph topology, we use the same linear constraints as structure preserving embedding (SPE) [21], but apply them to M, which parameterizes the distances between points. [sent-50, score-0.633]
36 A useful tool for defining distances as linear constraints on M is the transformation DM (xi , xj ) = xi Mxi + xj Mxj − xi Mxj − xj Mxi , (1) which allows linear constraints on the distances to be written as linear constraints on the M matrix. [sent-51, score-0.625]
37 For different connectivity schemes below, we present linear constraints which enforce graph structure to be preserved. [sent-52, score-0.347]
38 2 to the true degree for each node, the distances to all disconnected nodes must be larger than the distance to the farthest connected neighbor: DM (xi , xj ) > (1 − Aij ) maxl (Ail DM (xi , xl )), ∀i, j. [sent-54, score-0.347]
39 Similarly, preserving an -neighborhood graph obeys linear constraints on M: DM (xi , xj ) ≤ , ∀{i, j|Aij = 1}, and DM (xi , xj ) ≥ , ∀{i, j|Aij = 0}. [sent-55, score-0.45]
40 If for each node the connected distances are less than the unconnected distances (or some ), i. [sent-56, score-0.315]
41 , the metric obeys the above linear constraints, Definition 1 is satisfied, and thus the connectivity computed under the learned metric M is exactly A. [sent-58, score-0.583]
42 Maximum weight subgraphs Unlike nearest neighbor algorithms, which select edges greedily for each node, maximum weight subgraph algorithms select edges from a weighted graph to produce a subgraph which has total maximal weight [6]. [sent-59, score-0.419]
43 Given a metric parametrized by M, let the weight between two points (i, j) be the negated pairwise distance between them: Zij = −DM (xi , xj ) = −(xi − xj ) M(xi − xj ). [sent-60, score-0.476]
44 For example, maximum weight b-matching finds the maximum weight subgraph while also enforcing that every node has a fixed degree bi for each i’th node. [sent-61, score-0.186]
45 Unfortunately, preserving structure for these ˜ ˜ algorithms requires enforcing many linear constraints of the form: tr(Z A) ≥ tr(Z A), ∀A ∈ G. [sent-63, score-0.292]
46 This reveals one critical difference between structure preserving constraints of these algorithms and those of nearest-neighbor graphs: there are exponentially many linear constraints. [sent-64, score-0.292]
47 2 Algorithm derivation By combining the linear constraints from the previous section with a Frobenius norm (denoted ||·||F ) regularizer on M and regularization parameter λ, we have a simple semidefinite program (SDP) which learns an M that is structure preserving and has minimal complexity. [sent-67, score-0.336]
48 Algorithm 1 summarizes the naive implementation of SPML when the connectivity algorithm is k-nearest neighbors, which is optimized by a standard SDP solver. [sent-68, score-0.184]
49 Each added constraint enforces that the total weight along the edges of the true graph is greater than total weight of any other graph by some margin. [sent-75, score-0.244]
50 We can rewrite the optimization as unconstrained over an objective function with a hinge-loss on the structure preserving constraints: λ 1 f (M) = ||M||2 − max(DM (xi , xj ) − DM (xi , xk ) + 1, 0). [sent-85, score-0.322]
51 F 2 |S| (i,j,k)∈S Here the constraints have been written in terms of hinge-losses over triplets, each consisting of a node, its neighbor and its non-neighbor. [sent-86, score-0.121]
52 Using the distance transformation in Equation 1, each of the |S| constraints can be written using a sparse matrix C(i,j,k) , where (i,j,k) Cjj (i,j,k) = 1, Cik (i,j,k) = 1, , Cki (i,j,k) = 1, , Cij (i,j,k) = −1, Cji (i,j,k) = −1, , Ckk = −1, and whose other entries are zero. [sent-88, score-0.144]
53 By construction, sparse matrix multiplication of C(i,j,k) indexes the proper elements related to nodes i, j, and k, such that tr(C(i,j,k) X MX) is equal to DM (xi , xj ) − DM (xi , xk ). [sent-89, score-0.149]
54 The subgradient of f at M is then 1 f = λM + XC(i,j,k) X , |S| (i,j,k)∈S+ where S+ = {(i, j, k)|DM (xi , xj ) − DM (xi , xk ) + 1 > 0}. [sent-90, score-0.117]
55 If for all triplets this quantity is negative, there exists no unconnected neighbor of a point which is closer than a point’s farthest connected neighbor – precisely the structure preserving criterion for nearest neighbor algorithms. [sent-91, score-0.608]
56 If a true metric is necessary, we intermittently project M onto the PSD cone. [sent-94, score-0.187]
57 Assuming the node feature vectors are of bounded norm, the radius of the input data R is constant with respect to n, since each is constructed using the feature vectors of three nodes. [sent-109, score-0.098]
58 The area under the receiver operator characteristic (ROC) curve is measured, which is related to the structure preserving hinge loss, and the plot clearly shows fast convergence and quickly diminishing returns at higher iteration counts. [sent-112, score-0.233]
59 4 Variations While stochastic SPML does not scale with the size of the input graph, evaluating distances using a full M matrix requires O(d2 ) work. [sent-114, score-0.129]
60 It has been shown that n points can be mapped into a space of dimensionality O(log n/ε2 ) such that distances are distorted by no more than a factor of (1 ± ε) [5, 11]. [sent-116, score-0.074]
61 In this case, all references to M can be replaced with L L, thus inducing a true metric without projection. [sent-120, score-0.187]
62 Finally, when predicting links of new nodes, SPML does not know how many connections to predict. [sent-124, score-0.085]
63 To address this uncertainty, we propose a variant to SPML called degree distributional metric learning (DDML), which simultaneously learns the metric as well as parameters for the connectivity algorithm. [sent-125, score-0.602]
64 First we show how SPML performs on a simple synthetic dataset that is easily visualized in two 5 dimensions and which we believe mimics many traditional network datasets. [sent-128, score-0.082]
65 We then demonstrate favorable performance for SPML in predicting links of the Wikipedia document network and the Facebook social network. [sent-129, score-0.171]
66 These vectors represent the true features for the n nodes X ∈ Rd×n . [sent-133, score-0.117]
67 Next, the true features are scrambled by applying a random linear transformation: RX where R ∈ Rd×d . [sent-135, score-0.119]
68 Given RX and A, the goal of SPML is to learn a metric M that undoes the linear scrambling, so that when b-matching is applied to RX using the learned distance metric, it produces the input adjacency matrix. [sent-136, score-0.367]
69 In Figure 1(a), we see an embedding of the graph using the true features for each node as coordinates, and connectivity generated from b-matching. [sent-138, score-0.415]
70 We posit that many real-world datasets resemble plot 1(b), with seemingly incongruous feature and connectivity information. [sent-140, score-0.184]
71 Applying b-matching to the scrambled data produces connections shown in Figure 1(c). [sent-141, score-0.088]
72 Finally, by learning M via SPML (Algorithm 2) and computing L by Cholesky decomposition of M, we can recover features LRX (Figure 1(d)) that respect the structure in the target adjacency matrix and thus more closely resemble the true features used to generate the data. [sent-142, score-0.22]
73 2 Link prediction We compare SPML to a variety of methods for predicting links from node features: Euclidean distances, relational topic models (RTM) , and traditional support vector machines (SVM). [sent-145, score-0.13]
74 A simple baseline for comparison is how well the Euclidean distance metric performs at ranking possible connections. [sent-146, score-0.246]
75 Relational topic models learn a link probability function in addition to latent topic mixtures describing each node [2]. [sent-147, score-0.134]
76 For the SVM, we construct training examples consisting of the pairwise differences between node features. [sent-148, score-0.071]
77 Interestingly, an SVM with these inputs can be interpreted as an instance of SPML using diagonal M and the -neighborhood connectivity algorithm, which connects points based on their distance, completely independently of the rest of the graph structure. [sent-153, score-0.251]
78 The RTM approach is appropriate for data that consists of counts, and is a generative model which recovers a set of topics in addition to link predictions. [sent-155, score-0.092]
79 We skip the PSD projection step, since these experiments are only concerned with 6 prediction, and obtaining a true metric is not necessary. [sent-158, score-0.187]
80 For SPML, we use the diagonal variant of Algorithm 3, since the high-dimensionality of the input features reduces the benefit of cross-feature weights. [sent-181, score-0.084]
81 On the held-out nodes, we task each algorithm to rank the unknown edges according to distance (or another measure of link likelihood), and compare the accuracy of the rankings using receiver operator characteristic (ROC) curves. [sent-182, score-0.156]
82 One possible explanation for why the SVM is unable to gain performance over Euclidean distance is that the wide range of degrees for nodes in these graphs makes it difficult to find a single threshold that separates edges from nonedges. [sent-186, score-0.187]
83 The resulting AUC on the edges of the held-out nodes is listed in Table 1 as the “Philosophy Crawl” dataset. [sent-192, score-0.094]
84 Facebook social networks Applying SPML to social network data allows us to more accurately predict who will become friends based on the profile information for those users. [sent-194, score-0.236]
85 Shown below: number of nodes n, number of edges m, dimensionality d, and AUC performance. [sent-203, score-0.094]
86 818 the Facebook networks for the four schools we consider: Harvard, MIT, Stanford, and Columbia. [sent-234, score-0.078]
87 Harvard MIT Stanford Columbia status gender major dorm year 0 0. [sent-239, score-0.089]
88 5 Figure 3: Comparison of Facebook social networks from four schools in terms of feature importance computed from the learned structure preserving metric. [sent-244, score-0.391]
89 For all schools except MIT, the graduating year is most important for determining distance between people. [sent-248, score-0.091]
90 A possible explanation for this difference is that MIT is the only school in the list that makes it easy for students to stay in a residence for all four years of their undergraduate program, and therefore which dorm one lives in may affect more strongly the people they connect to. [sent-250, score-0.085]
91 4 Discussion We have demonstrated a fast convex optimization for learning a distance metric from a network such that the distances are tied to the network’s inherent topological structure. [sent-251, score-0.439]
92 The structure preserving distance metrics introduced in this article allow us to better model and predict the behavior of large real-world networks. [sent-252, score-0.292]
93 Furthermore, these metrics are as lightweight as independent pairwise models, but capture structural dependency from features making them easy to use in practice for link-prediction. [sent-253, score-0.09]
94 In future work, we plan to exploit SPML’s lack of dependence on graph size to learn a structure preserving metric on massive-scale graphs, e. [sent-254, score-0.487]
95 Since each iteration requires only sampling a random node, following a link to a neighbor, and sampling a non-neighbor, this can all be done in an online fashion as the algorithm crawls a network such as the worldwide web, learning a metric that may gradually change over time. [sent-257, score-0.307]
96 Make new friends, but keep the old: recommending people on social networking sites. [sent-290, score-0.078]
97 A survey of link mining tasks for analyzing noisy and incomplete networks. [sent-354, score-0.088]
98 Distance metric learning for large margin nearest neighbor classification. [sent-403, score-0.279]
99 Distance metric learning with application to clustering with side-information. [sent-410, score-0.187]
100 Discovering disease-genes by topological features in human protein-protein interaction network. [sent-419, score-0.094]
wordName wordTfidf (topN-words)
[('spml', 0.759), ('preserving', 0.196), ('metric', 0.187), ('connectivity', 0.184), ('facebook', 0.163), ('rtm', 0.139), ('wikipedia', 0.138), ('dm', 0.128), ('philosophy', 0.075), ('distances', 0.074), ('node', 0.071), ('adjacency', 0.069), ('graph', 0.067), ('mt', 0.067), ('xj', 0.064), ('link', 0.063), ('neighbor', 0.062), ('dorm', 0.062), ('scrambled', 0.062), ('unconnected', 0.062), ('pegasos', 0.061), ('nodes', 0.06), ('constraints', 0.059), ('svm', 0.059), ('pro', 0.059), ('links', 0.059), ('distance', 0.059), ('psd', 0.059), ('network', 0.057), ('features', 0.057), ('tr', 0.055), ('social', 0.055), ('sdp', 0.051), ('aij', 0.05), ('engines', 0.05), ('bert', 0.046), ('cik', 0.046), ('cjj', 0.046), ('cki', 0.046), ('ckk', 0.046), ('networks', 0.046), ('mx', 0.045), ('learns', 0.044), ('auc', 0.043), ('xi', 0.041), ('neighbors', 0.041), ('cji', 0.041), ('impostors', 0.041), ('roc', 0.041), ('subgraph', 0.039), ('columbia', 0.039), ('weight', 0.038), ('triplets', 0.038), ('topology', 0.038), ('structure', 0.037), ('topological', 0.037), ('bn', 0.036), ('embedding', 0.036), ('shaw', 0.035), ('connected', 0.034), ('rd', 0.034), ('graphs', 0.034), ('edges', 0.034), ('mahalanobis', 0.033), ('euclidean', 0.033), ('structural', 0.033), ('harvard', 0.032), ('schools', 0.032), ('xc', 0.032), ('aik', 0.031), ('ail', 0.031), ('argminm', 0.031), ('ddml', 0.031), ('dmt', 0.031), ('maxl', 0.031), ('rx', 0.031), ('blake', 0.031), ('nearest', 0.03), ('cij', 0.03), ('les', 0.03), ('topics', 0.029), ('subgradient', 0.028), ('stochastic', 0.028), ('input', 0.027), ('crawl', 0.027), ('status', 0.027), ('mxi', 0.027), ('mxj', 0.027), ('connections', 0.026), ('transformation', 0.026), ('xk', 0.025), ('synthetic', 0.025), ('learned', 0.025), ('tied', 0.025), ('farthest', 0.025), ('chi', 0.025), ('mining', 0.025), ('friends', 0.023), ('pushing', 0.023), ('people', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 150 nips-2011-Learning a Distance Metric from a Network
Author: Blake Shaw, Bert Huang, Tony Jebara
Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1
2 0.13624161 171 nips-2011-Metric Learning with Multiple Kernels
Author: Jun Wang, Huyen T. Do, Adam Woznica, Alexandros Kalousis
Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection. 1
3 0.11539044 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
4 0.081522107 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
5 0.072935641 60 nips-2011-Confidence Sets for Network Structure
Author: David S. Choi, Patrick J. Wolfe, Edoardo M. Airoldi
Abstract: Latent variable models are frequently used to identify structure in dichotomous network data, in part because they give rise to a Bernoulli product likelihood that is both well understood and consistent with the notion of exchangeable random graphs. In this article we propose conservative confidence sets that hold with respect to these underlying Bernoulli parameters as a function of any given partition of network nodes, enabling us to assess estimates of residual network structure, that is, structure that cannot be explained by known covariates and thus cannot be easily verified by manual inspection. We demonstrate the proposed methodology by analyzing student friendship networks from the National Longitudinal Survey of Adolescent Health that include race, gender, and school year as covariates. We employ a stochastic expectation-maximization algorithm to fit a logistic regression model that includes these explanatory variables as well as a latent stochastic blockmodel component and additional node-specific effects. Although maximumlikelihood estimates do not appear consistent in this context, we are able to evaluate confidence sets as a function of different blockmodel partitions, which enables us to qualitatively assess the significance of estimated residual network structure relative to a baseline, which models covariates but lacks block structure. 1
6 0.061555188 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data
7 0.061500549 263 nips-2011-Sparse Manifold Clustering and Embedding
8 0.058677536 213 nips-2011-Phase transition in the family of p-resistances
9 0.057443663 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
10 0.054880138 157 nips-2011-Learning to Search Efficiently in High Dimensions
11 0.054342888 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
12 0.053004846 231 nips-2011-Randomized Algorithms for Comparison-based Search
13 0.052722488 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
14 0.052662633 67 nips-2011-Data Skeletonization via Reeb Graphs
15 0.052640993 244 nips-2011-Selecting Receptive Fields in Deep Networks
16 0.0519628 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
17 0.051641326 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
18 0.05128574 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time
19 0.049501125 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
20 0.049484409 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
topicId topicWeight
[(0, 0.159), (1, 0.045), (2, -0.068), (3, -0.01), (4, -0.019), (5, -0.038), (6, -0.021), (7, -0.016), (8, -0.009), (9, -0.137), (10, -0.025), (11, 0.041), (12, -0.047), (13, 0.028), (14, 0.001), (15, 0.061), (16, -0.042), (17, 0.098), (18, -0.001), (19, -0.027), (20, -0.027), (21, 0.031), (22, 0.108), (23, 0.057), (24, 0.034), (25, -0.011), (26, -0.009), (27, -0.039), (28, -0.033), (29, 0.041), (30, -0.036), (31, -0.051), (32, -0.102), (33, 0.1), (34, 0.048), (35, 0.003), (36, -0.065), (37, 0.078), (38, 0.173), (39, -0.075), (40, 0.029), (41, -0.058), (42, 0.008), (43, 0.064), (44, -0.092), (45, 0.083), (46, 0.011), (47, -0.07), (48, -0.087), (49, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.93325043 150 nips-2011-Learning a Distance Metric from a Network
Author: Blake Shaw, Bert Huang, Tony Jebara
Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1
2 0.72182465 171 nips-2011-Metric Learning with Multiple Kernels
Author: Jun Wang, Huyen T. Do, Adam Woznica, Alexandros Kalousis
Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection. 1
3 0.63887602 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
4 0.57986313 67 nips-2011-Data Skeletonization via Reeb Graphs
Author: Xiaoyin Ge, Issam I. Safa, Mikhail Belkin, Yusu Wang
Abstract: Recovering hidden structure from complex and noisy non-linear data is one of the most fundamental problems in machine learning and statistical inference. While such data is often high-dimensional, it is of interest to approximate it with a lowdimensional or even one-dimensional space, since many important aspects of data are often intrinsically low-dimensional. Furthermore, there are many scenarios where the underlying structure is graph-like, e.g, river/road networks or various trajectories. In this paper, we develop a framework to extract, as well as to simplify, a one-dimensional ”skeleton” from unorganized data using the Reeb graph. Our algorithm is very simple, does not require complex optimizations and can be easily applied to unorganized high-dimensional data such as point clouds or proximity graphs. It can also represent arbitrary graph structures in the data. We also give theoretical results to justify our method. We provide a number of experiments to demonstrate the effectiveness and generality of our algorithm, including comparisons to existing methods, such as principal curves. We believe that the simplicity and practicality of our algorithm will help to promote skeleton graphs as a data analysis tool for a broad range of applications.
5 0.52957857 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
6 0.52027702 274 nips-2011-Structure Learning for Optimization
7 0.50511408 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data
8 0.46801531 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
9 0.46431106 181 nips-2011-Multiple Instance Learning on Structured Data
10 0.46164262 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
11 0.45566481 263 nips-2011-Sparse Manifold Clustering and Embedding
12 0.45545059 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
13 0.45449904 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
14 0.43179488 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
15 0.42756966 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks
16 0.42625913 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
17 0.42417231 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
18 0.42080542 157 nips-2011-Learning to Search Efficiently in High Dimensions
19 0.41742709 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
20 0.41007501 213 nips-2011-Phase transition in the family of p-resistances
topicId topicWeight
[(0, 0.025), (4, 0.094), (11, 0.23), (20, 0.028), (26, 0.011), (31, 0.065), (33, 0.019), (43, 0.043), (45, 0.12), (56, 0.021), (57, 0.071), (65, 0.014), (74, 0.041), (83, 0.074), (99, 0.041)]
simIndex simValue paperId paperTitle
1 0.81131399 73 nips-2011-Divide-and-Conquer Matrix Factorization
Author: Lester W. Mackey, Michael I. Jordan, Ameet Talwalkar
Abstract: This work introduces Divide-Factor-Combine (DFC), a parallel divide-andconquer framework for noisy matrix factorization. DFC divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorization algorithm, and combines the subproblem solutions using techniques from randomized matrix approximation. Our experiments with collaborative filtering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with this approach. Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.
same-paper 2 0.78968436 150 nips-2011-Learning a Distance Metric from a Network
Author: Blake Shaw, Bert Huang, Tony Jebara
Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1
3 0.75498199 18 nips-2011-Action-Gap Phenomenon in Reinforcement Learning
Author: Amir-massoud Farahmand
Abstract: Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an ˆ agent following the greedy policy π with respect to an action-value function Q, the ˆ ˆ ˆ performance loss E V ∗ (X) − V π (X) is upper bounded by O( Q − Q∗ 1+ζ ), ∞ in which ζ ≥ 0 is the parameter quantifying the action-gap regularity. For ζ > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms. 1
4 0.75247657 36 nips-2011-Analysis and Improvement of Policy Gradient Estimation
Author: Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama
Abstract: Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates. Finally, we demonstrate the usefulness of the improved PGPE method through experiments. 1
5 0.73644346 66 nips-2011-Crowdclustering
Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona
Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1
6 0.6266737 64 nips-2011-Convergent Bounds on the Euclidean Distance
7 0.61964357 127 nips-2011-Image Parsing with Stochastic Scene Grammar
8 0.61438596 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
9 0.61197674 219 nips-2011-Predicting response time and error rates in visual search
10 0.61009115 171 nips-2011-Metric Learning with Multiple Kernels
11 0.60928857 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
12 0.6090408 139 nips-2011-Kernel Bayes' Rule
13 0.60896415 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
14 0.60851079 231 nips-2011-Randomized Algorithms for Comparison-based Search
15 0.6069743 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
16 0.60600609 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
17 0.60591263 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
18 0.60576791 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
19 0.60471505 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
20 0.60226518 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data