nips nips2011 nips2011-146 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shilin Ding, Grace Wahba, Xiaojin Zhu
Abstract: In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu 1 Abstract In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. [sent-5, score-0.514]
2 observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. [sent-8, score-0.124]
3 The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. [sent-9, score-0.253]
4 We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. [sent-10, score-0.259]
5 The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. [sent-11, score-0.396]
6 Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. [sent-13, score-0.373]
7 These groups with overlaps are designed to enforce hierarchical function selection. [sent-14, score-0.206]
8 In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. [sent-15, score-0.402]
9 1 Introduction In undirected graphical models (UGMs), a graph is defined as G = (V, E), where V = {1, · · · , K} is the set of nodes and E ⊂ V × V is the set of edges between the nodes. [sent-16, score-0.41]
10 The graph structure specifies the conditional independence among nodes. [sent-17, score-0.401]
11 Much prior work has focused on graphical model structure learning without conditioning on X. [sent-18, score-0.115]
12 Ising models are special cases of discrete UGMs with (usually) only pairwise interactions, and without features. [sent-25, score-0.121]
13 We focused on discrete UGMs with both higher order interactions and features. [sent-26, score-0.218]
14 It is important to note that the graph structure may change conditioned on different X’s, thus our approach may lead to better estimates and interpretation. [sent-27, score-0.283]
15 In addressing the problem of structure learning with features, Liu et al. [sent-28, score-0.086]
16 Bradley and Guestrin [7] learned tree CRF that recovers a max spanning tree of a complete graph based on heuristic pairwise link scores. [sent-32, score-0.351]
17 The closest work is Schmidt and Murphy [8], which examined the higher-order graphical structure ∗ SD wishes to acknowledge the valuable comments from Stephen J. [sent-34, score-0.115]
18 They used an active set method to learn higher order interactions in a greedy manner. [sent-39, score-0.202]
19 Their model is over-parameterized, and the hierarchical assumption is sufficient but not necessary for conditional independence in the graph. [sent-40, score-0.207]
20 To the best of our knowledge, no previous work addressed the issue of graph structure learning of all orders while conditioning on input features. [sent-41, score-0.253]
21 Our contributions include a reparemeterization of UGMs with bivariate outcomes into multivariate Bernoulli (MVB) models. [sent-42, score-0.108]
22 The set of conditional log odds ratios in MVB models are complete to represent the effects of features on responses and their interactions at all levels. [sent-43, score-0.387]
23 The sparsity in the set of functions are sufficient and necessary for the conditional independence in the graph, i. [sent-44, score-0.148]
24 , two nodes are conditionally independent iff the pairwise interaction is constant zero; and the higher order interaction among a subset of nodes means none of the variables is separable from the others in the joint distribution. [sent-46, score-0.547]
25 To obtain a sparse graph structure, we impose Structure Lasso (SLasso) penalty on groups of functions with overlaps. [sent-47, score-0.373]
26 SLasso can be viewed as group lasso with overlaps. [sent-48, score-0.148]
27 [10] considered the penalty on groups with arbitrary overlaps. [sent-51, score-0.172]
28 Our groups are designed to shrink higher order interactions similar to hierarchical inclusion restriction in Schimdt and Murphy [8]. [sent-54, score-0.345]
29 We give a proximal linearization algorithm that efficiently learns the complete model. [sent-55, score-0.177]
30 We then propose a greedy search algorithm to scale our method up to large graphs as the number of parameters grows exponentially. [sent-57, score-0.098]
31 2 Conditional Independence in Discrete Undirected Graphical Models In this section, we first discuss the relationship between the multivariate Bernoulli (MVB) model and the UGM whose nodes are binary, i. [sent-58, score-0.138]
32 In UGMs, the distribution of multivariate discrete random variables Y1 , . [sent-62, score-0.094]
33 , K} is the set of nodes that are fully connected. [sent-73, score-0.101]
34 This factorization follows from the Markov property: any two nodes not in a clique are conditionally independent given others [13]. [sent-75, score-0.248]
35 So C does not have to comply with the graph structure, as long as it is sufficient. [sent-76, score-0.201]
36 For example, the most general choice for any given graph is C = {Ω}. [sent-77, score-0.201]
37 Given the graph structure, the potential functions characterize the distribution on the graph. [sent-82, score-0.201]
38 But if the graph is unknown in advance, estimating the potential functions on all possible cliques tends to be over-parameterized [8]. [sent-83, score-0.263]
39 Furthermore, log ΦC (yC ; X) = 0 is sufficient for the conditional independence among the nodes but not necessary (see Example 2. [sent-84, score-0.249]
40 , YK = yk |X = x) = exp ω∈ΨK = exp y1 f 1 (x) + · · · + yK f K (x) + · · · + y1 y2 f 1,2 (x) + · · · + y1 . [sent-92, score-0.154]
41 Let ω denotes a set in ΨK , define Y = (y 1 , · · · , y ω , · · · , y Ω ) be the augmented response with y ω = i∈ω yi . [sent-103, score-0.105]
42 , f Ω ) is the vector of conditional log odds ratios [14]. [sent-110, score-0.185]
43 We focus on estimating the set of f ω (x) with feature x where the sparsity in the set specifies the graph structure. [sent-113, score-0.201]
44 The following property holds: even odd κ∈Ψω even P (Yi = 1, i ∈ κ; Yj = 0, j ∈ Ω\κ|X) κ∈Ψω odd f ω = log P (Yi = 1, i ∈ κ; Yj = 0, j ∈ Ω\κ|X) , b(f ) = log Z(x) C∈C ΦC (0; x) (3) Theorem 2. [sent-118, score-0.166]
45 A UGM of the general form (1) with binary nodes is equivalent to a MVB model of (2). [sent-120, score-0.101]
46 In addition, the following are equivalent: 1) There is no |C|-order interaction in {Yi , i ∈ C}; 2) There is no clique C ∈ ΨK in the graph; 3) f ω = 0 for all ω such that C ⊆ ω. [sent-121, score-0.172]
47 It states that there is a clique C in the graph, iff there is ω ⊇ C, f ω = 0 in MVB model. [sent-123, score-0.173]
48 The advantage of modeling by MVB is that the sparsity in f ω ’s is sufficient and necessary for the conditional independence in the graph, thus fully specifying the graph structure. [sent-124, score-0.349]
49 Specifically, Yi , Yj are conditionally independent iff f ω = 0, ω ⊇ {i, j}. [sent-125, score-0.11]
50 This showed the interaction is non-zero iff all the nodes involved are not conditionally independent. [sent-126, score-0.278]
51 The relation between UGM and MVB is f 1 = log Φ10 , Φ00 f 2 = log Φ01 , Φ00 f 1,2 = log Φ11 · Φ00 Φ01 · Φ10 Note, the independence between Y1 and Y2 implies: f 1,2 = 0 or Φ11 · Φ00 = Φ01 · Φ10 . [sent-131, score-0.089]
52 Therefore, f 1,2 being zero in MVB model is sufficient and necessary for the conditional independence in the model. [sent-132, score-0.148]
53 3 Structure Penalty In many applications, the assumption is that the graph has very few large cliques. [sent-142, score-0.201]
54 Similar to the hierarchical inclusion restriction in Schmidt and Murphy [8], we will include a higher order interaction only when all its subsets are included. [sent-143, score-0.163]
55 The hierarchical assumption is that if there is no interaction on clique C, then all f ω should be zero, for ω ⊇ C. [sent-155, score-0.231]
56 The penalty is designed to shrink such f ω toward zero. [sent-156, score-0.127]
57 We consider the Structure Lasso (SLasso) penalty guided by the lattice in Figure 2. [sent-157, score-0.16]
58 [16] discussed how to define the groups to achieve different nonzero patterns. [sent-167, score-0.085]
59 Figure 2: Hierarchical lattice for penalty Let Tv = {ω ∈ ΨK |v ⊆ ω} be the subgraph rooted at v in T , including all the descendants of v. [sent-168, score-0.16]
60 All the functions are categorized into groups with overlaps as (T1 , . [sent-170, score-0.147]
61 The SLasso penalty on the group Tv is: J(f Tv ) = pv 1 |Tv | . [sent-174, score-0.251]
62 the weight for the penalty on Tv , empirically chosen as min f Iλ (f ) = L(f ) + λ fω 2 Hω where pv is Then, the objective is: fω pv v ω∈Tv 2 Hω (6) ω∈Tv The following theorem shows that by minimizing the objective (6), f ω1 will enter the model before f ω2 if ω1 ⊂ ω2 . [sent-175, score-0.283]
63 That is to say, if f ω1 is zero, there will be no higher order interactions on ω2 . [sent-176, score-0.161]
64 1 pv c T v Iλ (c) = L(c) + λ (7) v Estimating the complete model on small graphs Many applications do not involve a large amount of responses, so it is desirable to learn the complete model when the graph is small for consistency reasons. [sent-197, score-0.452]
65 With slight abuse of notation, we denote ck as the value of c at kth step. [sent-199, score-0.102]
66 Following the analysis in Wright [12], we can ensure that the proximal linearization algorithm will converge for the negative log-likelihood loss function with the SLasso penalty. [sent-201, score-0.129]
67 However, solving group lasso with overlaps is not trivial due to the non-smoothness at the singular point. [sent-202, score-0.21]
68 [10] duplicated the design matrix columns that appear in group overlaps, then solved the problem as group lasso without overlaps. [sent-205, score-0.214]
69 Kim and Xing [17] reparameterized the group norm with additional dummy variables. [sent-206, score-0.102]
70 2 Estimating large graphs The above algorithm is efficient on small graphs (K < 20). [sent-212, score-0.114]
71 However, the issue of estimating a complete model is the exponential number of f ω ’s and the same amount of groups involved in objective (7). [sent-214, score-0.133]
72 The hierarchical assumption and the SLasso penalty lend themselves naturally to a greedy search algorithm: 1. [sent-216, score-0.187]
73 In step i, remove the nodes that are not in Ai from the lattice in Figure 2. [sent-219, score-0.174]
74 Keep adding a higher order interaction into Ai+1 if all its subsets of i interactions are included in A′ . [sent-224, score-0.228]
75 Graph 5 has 100 nodes where the first 8 nodes have the same structure as in Figure 1(c) and the others are independent. [sent-234, score-0.254]
76 Graph 6 also has 100 nodes where the first 10 nodes have the same connection as in Figure 1(d) and the others are independent. [sent-235, score-0.202]
77 We set the intercepts jk cω in main effects to 1, and those in second or higher order interactions to 2. [sent-246, score-0.191]
78 We use AIC for greedy search (Greedy-AIC) in graphs 5 and 6 due to computational consideration. [sent-253, score-0.098]
79 In Figure 3, we show the learning results in terms of true positive rate (TPR) as sample size increases from 100 to 1000. [sent-262, score-0.091]
80 By adjusting λ, we observe new interactions enter the model. [sent-277, score-0.124]
81 5 GACV BGACV BMN 1000 −20 ) 200 400 600 Sample Size 800 (f) Graph 6 (< 10 1000 −20 ) Figure 3: The True Positive Rate (TPR) of graph structure learning methods with increasing sample size. [sent-321, score-0.253]
82 Table 2: Selected response variables Response Vote Poverty VCrime PCrime URate PChange Description 2004 votes for Republican presidential candidate Poverty Rate Violent Crime Rate, eg. [sent-324, score-0.122]
83 burglary Unemployment Rate Population change in percent from 2000 to 2006 Positive% 81. [sent-326, score-0.135]
84 The unemployment rate plays an important role as a hub as discovered by SLasso, but not by BMN. [sent-335, score-0.098]
85 The number in bracket is the function norm on the clique and the absolute value of the elements in the concentration matrix, respectively. [sent-338, score-0.147]
86 We note SLasso discovers at 7th step two third-order interactions which are displayed by two circles in (a). [sent-339, score-0.124]
87 0389, which is the second lowest absolute pairwise correlation, the 7 link is firstly recovered by SLasso. [sent-342, score-0.133]
88 This shows that after taking features into account, the dependence structure of response variables may change and hidden relations could be discovered. [sent-344, score-0.115]
89 The main factors in this case are “percentage of housing unit change” (X1 ) and “population percentage of people over 65” (X2 ). [sent-345, score-0.148]
90 The part of the fitted model shown below suggests that as housing units increase, the counties are more likely to have both positive results for “Vote” and “PChange”. [sent-346, score-0.15]
91 But this tendency will be counteracted by the increase of people over 65: the responses are less likely to take both positive values. [sent-347, score-0.104]
92 0458 · X2 + · · · 6 Conclusions Our SLasso method can learn the graph structure that is specified by the conditional log odds ratios conditioned on input features X, which allows the graphical model depending on features. [sent-354, score-0.501]
93 A greedy approach is applied when the graph is large. [sent-357, score-0.242]
94 Conversely, given the MVB model of (2), the cliques can be determined by the nonzero f ω : clique C exists if C = ω and f ω = 0. [sent-368, score-0.167]
95 Then the maximal cliques can be inferred from the graph structure. [sent-369, score-0.263]
96 Thus, UGM (1) with bivariate nodes is equivalent to MVB (2). [sent-382, score-0.138]
97 To show 2 ⇒ 3, let yC be a realization of yC such that yC = (yi )i∈C where ω ω ′ κ κ′ yi = 1 if i ∈ ω and yi = 0 otherwise. [sent-384, score-0.144]
98 So the partial derivative of ˆ c ˆ the objective (7) with respect to cω1 at cω1 is ∂L cω1 ˆ (11) +λ pv T v = 0 ∂cω1 cω1 =ˆω1 c ˆ c v⊆ω1 ∂L Thus, the probability of {ˆω2 = 0} equals to the probability of { ∂cω1 c 2 c cω1 =ˆω1 = 0}, which is 0. [sent-400, score-0.098]
99 Convex structure learning in log-linear models: Beyond pairwise potentials. [sent-456, score-0.116]
100 Tree-guided group lasso for multi-task regression with structured sparsity. [sent-511, score-0.148]
wordName wordTfidf (topN-words)
[('mvb', 0.389), ('bmn', 0.363), ('slasso', 0.296), ('yc', 0.231), ('ugm', 0.228), ('graph', 0.201), ('tv', 0.182), ('yk', 0.154), ('ugms', 0.13), ('interactions', 0.124), ('clique', 0.105), ('percent', 0.105), ('gacv', 0.104), ('ck', 0.102), ('nodes', 0.101), ('schmidt', 0.099), ('pv', 0.098), ('independence', 0.089), ('penalty', 0.087), ('groups', 0.085), ('odd', 0.083), ('lasso', 0.082), ('pchange', 0.078), ('tpr', 0.078), ('lattice', 0.073), ('yi', 0.072), ('odds', 0.071), ('murphy', 0.07), ('iff', 0.068), ('interaction', 0.067), ('group', 0.066), ('proximal', 0.065), ('dk', 0.064), ('linearization', 0.064), ('pairwise', 0.064), ('graphical', 0.063), ('census', 0.063), ('overlaps', 0.062), ('cliques', 0.062), ('fpr', 0.059), ('housing', 0.059), ('people', 0.059), ('conditional', 0.059), ('vote', 0.059), ('hierarchical', 0.059), ('discrete', 0.057), ('graphs', 0.057), ('ratios', 0.055), ('structure', 0.052), ('bgacv', 0.052), ('bureau', 0.052), ('ctv', 0.052), ('presidential', 0.052), ('republican', 0.052), ('unemployment', 0.052), ('complete', 0.048), ('rate', 0.046), ('counties', 0.046), ('county', 0.046), ('wahba', 0.046), ('crime', 0.046), ('poverty', 0.046), ('parameterization', 0.045), ('positive', 0.045), ('undirected', 0.045), ('ising', 0.044), ('conditionally', 0.042), ('jacob', 0.042), ('bracket', 0.042), ('aic', 0.042), ('tol', 0.042), ('greedy', 0.041), ('wright', 0.04), ('zhao', 0.04), ('shrink', 0.04), ('link', 0.038), ('higher', 0.037), ('votes', 0.037), ('koh', 0.037), ('bivariate', 0.037), ('multivariate', 0.037), ('kim', 0.036), ('peng', 0.036), ('bradley', 0.036), ('dummy', 0.036), ('ci', 0.036), ('spline', 0.034), ('outcomes', 0.034), ('et', 0.034), ('response', 0.033), ('meinshausen', 0.033), ('lk', 0.032), ('ai', 0.031), ('yj', 0.031), ('jenatton', 0.031), ('recovered', 0.031), ('effects', 0.03), ('percentage', 0.03), ('change', 0.03), ('gj', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
Author: Shilin Ding, Grace Wahba, Xiaojin Zhu
Abstract: In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
2 0.17344764 78 nips-2011-Efficient Methods for Overlapping Group Lasso
Author: Lei Yuan, Jun Liu, Jieping Ye
Abstract: The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms. 1
3 0.11747137 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
Author: Ali Jalali, Christopher C. Johnson, Pradeep K. Ravikumar
Abstract: In this paper, we address the problem of learning the structure of a pairwise graphical model from samples in a high-dimensional setting. Our first main result studies the sparsistency, or consistency in sparsity pattern recovery, properties of a forward-backward greedy algorithm as applied to general statistical models. As a special case, we then apply this algorithm to learn the structure of a discrete graphical model via neighborhood estimation. As a corollary of our general result, we derive sufficient conditions on the number of samples n, the maximum nodedegree d and the problem size p, as well as other conditions on the model parameters, so that the algorithm recovers all the edges with high probability. Our result guarantees graph selection for samples scaling as n = Ω(d2 log(p)), in contrast to existing convex-optimization based algorithms that require a sample complexity of Ω(d3 log(p)). Further, the greedy algorithm only requires a restricted strong convexity condition which is typically milder than irrepresentability assumptions. We corroborate these results using numerical simulations at the end.
4 0.11358344 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky
Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.
5 0.091432825 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1
6 0.08458893 229 nips-2011-Query-Aware MCMC
7 0.082466818 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
8 0.0820942 63 nips-2011-Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
9 0.076508999 213 nips-2011-Phase transition in the family of p-resistances
10 0.076406173 276 nips-2011-Structured sparse coding via lateral inhibition
11 0.074935742 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
12 0.074685402 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
13 0.072478764 251 nips-2011-Shaping Level Sets with Submodular Functions
14 0.071820945 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations
15 0.071285971 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
16 0.070792645 154 nips-2011-Learning person-object interactions for action recognition in still images
17 0.069494277 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
18 0.068331294 274 nips-2011-Structure Learning for Optimization
19 0.066164695 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
20 0.066158466 67 nips-2011-Data Skeletonization via Reeb Graphs
topicId topicWeight
[(0, 0.195), (1, 0.033), (2, -0.063), (3, -0.065), (4, -0.06), (5, -0.028), (6, -0.104), (7, 0.028), (8, 0.054), (9, -0.037), (10, 0.009), (11, -0.044), (12, -0.102), (13, 0.037), (14, -0.021), (15, 0.074), (16, 0.157), (17, -0.071), (18, 0.013), (19, -0.002), (20, -0.057), (21, 0.059), (22, -0.033), (23, 0.113), (24, -0.007), (25, -0.077), (26, 0.035), (27, -0.049), (28, -0.021), (29, -0.031), (30, 0.052), (31, -0.076), (32, -0.113), (33, 0.09), (34, -0.084), (35, 0.019), (36, 0.015), (37, -0.138), (38, 0.027), (39, -0.075), (40, 0.138), (41, -0.023), (42, 0.035), (43, -0.092), (44, 0.034), (45, 0.009), (46, -0.031), (47, -0.0), (48, -0.045), (49, 0.096)]
simIndex simValue paperId paperTitle
same-paper 1 0.92791247 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
Author: Shilin Ding, Grace Wahba, Xiaojin Zhu
Abstract: In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
2 0.70753211 78 nips-2011-Efficient Methods for Overlapping Group Lasso
Author: Lei Yuan, Jun Liu, Jieping Ye
Abstract: The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms. 1
3 0.70029515 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky
Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.
4 0.62157643 67 nips-2011-Data Skeletonization via Reeb Graphs
Author: Xiaoyin Ge, Issam I. Safa, Mikhail Belkin, Yusu Wang
Abstract: Recovering hidden structure from complex and noisy non-linear data is one of the most fundamental problems in machine learning and statistical inference. While such data is often high-dimensional, it is of interest to approximate it with a lowdimensional or even one-dimensional space, since many important aspects of data are often intrinsically low-dimensional. Furthermore, there are many scenarios where the underlying structure is graph-like, e.g, river/road networks or various trajectories. In this paper, we develop a framework to extract, as well as to simplify, a one-dimensional ”skeleton” from unorganized data using the Reeb graph. Our algorithm is very simple, does not require complex optimizations and can be easily applied to unorganized high-dimensional data such as point clouds or proximity graphs. It can also represent arbitrary graph structures in the data. We also give theoretical results to justify our method. We provide a number of experiments to demonstrate the effectiveness and generality of our algorithm, including comparisons to existing methods, such as principal curves. We believe that the simplicity and practicality of our algorithm will help to promote skeleton graphs as a data analysis tool for a broad range of applications.
5 0.59632844 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
Author: Ali Jalali, Christopher C. Johnson, Pradeep K. Ravikumar
Abstract: In this paper, we address the problem of learning the structure of a pairwise graphical model from samples in a high-dimensional setting. Our first main result studies the sparsistency, or consistency in sparsity pattern recovery, properties of a forward-backward greedy algorithm as applied to general statistical models. As a special case, we then apply this algorithm to learn the structure of a discrete graphical model via neighborhood estimation. As a corollary of our general result, we derive sufficient conditions on the number of samples n, the maximum nodedegree d and the problem size p, as well as other conditions on the model parameters, so that the algorithm recovers all the edges with high probability. Our result guarantees graph selection for samples scaling as n = Ω(d2 log(p)), in contrast to existing convex-optimization based algorithms that require a sample complexity of Ω(d3 log(p)). Further, the greedy algorithm only requires a restricted strong convexity condition which is typically milder than irrepresentability assumptions. We corroborate these results using numerical simulations at the end.
6 0.54479504 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
7 0.53242707 274 nips-2011-Structure Learning for Optimization
8 0.53052616 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs
9 0.52086163 63 nips-2011-Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
10 0.51890785 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
11 0.50011486 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation
12 0.48454088 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
13 0.451529 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
14 0.44377303 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
15 0.44141769 213 nips-2011-Phase transition in the family of p-resistances
16 0.4397727 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
17 0.43668613 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
18 0.42156783 60 nips-2011-Confidence Sets for Network Structure
19 0.41596687 130 nips-2011-Inductive reasoning about chimeric creatures
20 0.40366125 150 nips-2011-Learning a Distance Metric from a Network
topicId topicWeight
[(0, 0.024), (4, 0.039), (20, 0.02), (26, 0.029), (31, 0.049), (33, 0.011), (43, 0.531), (45, 0.093), (57, 0.031), (74, 0.046), (77, 0.01), (83, 0.026), (99, 0.02)]
simIndex simValue paperId paperTitle
1 0.99501568 26 nips-2011-Additive Gaussian Processes
Author: David K. Duvenaud, Hannes Nickisch, Carl E. Rasmussen
Abstract: We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks. 1
same-paper 2 0.96140611 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
Author: Shilin Ding, Grace Wahba, Xiaojin Zhu
Abstract: In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
3 0.95670813 282 nips-2011-The Fast Convergence of Boosting
Author: Matus J. Telgarsky
Abstract: This manuscript considers the convergence rate of boosting under a large class of losses, including the exponential and logistic losses, where the best previous rate of convergence was O(exp(1/✏2 )). First, it is established that the setting of weak learnability aids the entire class, granting a rate O(ln(1/✏)). Next, the (disjoint) conditions under which the infimal empirical risk is attainable are characterized in terms of the sample and weak learning class, and a new proof is given for the known rate O(ln(1/✏)). Finally, it is established that any instance can be decomposed into two smaller instances resembling the two preceding special cases, yielding a rate O(1/✏), with a matching lower bound for the logistic loss. The principal technical hurdle throughout this work is the potential unattainability of the infimal empirical risk; the technique for overcoming this barrier may be of general interest. 1
4 0.94019836 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis
Author: Jonathan W. Pillow, Il M. Park
Abstract: Neurons typically respond to a restricted number of stimulus features within the high-dimensional space of natural stimuli. Here we describe an explicit modelbased interpretation of traditional estimators for a neuron’s multi-dimensional feature space, which allows for several important generalizations and extensions. First, we show that traditional estimators based on the spike-triggered average (STA) and spike-triggered covariance (STC) can be formalized in terms of the “expected log-likelihood” of a Linear-Nonlinear-Poisson (LNP) model with Gaussian stimuli. This model-based formulation allows us to define maximum-likelihood and Bayesian estimators that are statistically consistent and efficient in a wider variety of settings, such as with naturalistic (non-Gaussian) stimuli. It also allows us to employ Bayesian methods for regularization, smoothing, sparsification, and model comparison, and provides Bayesian confidence intervals on model parameters. We describe an empirical Bayes method for selecting the number of features, and extend the model to accommodate an arbitrary elliptical nonlinear response function, which results in a more powerful and more flexible model for feature space inference. We validate these methods using neural data recorded extracellularly from macaque primary visual cortex. 1
5 0.93437672 264 nips-2011-Sparse Recovery with Brownian Sensing
Author: Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
Abstract: We consider the problem of recovering the parameter α ∈ RK of a sparse function f (i.e. the number of non-zero entries of α is small compared to the number K of features) given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomization process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven, independently on the number of sampling points N , even when the features are arbitrarily non-orthogonal. Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method. 1
6 0.92435896 288 nips-2011-Thinning Measurement Models and Questionnaire Design
7 0.76521355 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
8 0.7334519 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
9 0.71847445 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
10 0.7169289 123 nips-2011-How biased are maximum entropy models?
11 0.71328551 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
12 0.69466174 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
13 0.69146079 294 nips-2011-Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning
14 0.68147749 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
15 0.680722 24 nips-2011-Active learning of neural response functions with Gaussian processes
16 0.67513734 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
17 0.67016262 152 nips-2011-Learning in Hilbert vs. Banach Spaces: A Measure Embedding Viewpoint
18 0.66941637 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
19 0.66652673 306 nips-2011-t-divergence Based Approximate Inference
20 0.66259789 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis