nips nips2011 nips2011-274 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shulin Yang, Ali Rahimi
Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. [sent-4, score-0.194]
2 We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. [sent-6, score-0.252]
3 To develop these methods, we introduce a notion of coupling between variables of optimization. [sent-7, score-0.32]
4 This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. [sent-8, score-0.451]
5 Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. [sent-9, score-0.388]
6 We propose solving such optimization problems by first estimating the internal structure of the black box function, then optimizing the function with message passing algorithms that take advantage of this structure. [sent-12, score-0.237]
7 This lets us solve global optimization problems as a sequence of small grid searches that are coordinated by dynamic programming. [sent-13, score-0.375]
8 Many optimization problems exhibit only loose coupling between many of the variables of optimization. [sent-16, score-0.326]
9 Such notions of conditional decoupling are conveniently depicted in a graphical form that represents the way the objective function factors into a sum or product of terms each involving only a small subset of the variables. [sent-19, score-0.578]
10 This factorization structure can then be exploited by optimization procedures such as dynamic programming on trees or junction trees. [sent-20, score-0.48]
11 We introduce a notion of decoupling that can be more readily estimated from function evaluations. [sent-22, score-0.625]
12 At the same time, this notion of decoupling is more general than the factorization notion of decoupling in that functions that do not factorize may still exhibit this type of decoupling. [sent-23, score-1.417]
13 We say that two variables are decoupled if the optimal setting of one variable does not depend on the setting of the other. [sent-24, score-0.154]
14 This is formalized below in a way that parallels the notion of conditional decoupling between random variables in statistics. [sent-25, score-0.711]
15 This parallel allows us to migrate much of the machinery developed 1 for inference on graphical models to global optimization . [sent-26, score-0.168]
16 For example, decoupling can be visualized with a graphical model whose semantics are similar to those of a Markov network. [sent-27, score-0.551]
17 Analogs of the max-product algorithm on trees, the junction tree algorithm, and loopy belief propagation can be readily adapted to global optimization. [sent-28, score-0.454]
18 We also introduce a simple procedure to estimate decoupling structure. [sent-29, score-0.551]
19 The resulting recipe for global optimization is to first estimate the decoupling structure of the objective function, then to optimize it with a message passing algorithm that utilises this structure. [sent-30, score-0.78]
20 The message passing algorithm relies on a simple grid search to solve the sub-problems it generates. [sent-31, score-0.362]
21 In many cases, using the same number of function evaluations, this procedure produces solutions with objective values that improve over those produced by existing global optimizers by as much as 10%. [sent-32, score-0.162]
22 This happens because knowledge of the independence structure allows this procedure to explore the objective function only along directions that cause the function to vary, and because the grid search that solves the sub-problems does not get stuck in local minima. [sent-33, score-0.323]
23 2 Related work The idea of estimating and exploiting loose coupling between variables of optimization appears implicitly in Quasi-Newton methods that numerically estimate the Hessian matrix, such as BFGS (Nocedal & Wright, 2006, Chap. [sent-34, score-0.298]
24 This is strictly a less powerful notion of coupling than the factorization model, which we argue below, is in turn less powerful than our notion of decoupling. [sent-37, score-0.419]
25 The procedure we develop here seeks only to approximate decoupling structure of the function, a much simpler task to carry out accurately. [sent-40, score-0.551]
26 A similar notion of decoupling has been explored in the decision theory literature Keeney & Raiffa (1976); Bacchus & Grove (1996), where decoupling was used to reason about preferences and utilities during decision making. [sent-41, score-1.176]
27 In contrast, we use decoupling to solve black-box optimization problems and present a practical algorithm to estimate the decoupling structure. [sent-42, score-1.154]
28 3 Decoupling between variables of optimization A common way to minimize an objective function over many variables is to factorize it into terms, each of which involves only a small subset of the variables Aji & McEliece (2000). [sent-43, score-0.289]
29 For example, a factorization for the function f3 (x, y, z) = x2 y 2 z 2 + x2 + y 2 + z 2 is elusive, yet the arguments of f3 are decoupled in the sense that the setting of any two variables does not affect the optimal setting of the third. [sent-49, score-0.272]
30 This decoupling allows us to optimize over the variables separately. [sent-51, score-0.599]
31 For example, the function f4 (x, y, z) = (x y)2 + (y z)2 exhibits no such decoupling between x and y because the minimizer of argminx f4 (x, y0 , z0 ) is y0 , which is obviously a function of the second argument of f . [sent-53, score-0.582]
32 We say that the coordinates Z block X from Y under f if the set of minimizers of f over X does not change for any setting of the variables Y given a setting of the variables Z: argmin f (X, Y1 , Z) = argmin f (X, Y2 , Z). [sent-57, score-0.227]
33 8 Y1 2Y,Y2 2Y Z2Z X2X X2X We will say that X and Y are decoupled conditioned on Z under f , or X ? [sent-58, score-0.155]
34 , xn ), decoupling between the variables can be represented graphically with an undirected graph analogous to a Markov network: Definition 2. [sent-66, score-0.731]
35 , xn }, E) is a coupling graph for a function f (x1 , . [sent-70, score-0.33]
36 , xn ) if (i, j) 2 E implies xi and xj are decoupled under f . [sent-73, score-0.281]
37 / The following result mirrors the notion of separation in Markov networks and makes it easy to reason about decoupling between groups of variables with coupling graphs (see the appendix for a proof): Proposition 1. [sent-74, score-0.9]
38 Let X , Y, Z be groups of nodes in a coupling graph for a function f . [sent-75, score-0.313]
39 s=1 However decoupling is strictly more powerful than factorization. [sent-101, score-0.551]
40 4 Optimization procedures that utilize decoupling When a cost function factorizes, dynamic programming algorithms can be used to optimize over the variables Aji & McEliece (2000). [sent-106, score-0.663]
41 When a cost function exhibits decoupling as defined above, the same dynamic programming algorithms can be applied with a few minor modifications. [sent-107, score-0.583]
42 1 Optimization over trees Suppose the coupling graph between some partitioning X1 , . [sent-116, score-0.278]
43 For all other nodes i, define recursively starting from the parents of the leaf nodes the functions ˆ ˆ 1 ˆ 2 Xi (Xpi ) = argmin f (Xi , Xpi , XCi (Xi ), XCi (Xi ), . [sent-127, score-0.145]
44 To compute the entries of the table, a subordinate global ˆ optimizer computes the minimization that appears in the definition of Xi . [sent-132, score-0.507]
45 2 Optimization over junction trees Even when the coupling graph for a function is not tree-structured, a thin junction tree can often be constructed for it. [sent-134, score-1.006]
46 A variant of the above algorithm that mirrors the junction tree algorithm can be used to efficiently search for the optima of the function. [sent-135, score-0.53]
47 These properties guarantee that T is tree-structured, that it covers all nodes and edges in G, and that two nodes v and u in two different cliques Xi and Xj are decoupled from each other conditioned on the union of the cliques on the path between u and v in T . [sent-137, score-0.455]
48 Many heuristics exist for constructing a thin junction tree for a graph Jensen & Graven-Nielsen (2007); Huang & Darwiche (1996). [sent-138, score-0.517]
49 To search for the minimizers of f , using a junction tree for its coupling graph, denote by Xij := Xi \ Xj the intersection of the groups of variables Xi and Xj and by Xi\j = Xi \ Xj the set of nodes in Xi but not in Xj . [sent-139, score-0.829]
50 At every leaf clique ` of the junction tree, construct the function ˆ X` (X`,p` ) := argmin X`\p` 2X`\p` f (X` ). [sent-140, score-0.433]
51 (3) For all other cliques i, compute recursively starting from the parents of the leaf cliques ˆ Xi (Xi,pi ) = ˆ 1 ˆ 2 1 2 argmin f (Xi , XCi (Xi,Ci ), XCi (Xi,Ci ), . [sent-141, score-0.305]
52 Xi,pi 2Xi\pi (4) As before, decoupling between the cliques, conditioned on the intersection of the cliques, guarantees ⇤ ⇤ ˆ that Xi (Xi,pi ) = Xi . [sent-145, score-0.6]
53 3 Other strategies When the cliques of the junction tree are large, the subordinate optimizations in the above algorithm become costly. [sent-148, score-0.759]
54 1 can be applied to a maximal spanning tree of the coupling graph. [sent-150, score-0.332]
55 • Loops in the coupling graph can be broken by conditioning on a node in each loop, resulting in a tree-structured coupling graph conditioned on those nodes. [sent-153, score-0.644]
56 1 then searches for the minima conditioned on the value of those nodes in the inner loop of a global optimizer that searches for good settings for the conditioned nodes. [sent-155, score-0.472]
57 5 Graph structure learning It is possible to estimate decoupling structure between the arguments of a function f with the help of a subordinate optimizer that only evaluates f . [sent-156, score-1.045]
58 A straightforward application of definition 1 to assess empirically whether groups of variables X and Y are decoupled conditioned on a group of variables Z would require comparing the minimizer of f over X for every possible value of Z and Y. [sent-157, score-0.251]
59 Following this result, an approximate coupling graph can be constructed by positing and invalidating decoupling relations. [sent-162, score-0.867]
60 Starting with a graph containing no edges, we consider all groupings X = 4 {xi }, Y = {xj }, Z = ⌦\{xi , xj }, of variables x1 , . [sent-163, score-0.181]
61 We posit various values of Z 2 Z, Y0 2 Y and Y1 2 Y under this grouping, and compute the minimizers over X 2 X of f (X, Y0 , Z) and f (X, Y1 , Z) with a subordinate optimizer. [sent-167, score-0.295]
62 If the minimizers differ, then by the above proposition, X and Y are not decoupled conditioned on Z, and an edge is added between xi and xj in the graph. [sent-168, score-0.325]
63 Algorithm 1 Estimating the coupling graph of a function. [sent-170, score-0.278]
64 6 Experiments We evaluate a two step process for global optimization: first estimating decoupling between variables using the algorithm of Section 5, then optimizing with this structure using an algorithm from Section 4. [sent-185, score-0.689]
65 Whenever Algorithm 1 detects tree-structured decoupling, we use the tree optimizer of Section 4. [sent-186, score-0.306]
66 Otherwise we either construct a junction tree and apply the junction tree optimizer of Section 4. [sent-188, score-0.993]
67 2 if the junction tree is thin, or we approximate the graph with a maximum spanning tree and apply the tree solver of Section 4. [sent-189, score-0.715]
68 (2004) (a biologically inspired randomized algorithm), and MEGA Hazen & Gupta (2009) (a multiresolution search strategy with numerically computed gradients). [sent-193, score-0.139]
69 As the subordinate optimizer for Algorithm 1, we use a simple grid search for all our experiments. [sent-196, score-0.719]
70 As the subordinate optimizer for the algorithms of Section 4, we experiment with grid search and the aforementioned state-of-the-art global optimizers. [sent-197, score-0.777]
71 Since our method does not iterate, we vary the number of function calls its subordinate optimizer invokes (when the subordinate optimizer is grid search, we vary the grid resolution). [sent-201, score-1.284]
72 The experiments demonstrate that using grid search as a subordinate strategy is sufficient to produce better solutions than all the other global optimizers we evaluated. [sent-202, score-0.653]
73 5 Table 1: Value of the iterates of the functions of Table 2 after 10,000 function evaluations (for our approach, this includes the function evaluations for structure learning). [sent-206, score-0.284]
74 In these experiments, the subordinate grid search of Algorithm 1 discretized each dimension into four discrete values. [sent-238, score-0.585]
75 The algorithms of Section 4 also used grid search as a ⇣ ⌘S1 mc subordinate optimizer. [sent-239, score-0.518]
76 For this grid search, each dimension was discretized into GR = Emax Nmc discrete values where Emax is a cap on the number of function evaluations to perform, Smc is the size of the largest clique in the junction tree, and Nmc is the number of nodes in the junction tree. [sent-240, score-1.058]
77 Figure 1 shows that in all cases, Algorithm 1 recovered decoupling structure exactly even for very coarse grids. [sent-241, score-0.596]
78 4e4 Colville 080% 100% Number of evaluations 4. [sent-257, score-0.142]
79 4e4 Rosenbrock 080% 100% Number of evaluations 4. [sent-267, score-0.142]
80 The plots show percentage of incorrectly recovered edges in the coupling graph on four synthetic cost functions as a function of the grid resolution (bottom x-axis) and the number of function evaluations (top x-axis). [sent-273, score-0.692]
81 Face detector classification error (%) Algorithm 1 was run with a grid search as a subordinate optimizer with three discrete values along the continuous dimensions. [sent-295, score-0.789]
82 It invoked 90 function evaluations and produced a coupling graph wherein the first three of the above parameters formed a clique and where the remaining two parameter were decoupled of the others. [sent-296, score-0.593]
83 To evaluate the accuracy of our method under different numbers of function invocations, we varied the grid resolution between 2 to 12. [sent-301, score-0.227]
84 These experiments demonstrate how a grid search can help overcome local minima that cause FIPS and Direct Search to get stuck. [sent-303, score-0.27]
85 7 Algorithm 1 was run with a grid search as the subordinate optimizer, discretizing the search space into four discrete values along each dimension. [sent-320, score-0.695]
86 This results in a graph that admits no thin junction tree, so we approximate it with a maximal spanning tree. [sent-321, score-0.441]
87 1 using as subordinate optimizers Direct Search, FIPS, and grid search (with five discrete values along each dimension). [sent-323, score-0.629]
88 After a total of roughly 300 function evaluations, the tree optimizer with FIPS produces parameters that result in a classification error of 29. [sent-324, score-0.306]
89 The tree optimizer with Direct Search and grid search as subordinate optimizers resulted in error rates of 31. [sent-329, score-0.929]
90 In this application, the proposed method enjoys only modest gains of ⇠ 2% because the variables are tightly coupled, as indicated by the denseness of the graph and the thickness of the junction tree. [sent-332, score-0.419]
91 Algorithm 1 with grid search as a subordinate optimizer with a grid resolution of three discrete values along each dimension found no coupling between the hyperparameters. [sent-344, score-1.178]
92 We carried out these one-dimensional optimizations with Direct Search, FIPS, and grid search (discretizing each dimension into 100 values). [sent-347, score-0.27]
93 7 Conclusion We quantified the coupling between variables of optimization in a way that parallels the notion of independence in statistics. [sent-356, score-0.436]
94 This lets us identify decoupling between variables in cases where the function does not factorize, making it strictly stronger than the notion of decoupling in statistical estimation. [sent-357, score-1.252]
95 This type of decoupling is also easier to evaluate empirically. [sent-358, score-0.551]
96 Despite these differences, this notion of decoupling allows us to migrate to global optimization many of the message passing algorithms that were developed to leverage factorization in statistics and optimization. [sent-359, score-0.958]
97 These include belief propagation and the junction tree algorithm. [sent-360, score-0.396]
98 We show empirically that optimizing cost functions by applying these algorithms to an empirically estimated decoupling structure outperforms existing black box optimization procedures that rely on numerical gradients, deterministic space carving, or biologically inspired searches. [sent-361, score-0.728]
99 Notably, we observe that it is advantageous to decompose optimization problems into a sequence of small deterministic grid searches using this technique, as opposed to employing existing black box optimizers directly. [sent-362, score-0.395]
100 After running these experiments, we discovered a result of Fan & Lin (2007) showing that optimizing the macro-average F-measure is equivalent to optimizing per-category F-measure, thereby validating decoupling structure recovered by Algorithm 1. [sent-367, score-0.66]
wordName wordTfidf (topN-words)
[('decoupling', 0.551), ('fips', 0.307), ('junction', 0.291), ('subordinate', 0.248), ('optimizer', 0.201), ('coupling', 0.198), ('grid', 0.165), ('evaluations', 0.142), ('mega', 0.134), ('cliques', 0.115), ('decoupled', 0.106), ('tree', 0.105), ('search', 0.105), ('nz', 0.101), ('graph', 0.08), ('optimizers', 0.077), ('xci', 0.077), ('xpi', 0.077), ('notion', 0.074), ('factorization', 0.073), ('xi', 0.07), ('aji', 0.067), ('mceliece', 0.067), ('clique', 0.067), ('factorize', 0.066), ('tune', 0.065), ('resolution', 0.062), ('global', 0.058), ('migrate', 0.058), ('direct', 0.056), ('xj', 0.053), ('xn', 0.052), ('optimization', 0.052), ('minutes', 0.052), ('conditioned', 0.049), ('variables', 0.048), ('message', 0.048), ('minimizers', 0.047), ('recovered', 0.045), ('arguments', 0.045), ('passing', 0.044), ('levy', 0.044), ('argmin', 0.042), ('face', 0.041), ('thin', 0.041), ('searches', 0.04), ('node', 0.039), ('program', 0.039), ('bacchus', 0.038), ('carving', 0.038), ('codecs', 0.038), ('colville', 0.038), ('elusive', 0.038), ('hazen', 0.038), ('invalidating', 0.038), ('keeney', 0.038), ('nmc', 0.038), ('parallels', 0.038), ('powell', 0.038), ('raiffa', 0.038), ('rastrigin', 0.038), ('tague', 0.038), ('discretizing', 0.038), ('gr', 0.037), ('detector', 0.036), ('box', 0.035), ('nodes', 0.035), ('discrete', 0.034), ('rosenbrock', 0.034), ('recognizer', 0.034), ('multiresolution', 0.034), ('perttunen', 0.034), ('grove', 0.034), ('mendes', 0.034), ('discretized', 0.033), ('leaf', 0.033), ('parent', 0.033), ('procedures', 0.032), ('optimizing', 0.032), ('dynamic', 0.032), ('gist', 0.031), ('distributive', 0.031), ('emax', 0.031), ('argminx', 0.031), ('commutative', 0.031), ('darwiche', 0.031), ('mirrors', 0.029), ('pearl', 0.029), ('spanning', 0.029), ('classi', 0.029), ('exhibit', 0.028), ('resulted', 0.028), ('xp', 0.028), ('vary', 0.028), ('lets', 0.028), ('programs', 0.027), ('objective', 0.027), ('srinivas', 0.026), ('independence', 0.026), ('black', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 274 nips-2011-Structure Learning for Optimization
Author: Shulin Yang, Ali Rahimi
Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1
2 0.094591752 157 nips-2011-Learning to Search Efficiently in High Dimensions
Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).
3 0.092932895 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
4 0.092219204 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
Author: Le Song, Eric P. Xing, Ankur P. Parikh
Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
5 0.086340979 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
Author: Konrad Koerding, Ian Stevenson
Abstract: Synaptic plasticity underlies learning and is thus central for development, memory, and recovery from injury. However, it is often difficult to detect changes in synaptic strength in vivo, since intracellular recordings are experimentally challenging. Here we present two methods aimed at inferring changes in the coupling between pairs of neurons from extracellularly recorded spike trains. First, using a generalized bilinear model with Poisson output we estimate time-varying coupling assuming that all changes are spike-timing-dependent. This approach allows model-based estimation of STDP modification functions from pairs of spike trains. Then, using recursive point-process adaptive filtering methods we estimate more general variation in coupling strength over time. Using simulations of neurons undergoing spike-timing dependent modification, we show that the true modification function can be recovered. Using multi-electrode data from motor cortex we then illustrate the use of this technique on in vivo data. 1
6 0.070730783 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
7 0.068331294 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
8 0.067968011 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
9 0.066434942 55 nips-2011-Collective Graphical Models
10 0.064640231 226 nips-2011-Projection onto A Nonnegative Max-Heap
11 0.061954394 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
12 0.060433015 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
13 0.058226518 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
14 0.053126901 30 nips-2011-Algorithms for Hyper-Parameter Optimization
15 0.053034671 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
16 0.05174597 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
17 0.04961586 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
18 0.04880359 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
19 0.048733298 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
20 0.047876287 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
topicId topicWeight
[(0, 0.162), (1, 0.037), (2, -0.052), (3, 0.007), (4, -0.021), (5, -0.065), (6, -0.068), (7, -0.039), (8, 0.011), (9, -0.112), (10, -0.021), (11, 0.021), (12, -0.124), (13, 0.057), (14, -0.011), (15, 0.055), (16, 0.04), (17, -0.031), (18, -0.017), (19, 0.002), (20, -0.026), (21, -0.004), (22, 0.02), (23, 0.026), (24, -0.007), (25, 0.04), (26, 0.003), (27, -0.032), (28, -0.009), (29, 0.048), (30, 0.01), (31, -0.018), (32, -0.009), (33, 0.038), (34, 0.002), (35, 0.015), (36, -0.054), (37, -0.005), (38, 0.001), (39, 0.002), (40, 0.026), (41, 0.067), (42, -0.057), (43, 0.003), (44, 0.002), (45, -0.085), (46, -0.069), (47, -0.026), (48, -0.068), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.92946351 274 nips-2011-Structure Learning for Optimization
Author: Shulin Yang, Ali Rahimi
Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1
2 0.72543508 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1
3 0.68580502 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
Author: Yusuke Watanabe
Abstract: While loopy Belief Propagation (LBP) has been utilized in a wide variety of applications with empirical success, it comes with few theoretical guarantees. Especially, if the interactions of random variables in a graphical model are strong, the behaviors of the algorithm can be difficult to analyze due to underlying phase transitions. In this paper, we develop a novel approach to the uniqueness problem of the LBP fixed point; our new “necessary and sufficient” condition is stated in terms of graphs and signs, where the sign denotes the types (attractive/repulsive) of the interaction (i.e., compatibility function) on the edge. In all previous works, uniqueness is guaranteed only in the situations where the strength of the interactions are “sufficiently” small in certain senses. In contrast, our condition covers arbitrary strong interactions on the specified class of signed graphs. The result of this paper is based on the recent theoretical advance in the LBP algorithm; the connection with the graph zeta function.
4 0.66563857 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
5 0.65808135 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky
Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.
6 0.63932258 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
7 0.63546395 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
8 0.6229021 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
9 0.58064932 67 nips-2011-Data Skeletonization via Reeb Graphs
10 0.57314068 157 nips-2011-Learning to Search Efficiently in High Dimensions
11 0.57140058 226 nips-2011-Projection onto A Nonnegative Max-Heap
12 0.56747442 150 nips-2011-Learning a Distance Metric from a Network
13 0.56509703 158 nips-2011-Learning unbelievable probabilities
14 0.55384433 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
15 0.55274266 177 nips-2011-Multi-armed bandits on implicit metric spaces
16 0.54985934 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
17 0.5350492 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
18 0.51811218 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
19 0.50662154 130 nips-2011-Inductive reasoning about chimeric creatures
20 0.5044843 213 nips-2011-Phase transition in the family of p-resistances
topicId topicWeight
[(0, 0.025), (4, 0.05), (20, 0.03), (26, 0.021), (31, 0.056), (33, 0.03), (43, 0.079), (45, 0.1), (48, 0.013), (57, 0.069), (65, 0.023), (74, 0.049), (77, 0.014), (83, 0.048), (95, 0.263), (99, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.76859659 274 nips-2011-Structure Learning for Optimization
Author: Shulin Yang, Ali Rahimi
Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1
2 0.5522688 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
3 0.55142349 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
4 0.5492363 276 nips-2011-Structured sparse coding via lateral inhibition
Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun
Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. 1
5 0.54831815 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
Author: Patrick O. Perry, Michael W. Mahoney
Abstract: Recently, Mahoney and Orecchia demonstrated that popular diffusion-based procedures to compute a quick approximation to the first nontrivial eigenvector of a data graph Laplacian exactly solve certain regularized Semi-Definite Programs (SDPs). In this paper, we extend that result by providing a statistical interpretation of their approximation procedure. Our interpretation will be analogous to the manner in which 2 -regularized or 1 -regularized 2 -regression (often called Ridge regression and Lasso regression, respectively) can be interpreted in terms of a Gaussian prior or a Laplace prior, respectively, on the coefficient vector of the regression problem. Our framework will imply that the solutions to the MahoneyOrecchia regularized SDP can be interpreted as regularized estimates of the pseudoinverse of the graph Laplacian. Conversely, it will imply that the solution to this regularized estimation problem can be computed very quickly by running, e.g., the fast diffusion-based PageRank procedure for computing an approximation to the first nontrivial eigenvector of the graph Laplacian. Empirical results are also provided to illustrate the manner in which approximate eigenvector computation implicitly performs statistical regularization, relative to running the corresponding exact algorithm. 1
6 0.54714763 186 nips-2011-Noise Thresholds for Spectral Clustering
7 0.54674053 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
8 0.54605329 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
9 0.54571056 219 nips-2011-Predicting response time and error rates in visual search
10 0.54510862 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
11 0.54474694 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
12 0.54471868 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
13 0.5446105 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
14 0.54459071 265 nips-2011-Sparse recovery by thresholded non-negative least squares
15 0.54382783 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
16 0.54318756 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
17 0.54314971 171 nips-2011-Metric Learning with Multiple Kernels
18 0.54300576 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
19 0.54257643 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
20 0.54241985 227 nips-2011-Pylon Model for Semantic Segmentation