nips nips2010 nips2010-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
Reference: text
sentIndex sentText sentNum sentScore
1 il Abstract A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. [sent-12, score-1.117]
2 We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. [sent-13, score-0.367]
3 Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints. [sent-15, score-1.007]
4 Define G as the family of isotonic functions, that is, g ∈ G satisfies x1 x2 ⇒ g(x1 ) ≤ g(x2 ), where the partial order here will usually be the standard Euclidean one, i. [sent-23, score-0.789]
5 Given these definitions, isotonic regression solves ˆ f = arg min y − g(x) 2 . [sent-26, score-0.942]
6 g∈G x2 if x1j ≤ x2j (1) As many authors have noted, the optimal solution to this problem comprises a partitioning of the ˆ space X into regions obeying a monotonicity property with a constant fitted to f in each region. [sent-27, score-0.291]
7 It is clear that isotonic regression is a very attractive model for situations where monotonicity is a reasonable assumption, but other common assumptions like linearity or additivity are not. [sent-28, score-0.914]
8 Practicality of isotonic regression has already been demonstrated in various fields and in this paper we focus on algorithms for computing isotonic regressions on large problems. [sent-30, score-1.701]
9 An equivalent formulation of L2 isotonic regression seeks an optimal isotonic fit yi at every point ˆ by solving n minimize (ˆi − yi )2 y i=1 yi ≤ ˆ (2) subject to yj ˆ ∀(i, j) ∈ I where I denotes a set of isotonic constraints. [sent-31, score-2.835]
10 Problem (2) is a quadratic program subject to 1 simple linear constraints, and, according to a literature review, appears to be largely ignored due to computational difficulty on large problems. [sent-35, score-0.181]
11 The discussion of isotonic regression originally focused on the case x ∈ R, where denoted a complete order [4]. [sent-37, score-0.889]
12 For this case, the well known pooled adjacent violators algorithm (PAVA) efficiently solves the isotonic regression problem. [sent-38, score-0.942]
13 Problem (1) can also be treated as a separable quadratic program subject to simple linear equality constraints. [sent-41, score-0.181]
14 Related algorithms in [9] to those described here were applied to problems for scheduling reorder intervals in production systems and are of complexity O(n4 ) and connections to isotonic regression can be made through [1]. [sent-44, score-1.022]
15 An even better complexity of O(n log n) can be obtained for the optimal solution when the isotonic constraints take a special structure such as a tree, e. [sent-48, score-0.949]
16 1 Contribution Our novel approach to isotonic regression offers an exact solution of (1) with a complexity bounded by O(n4 ), but acts on the order of O(n3 ) for practical problems. [sent-52, score-0.988]
17 The main goal of this paper is to make isotonic regression a reasonable computational tool for large data sets, as the assumptions in this framework are very applicable in real-world applications. [sent-54, score-0.889]
18 Our framework solves quadratic programs with 2 × 105 variables and more than 107 constraints, a problem of size not solved anywhere in previous isotonic regression literature, and with the decomposition detailed below, even larger problems can be solved. [sent-55, score-1.104]
19 Section 2 describes a partitioning algorithm for isotonic regression and proves convergence to the globally optimal solution. [sent-57, score-1.143]
20 Section 4 demonstrates that the partitioning algorithm is significantly better in practice than the O(n4 ) worst-case complexity. [sent-59, score-0.236]
21 A and B are then said to be isotonic blocks (or obey isotonicity). [sent-68, score-0.871]
22 A group of nodes X majorizes (minorizes) another group Y if X Y (X Y ). [sent-69, score-0.308]
23 A group X is a majorant (minorant) of X ∪ A where A = ∪k Ai if X Ai (X Ai ) ∀i = 1 . [sent-70, score-0.208]
24 i=1 2 Partitioning Algorithm We first describe the structure of the classic L2 isotonic regression problem and continue to detail the partitioning algorithm. [sent-74, score-1.075]
25 The section concludes by proving convergence of the algorithm to the globally optimal isotonic regression solution. [sent-75, score-0.957]
26 1 Structure Problem (2) is a quadratic program subject to simple linear constraints. [sent-77, score-0.181]
27 Observations are divided into k groups where the fits in each group take the group mean observation value. [sent-79, score-0.236]
28 This can be seen through the equations given by the following Karush-Kuhn-Tucker (KKT) conditions: 1 (a) yi = yi − ( ˆ 2 λij − j:(i,j)∈I λji ) j:(j,i)∈I (b) yi ≤ yj ∀(i, j) ∈ I ˆ ˆ (c) λij ≥ 0 ∀(i, j) ∈ I (d) λij (ˆi − yj ) = 0 ∀(i, j) ∈ I. [sent-80, score-0.288]
29 Hence λij can be non-zero only within blocks in the isotonic solution which ˆ ˆ have the same fitted value. [sent-82, score-0.891]
30 Thus, we get the familiar characterization of the isotonic regression problem as one of finding a division into isotonic blocks. [sent-87, score-1.678]
31 2 Partitioning In order to take advantage of the optimal solution’s structure, we propose solving the isotonic regression problem (2) as a sequence of subproblems that divides a group of nodes into two groups at each iteration. [sent-89, score-1.304]
32 An important property of our partitioning approach is that nodes separated at one iteration are never rejoined into the same group in future iterations. [sent-90, score-0.423]
33 We now describe the partitioning criterion used for each subproblem. [sent-92, score-0.186]
34 Suppose a current block V is optimal and thus yi = y V ∀i ∈ V. [sent-93, score-0.19]
35 From condition (a) of the KKT conditions, we define the net ˆ∗ outflow of a group V as i∈V (yi − yi ). [sent-94, score-0.199]
36 Finding two groups within V such that the net outflow from ˆ the higher group is greater than the net outflow from the lower group should be infeasible, according to the KKT conditions. [sent-95, score-0.284]
37 isotonic) cuts through the network defined by nodes in V. [sent-99, score-0.212]
38 A cut is called isotonic if the two blocks created by the cut are isotonic. [sent-100, score-1.081]
39 The optimal cut is determined as the cut that solves the problem max (yi − y V ) − (yi − y V ) (3) c∈CV − i∈Vc + i∈Vc − + Vc (Vc ) where is the group on the lower (upper) side of the edges of cut c. [sent-101, score-0.536]
40 In terms of isotonic regression, the optimal cut is such that the difference in the sum of the normalized fits (yi − y V ) at each node of a group is maximized. [sent-102, score-1.069]
41 The optimal cut problem (3) can also be written as the binary program maximize i xi (yi − y V ) subject to xi ≤ xj ∀(i, j) ∈ I xi ∈ {−1, +1} ∀i ∈ V. [sent-104, score-0.364]
42 This group-wise partitioning operation is the basis for our partitioning algorithm which is explicitly given in Algorithm 1. [sent-106, score-0.372]
43 , n}), and recursively splits each group optimally by solving subproblem (5). [sent-112, score-0.207]
44 At each 3 iteration, a list C of potential optimal cuts for each group generated thus far is maintained, and the cut among them with the highest objective value is performed. [sent-113, score-0.278]
45 As proven next, this algorithm terminates with the optimal global (isotonic) solution to the isotonic regression problem (2). [sent-118, score-0.969]
46 4: for all v ∈ {w− , w+ } do 5: Set zi = yi − yv ∀i ∈ v where y v is the mean of observations in v. [sent-132, score-0.303]
47 3 Convergence Theorem 1 next states the main result that allows for a no-regret partitioning algorithm for isotonic regression. [sent-140, score-0.975]
48 Theorem 1 Assume a group V is a union of blocks from the optimal solution to problem (2). [sent-145, score-0.26]
49 Then a cut made by solving (5) does not cut through any block in the global optimal solution. [sent-146, score-0.381]
50 Define M1 (MK ) to be a minorant (majorant) block in M. [sent-149, score-0.24]
51 , n} which is a union of (all) optimal blocks, we can conclude from this theorem that partitions never cut an optimal block. [sent-159, score-0.295]
52 3 Efficient solutions of the subproblems Linear program (5) has a special structure that can be taken advantage of in order to solve larger problems faster. [sent-163, score-0.212]
53 We first show why these problems can be solved faster than typical linear programs, followed by a novel decomposition of the structure that allows problems of extremely large size to be solved efficiently. [sent-164, score-0.193]
54 The network flow n constraints are identical to those in (6) below, but the objective is 1 i=1 (s2 + t2 ), which, to the i i 4 author’s knowledge, currently still precludes this dual from being efficiently solved with special network algorithms. [sent-167, score-0.253]
55 While this structure does not help solve directly the quadratic program, the network structure allows the linear program for the subproblems to be solved very efficiently. [sent-168, score-0.379]
56 The dual program to (5) is (si + ti ) minimize i∈V λij − subject to j:(i,j)∈I λji − si + ti = zi ∀i ∈ V (6) j:(j,i)∈I λ, s, t ≥ 0 where again zi = yi − y V . [sent-169, score-0.376]
57 Linear program (6) is a network flow problem with |V| + 2 nodes and |I| + 2|V| arcs. [sent-170, score-0.282]
58 The network flow problem here minimizes the total sum of flow over links from the source and into the sink with the goal to leave zi units of flow at each node i ∈ V. [sent-172, score-0.223]
59 Note that this is very similar to the network flow problem solved in [14] where zi there represents the classification performance on node i. [sent-173, score-0.238]
60 nodes with zi < 0 (zi ≥ 0) where where zi = yi − y V , that are bounded only from above (below). [sent-182, score-0.338]
61 It is trivial to observe that these nodes will be be equal to −1 (+1) in the optimal solution and that eliminating them does not affect solving (5) without them. [sent-183, score-0.275]
62 The main idea is that it can be solved through a sequence of smaller linear programs that reduce the total size of the full linear program on each iteration. [sent-187, score-0.228]
63 Consider a minorant group of nodes J ⊆ V and the subset of arcs IJ ⊆ I connecting them. [sent-188, score-0.402]
64 Solving problem (5) on this reduced network with the original input z divides the nodes in J into a lower and upper group, denoted JL and JU . [sent-189, score-0.209]
65 In addition, the same problem solved on the remaining nodes in V \ JL will give the optimal solutions to these nodes. [sent-191, score-0.217]
66 Proposition 3 Let J ⊆ V be a minorant group of nodes in V. [sent-193, score-0.377]
67 The optimal solution for the remaining nodes (V \ J ) can be found by solving (5) over only those nodes. [sent-196, score-0.231]
68 The same claims can be made when J ⊆ V is a majorant group of nodes in V where ∗ instead wi = +1 ⇒ x∗ = +1 ∀i ∈ J . [sent-197, score-0.322]
69 Clearly, the solution to Problem (5) over nodes in W has the solution with all variables equal to −1. [sent-200, score-0.219]
70 Problem (5) can be written in the following form with separable objective: zi xi + maximize i∈W subject to zi xi i∈V\W xi ≤ xj xi ≤ xj −1 ≤ xi ≤ 1 ∀(i, j) ∈ I, i, j ∈ W ∀(i, j) ∈ I, i ∈ V, j ∈ V \ W ∀i ∈ V 5 (7) Start with an initial solution xi = 1 ∀i ∈ V. [sent-201, score-0.382]
71 First, an upper triangular adjacency matrix C can be constructed to represent I, where Cij = 1 if xi ≤ xj is an isotonic constraint and Cij = 0 otherwise. [sent-209, score-0.816]
72 A minorant (majorant) subtree with k nodes is then constructed as the upper left (lower right) k × k sub-matrix of C. [sent-210, score-0.313]
73 1: while |V| ≥ M AXSIZE do 2: ELIMINATE A MINORANT SET OF NODES: 3: Build a minorant subtree T . [sent-219, score-0.199]
74 ˆ 5: L = L ∪ {v ∈ T : yv = −1}, V = V \ {v ∈ T : yv = −1}. [sent-221, score-0.234]
75 ˆ 9: U = U ∪ {v ∈ T : yv = +1}, V = V \ {v ∈ T : yv = +1}. [sent-224, score-0.234]
76 ˆ 12: L = L ∪ {v ∈ T : yv = −1}, U = U ∪ {v ∈ T : yv = +1}. [sent-226, score-0.234]
77 ˆ ˆ The computational bottleneck of Algorithm 2 is solving linear program (5), which is done efficiently by solving the dual network flow problem (6). [sent-227, score-0.286]
78 This shows that, if the first network flow problem is too large to solve, it can be solved by a sequence of smaller network flow problems as illustrated in Figure 1. [sent-228, score-0.228]
79 In the worst case, many network flow problems will be solved until the original full-size network flow problem is solved. [sent-230, score-0.259]
80 The result follows from repeated application of Proposition 3 over the set of nodes V that has not yet been optimally solved for. [sent-235, score-0.222]
81 4 Complexity of the partitioning algorithm Linear program (5) can be solved in O(n3 ) using interior point methods. [sent-236, score-0.389]
82 Consider the case of balanced partitioning at each iteration until there are n final blocks. [sent-240, score-0.212]
83 In this case, we can represent the partitioning path as a binary tree with log n n levels, and at each level k, LP (5) is solved 2k times on instances of size 2k which leads to a total complexity of log n log n 2k ( k=0 n 3 ) = n3 ( 2k k=0 1 1 − . [sent-241, score-0.329]
84 33, and hence in this case the partitioning algorithm has complexity O(1. [sent-245, score-0.243]
85 LP (5) solved on the entire set of nodes in the first picture may be too large for memory. [sent-354, score-0.179]
86 Hence subproblems are solved on the lower left (red dots) and upper right (green dots) of the networks and some nodes are fixed from the solution of these subproblems. [sent-355, score-0.284]
87 5 Numerical experiments We here demonstrate that exact isotonic regression is computationally tractable for very large problems, and compare against the time it takes to get an approximation. [sent-384, score-0.889]
88 We first show the computational performance of isotonic regression on simulated data sets as large as 2 × 105 training points with more than 107 constraints. [sent-385, score-0.974]
89 We then show the favorable predictive performance of isotonic regression on large simulated data sets. [sent-386, score-0.963]
90 1 Large-Scale Computations Figure 2 demonstrates that the partitioning algorithm with decompositions of the partitioning step can solve very large isotonic regressions. [sent-388, score-1.245]
91 The left figure shows that the partitioning algorithm finds the globally optimal isotonic regression solution in not much more time than it takes to find an approximation as done in [6] for very large problems. [sent-391, score-1.185]
92 More partitions (left axis) require solving more network flow problems, however, as discussed, they reduce in size very quickly over the partitioning path, resulting in the practical complexity seen in the figure on the left. [sent-395, score-0.431]
93 2 Predictive Performance Here we show that isotonic regression is a useful tool when the data fits the monotonic framework. [sent-409, score-0.919]
94 A comparison is made between isotonic regression and linear least squares regression. [sent-414, score-0.91]
95 5000 training points is sufficient to fit the model well with up to 4 dimensions, after which linear regression outperforms the isotonic regression, and 50000 training points fits the model well up with up to 5 dimensions. [sent-416, score-1.008]
96 6 Conclusion This paper demonstrates that isotonic regression can be used to solve extremely large problems. [sent-483, score-0.948]
97 Indeed, isotonic regression as done here performs with a complexity of O(n3 ) in practice. [sent-485, score-0.946]
98 As also shown, isotonic regression performs well at reasonable dimensions, but suffers from overfitting as the dimension of the data increases. [sent-486, score-0.889]
99 Statistical complexity of the models generated by partitioning will be examined. [sent-488, score-0.243]
100 Furthermore, similar results will be made for isotonic regression with different loss functions. [sent-489, score-0.889]
wordName wordTfidf (topN-words)
[('isotonic', 0.789), ('partitioning', 0.186), ('minorant', 0.166), ('ow', 0.153), ('irp', 0.148), ('yv', 0.117), ('cut', 0.116), ('nodes', 0.114), ('majorant', 0.111), ('mk', 0.106), ('regression', 0.1), ('program', 0.097), ('group', 0.097), ('partitions', 0.08), ('yi', 0.078), ('block', 0.074), ('pava', 0.074), ('zi', 0.073), ('network', 0.071), ('mse', 0.066), ('solved', 0.065), ('subproblems', 0.063), ('blocks', 0.06), ('complexity', 0.057), ('kkt', 0.056), ('aviv', 0.055), ('isotonicity', 0.055), ('tel', 0.055), ('lp', 0.053), ('solves', 0.053), ('vc', 0.047), ('ij', 0.043), ('optimally', 0.043), ('groups', 0.042), ('solution', 0.042), ('interior', 0.041), ('optimal', 0.038), ('favorable', 0.038), ('simplex', 0.038), ('solving', 0.037), ('axsize', 0.037), ('burdakov', 0.037), ('gpav', 0.037), ('moni', 0.037), ('sysoev', 0.037), ('simulated', 0.036), ('observations', 0.035), ('jl', 0.033), ('subtree', 0.033), ('saharon', 0.032), ('reorder', 0.032), ('val', 0.032), ('subject', 0.032), ('ls', 0.032), ('tted', 0.032), ('worst', 0.031), ('quadratic', 0.031), ('solve', 0.031), ('varies', 0.03), ('monotonic', 0.03), ('globally', 0.03), ('axis', 0.03), ('subproblem', 0.03), ('node', 0.029), ('demonstrates', 0.028), ('mosek', 0.028), ('cuts', 0.027), ('xi', 0.027), ('yj', 0.027), ('points', 0.027), ('sink', 0.026), ('bold', 0.026), ('iteration', 0.026), ('monotonicity', 0.025), ('arcs', 0.025), ('decompositions', 0.025), ('net', 0.024), ('cv', 0.024), ('divides', 0.024), ('ts', 0.024), ('programs', 0.024), ('links', 0.024), ('eliminating', 0.023), ('cij', 0.023), ('constraints', 0.023), ('intervals', 0.023), ('demonstrated', 0.023), ('dual', 0.023), ('union', 0.023), ('practice', 0.022), ('obey', 0.022), ('proposition', 0.022), ('training', 0.022), ('problems', 0.021), ('linear', 0.021), ('trivial', 0.021), ('dots', 0.021), ('variables', 0.021), ('path', 0.021), ('monotone', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
2 0.12759221 181 nips-2010-Network Flow Algorithms for Structured Sparsity
Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski
Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1
3 0.087864466 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
Author: Lauren Hannah, Warren Powell, David M. Blei
Abstract: In this paper we study convex stochastic optimization problems where a noisy objective function value is observed after a decision is made. There are many stochastic optimization problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state s. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel based weights and Dirichlet process based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems. 1
4 0.078342862 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
5 0.077359572 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
Author: Tamir Hazan, Raquel Urtasun
Abstract: In this paper we propose an approximated structured prediction framework for large scale graphical models and derive message-passing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the log-partition function, known as the soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efficient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efficiently graphical models with cycles and very large number of parameters. 1
6 0.075861514 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
7 0.073055848 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts
8 0.071374021 155 nips-2010-Learning the context of a category
9 0.068972446 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering
10 0.063747764 208 nips-2010-Policy gradients in linearly-solvable MDPs
11 0.063461594 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
12 0.063170001 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
13 0.057889163 117 nips-2010-Identifying graph-structured activation patterns in networks
14 0.055554364 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression
15 0.051382933 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
16 0.050344642 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
17 0.050101008 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
18 0.049862783 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
19 0.048846494 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
20 0.048390701 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
topicId topicWeight
[(0, 0.152), (1, 0.032), (2, 0.057), (3, 0.041), (4, -0.041), (5, -0.076), (6, -0.02), (7, 0.064), (8, 0.063), (9, 0.002), (10, -0.034), (11, -0.041), (12, -0.028), (13, 0.071), (14, -0.027), (15, -0.059), (16, -0.04), (17, -0.042), (18, -0.026), (19, -0.066), (20, 0.078), (21, -0.04), (22, 0.03), (23, 0.019), (24, 0.007), (25, -0.045), (26, -0.071), (27, 0.019), (28, -0.015), (29, -0.008), (30, -0.039), (31, 0.033), (32, 0.065), (33, -0.012), (34, -0.062), (35, 0.047), (36, -0.014), (37, -0.038), (38, 0.01), (39, 0.053), (40, -0.053), (41, 0.029), (42, -0.128), (43, -0.07), (44, 0.023), (45, -0.007), (46, 0.01), (47, -0.085), (48, 0.002), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.92267776 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
2 0.78539175 181 nips-2010-Network Flow Algorithms for Structured Sparsity
Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski
Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1
3 0.64430338 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem
Author: Gilbert Leung, Novi Quadrianto, Kostas Tsioutsiouliklis, Alex J. Smola
Abstract: We present a fast online solver for large scale parametric max-flow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. Our algorithm solves an integer linear program in an online fashion. It exploits total unimodularity of the constraint matrix and a Lagrangian relaxation to solve the problem as a convex online game. The algorithm generates approximate solutions of max-flow problems by performing stochastic gradient descent on a set of flows. We apply the algorithm to optimize tier arrangement of over 84 million web pages on a layered set of caches to serve an incoming query stream optimally. 1
4 0.59579682 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
Author: Nikos Karampatziakis
Abstract: We cast the problem of identifying basic blocks of code in a binary executable as learning a mapping from a byte sequence to a segmentation of the sequence. In general, inference in segmentation models, such as semi-CRFs, can be cubic in the length of the sequence. By taking advantage of the structure of our problem, we derive a linear-time inference algorithm which makes our approach practical, given that even small programs are tens or hundreds of thousands bytes long. Furthermore, we introduce two loss functions which are appropriate for our problem and show how to use structural SVMs to optimize the learned mapping for these losses. Finally, we present experimental results that demonstrate the advantages of our method against a strong baseline. 1
5 0.58855665 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods
Author: Katya Scheinberg, Shiqian Ma, Donald Goldfarb
Abstract: Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an ℓ1 -regularization term. In this paper, we propose a first-order method based on an alternating linearization technique that exploits the problem’s special structure; in particular, the subproblems solved in each iteration have closed-form solutions. Moreover, our algorithm obtains an ϵ-optimal solution in O(1/ϵ) iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms. 1
6 0.57681417 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
7 0.56485003 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
8 0.54998642 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
9 0.53697091 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
10 0.53678393 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts
11 0.52498066 63 nips-2010-Distributed Dual Averaging In Networks
12 0.49659094 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
13 0.47471488 234 nips-2010-Segmentation as Maximum-Weight Independent Set
14 0.45932066 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
16 0.45199338 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models
17 0.43485901 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
18 0.41871589 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
19 0.41781527 223 nips-2010-Rates of convergence for the cluster tree
20 0.4166964 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
topicId topicWeight
[(13, 0.049), (17, 0.012), (27, 0.044), (30, 0.436), (35, 0.012), (45, 0.181), (50, 0.051), (52, 0.026), (60, 0.035), (77, 0.049), (90, 0.013)]
simIndex simValue paperId paperTitle
1 0.99038583 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
2 0.9774043 264 nips-2010-Synergies in learning words and their referents
Author: Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black
Abstract: This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these. 1
3 0.95117289 283 nips-2010-Variational Inference over Combinatorial Spaces
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
same-paper 4 0.89276123 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
5 0.85744846 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori
Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1
6 0.83948249 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
7 0.79697794 220 nips-2010-Random Projection Trees Revisited
8 0.7368294 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.72744203 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
10 0.70795137 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
11 0.70579666 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
12 0.69806534 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
13 0.68624288 285 nips-2010-Why are some word orders more common than others? A uniform information density account
14 0.68194044 155 nips-2010-Learning the context of a category
15 0.68127733 221 nips-2010-Random Projections for $k$-means Clustering
16 0.67687875 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
17 0.6717177 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
18 0.66639906 233 nips-2010-Scrambled Objects for Least-Squares Regression
19 0.66371179 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
20 0.65840167 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case